LLenergyMeasure

Measure how implementation choices drive LLM inference efficiency.

A CLI-first framework that supports energy, throughput, and FLOPs measurement - with extensible multi-engine-introspected parameter spaces, structured sweep logic, experiment deduplication and sampling.

Get Started

Why this exists

What is this?

LLenergyMeasure is an open research tool for measuring LLM inference efficiency. It stitches energy samplers (Zeus, CodeCarbon, NVML), inference engines (transformers, vLLM, TensorRT-LLM), and a methodology-rigorous measurement harness into a coherent research tool that takes researchers' specs and runs them. The tool discovers and exposes engine parameters programmatically, then uses invariant mining and deduplication to keep the resulting parameter space tractable for sweeps.

While both the CLI and its underlying Python library can be used for any LLM efficiency research, the primary question the CLI was built to answer is: how do implementation choices downstream of model selection drive LLM serving efficiency?

First measurement

CLI
Python

$ llem run --model gpt2 --engine transformers

engine:      transformers
model:       gpt2
energy:      12.43 J
throughput:  47.2 tok/s
duration:    3.18 s

from llenergymeasure import run_experiment

result = run_experiment(model="gpt2", engine="transformers")

print(result.total_energy_j)            # 12.43
print(result.avg_tokens_per_second)     # 47.2

Methodology

GPU power sampled via NVML at 100 ms intervals (default sampler)
Baseline idle-power subtracted via two-container baseline measurement
Warmup convergence required (CV < 0.05) before the measurement window opens
Energy reported in joules; reproducibility notes attached to every result
Per-engine Docker isolation - dependency stacks pinned and isolated; runs reproduce bit-identically against the same pinned versions
Sampler plugins: NVML, Zeus, CodeCarbon
Engine plugins (extensible via plugin protocol; ships with transformers, vLLM, TensorRT-LLM)

Read the full methodology ->

Built for a moving target

The inference stack moves quickly: vLLM, TensorRT-LLM, and SGLang ship new versions on a monthly cadence, each adding tens to hundreds of configuration parameters. Manual curation does not keep up.

LLenergyMeasure handles this through three coupled mechanisms:

Programmatic introspection. Each engine's full parameter surface is discovered automatically from Pydantic schemas and class signatures, not from a hand-curated list.
Invariant mining. Validation constraints - which parameter combinations the engine rejects, warns on, or silently normalises - are mined automatically from each library's source via AST walking of validator methods and dynamic probing. Mined rules deduplicate across engines by fingerprint.
Renovate-driven refresh. When an engine version bumps upstream, CI re-runs introspection and mining inside the new image, regenerates the schema and invariants artefacts, and posts the diff for review.

Together these make sweeps tractable across an otherwise-vast parameter space. A study can request thousands of cells; deduplication collapses semantically-identical configs; the resolved plan is reviewable before any GPU time is spent.

Engine introspection pipelines ->

What this isn't

Inference, not training. Training accounting belongs elsewhere.
Efficiency, not capability - pair with lm-evaluation-harness for the capability side.
Measurement, not benchmark - pair with MLPerf Inference for standardised inter-model comparison.
Outputs inform benchmark design and policy; LLenergyMeasure is not the benchmark itself.

Ecosystem positioning ->

Where to start

Researcher

Design and run a multi-engine implementation-parameter study. Control engine, sampler, and sweep axes from a single YAML config.

Multi-engine study tutorial

Engineer

Run LLenergyMeasure in your CI pipeline with Docker and vLLM. Reproducible containers; no host CUDA dependency.

Run with Docker + vLLM

Serving open-weights models

Run open-weights inference and want to make it more efficient. Apply the same measurement methodology that supports published comparisons to your own serving stack.

Where to start

BibTeX citation

@software{baker2026llenergymeasure,
  author    = {Baker, Henry C. G.},
  title     = {{LLenergyMeasure}: Measure how implementation choices drive
                LLM inference efficiency},
  year      = {2026},
  version   = {0.9.0},
  url       = {https://github.com/henrycgbaker/llenergymeasure},
  note      = {Pre-1.0 release. See GitHub releases for the current version.
                Multi-engine, methodology-first measurement framework for
                LLM inference efficiency.}
}

Full citation page ->

LLenergyMeasure

What is this?

First measurement

Methodology

Built for a moving target

What this isn't

Where to start

Researcher

Engineer

Serving open-weights models

Documentation

Tutorials

How-to

Reference

Explanation