Skip to main content

import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';

Quick start

Run your first measurement

llem run --model gpt2 --engine transformers
from llenergymeasure import run_experiment

result = run_experiment(model="gpt2", engine="transformers")
print(result)

On first run, GPT-2 (around 500 MB) downloads from HuggingFace and the engine Docker image is resolved. Subsequent runs use the local cache and typically complete in under two minutes.

A progress indicator prints to stderr. When the experiment finishes, a short summary prints to stdout and a structured result.json is written under results/. Numeric values vary by hardware; the shape is:

Result: gpt2-transformers-bf16-<timestamp>

Energy
Total <joules>
Baseline <watts>
Adjusted <joules - baseline * duration>

Performance
Throughput <tokens/sec>
FLOPs <estimate>

Timing
Duration <wall-clock>
Warmup <n prompts excluded>

Read the result

FieldWhat it means
Total (J)Raw GPU energy across the prompt set
Baseline (W)Idle GPU power measured before the run
Adjusted (J)Total minus Baseline x Duration: the energy attributable to inference
Throughput (tok/s)Output tokens per second across all prompts
FLOPsEstimated floating-point operations (validity check, not a headline metric)
DurationWall-clock time for the full experiment
WarmupPrompts excluded for thermal stabilisation

:::tip Use Adjusted for cross-experiment comparisons Adjusted isolates inference energy from idle-GPU draw. Use it whenever you are comparing two configurations - engine choice, dtype, batch size - so that ambient GPU power does not inflate the difference. :::

For the full field list in result.json, see Results schema.


Result file

Results are written to results/ in your working directory:

results/
└── gpt2-transformers-bf16-<timestamp>/
└── result.json

The experiment ID encodes the model, engine, dtype, and a timestamp.


What's next