Skip to main content

import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';

Your first measurement

In the next five minutes you'll install llenergymeasure, run a measurement on GPT-2, and read the result file. By the end you'll know:

  • How llem is structured (host orchestrator + per-engine Docker images)
  • What a single experiment measures (energy, throughput, FLOPs)
  • Where results are written and how to read them

This is a tutorial - guided and linear. For goal-driven recipes (e.g. "how do I run with vLLM?"), see How-to.

Prerequisites

  • llenergymeasure installed - see How to install
  • Docker + NVIDIA Container Toolkit - every engine, including Transformers, runs inside a per-engine Docker image
  • An NVIDIA GPU available

Step 1: Verify your environment

llem config

Check that the output shows your GPU detected and an energy sampler selected. Engines will show as "not installed" on host - that is expected; they run inside Docker. See the Docker setup how-to if any of the pre-flight checks fail.

Step 2: Run your first experiment

llem run --model gpt2 -e transformers
from llenergymeasure import run_experiment

result = run_experiment(model="gpt2", engine="transformers")

This runs GPT-2 (124M parameters). On first run, the model downloads from HuggingFace (~500 MB). Subsequent runs use the cache.

Default settings: 100 prompts, aienergyscore dataset, bfloat16 dtype.

You'll see a progress indicator on stderr, then results printed to stdout:

Result: gpt2_20260507_143208 # unique experiment ID

Energy # GPU energy consumed
Total 847 J # total joules for all 100 prompts
Baseline 12.3 W # idle GPU power (subtracted from total)
Adjusted 723 J # energy minus baseline x duration

Performance # throughput and compute
Throughput 312 tok/s # output tokens per second
FLOPs 4.21e+11 # estimated from architecture

Timing # wall-clock time
Duration 1m 38s # total experiment wall time
Warmup 5 prompts excluded # thermal stabilisation, not in metrics

Step 3: Read the results

Each field maps to a measurement decision:

FieldWhat it measures
Total (J)Raw GPU energy consumed during the experiment
Baseline (W)Idle GPU power measured before the run
Adjusted (J)Energy minus Baseline x Duration - net inference energy
Throughput (tok/s)Output tokens generated per second across all prompts
FLOPsEstimated floating-point operations (method and confidence shown)
DurationWall-clock time for the full experiment
WarmupPrompts run for thermal stabilisation, excluded from metrics

Why subtract baseline? Because you're measuring inference, not "GPU plugged in." The full reasoning is on the methodology page and the energy-measurement explanation.

Step 4: Inspect the output files

Results are written to results/ by default:

results/
└── gpt2_20260507_143208/
└── result.json # full record (all metrics, config, metadata)

The JSON file is the scientific record - all raw metrics, the resolved config, timestamps, and any measurement warnings. See How to interpret results for a walkthrough.

Specify a different output directory with --output:

llem run --model gpt2 -e transformers --output /data/experiments

:::tip Reproducibility Keep the result.json and effective_config.json together. The config file records every resolved parameter value - including engine defaults you didn't set - so you can reproduce the exact run later. :::

What you've learned

  • llem run runs one experiment end-to-end, writes a result.json, prints a human-readable summary.
  • Even Transformers runs go through a Docker image - keeps the measurement environment reproducible.
  • Energy is measured and baseline-adjusted; both numbers ship in the result file so you can pick the framing your study needs.

Where to go next