Skip to main content

run_experiment

from llenergymeasure import run_experiment

Concept

run_experiment is the simplest entry point for a one-off measurement. It accepts a model name, an engine, and optional measurement settings, runs inference against the aienergyscore prompt set (or a dataset you specify), and returns an ExperimentResult containing energy, throughput, FLOPs, and timing data.

Use run_experiment when you want to measure a single model-engine pair. When you need to sweep across multiple models, engines, or parameter axes - or run the same configuration more than once for statistical reliability - use run_study instead. Internally, run_experiment wraps a degenerate StudyConfig containing exactly one experiment and unwraps the result before returning it.


Simple usage

from llenergymeasure import run_experiment

result = run_experiment(model="gpt2", engine="transformers")

print(f"Energy: {result.total_energy_j:.1f} J")
print(f"Throughput: {result.avg_tokens_per_second:.1f} tok/s")
print(f"Efficiency: {result.mj_per_tok_total:.3f} mJ/tok")

The return value is an ExperimentResult. For field definitions and the on-disk JSON schema it mirrors, see Results schema.


Config-from-file usage

result = run_experiment("experiment.yaml")

experiment.yaml:

task:
model: meta-llama/Llama-3.1-8B
dataset:
source: aienergyscore
n_prompts: 100

engine: transformers

measurement:
warmup:
n_warmup: 5
baseline:
enabled: true
strategy: validated
energy_sampler: auto

Any kwarg you pass alongside a YAML path overrides the corresponding field:

# Load the YAML but force a different model
result = run_experiment("experiment.yaml", model="gpt2")

Parameter table

run_experiment accepts three mutually exclusive call forms:

FormFirst argumentRequired kwargs
YAML pathstr or Path to a YAML filenone
Config objectExperimentConfig instancenone
kwargsNone (omit)model=

Shared keyword arguments

These apply to all three call forms:

ParameterTypeDefaultDescription
skip_preflightboolFalseSkip Docker pre-flight checks (GPU visibility, CUDA/driver compatibility). Useful in CI or when using a remote Docker daemon.
progressProgressCallback | NoneNoneStep-by-step progress callback. Receives on_step_start / on_step_done events during preflight, warmup, and inference phases.
output_dirstr | Path | NoneNoneOverride the base directory for results. A timestamped study subdirectory is created within this path. None defers to results/ (built-in default).

kwargs-form-only parameters

Used when config is None (omitted):

ParameterTypeDefaultDescription
modelstr(required)HuggingFace Hub ID or local path (e.g. "gpt2", "meta-llama/Llama-3.1-8B").
enginestr | NoneNoneInference engine: "transformers", "vllm", or "tensorrt". None falls back to the ExperimentConfig default ("transformers").
n_promptsint100Number of prompts to run. Matches DatasetConfig.n_prompts default.
datasetstr"aienergyscore"Dataset source: built-in alias or path to a .jsonl file.
**kwargsAny-Additional ExperimentConfig fields. Task-level fields (max_input_tokens, max_output_tokens, random_seed) and measurement-level fields (energy_sampler, etc.) are routed automatically.

Returns

ExperimentResult - a frozen Pydantic model containing all measurements. Key fields:

result.total_energy_j # float - total GPU energy in joules
result.energy_adjusted_j # float | None - baseline-subtracted energy
result.avg_tokens_per_second # float - throughput
result.mj_per_tok_total # float | None - millijoules per token (total)
result.mj_per_tok_adjusted # float | None - millijoules per token (baseline-adjusted)
result.total_flops # float - estimated FLOPs (reference)
result.total_inference_time_sec # float - wall time

See ExperimentResult for the full field list.


Common patterns

Override a single field from a YAML file

# Use YAML for everything except model - quick model swap
result = run_experiment("base_config.yaml", model="meta-llama/Llama-3.1-8B")

Pass an ExperimentConfig object directly

from llenergymeasure import run_experiment, ExperimentConfig
from llenergymeasure.config.models import TaskConfig, MeasurementConfig

config = ExperimentConfig(
task=TaskConfig(model="gpt2"),
engine="transformers",
)
result = run_experiment(config)

Capture results to a custom directory

result = run_experiment(
model="gpt2",
engine="transformers",
output_dir="/data/experiments/gpt2-baseline",
)

Compare two single measurements

result_a = run_experiment(model="gpt2", engine="transformers")
result_b = run_experiment(model="gpt2-xl", engine="transformers")

delta_pct = (result_b.mj_per_tok_total - result_a.mj_per_tok_total) / result_a.mj_per_tok_total * 100
print(f"gpt2-xl uses {delta_pct:+.1f}% more energy per token than gpt2")

Raises

ExceptionWhen
ConfigErrorNo config argument and no model= kwarg; invalid YAML path; wrong argument type.
pydantic.ValidationErrorField value fails validation (e.g. engine not a known string, n_prompts < 1). Passes through unchanged.
PreFlightErrorEngine requires Docker but Docker is not available or not running.
ExperimentErrorExperiment ran but produced no results (e.g. engine crashed without raising).

Pitfalls

When to use run_study instead. If you are running the same configuration multiple times to get a stable mean, or sweeping over a parameter axis, use run_study. Calling run_experiment in a loop bypasses study-level gap controls, cycle ordering, circuit-breaker logic, and the result manifest.

Engines run in Docker. Engine libraries are not installed on the host; each engine runs inside its own Docker image. Docker with the NVIDIA Container Toolkit must be available, and the engine image must be present locally or pullable. See Docker setup and engine configuration. When Docker is unavailable, run_experiment raises PreFlightError at preflight, before any inference starts.

kwargs routing is automatic but transparent. When using the kwargs form, known TaskConfig fields (max_input_tokens, max_output_tokens, random_seed) and MeasurementConfig fields (energy_sampler, etc.) are routed to the correct sub-model. Unknown keys land on ExperimentConfig directly and raise ValidationError if unrecognised.


See also