`run_experiment`

from llenergymeasure import run_experiment

Concept

run_experiment is the simplest entry point for a one-off measurement. It accepts a model name, an engine, and optional measurement settings, runs inference against the aienergyscore prompt set (or a dataset you specify), and returns an ExperimentResult containing energy, throughput, FLOPs, and timing data.

Use run_experiment when you want to measure a single model-engine pair. When you need to sweep across multiple models, engines, or parameter axes - or run the same configuration more than once for statistical reliability - use run_study instead. Internally, run_experiment wraps a degenerate StudyConfig containing exactly one experiment and unwraps the result before returning it.

Simple usage

from llenergymeasure import run_experiment

result = run_experiment(model="gpt2", engine="transformers")

print(f"Energy:     {result.total_energy_j:.1f} J")
print(f"Throughput: {result.avg_tokens_per_second:.1f} tok/s")
print(f"Efficiency: {result.mj_per_tok_total:.3f} mJ/tok")

The return value is an ExperimentResult. For field definitions and the on-disk JSON schema it mirrors, see Results schema.

Config-from-file usage

result = run_experiment("experiment.yaml")

experiment.yaml:

task:
  model: meta-llama/Llama-3.1-8B
  dataset:
    source: aienergyscore
    n_prompts: 100

engine: transformers

measurement:
  warmup:
    n_warmup: 5
  baseline:
    enabled: true
    strategy: validated
  energy_sampler: auto

Any kwarg you pass alongside a YAML path overrides the corresponding field:

# Load the YAML but force a different model
result = run_experiment("experiment.yaml", model="gpt2")

Parameter table

run_experiment accepts three mutually exclusive call forms:

Form	First argument	Required kwargs
YAML path	`str` or `Path` to a YAML file	none
Config object	`ExperimentConfig` instance	none
kwargs	`None` (omit)	`model=`

Shared keyword arguments

These apply to all three call forms:

Parameter	Type	Default	Description
`skip_preflight`	`bool`	`False`	Skip Docker pre-flight checks (GPU visibility, CUDA/driver compatibility). Useful in CI or when using a remote Docker daemon.
`progress`	`ProgressCallback \| None`	`None`	Step-by-step progress callback. Receives `on_step_start` / `on_step_done` events during preflight, warmup, and inference phases.
`output_dir`	`str \| Path \| None`	`None`	Override the base directory for results. A timestamped study subdirectory is created within this path. `None` defers to `results/` (built-in default).

kwargs-form-only parameters

Used when config is None (omitted):

Parameter	Type	Default	Description
`model`	`str`	(required)	HuggingFace Hub ID or local path (e.g. `"gpt2"`, `"meta-llama/Llama-3.1-8B"`).
`engine`	`str \| None`	`None`	Inference engine: `"transformers"`, `"vllm"`, or `"tensorrt"`. `None` falls back to the `ExperimentConfig` default (`"transformers"`).
`n_prompts`	`int`	`100`	Number of prompts to run. Matches `DatasetConfig.n_prompts` default.
`dataset`	`str`	`"aienergyscore"`	Dataset source: built-in alias or path to a `.jsonl` file.
`**kwargs`	`Any`	-	Additional `ExperimentConfig` fields. Task-level fields (`max_input_tokens`, `max_output_tokens`, `random_seed`) and measurement-level fields (`energy_sampler`, etc.) are routed automatically.

Returns

ExperimentResult - a frozen Pydantic model containing all measurements. Key fields:

result.total_energy_j          # float  - total GPU energy in joules
result.energy_adjusted_j       # float | None - baseline-subtracted energy
result.avg_tokens_per_second   # float  - throughput
result.mj_per_tok_total        # float | None - millijoules per token (total)
result.mj_per_tok_adjusted     # float | None - millijoules per token (baseline-adjusted)
result.total_flops             # float  - estimated FLOPs (reference)
result.total_inference_time_sec # float - wall time

See ExperimentResult for the full field list.

Common patterns

Override a single field from a YAML file

# Use YAML for everything except model - quick model swap
result = run_experiment("base_config.yaml", model="meta-llama/Llama-3.1-8B")

Pass an ExperimentConfig object directly

from llenergymeasure import run_experiment, ExperimentConfig
from llenergymeasure.config.models import TaskConfig, MeasurementConfig

config = ExperimentConfig(
    task=TaskConfig(model="gpt2"),
    engine="transformers",
)
result = run_experiment(config)

Capture results to a custom directory

result = run_experiment(
    model="gpt2",
    engine="transformers",
    output_dir="/data/experiments/gpt2-baseline",
)

Compare two single measurements

result_a = run_experiment(model="gpt2", engine="transformers")
result_b = run_experiment(model="gpt2-xl", engine="transformers")

delta_pct = (result_b.mj_per_tok_total - result_a.mj_per_tok_total) / result_a.mj_per_tok_total * 100
print(f"gpt2-xl uses {delta_pct:+.1f}% more energy per token than gpt2")

Raises

Exception	When
`ConfigError`	No `config` argument and no `model=` kwarg; invalid YAML path; wrong argument type.
`pydantic.ValidationError`	Field value fails validation (e.g. `engine` not a known string, `n_prompts < 1`). Passes through unchanged.
`PreFlightError`	Engine requires Docker but Docker is not available or not running.
`ExperimentError`	Experiment ran but produced no results (e.g. engine crashed without raising).

Pitfalls

When to use run_study instead. If you are running the same configuration multiple times to get a stable mean, or sweeping over a parameter axis, use run_study. Calling run_experiment in a loop bypasses study-level gap controls, cycle ordering, circuit-breaker logic, and the result manifest.

Engines run in Docker. Engine libraries are not installed on the host; each engine runs inside its own Docker image. Docker with the NVIDIA Container Toolkit must be available, and the engine image must be present locally or pullable. See Docker setup and engine configuration. When Docker is unavailable, run_experiment raises PreFlightError at preflight, before any inference starts.

kwargs routing is automatic but transparent. When using the kwargs form, known TaskConfig fields (max_input_tokens, max_output_tokens, random_seed) and MeasurementConfig fields (energy_sampler, etc.) are routed to the correct sub-model. Unknown keys land on ExperimentConfig directly and raise ValidationError if unrecognised.

Concept​

Simple usage​

Config-from-file usage​

Parameter table​

Shared keyword arguments​

kwargs-form-only parameters​

Returns​

Common patterns​

Override a single field from a YAML file​

Pass an ExperimentConfig object directly​

Capture results to a custom directory​

Compare two single measurements​

Raises​

Pitfalls​

See also​