`ExperimentConfig`

from llenergymeasure import ExperimentConfig

Concept

ExperimentConfig is the pure scientific specification for a single measurement. It captures everything that defines what is being measured and how it is measured, with no knowledge of studies, sweeps, cycles, or output paths (those live on StudyConfig).

The design is intentional: two ExperimentConfig objects with identical field values always represent the same experiment, regardless of when or how many times they are run. This property powers the deduplication and reproducibility tracking that the study layer builds on top of it.

ExperimentConfig is a Pydantic BaseModel with extra="forbid" and a frozen instance (immutable after construction). Validation runs at construction time; invalid configs raise pydantic.ValidationError immediately.

Construction

From YAML (via loader)

The standard path - the loader handles file I/O, sweep expansion, and returns validated objects ready for execution:

from llenergymeasure.config.loader import load_experiment_config
from pathlib import Path

config = load_experiment_config(path=Path("experiment.yaml"))

From kwargs

from llenergymeasure import ExperimentConfig
from llenergymeasure.config.models import TaskConfig

config = ExperimentConfig(
    task=TaskConfig(model="gpt2"),
    engine="transformers",
)

From another config (override pattern)

Pydantic's model_copy(update=...) creates a new instance with selected fields changed. Useful for building a family of configs from a base:

base = ExperimentConfig(task=TaskConfig(model="gpt2"), engine="transformers")

# Derive a variant with a different model
large = base.model_copy(update={"task": base.task.model_copy(update={"model": "gpt2-large"})})

Fields

Top-level fields

Field	Type	Default	Description
`task`	`TaskConfig`	(required)	What to measure: model, dataset, token limits, seed.
`engine`	`str`	`"transformers"`	Inference engine: `"transformers"`, `"vllm"`, or `"tensorrt"`.
`measurement`	`MeasurementConfig`	`MeasurementConfig()`	How to measure: warmup, baseline, energy sampler.
`sampling_preset`	`"deterministic" \| "standard" \| "creative" \| "factual" \| None`	`None`	When set, preset values are merged into the active engine's sampling section at parse time. Explicit YAML values take precedence.
`transformers`	`TransformersConfig \| None`	`None`	Transformers-specific settings (only used when `engine="transformers"`).
`vllm`	`VLLMConfig \| None`	`None`	vLLM-specific settings (only used when `engine="vllm"`).
`tensorrt`	`TensorRTConfig \| None`	`None`	TensorRT-LLM settings (only used when `engine="tensorrt"`).
`lora`	`LoRAConfig \| None`	`None`	LoRA adapter configuration (adapter Hub ID or local path).
`passthrough_kwargs`	`dict[str, Any] \| None`	`None`	Extra kwargs forwarded to the engine at execution time. Keys must not collide with top-level `ExperimentConfig` fields.

`TaskConfig` fields

Nested under task: in YAML, or TaskConfig(...) in Python:

Field	Type	Default	Description
`model`	`str`	(required)	HuggingFace Hub ID or local path.
`dataset`	`DatasetConfig`	`DatasetConfig()`	Dataset source and prompt count.
`max_input_tokens`	`int \| None`	`256`	Truncate input prompts to this many tokens. `None` = no truncation.
`max_output_tokens`	`int \| None`	`256`	Maximum generated tokens (`max_new_tokens`). `None` = generate until EOS.
`random_seed`	`int`	`42`	Seed for all stochasticity: inference RNG and dataset ordering.

`MeasurementConfig` fields

Nested under measurement: in YAML, or MeasurementConfig(...) in Python:

Field	Type	Default	Description
`warmup`	`WarmupConfig`	`WarmupConfig()`	Warmup phase settings (see below).
`baseline`	`BaselineConfig`	`BaselineConfig()`	Idle power baseline settings (see below).
`energy_sampler`	`"auto" \| "nvml" \| "zeus" \| "codecarbon" \| None`	`"auto"`	Energy measurement sampler. `"auto"` selects the best available (Zeus > NVML > CodeCarbon). `None` disables energy measurement.

`WarmupConfig` fields (nested under `measurement.warmup`)

Field	Type	Default	Description
`enabled`	`bool`	`True`	Enable warmup phase.
`n_warmup`	`int`	`5`	Number of full-length warmup prompts before measurement starts. Minimum 1.
`thermal_floor_seconds`	`float`	`60.0`	Minimum seconds to wait after warmup for thermal stabilisation. Minimum 30s enforced.
`convergence_detection`	`bool`	`False`	Enable CV-based adaptive convergence detection (additive to `n_warmup`).
`cv_threshold`	`float`	`0.05`	CV target for convergence (0.01-0.50, only used when `convergence_detection=True`).
`max_prompts`	`int`	`20`	Safety cap on warmup prompts in CV mode.

`BaselineConfig` fields (nested under `measurement.baseline`)

Field	Type	Default	Description
`enabled`	`bool`	`True`	Enable idle power baseline measurement.
`duration_seconds`	`float`	`30.0`	Baseline measurement duration (5-120s).
`strategy`	`"cached" \| "validated" \| "fresh"`	`"validated"`	`"cached"`: disk-persisted with TTL. `"validated"`: cached with periodic spot-checks. `"fresh"`: measure every experiment (most accurate, ~30s overhead per experiment).
`cache_ttl_seconds`	`float`	`7200.0`	How long a cached baseline remains valid. Used with `"cached"` or `"validated"`.
`validation_interval`	`int`	`5`	Re-validate every N experiments. Used with `"validated"` only.
`drift_threshold`	`float`	`0.10`	Power drift fraction to trigger re-measurement. Used with `"validated"` only.

Validation

Pydantic validation runs at construction time. Common errors:

Engine-section mismatch. Providing a transformers: section when engine="vllm" is a configuration error and raises ValidationError. The engine section must match the engine field:

# Raises: transformers: config section provided but engine='vllm'
ExperimentConfig(
    task=TaskConfig(model="gpt2"),
    engine="vllm",
    transformers=TransformersConfig(batch_size=4),  # wrong engine
)

passthrough_kwargs collision. Keys in passthrough_kwargs must not overlap with top-level ExperimentConfig field names:

# Raises: passthrough_kwargs keys collide with ExperimentConfig fields: ['engine']
ExperimentConfig(
    task=TaskConfig(model="gpt2"),
    passthrough_kwargs={"engine": "custom"},  # collides
)

FlashAttention dtype. attn_implementation="flash_attention_2" or "flash_attention_3" require dtype="float16" or dtype="bfloat16". Combining with dtype="float32" raises ValidationError.

Engine invariants. The invariants corpus may emit ConfigValidationWarning for configurations the library will silently override (dormant invariants), or raise ValidationError for invalid combinations (e.g. FP8 on A100 with engine="tensorrt").

Common patterns

Programmatic config building

from llenergymeasure import ExperimentConfig
from llenergymeasure.config.models import TaskConfig, MeasurementConfig, WarmupConfig

config = ExperimentConfig(
    task=TaskConfig(
        model="meta-llama/Llama-3.1-8B",
        max_input_tokens=512,
        max_output_tokens=256,
    ),
    engine="transformers",
    measurement=MeasurementConfig(
        warmup=WarmupConfig(n_warmup=10, thermal_floor_seconds=90.0),
        energy_sampler="nvml",
    ),
)

Override pattern - build a family from a base

base = ExperimentConfig(task=TaskConfig(model="gpt2"), engine="transformers")

variants = [
    base.model_copy(update={"task": base.task.model_copy(update={"model": m})})
    for m in ["gpt2", "gpt2-medium", "gpt2-large"]
]

Inspect the config hash

from llenergymeasure.domain.experiment import compute_declared_config_hash

h = compute_declared_config_hash(config)
print(h)  # 16-char hex, e.g. "a3f2b19c7e4d0a81"

Two configs with identical fields produce the same hash, regardless of construction order.

Concept​

Construction​

From YAML (via loader)​

From kwargs​

From another config (override pattern)​

Fields​

Top-level fields​

TaskConfig fields​

MeasurementConfig fields​

WarmupConfig fields (nested under measurement.warmup)​

BaselineConfig fields (nested under measurement.baseline)​

Validation​

Common patterns​

Programmatic config building​

Override pattern - build a family from a base​

Inspect the config hash​

See also​