Skip to main content

ExperimentConfig

from llenergymeasure import ExperimentConfig

Concept

ExperimentConfig is the pure scientific specification for a single measurement. It captures everything that defines what is being measured and how it is measured, with no knowledge of studies, sweeps, cycles, or output paths (those live on StudyConfig).

The design is intentional: two ExperimentConfig objects with identical field values always represent the same experiment, regardless of when or how many times they are run. This property powers the deduplication and reproducibility tracking that the study layer builds on top of it.

ExperimentConfig is a Pydantic BaseModel with extra="forbid" and a frozen instance (immutable after construction). Validation runs at construction time; invalid configs raise pydantic.ValidationError immediately.


Construction

From YAML (via loader)

The standard path - the loader handles file I/O, sweep expansion, and returns validated objects ready for execution:

from llenergymeasure.config.loader import load_experiment_config
from pathlib import Path

config = load_experiment_config(path=Path("experiment.yaml"))

From kwargs

from llenergymeasure import ExperimentConfig
from llenergymeasure.config.models import TaskConfig

config = ExperimentConfig(
task=TaskConfig(model="gpt2"),
engine="transformers",
)

From another config (override pattern)

Pydantic's model_copy(update=...) creates a new instance with selected fields changed. Useful for building a family of configs from a base:

base = ExperimentConfig(task=TaskConfig(model="gpt2"), engine="transformers")

# Derive a variant with a different model
large = base.model_copy(update={"task": base.task.model_copy(update={"model": "gpt2-large"})})

Fields

Top-level fields

FieldTypeDefaultDescription
taskTaskConfig(required)What to measure: model, dataset, token limits, seed.
enginestr"transformers"Inference engine: "transformers", "vllm", or "tensorrt".
measurementMeasurementConfigMeasurementConfig()How to measure: warmup, baseline, energy sampler.
sampling_preset"deterministic" | "standard" | "creative" | "factual" | NoneNoneWhen set, preset values are merged into the active engine's sampling section at parse time. Explicit YAML values take precedence.
transformersTransformersConfig | NoneNoneTransformers-specific settings (only used when engine="transformers").
vllmVLLMConfig | NoneNonevLLM-specific settings (only used when engine="vllm").
tensorrtTensorRTConfig | NoneNoneTensorRT-LLM settings (only used when engine="tensorrt").
loraLoRAConfig | NoneNoneLoRA adapter configuration (adapter Hub ID or local path).
passthrough_kwargsdict[str, Any] | NoneNoneExtra kwargs forwarded to the engine at execution time. Keys must not collide with top-level ExperimentConfig fields.

TaskConfig fields

Nested under task: in YAML, or TaskConfig(...) in Python:

FieldTypeDefaultDescription
modelstr(required)HuggingFace Hub ID or local path.
datasetDatasetConfigDatasetConfig()Dataset source and prompt count.
max_input_tokensint | None256Truncate input prompts to this many tokens. None = no truncation.
max_output_tokensint | None256Maximum generated tokens (max_new_tokens). None = generate until EOS.
random_seedint42Seed for all stochasticity: inference RNG and dataset ordering.

MeasurementConfig fields

Nested under measurement: in YAML, or MeasurementConfig(...) in Python:

FieldTypeDefaultDescription
warmupWarmupConfigWarmupConfig()Warmup phase settings (see below).
baselineBaselineConfigBaselineConfig()Idle power baseline settings (see below).
energy_sampler"auto" | "nvml" | "zeus" | "codecarbon" | None"auto"Energy measurement sampler. "auto" selects the best available (Zeus > NVML > CodeCarbon). None disables energy measurement.

WarmupConfig fields (nested under measurement.warmup)

FieldTypeDefaultDescription
enabledboolTrueEnable warmup phase.
n_warmupint5Number of full-length warmup prompts before measurement starts. Minimum 1.
thermal_floor_secondsfloat60.0Minimum seconds to wait after warmup for thermal stabilisation. Minimum 30s enforced.
convergence_detectionboolFalseEnable CV-based adaptive convergence detection (additive to n_warmup).
cv_thresholdfloat0.05CV target for convergence (0.01-0.50, only used when convergence_detection=True).
max_promptsint20Safety cap on warmup prompts in CV mode.

BaselineConfig fields (nested under measurement.baseline)

FieldTypeDefaultDescription
enabledboolTrueEnable idle power baseline measurement.
duration_secondsfloat30.0Baseline measurement duration (5-120s).
strategy"cached" | "validated" | "fresh""validated""cached": disk-persisted with TTL. "validated": cached with periodic spot-checks. "fresh": measure every experiment (most accurate, ~30s overhead per experiment).
cache_ttl_secondsfloat7200.0How long a cached baseline remains valid. Used with "cached" or "validated".
validation_intervalint5Re-validate every N experiments. Used with "validated" only.
drift_thresholdfloat0.10Power drift fraction to trigger re-measurement. Used with "validated" only.

Validation

Pydantic validation runs at construction time. Common errors:

Engine-section mismatch. Providing a transformers: section when engine="vllm" is a configuration error and raises ValidationError. The engine section must match the engine field:

# Raises: transformers: config section provided but engine='vllm'
ExperimentConfig(
task=TaskConfig(model="gpt2"),
engine="vllm",
transformers=TransformersConfig(batch_size=4), # wrong engine
)

passthrough_kwargs collision. Keys in passthrough_kwargs must not overlap with top-level ExperimentConfig field names:

# Raises: passthrough_kwargs keys collide with ExperimentConfig fields: ['engine']
ExperimentConfig(
task=TaskConfig(model="gpt2"),
passthrough_kwargs={"engine": "custom"}, # collides
)

FlashAttention dtype. attn_implementation="flash_attention_2" or "flash_attention_3" require dtype="float16" or dtype="bfloat16". Combining with dtype="float32" raises ValidationError.

Engine invariants. The invariants corpus may emit ConfigValidationWarning for configurations the library will silently override (dormant invariants), or raise ValidationError for invalid combinations (e.g. FP8 on A100 with engine="tensorrt").


Common patterns

Programmatic config building

from llenergymeasure import ExperimentConfig
from llenergymeasure.config.models import TaskConfig, MeasurementConfig, WarmupConfig

config = ExperimentConfig(
task=TaskConfig(
model="meta-llama/Llama-3.1-8B",
max_input_tokens=512,
max_output_tokens=256,
),
engine="transformers",
measurement=MeasurementConfig(
warmup=WarmupConfig(n_warmup=10, thermal_floor_seconds=90.0),
energy_sampler="nvml",
),
)

Override pattern - build a family from a base

base = ExperimentConfig(task=TaskConfig(model="gpt2"), engine="transformers")

variants = [
base.model_copy(update={"task": base.task.model_copy(update={"model": m})})
for m in ["gpt2", "gpt2-medium", "gpt2-large"]
]

Inspect the config hash

from llenergymeasure.domain.experiment import compute_declared_config_hash

h = compute_declared_config_hash(config)
print(h) # 16-char hex, e.g. "a3f2b19c7e4d0a81"

Two configs with identical fields produce the same hash, regardless of construction order.


See also