Pipeline Architecture

This doc is the chain-diagram reference for the engine-coupling pipeline. SSOT-driven Renovate cycles flow through these stages: trigger, two per-concern CI workflows, validated artefacts, and the human curation checkpoint before merge.

Asymmetric engine architecture (locked design choice)

The three engines run different pipelines in CI for a load-bearing reason. Don't undo this asymmetry without re-reading #518 - the conclusion has held across re-litigations 2026-04-30, 2026-05-01, and 2026-05-05.

Engine	Image source	CI flow on PR
transformers	First-party `docker/Dockerfile.transformers` (FA3-included; no upstream provides this)	`engine-pipeline :: build-transformers` (rebuild) → `engine-pipeline :: invariants-transformers + schemas-transformers` (probe + mine/introspect) → [merge] → `publish-engine-image` (mirror to production tag)
vllm	Upstream `vllm/vllm-openai:v<VER>` directly + bind-mount llem source	`engine-pipeline :: invariants-others + schemas-others` matrix cells fire on `pull_request: paths` (no first-party build)
tensorrt	Upstream `nvcr.io/nvidia/tensorrt-llm/release:<VER>` directly + bind-mount llem source	Same shape as vllm

Why asymmetric. vllm + tensorrt's upstream images empirically contain everything llem needs at runtime (PoC verified 2026-04-30: pydantic, typer, pyarrow, rich, dotenv, pyyaml all present transitively). Transformers' upstream images don't include FA3, which is non-negotiable for production-equivalent CI runs. So transformers gets a first-party Dockerfile; the others stay upstream-direct.

Drift safety. The only argument for first-party-everywhere is "what if upstream drops a transitive dep llem needs?" The migration cost from upstream-direct → first-party is bounded (~1 day, well-defined recipe per #518). The actual cost of running first-party-everywhere is the FA3 build for two extra engines that don't need it.

Transformers PR-time CI flow (rebuild + probe/mine/introspect chain)

PR trigger. engine-pipeline.yml fires when a PR touches any of the path-filter inputs: engine_versions/transformers.yaml, docker/Dockerfile.transformers, or .github/workflows/engine-pipeline.yml.
build-transformers. Builds the transformers runtime image. Cache hits land in ~10-15 min; cold FA3 builds ~60-90 min. Pushes to ghcr.io/<repo>/transformers-cache:transformers-<VER>.
invariants-transformers + schemas-transformers cells. Orchestrator's needs: graph fires these on build success. Each cell pulls the transformers-cache image, runs probe -> mine/introspect -> validate, and uploads a writeback artefact.
Probe + CI verdict. A probe failure turns CI red. The accept-probe-fail PR label bypasses the gate for known-drift cases (admin escalation; see #547).
publish-engine-image.yml. Fires directly on push to main (no rebuild). Tag-copy via docker buildx imagetools create: transformers-cache:transformers-<VER> -> transformers:transformers-<VER> and transformers:latest. Registry-side metadata op only; seconds, no build infrastructure. The production image is bit-identical to the cache image that CI validated on the PR.

vllm + tensorrt PR-time CI flow (no rebuild; upstream-direct)

The diagram below applies to vllm + tensorrt only - engine-pipeline.yml's invariants-others + schemas-others matrix cells fire on pull_request: paths (no build-transformers dependency). They pull the upstream image at the SSOT-pinned version, bind-mount llem source, and probe/mine/introspect inside the upstream container.

Pipeline shape: Renovate -> per-concern workflows -> writeback -> human curation

The vllm + tensorrt cycle uses two per-concern workflow cells (engine-invariants + engine-schemas) coordinating via sibling-wait. Each cell runs its own probe -> producer -> diff -> comment + label sequence; the last-finishing cell performs an atomic writeback. Cross-pipeline rollup state lives on PR labels.

The diagram captures the high-level flow; per-step detail follows below.

Trigger contract

Renovate. Scans upstream library releases on the configured schedule. Custom regex manager bumps two file targets together: engine_versions/{engine}.yaml:library.current_version (the SSOT, canonical) and docker/Dockerfile.{engine} ARG (derived, auto-templated from SSOT).
Path-filtered fan-out. When Renovate's PR opens, paths-filter routes the change to two workflows in parallel: the engine-invariants pipeline and the engine-schemas pipeline.

engine-invariants cell (per-engine matrix)

Layers over: invariant-miner + invalidity-miner + lift modules + validation-CI gate.

PROBE - inline python -m scripts._probe --producer invariants; verdict pass or fail.
MINE (only if probe passes) - build_corpus.py writes src/llenergymeasure/engines/{engine}/invariants.proposed.yaml.
VALIDATE-REPLAY - validate_invariants.py plus the compare_expected_vs_observed contract from _invariant_validation_common.py. Replays kwargs_positive + kwargs_negative against the live library; classifies outcomes (positive_confirmed, negative_confirmed, divergence). Writes src/llenergymeasure/engines/{engine}/invariants.validated.yaml.
DIFF vs HEAD for both proposed.yaml and validated.yaml artefacts.
REGENERATE docs/reference/engines/invariants-{engine}.md (Invariants section - fact base, encompasses dormancy + invalidity + miner output + introspection + runtime catch-all).
COMMENT + LABEL (suppress on empty).
Probe-fail branch - same 3-route handling as the schemas pipeline below; apply probe-blocked label; exit 0 (not a CI failure).

engine-schemas cell (engines matrix)

Layers over: parameter-discovery + typed-schema-discovery.

PROBE - inline python -m scripts._probe --producer schemas; verdict pass or fail.
DISCOVER (only if probe passes) - engine_introspectors writes src/llenergymeasure/config/discovered_schemas/{engine}/schema.discovered.json.
DIFF vs HEAD.
REGENERATE docs/reference/engines/curation-{engine}.md (Parameters section - fact base for the human curator; pre-existing behaviour preserved).
COMMENT + LABEL (suppress on empty).
Probe-fail branch - post probe-fail comment with 3 routes (per §3 of the design doc: patch code, /approve-reuse, escalate). Apply probe-blocked label; exit 0 (not a CI failure).

Per-cell artefact contract

Each cell:

Uploads engine-step-diff-{engine}-{concern}.yaml.
Posts its OWN per-pipeline comment (suppress on empty).
Applies its own per-pipeline label (invariants/schemas-changed, invariants/schemas-breaking, corpus-changed, probe-blocked).
Waits for the sibling pipeline to complete (lewagon/wait-on-check-action; an already-finished sibling exits immediately).

Atomic writeback

The last-finishing workflow performs an in-line atomic writeback:

git add src/llenergymeasure/engines/{engine}/invariants.proposed.yaml
        src/llenergymeasure/engines/{engine}/invariants.validated.yaml
        src/llenergymeasure/engines/{engine}/schema.discovered.json
        docs/reference/engines/curation-{engine}.md
        docs/reference/engines/invariants-{engine}.md
        engine_versions/{engine}.compat.json
        engine_versions/{engine}.yaml   # only if /approve-reuse fired
git commit && git push --force-with-lease

The same workflow then applies the cross-pipeline rollup label (safe-bump or probe-blocked).

:::note No summariser workflow There is no summariser workflow file and no composite action. Cross-pipeline state lives on labels - a GitHub-native primitive. "Did the cycle run?" reads off the check-status badge. "Anything change?" reads off the per-pipeline comments and commits. "What's the rollup state?" reads off the label. :::

PR state after a Renovate cycle

2 per-concern check statuses.
Up to 2 comments per cycle (suppress-on-empty): engine-invariants pipeline and engine-schemas pipeline.
1 atomic bot commit (all artefacts; written by whichever workflow finished last).
Cross-pipeline rollup label (safe-bump or probe-blocked).

Human curation checkpoint

This is the only crossing of the human-as-final-checkpoint boundary (P6) inside the otherwise-automated validated half. Bots never edit src/llenergymeasure/config/engine_configs.py.

The dev consumes auto-generated digests:

docs/reference/engines/curation-{engine}.md - Section 1: Parameters (discovered fields with Pydantic-curated yes/no, deltas vs previous SSOT version).
docs/reference/engines/invariants-{engine}.md - Section 1: Invariants (corpus rules added/changed/removed, classified by added_by; encompasses dormancy + invalidity + miner output + introspection + runtime catch-all).

The dev manually edits engine_configs.py:

which discovered params to expose in Pydantic;
which Literal narrowings to pin;
which sub-config taxonomy to use;
which custom @model_validator decorators to add.

A push triggers a re-run of the CI cycle; the updated summary comment supersedes the prior one (edited via comment-id key, no proliferation).

Decision routes after digest review

Route	Action
`safe-bump` + green CI	squash-merge
`corpus-changed` + mechanical	squash-merge
`invariants-breaking`	edit `engine_configs.py`
`schemas-breaking`	edit `engine_configs.py`
`probe-blocked`	resolve via §3 routes: patch producer code, `/approve-reuse`, or escalate

:::note Guided curation UX is deferred The guided curation UX (RFC-style YAML decision file + libcst applier) is deferred to issue #475. The current redesign ships self-serve curation only: devs hand-edit engine_configs.py based on the digest. After 2-3 Renovate cycles of operational data, the #475 reactivation will evaluate whether the guided UX pays off. :::

Probe-fail human checkpoint

This is the OTHER human touchpoint (per P6) - inside the otherwise-automated CI half. When a probe fails (inline step 1 of either workflow), three resolution routes are available.

Route 1 - Patch producer code. The dev edits scripts/engine_miners/{engine}_*_miner.py or scripts/engine_introspectors/{engine}_introspector.py to fix the broken landmark (e.g. follow an upstream rename). Pushing the commit re-runs the workflow; the probe re-runs; if it passes, downstream stages proceed.

Route 2 - Approve reuse via slash command. The dev posts @llem-ci-bot /approve-reuse <engine> <producer> as a PR comment. Producer is one of {invariants, schemas} (per-producer granularity - vllm invariants might be reusable while vllm schemas are not). approve-reuse-bot.yml is the issue_comment: created listener; it validates the dev's approval rights, updates engine_versions/{engine}.yaml miner_pins.{producer} to widen the SpecifierSet to include the bumped version, and commits the SSOT change via the llem-ci-bot App token (cascades; GITHUB_TOKEN would not). The probe re-runs against the widened range; the verdict flips to PASS and downstream stages proceed.

Route 3 - Escalate / block. The dev applies the probe-blocked label. Renovate stops retrying this bump until the label is removed; route 1 or 2 must follow before merge.

:::caution No other slash commands /rerun, /skip-probe, /force-merge were explicitly rejected as footguns. The deliberate scope is one binary approval gate per (engine, producer) - no escape hatches. :::

Adjacent pipelines

These pipelines run independent of the per-PR Renovate cycle.

`engine-versions-sweep.yml` (scheduled, advisory)

Runs scripts/_probe.py over a curated version range (e.g. vllm v0.9..v0.12) on a weekly schedule. Updates engine_versions/{engine}.compat.json (probe cache + compat-matrix in one file; closes #470). Populates the probe-result cache so per-PR probes hit a warm cache.

Runtime side-products (study-local, not CI)

runtime_observations.jsonl and equivalence_groups.json are emitted at study-runtime, not by CI.

runtime_observations.jsonl:

Producer: src/llenergymeasure/study/runtime_observations.py (warnings.catch_warnings + logger handler wrapping each worker body); wired in runner.py.
Schema: schema_version=1; one record per (study_run_id, config_hash, cycle); outcome in {success, exception, subprocess_died}.
Consumer (today): llem report-gaps with --source runtime-warnings (the only wired source). Output is a YAML fragment for manual append to the corpus, with # TODO: human markers on placeholder fields. Preserved as an escape-hatch.
Consumer (long-term): subsume into curation digest Section 3 ("Runtime gaps observed"). Deferred to #475; reactivate after 2-3 Renovate cycles of operational data.

equivalence_groups.json:

Detects observed_config_hash collisions across configs: configs that Pydantic distinguishes (resolved_config_hash differs) but the engine collapses (observed_config_hash matches). Flagged as gap_detected: true - a dormancy signal.
The proposed_invariant_id field is currently always None; the consumer is deferred until a researcher hits a real gap_detected: true group and asks for tooling. Tracked in #405 and #474.

Legend

Marker	Meaning
`[auto]`	fully automated, no human action
`[chk]`	human checkpoint - required dev input
`[info]`	informational artefact, advisory

For the full design rationale (including the resolution of the per-engine vs per-concern split, the wait-for-sibling coordination decision, and the rejected summariser-workflow alternative), see the engine-coupling design discussion captured across PRs #477-#492.

Asymmetric engine architecture (locked design choice)​

Transformers PR-time CI flow (rebuild + probe/mine/introspect chain)​

vllm + tensorrt PR-time CI flow (no rebuild; upstream-direct)​

Pipeline shape: Renovate -> per-concern workflows -> writeback -> human curation​

Trigger contract​

engine-invariants cell (per-engine matrix)​

engine-schemas cell (engines matrix)​

Per-cell artefact contract​

Atomic writeback​

PR state after a Renovate cycle​

Human curation checkpoint​

Decision routes after digest review​

Probe-fail human checkpoint​

Adjacent pipelines​

engine-versions-sweep.yml (scheduled, advisory)​

Runtime side-products (study-local, not CI)​

Legend​