import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';

Run an experiment with vLLM (Docker)

This recipe runs a single measurement against the vLLM engine. Use it when you want to measure inference under vLLM's continuous-batching runtime rather than HuggingFace transformers.

:::caution vLLM requires Docker The vLLM engine runs inside a Docker container. Attempting to run vLLM without Docker raises a PreFlightError at preflight. Ensure Docker and the NVIDIA Container Toolkit are installed before proceeding. :::

Prerequisites

llenergymeasure installed (host-side orchestrator)
Docker + NVIDIA Container Toolkit - see Docker setup
vLLM Docker image built or pullable from GHCR - see Contributing > Development

1. Create a config file

Create experiment.yaml:

engine: vllm
task:
  model: gpt2
  dataset:
    source: aienergyscore
    n_prompts: 50
runners:
  vllm: docker

from llenergymeasure import run_experiment

result = run_experiment(
    model="gpt2",
    engine="vllm",
    n_prompts=50,
)
print(result)

2. Run the experiment

llem run experiment.yaml

from llenergymeasure import run_experiment

result = run_experiment("experiment.yaml")

What happens:

Pre-flight checks run: Docker CLI, NVIDIA Container Toolkit, GPU visibility inside container, CUDA/driver compatibility.
The vLLM Docker image is pulled on first run (ghcr.io/henrycgbaker/llenergymeasure/vllm:v0.9.0).
The container launches, runs the experiment, and streams results back.
Results are printed to stdout and saved to results/.

3. Read the results

The output format matches the Transformers track. The key difference is engine: vllm in the experiment ID and result file. See How to interpret results for the field-by-field walkthrough.

Tutorial: Your first measurement - start here if you've never run llem
How to: run with TensorRT-LLM - sister recipe for the TRT-LLM engine
Reference: engine configuration - every vLLM-specific config field

Prerequisites​

1. Create a config file​

2. Run the experiment​

3. Read the results​

Related​

Prerequisites

1. Create a config file

2. Run the experiment

3. Read the results

Related