Skip to main content

vLLM Engine Schema

Engine version: 0.7.3
Discovered at: 2026-05-06T22:57:22+02:00
Discovery method: dataclasses.fields(EngineArgs) + msgspec.json.schema(SamplingParams)
Schema version: 1.0.0

Summary: 104 engine parameters, 31 sampling parameters.

Discovery limitations

  • sampling_params - constraints (e.g. temperature>=0, top_p in (0,1]) live in imperative _verify_args() and are not introspectable from field metadata
  • engine_params - per-field descriptions unavailable (vLLM EngineArgs has only a class docstring)

Engine Parameters

FieldTypeDefaultDescription
modelstrfacebook/opt-125m
served_model_name`strlist[str]None`
tokenizer`strNone`-
taskLiteral['auto', 'generate', 'embedding', 'embed', 'classify', 'score', 'reward', 'transcription']auto
skip_tokenizer_initboolfalse
tokenizer_modestrauto
trust_remote_codeboolfalse
allowed_local_media_pathstr""
download_dir`strNone`-
load_formatstrauto
config_formatConfigFormatauto
dtypestrauto
kv_cache_dtypestrauto
seedint0
max_model_len`intNone`-
distributed_executor_backend`strtype[ExecutorBase]None`
pipeline_parallel_sizeint1
tensor_parallel_sizeint1
max_parallel_loading_workers`intNone`-
block_size`intNone`-
enable_prefix_caching`boolNone`-
disable_sliding_windowboolfalse
use_v2_block_managerbooltrue
swap_spacefloat4
cpu_offload_gbfloat0
gpu_memory_utilizationfloat0.9
max_num_batched_tokens`intNone`-
max_num_partial_prefills`intNone`1
max_long_partial_prefills`intNone`1
long_prefill_token_threshold`intNone`0
max_num_seqs`intNone`-
max_logprobsint20
disable_log_statsboolfalse
revision`strNone`-
code_revision`strNone`-
rope_scaling`dict[str, Any]None`-
rope_theta`floatNone`-
hf_overrides`dict[str, Any]Callable[[<class 'transformers.configuration_utils.PretrainedConfig'>], PretrainedConfig]None`
tokenizer_revision`strNone`-
quantization`strNone`-
enforce_eager`boolNone`-
max_seq_len_to_captureint8192
disable_custom_all_reduceboolfalse
tokenizer_pool_sizeint0
tokenizer_pool_type`strtype[ForwardRef('BaseTokenizerGroup')]`ray
tokenizer_pool_extra_config`dict[str, Any]None`-
limit_mm_per_prompt`Mapping[str, int]None`-
mm_processor_kwargs`dict[str, Any]None`-
disable_mm_preprocessor_cacheboolfalse
enable_loraboolfalse
enable_lora_biasboolfalse
max_lorasint1
max_lora_rankint16
enable_prompt_adapterboolfalse
max_prompt_adaptersint1
max_prompt_adapter_tokenint0
fully_sharded_lorasboolfalse
lora_extra_vocab_sizeint256
long_lora_scaling_factors`tuple[float]None`-
lora_dtype`strdtypeNone`
max_cpu_loras`intNone`-
devicestrauto
num_scheduler_stepsint1
multi_step_stream_outputsbooltrue
ray_workers_use_nsightboolfalse
num_gpu_blocks_override`intNone`-
num_lookahead_slotsint0
model_loader_extra_config`dictNone`-
ignore_patterns`strlist[str]None`
preemption_mode`strNone`-
scheduler_delay_factorfloat0.0
enable_chunked_prefill`boolNone`-
guided_decoding_backendstrxgrammar
logits_processor_pattern`strNone`-
speculative_model`strNone`-
speculative_model_quantization`strNone`-
speculative_draft_tensor_parallel_size`intNone`-
num_speculative_tokens`intNone`-
speculative_disable_mqa_scorer`boolNone`false
speculative_max_model_len`intNone`-
speculative_disable_by_batch_size`intNone`-
ngram_prompt_lookup_max`intNone`-
ngram_prompt_lookup_min`intNone`-
spec_decoding_acceptance_methodstrrejection_sampler
typical_acceptance_sampler_posterior_threshold`floatNone`-
typical_acceptance_sampler_posterior_alpha`floatNone`-
qlora_adapter_name_or_path`strNone`-
disable_logprobs_during_spec_decoding`boolNone`-
otlp_traces_endpoint`strNone`-
collect_detailed_traces`strNone`-
disable_async_output_procboolfalse
scheduling_policyLiteral['fcfs', 'priority']fcfs
scheduler_cls`strtype[object]`vllm.core.scheduler.Scheduler
override_neuron_config`dict[str, Any]None`-
override_pooler_config`PoolerConfigNone`-
compilation_config`CompilationConfigNone`-
worker_clsstrauto
kv_transfer_config`KVTransferConfigNone`-
generation_config`strNone`-
override_generation_config`dict[str, Any]None`-
enable_sleep_modeboolfalse
model_implstrauto
calculate_kv_scales`boolNone`-
additional_config`dict[str, Any]None`-

Sampling Parameters

FieldTypeDefaultDescription
ninteger1
best_ofunknown-
_real_nunknown-
presence_penaltynumber0.0
frequency_penaltynumber0.0
repetition_penaltynumber1.0
temperaturenumber1.0
top_pnumber1.0
top_kinteger-1
min_pnumber0.0
seedunknown-
stopunknown-
stop_token_idsunknown-
bad_wordsunknown-
ignore_eosbooleanfalse
max_tokensunknown16
min_tokensinteger0
logprobsunknown-
prompt_logprobsunknown-
detokenizebooleantrue
skip_special_tokensbooleantrue
spaces_between_special_tokensbooleantrue
logits_processorsunknown-
include_stop_str_in_outputbooleanfalse
truncate_prompt_tokensunknown-
output_kindunknown0
output_text_buffer_lengthinteger0
_all_stop_token_idsarray[]
guided_decodingunknown-
logit_biasunknown-
allowed_token_idsunknown-