Back to Vllm

Engine-Core Smoke Tests

rust/src/engine-core-client/examples/README.md

0.22.01.7 KB
Original Source

Engine-Core Smoke Tests

Start headless vllm:

bash
source ../vllm/.venv/bin/activate
HF_HUB_OFFLINE=1 \
VLLM_LOGGING_LEVEL=DEBUG \
VLLM_CPU_KVCACHE_SPACE=2 \
VLLM_HOST_IP=127.0.0.1 \
VLLM_LOOPBACK_IP=127.0.0.1 \
python3 -m vllm.entrypoints.cli.main serve Qwen/Qwen3-0.6B \
  --headless \
  --enable-sleep-mode \
  --data-parallel-address 127.0.0.1 \
  --data-parallel-rpc-port 62100 \
  --data-parallel-size-local 1 \
  --max-model-len 512 \
  --dtype float16

Run the Rust smoke test through the vllm-engine-core-client utility interface:

bash
cargo run -p vllm-engine-core-client --example external_engine_utility_call -- \
  --handshake-address tcp://127.0.0.1:62100 \
  --host 127.0.0.1

If your current engine setup does not support sleep mode, skip the sleep / wake_up part of the smoke:

bash
cargo run -p vllm-engine-core-client --example external_engine_utility_call -- \
  --handshake-address tcp://127.0.0.1:62100 \
  --host 127.0.0.1 \
  --skip-sleep-wake

Run the Rust smoke test for sample logprobs decoding through the raw engine-core request path:

bash
cargo run -p vllm-engine-core-client --example external_engine_logprobs -- \
  --handshake-address tcp://127.0.0.1:62100 \
  --host 127.0.0.1

This smoke requests a small generated-token logprobs payload plus prompt logprobs over a much longer prompt, so it exercises both the inline and aux-frame decode paths against a real engine. The Rust client decodes those payloads into semantic per-position records rather than exposing the raw ndarray/tensor wire shape.

IMPORTANT: You must restart vllm each time you run the smoke test, as the vLLM engine cannot manage frontend closures and subsequent reconnects. In other words, do not reuse existing vllm instances, if any.