docs/examples/README.md
vLLM's examples are organized into the following categories:
basic/ – Minimal examples for offline inference and online serving.generate/ – Text generation examples, including multimodal models.pooling/ – Examples for embedding, classification, scoring, reward, etc.speech_to_text/ – Speech transcription, translation and real-time audio examples.features/ – Demonstrations of individual vLLM features: automatic prefix caching, speculative decoding, LoRA, structured outputs, prompt embedding, pause/resume, batch invariance, KV events, data parallelism, and more.reasoning/ – Examples for reasoning with vLLM.tool_calling/ – Examples for function/tool calling with vLLM.applications/ – Application examples such as chatbots and RAG (Retrieval-Augmented Generation).rl/ – Reinforcement learning examples.deployment/ – Examples for deploying vLLM in production.ray_serving/ – Scalable serving using Ray.disaggregated/ – Examples for disaggregated serving (separate prefill and decode), including various kv cache connectors (LMCache, Mooncake, FlexKV, P2P NCCL) and failure recovery.observability/ – Metrics, logging, tracing (OpenTelemetry), and dashboards (Grafana, Perses).