Back to Vllm

NVIDIA Dynamo

docs/deployment/integrations/dynamo.md

0.20.1620 B
Original Source

NVIDIA Dynamo

NVIDIA Dynamo is an open-source framework for distributed LLM inference that can run vLLM on Kubernetes with flexible serving architectures (e.g. aggregated/disaggregated, optional router/planner).

For Kubernetes deployment instructions and examples (including vLLM), see the Deploying Dynamo on Kubernetes guide.

Background reading: InfoQ news coverage — NVIDIA Dynamo simplifies Kubernetes deployment for LLM inference.