Back to Vllm

llm-d

docs/deployment/integrations/llm-d.md

0.20.1602 B
Original Source

llm-d

vLLM can be deployed with llm-d, a Kubernetes-native distributed inference serving stack providing well-lit paths for anyone to serve large generative AI models at scale. It helps achieve the fastest "time to state-of-the-art (SOTA) performance" for key OSS models across most hardware accelerators and infrastructure providers.

You can use vLLM with llm-d directly by following this guide or via KServe's LLMInferenceService.