doc/source/serve/llm/index.md
(serving-llms)=
Ray Serve LLM provides a high-performance, scalable framework for deploying Large Language Models (LLMs) in production. It specializes Ray Serve primitives for distributed LLM serving workloads, offering enterprise-grade features with OpenAI API compatibility.
Ray Serve LLM excels at highly distributed multi-node inference workloads:
pip install ray[serve,llm]
:hidden:
Quickstart <quick-start>
Examples <examples>
User Guides <user-guides/index>
Architecture <architecture/index>
Benchmarks <benchmarks>
Troubleshooting <troubleshooting>
Quickstart <quick-start> - Deploy your first LLM with Ray ServeExamples <examples> - Production-ready deployment tutorialsUser Guides <user-guides/index> - Practical guides for advanced featuresArchitecture <architecture/index> - Technical design and implementation detailsTroubleshooting <troubleshooting> - Common issues and solutions