Back to Ray

User guides

doc/source/serve/llm/user-guides/index.md

1.13.1559 B
Original Source

User guides

How-to guides for deploying and configuring Ray Serve LLM features.

{toctree}
:maxdepth: 1

Cross-node parallelism <cross-node-parallelism>
Data parallel attention <data-parallel-attention>
Deployment Initialization <deployment-initialization>
Prefill/decode disaggregation <prefill-decode>
KV cache offloading <kv-cache-offloading>
Prefix-aware routing <prefix-aware-routing>
Multi-LoRA deployment <multi-lora>
vLLM compatibility <vllm-compatibility>
Fractional GPU serving <fractional-gpu>
Observability and monitoring <observability>