docs/index.rst
.. raw:: html
<a class="github-button" href="https://github.com/sgl-project/sglang" data-size="large" data-show-count="true" aria-label="Star sgl-project/sglang on GitHub">Star</a> <a class="github-button" href="https://github.com/sgl-project/sglang/fork" data-icon="octicon-repo-forked" data-size="large" data-show-count="true" aria-label="Fork sgl-project/sglang on GitHub">Fork</a>
<script async defer src="https://buttons.github.io/buttons.js"></script> </br>SGLang is a high-performance serving framework for large language models and multimodal models. It is designed to deliver low-latency and high-throughput inference across a wide range of setups, from a single GPU to large distributed clusters. Its core features include:
.. toctree:: :maxdepth: 1 :caption: Get Started
get_started/install.md
.. toctree:: :maxdepth: 1 :caption: Basic Usage
basic_usage/send_request.ipynb basic_usage/openai_api.rst basic_usage/ollama_api.md basic_usage/offline_engine_api.ipynb basic_usage/native_api.ipynb basic_usage/sampling_params.md basic_usage/popular_model_usage.rst
.. toctree:: :maxdepth: 1 :caption: Advanced Features
advanced_features/server_arguments.md advanced_features/object_storage.md advanced_features/hyperparameter_tuning.md advanced_features/attention_backend.md advanced_features/speculative_decoding.ipynb advanced_features/adaptive_speculative_decoding.md advanced_features/structured_outputs.ipynb advanced_features/structured_outputs_for_reasoning_models.ipynb advanced_features/tool_parser.ipynb advanced_features/separate_reasoning.ipynb advanced_features/quantization.md advanced_features/quantized_kv_cache.md advanced_features/expert_parallelism.md advanced_features/dp_dpa_smg_guide.md advanced_features/lora.ipynb advanced_features/pd_disaggregation.md advanced_features/epd_disaggregation.md advanced_features/pipeline_parallelism.md advanced_features/hicache.rst advanced_features/pd_multiplexing.md advanced_features/vlm_query.ipynb advanced_features/dp_for_multi_modal_encoder.md advanced_features/cuda_graph_for_multi_modal_encoder.md advanced_features/piecewise_cuda_graph.md advanced_features/breakable_cuda_graph.md advanced_features/sgl_model_gateway.md advanced_features/deterministic_inference.md advanced_features/observability.md advanced_features/checkpoint_engine.md advanced_features/sglang_for_rl.md
.. toctree:: :maxdepth: 2 :caption: Supported Models
supported_models/text_generation/index supported_models/retrieval_ranking/index supported_models/specialized/index supported_models/extending/index
.. toctree:: :maxdepth: 2 :caption: SGLang Diffusion
diffusion/index diffusion/installation diffusion/compatibility_matrix diffusion/api/cli diffusion/api/openai_api diffusion/performance/index diffusion/performance/ring_sp_performance diffusion/performance/attention_backends diffusion/performance/cache/index diffusion/quantization diffusion/contributing
.. toctree:: :maxdepth: 1 :caption: Hardware Platforms
platforms/amd_gpu.md platforms/cpu_server.md platforms/tpu.md platforms/nvidia_jetson.md platforms/ascend/ascend_npu_support.rst platforms/xpu.md
.. toctree:: :maxdepth: 1 :caption: Developer Guide
developer_guide/contribution_guide.md developer_guide/development_guide_using_docker.md developer_guide/development_jit_kernel_guide.md developer_guide/benchmark_and_profiling.md developer_guide/bench_serving.md developer_guide/evaluating_new_models.md
.. toctree:: :maxdepth: 1 :caption: References
references/faq.md references/environment_variables.md references/production_metrics.md references/production_request_trace.md references/multi_node_deployment/multi_node_index.rst references/custom_chat_template.md references/frontend/frontend_index.rst references/post_training_integration.md references/release_lookup references/learn_more.md
.. toctree:: :maxdepth: 1 :caption: Security Acknowledgement
security/acknowledgements.md