docs_new/index.mdx
<a class="github-button" href="https://github.com/sgl-project/sglang" data-size="large" data-show-count="true" aria-label="Star sgl-project/sglang on GitHub"
Star </a> <a class="github-button" href="https://github.com/sgl-project/sglang/fork" data-icon="octicon-repo-forked" data-size="large" data-show-count="true" aria-label="Fork sgl-project/sglang on GitHub"
Fork </a>
<script async defer src="https://buttons.github.io/buttons.js"></script> </br> <CardGroup cols={2}> <Card title="Performance & Runtime" icon="arrow-trend-up"> Designed for low-latency, high-throughput inference with RadixAttention, prefix caching, and multi-GPU parallelism. </Card> <Card title="Models & Ecosystem" icon="hexagon-nodes"> Broad support for Llama, Qwen, DeepSeek, and more. Compatible with Hugging Face and OpenAI APIs. </Card> <Card title="Extensive Hardware Support" icon="microchip"> Native support across <a href="./docs/hardware-platforms/overview">Hardware Platforms</a> including NVIDIA, AMD, Intel Xeon, Google TPU, and Ascend NPU accelerators. </Card> <Card title="Community & Training" icon="users"> Open-source with widespread adoption, powering 400k+ GPUs and integrated with major RL frameworks. </Card> </CardGroup>SGLang powers large-scale production deployments, generating trillions of tokens each day across more than 400,000 GPUs worldwide. It is hosted under the non-profit open-source organization LMSYS.
SGLang is an inference framework meant for production level serving. It is designed to deliver low-latency and high-throughput inference across a wide range of setups, from a single GPU to large distributed clusters.
<CardGroup cols={2}> <Card title="Install SGLang" icon="angles-down" href="./docs/get-started/install"> Install SGLang with pip, from source, or via Docker on your preferred hardware platform. </Card> <Card title="Quickstart" icon="zap" href="./docs/get-started/quickstart"> Launch your first model server and send requests in minutes with OpenAI-compatible APIs. </Card> </CardGroup> </div>
<div style={{ padding: "0.9rem 1rem 1rem" }}>
<p
style={{
margin: 0,
fontWeight: 600,
lineHeight: 1.35,
fontSize: "0.98rem",
}}
>
{"DeepSeek-V4 on Day 0: From Fast Inference to Verified RL with SGLang and Miles"}
</p>
<p
style={{
margin: "0.55rem 0 0",
fontSize: "0.85rem",
opacity: 0.75,
}}
>
{"April 25, 2026"}
</p>
</div>
</a>
<a
href="https://lmsys.org/blog/2026-04-10-sglang-hisparse/"
target="_blank"
rel="noopener noreferrer"
style={{
display: "block",
border: "1px solid rgba(128, 128, 128, 0.3)",
borderRadius: "0.75rem",
overflow: "hidden",
textDecoration: "none",
color: "inherit",
height: "100%",
}}
>
<div
style={{
aspectRatio: "16 / 9",
overflow: "hidden",
background: "rgba(128, 128, 128, 0.15)",
}}
>
</div>
<div style={{ padding: "0.9rem 1rem 1rem" }}>
<p
style={{
margin: 0,
fontWeight: 600,
lineHeight: 1.35,
fontSize: "0.98rem",
}}
>
{"HiSparse: Turbocharging Sparse Attention with Hierarchical Memory"}
</p>
<p
style={{
margin: "0.55rem 0 0",
fontSize: "0.85rem",
opacity: 0.75,
}}
>
{"April 10, 2026"}
</p>
</div>
</a>
<a
href="https://lmsys.org/blog/2026-03-25-gtc2026/"
target="_blank"
rel="noopener noreferrer"
style={{
display: "block",
border: "1px solid rgba(128, 128, 128, 0.3)",
borderRadius: "0.75rem",
overflow: "hidden",
textDecoration: "none",
color: "inherit",
height: "100%",
}}
>
<div
style={{
aspectRatio: "16 / 9",
overflow: "hidden",
background: "rgba(128, 128, 128, 0.15)",
}}
>
</div>
<div style={{ padding: "0.9rem 1rem 1rem" }}>
<p
style={{
margin: 0,
fontWeight: 600,
lineHeight: 1.35,
fontSize: "0.98rem",
}}
>
{"Highlights of SGLang at NVIDIA GTC 2026"}
</p>
<p
style={{
margin: "0.55rem 0 0",
fontSize: "0.85rem",
opacity: 0.75,
}}
>
{"March 31, 2026"}
</p>
</div>
</a>
<a
href="https://lmsys.org/blog/2026-03-25-eep-partial-failure-tolerance/"
target="_blank"
rel="noopener noreferrer"
style={{
display: "block",
border: "1px solid rgba(128, 128, 128, 0.3)",
borderRadius: "0.75rem",
overflow: "hidden",
textDecoration: "none",
color: "inherit",
height: "100%",
}}
>
<div
style={{
aspectRatio: "16 / 9",
overflow: "hidden",
background: "rgba(128, 128, 128, 0.15)",
}}
>
</div>
<div style={{ padding: "0.9rem 1rem 1rem" }}>
<p
style={{
margin: 0,
fontWeight: 600,
lineHeight: 1.35,
fontSize: "0.98rem",
}}
>
{"Elastic EP in SGLang: Achieving Partial Failure Tolerance for DeepSeek MoE Deployments"}
</p>
<p
style={{
margin: "0.55rem 0 0",
fontSize: "0.85rem",
opacity: 0.75,
}}
>
{"March 25, 2026"}
</p>
</div>
</a>
<a
href="https://lmsys.org/blog/2026-03-17-rocm-miles-rl-amd/"
target="_blank"
rel="noopener noreferrer"
style={{
display: "block",
border: "1px solid rgba(128, 128, 128, 0.3)",
borderRadius: "0.75rem",
overflow: "hidden",
textDecoration: "none",
color: "inherit",
height: "100%",
}}
>
<div
style={{
aspectRatio: "16 / 9",
overflow: "hidden",
background: "rgba(128, 128, 128, 0.15)",
}}
>
</div>
<div style={{ padding: "0.9rem 1rem 1rem" }}>
<p
style={{
margin: 0,
fontWeight: 600,
lineHeight: 1.35,
fontSize: "0.98rem",
}}
>
{"ROCm Support for Miles: Large-Scale RL Post-Training on AMD Instinct\u2122 GPUs"}
</p>
<p
style={{
margin: "0.55rem 0 0",
fontSize: "0.85rem",
opacity: 0.75,
}}
>
{"March 17, 2026"}
</p>
</div>
</a>
<a
href="https://lmsys.org/blog/2026-03-11-run-nvidia-nemotron-3-super/"
target="_blank"
rel="noopener noreferrer"
style={{
display: "block",
border: "1px solid rgba(128, 128, 128, 0.3)",
borderRadius: "0.75rem",
overflow: "hidden",
textDecoration: "none",
color: "inherit",
height: "100%",
}}
>
<div
style={{
aspectRatio: "16 / 9",
overflow: "hidden",
background: "rgba(128, 128, 128, 0.15)",
}}
>
</div>
<div style={{ padding: "0.9rem 1rem 1rem" }}>
<p
style={{
margin: 0,
fontWeight: 600,
lineHeight: 1.35,
fontSize: "0.98rem",
}}
>
{"SGLang Adds Day-0 Support for NVIDIA Nemotron 3 Super for building High-Efficiency Multi-Agent Systems"}
</p>
<p
style={{
margin: "0.55rem 0 0",
fontSize: "0.85rem",
opacity: 0.75,
}}
>
{"March 11, 2026"}
</p>
</div>
</a>