docs_new/docs/get-started/install.mdx
You can install SGLang using one of the methods below. This page primarily applies to common NVIDIA GPU platforms. For other or newer platforms, please refer to the dedicated pages for AMD GPUs, Intel Xeon CPUs, Google TPU, NVIDIA DGX Spark, NVIDIA Jetson, Ascend NPUs, and Intel XPU.
<Note> Prerequisites: Python 3.10 or higher. </Note>It is recommended to use uv for faster installation:
pip install --upgrade pip
pip install uv
uv pip install sglang
Docker is recommended (see Method 3 note on B300/GB300/CUDA 13). If you do not have Docker access, follow these steps:
# Replace X.Y.Z with the version by your SGLang install
uv pip install torch==X.Y.Z torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130
uv pip install sglang
sglang-kernel wheel for CUDA 13 from the sgl-project whl releases. Replace X.Y.Z with the sglang-kernel version required by your SGLang install (you can find this by running uv pip show sglang-kernel). Examples:# x86_64
uv pip install "https://github.com/sgl-project/whl/releases/download/vX.Y.Z/sglang_kernel-X.Y.Z+cu130-cp310-abi3-manylinux2014_x86_64.whl"
# aarch64
uv pip install "https://github.com/sgl-project/whl/releases/download/vX.Y.Z/sglang_kernel-X.Y.Z+cu130-cp310-abi3-manylinux2014_aarch64.whl"
ptxas fatal : Value 'sm_103a' is not defined for option 'gpu-name' on B300/GB300, fix it with:export TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root with either of the following solutions:
export CUDA_HOME=/usr/local/cuda-<your-cuda-version> to set the CUDA_HOME environment variable.# Use the last release branch
git clone -b v0.5.9 https://github.com/sgl-project/sglang.git
cd sglang
# Install the python packages
pip install --upgrade pip
pip install -e "python"
Quick fixes to common problems
lmsysorg/sglang:dev.The docker images are available on Docker Hub at lmsysorg/sglang, built from Dockerfile.
Replace <secret> below with your huggingface hub token.
docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --host 0.0.0.0 --port 30000
For production deployments, use the runtime variant which is significantly smaller (~40% reduction) by excluding build tools and development dependencies:
docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest-runtime \
python3 -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --host 0.0.0.0 --port 30000
You can also find the nightly docker images here.
Notes:
lmsysorg/sglang:dev-cu13 or stable image at lmsysorg/sglang:latest-cu130-runtime. Please, do not re-install the project as editable inside the docker image, since it will override the version of libraries specified by the cu13 docker image.Please check out OME, a Kubernetes operator for enterprise-grade management and serving of large language models (LLMs).
<Accordion title="More">Option 1: For single node serving (typically when the model size fits into GPUs on one node)
Execute command kubectl apply -f docker/k8s-sglang-service.yaml, to create k8s deployment and service, with llama-31-8b as example.
Option 2: For multi-node serving (usually when a large model requires more than one GPU node, such as DeepSeek-R1)
Modify the LLM model path and arguments as necessary, then execute command kubectl apply -f docker/k8s-sglang-distributed-sts.yaml, to create two nodes k8s statefulset and serving service.
This method is recommended if you plan to serve it as a service. A better approach is to use the k8s-sglang-service.yaml.
docker compose up -d in your terminal.To deploy on Kubernetes or 12+ clouds, you can use SkyPilot.
<Accordion title={<>SkyPilot YAML: <code>sglang.yaml</code></>}>
# sglang.yaml
envs:
HF_TOKEN: null
resources:
image_id: docker:lmsysorg/sglang:latest
accelerators: A100
ports: 30000
run: |
conda deactivate
python3 -m sglang.launch_server \
--model-path meta-llama/Llama-3.1-8B-Instruct \
--host 0.0.0.0 \
--port 30000
# Deploy on any cloud or Kubernetes cluster. Use --cloud <cloud> to select a specific cloud provider.
HF_TOKEN=<secret> sky launch -c sglang --env HF_TOKEN sglang.yaml
# Get the HTTP API endpoint
sky status --endpoint 30000 sglang
To deploy on SGLang on AWS SageMaker, check out AWS SageMaker Inference
Amazon Web Services provide supports for SGLang containers along with routine security patching. For available SGLang containers, check out AWS SGLang DLCs
To host a model with your own container, follow the following steps:
<Accordion title={<>Dockerfile Build Script: <code>build-and-push.sh</code></>}>
#!/bin/bash
AWS_ACCOUNT="<YOUR_AWS_ACCOUNT>"
AWS_REGION="<YOUR_AWS_REGION>"
REPOSITORY_NAME="<YOUR_REPOSITORY_NAME>"
IMAGE_TAG="<YOUR_IMAGE_TAG>"
ECR_REGISTRY="${AWS_ACCOUNT}.dkr.ecr.${AWS_REGION}.amazonaws.com"
IMAGE_URI="${ECR_REGISTRY}/${REPOSITORY_NAME}:${IMAGE_TAG}"
echo "Starting build and push process..."
# Login to ECR
echo "Logging into ECR..."
aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS --password-stdin ${ECR_REGISTRY}
# Build the image
echo "Building Docker image..."
docker build -t ${IMAGE_URI} -f sagemaker.Dockerfile .
echo "Pushing ${IMAGE_URI}"
docker push ${IMAGE_URI}
echo "Build and push completed successfully!"
python3 -m sglang.launch_server --model-path opt/ml/model --host 0.0.0.0 --port 8080. This is optimal for hosting your own model with SageMaker.python3 -m sglang.launch_server --help cli by specifying environment variables with prefix SM_SGLANG_.SM_SGLANG_ from SM_SGLANG_INPUT_ARGUMENT into --input-argument to be parsed into python3 -m sglang.launch_server cli.SM_SGLANG_MODEL_PATH=Qwen/Qwen3-0.6B and SM_SGLANG_REASONING_PARSER=qwen3.--attention-backend triton --sampling-backend pytorch and open an issue on GitHub.pip3 install --upgrade flashinfer-python --force-reinstall --no-deps and then delete the cache with rm -rf ~/.cache/flashinfer.