docs/developer_guide/setup_github_runner.md
**You can mount a folder for the shared huggingface model weights cache. **
The command below uses /tmp/huggingface as an example.
docker pull nvidia/cuda:12.9.1-devel-ubuntu22.04
# Nvidia
docker run --shm-size 128g -it -v /tmp/huggingface:/hf_home --gpus all nvidia/cuda:12.9.1-devel-ubuntu22.04 /bin/bash
# AMD
docker run --rm --device=/dev/kfd --device=/dev/dri --group-add video --shm-size 128g -it -v /tmp/huggingface:/hf_home lmsysorg/sglang:v0.5.8-rocm700-mi30x /bin/bash
# AMD just the last 2 GPUs
docker run --rm --device=/dev/kfd --device=/dev/dri/renderD176 --device=/dev/dri/renderD184 --group-add video --shm-size 128g -it -v /tmp/huggingface:/hf_home lmsysorg/sglang:v0.5.8-rocm700-mi30x /bin/bash
config.shRun these commands inside the container.
apt update && apt install -y curl python3-pip git
pip install --upgrade pip
export RUNNER_ALLOW_RUNASROOT=1
Then follow https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/adding-self-hosted-runners to run config.sh
Notes
test-sgl-gpu-0) and some labels (e.g., 1-gpu-h100). The labels can be edited later in Github Settings.run.shexport HF_HOME=/hf_home
export SGLANG_IS_IN_CI=true
export HF_TOKEN=hf_xxx
export OPENAI_API_KEY=sk-xxx
export CUDA_VISIBLE_DEVICES=0
while true; do ./run.sh; echo "Restarting..."; sleep 2; done