docs/contributing/ci/nightly_builds.md
vLLM maintains a per-commit wheel repository (commonly referred to as "nightly") at https://wheels.vllm.ai that provides pre-built wheels for every commit on the main branch since v0.5.3. This document explains how the nightly wheel index mechanism works.
Wheels are built in the Release pipeline (.buildkite/release-pipeline.yaml) after a PR is merged into the main branch, with multiple variants:
cpu and cuXXX (e.g., cu129, cu130).x86_64 and aarch64.Each build step:
manylinux_2_31) for PEP 600 compliance.vllm-wheels under /{commit_hash}/.After uploading each wheel, the .buildkite/scripts/upload-wheels.sh script:
.buildkite/scripts/generate-nightly-index.py:
index.html) for PyPI compatibility.metadata.json files./{commit_hash}/ - Always uploaded for commit-specific access./nightly/ - Only for commits on main branch (not PRs)./{version}/ - Only for release wheels (no dev in its version).!!! tip "Handling Concurrent Builds" The index generation script can handle multiple variants being built concurrently by always listing all wheels in the commit directory before generating indices, avoiding race conditions.
The S3 bucket structure follows this pattern:
s3://vllm-wheels/
├── {commit_hash}/ # Commit-specific wheels and indices
│ ├── vllm-*.whl # All wheel files
│ ├── index.html # Project list (default variant)
│ ├── vllm/
│ │ ├── index.html # Package index (default variant)
│ │ └── metadata.json # Metadata (default variant)
│ ├── cu129/ # Variant subdirectory
│ │ ├── index.html # Project list (cu129 variant)
│ │ └── vllm/
│ │ ├── index.html # Package index (cu129 variant)
│ │ └── metadata.json # Metadata (cu129 variant)
│ ├── cu130/ # Variant subdirectory
│ ├── cpu/ # Variant subdirectory
│ └── .../ # More variant subdirectories
├── nightly/ # Latest main branch wheels (mirror of latest commit)
└── {version}/ # Release version indices (e.g., 0.11.2)
All built wheels are stored in /{commit_hash}/, while different indices are generated and reference them.
This avoids duplication of wheel files.
For example, you can specify the following URLs to use different indices:
https://wheels.vllm.ai/nightly/cu130 for the latest main branch wheels built with CUDA 13.0.https://wheels.vllm.ai/{commit_hash} for wheels built at a specific commit (default variant).https://wheels.vllm.ai/0.12.0/cpu for 0.12.0 release wheels built for CPU variant.Please note that not all variants are present on every commit. The available variants are subject to change over time, e.g., changing cu130 to cu131.
Indices are organized by variant:
VLLM_MAIN_CUDA_VERSION) are placed in the root.+cu130, .cpu) are organized in subdirectories.cu129 for now) for consistency and convenience.The variant is extracted from the wheel filename (as described in the file name convention):
+cu129 or dev<N>+g<hash>.cu130).vllm-0.11.2.dev278+gdbc3d9991-cp38-abi3-manylinux1_x86_64.whl → default variantvllm-0.10.2rc2+cu129-cp38-abi3-manylinux2014_aarch64.whl → cu129 variantvllm-0.11.1rc8.dev14+gaa384b3c0.cu130-cp38-abi3-manylinux1_x86_64.whl → cu130 variantThe generate-nightly-index.py script performs the following:
vllm is built, but the structure supports multiple packages in the future.index.html: Lists all packages and variant subdirectoriesindex.html: Lists all wheel files for that packagepath field with URL-encoded relative path to wheel filesetup.py to locate compatible pre-compiled wheels during Python-only buildsThe wheels and indices are directly stored on AWS S3, and we use AWS CloudFront as a CDN in front of the S3 bucket.
Since S3 does not provide proper directory listing, to support PyPI-compatible simple repository API behavior, we deploy a CloudFront Function that:
/ and does not look like a file (i.e., does not contain a dot . in the last path segment) to the same URL with a trailing //index.html to any URL that ends with /For example, the following requests would be handled as:
/nightly -> /nightly/index.html/nightly/cu130/ -> /nightly/cu130/index.html/nightly/index.html or /nightly/vllm.whl -> unchanged!!! note "AWS S3 Filename Escaping"
S3 will automatically escape filenames upon upload according to its [naming rule](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html). The direct impact on vllm is that `+` in filenames will be converted to `%2B`. We take special care in the index generation script to escape filenames properly when generating the HTML indices and JSON metadata, to ensure the URLs are correct and can be directly used.
setup.py {#precompiled-wheels-usage}When installing vLLM with VLLM_USE_PRECOMPILED=1, the setup.py script:
precompiled_wheel_utils.determine_wheel_url():
VLLM_PRECOMPILED_WHEEL_LOCATION (user-specified URL/path) always takes precedence and skips all other steps.VLLM_MAIN_CUDA_VERSION (can be overridden with env var VLLM_PRECOMPILED_WHEEL_VARIANT); the default variant will also be tried as a fallback.VLLM_PRECOMPILED_WHEEL_COMMIT).https://wheels.vllm.ai/{commit}/vllm/metadata.json (for the default variant) or https://wheels.vllm.ai/{commit}/{variant}/vllm/metadata.json (for a specific variant).vllm).so files)!!! note "What is the base commit?"
The base commit is determined by finding the merge-base
between the current branch and upstream `main`, ensuring
compatibility between source code and precompiled binaries.
Note: it's users' responsibility to ensure there is no native code (e.g., C++ or CUDA) changes before using precompiled wheels.
Key files involved in the nightly wheel mechanism:
.buildkite/release-pipeline.yaml: CI pipeline that builds wheels.buildkite/scripts/upload-wheels.sh: Script that uploads wheels and generates indices.buildkite/scripts/generate-nightly-index.py: Python script that generates PyPI-compatible indicessetup.py: Contains precompiled_wheel_utils class for fetching and using precompiled wheels