docs/contributing/ci/update_pytorch_version.md
vLLM's current policy is to always use the latest PyTorch stable release in CI/CD. It is standard practice to submit a PR to update the PyTorch version as early as possible when a new PyTorch stable release becomes available. This process is non-trivial due to the gap between PyTorch releases. Using https://github.com/vllm-project/vllm/pull/16859 as an example, this document outlines common steps to achieve this update along with a list of potential issues and how to address them.
Updating PyTorch in vLLM after the official release is not ideal because any issues discovered at that point can only be resolved by waiting for the next release or by implementing hacky workarounds in vLLM. The better solution is to test vLLM with PyTorch release candidates (RC) to ensure compatibility before each release.
PyTorch release candidates can be downloaded from PyTorch test index.
For example, torch2.7.0+cu12.8 RC can be installed using the following command:
uv pip install torch torchvision torchaudio \
--index-url https://download.pytorch.org/whl/test/cu128
When the final RC is ready for testing, it will be announced to the community on the PyTorch dev-discuss forum. After this announcement, we can begin testing vLLM integration by drafting a pull request following this 3-step process:
Update requirements files
to point to the new releases for torch, torchvision, and torchaudio.
Use the following option to get the final release candidates' wheels. Some common platforms are cpu, cu128, and rocm6.2.4.
--extra-index-url https://download.pytorch.org/whl/test/<PLATFORM>
Since vLLM uses uv, ensure the following index strategy is applied:
export UV_INDEX_STRATEGY=unsafe-best-match
--index-strategy unsafe-best-match
If failures are found in the pull request, raise them as issues on vLLM and cc the PyTorch release team to initiate discussion on how to address them.
The PyTorch release matrix includes both stable and experimental CUDA versions. Due to limitations, only the latest stable CUDA version (for example, torch 2.7.1+cu126) is uploaded to PyPI. However, vLLM may require a different CUDA version,
such as 12.8 for Blackwell support.
This complicates the process as we cannot use the out-of-the-box
pip install torch torchvision torchaudio command. The solution is to use
--extra-index-url in vLLM's Dockerfiles.
| Platform | --extra-index-url |
|---|---|
| CUDA 12.8 | https://download.pytorch.org/whl/cu128 |
| CPU | https://download.pytorch.org/whl/cpu |
| ROCm 6.2 | https://download.pytorch.org/whl/rocm6.2.4 |
| ROCm 6.3 | https://download.pytorch.org/whl/rocm6.3 |
| XPU | https://download.pytorch.org/whl/xpu |
.buildkite/release-pipeline.yaml.buildkite/scripts/upload-wheels.shWhen building vLLM with a new PyTorch/CUDA version, the vLLM sccache S3 bucket will not have any cached artifacts, which can cause CI build jobs to exceed 5 hours. Furthermore, vLLM's fastcheck pipeline operates in read-only mode and does not populate the cache, making it ineffective for cache warm-up purposes.
To address this, manually trigger a build on Buildkite to accomplish two objectives:
RUN_ALL=1 and NIGHTLY=1Rather than attempting to update all vLLM platforms in a single pull request, it's more manageable to handle some platforms separately. The separation of requirements and Dockerfiles for different platforms in vLLM CI/CD allows us to selectively choose which platforms to update. For instance, updating XPU requires the corresponding release from Intel Extension for PyTorch by Intel. While https://github.com/vllm-project/vllm/pull/16859 updated vLLM to PyTorch 2.7.0 on CPU, CUDA, and ROCm, https://github.com/vllm-project/vllm/pull/17444 completed the update for XPU.