docs/advance/fp8.md
Last updated: 03/05/2026
verl supports two FP8 modes for accelerating RL training:
| Mode | Training Precision | Rollout Precision |
|---|---|---|
| FP8 Rollout Only | BF16 | FP8 |
| FP8 End-to-End | FP8 (Megatron) | FP8 (vLLM) |
[!TIP] For ready-to-run scripts, see the low-precision recipe directory.
FP8 rollout-only mode keeps training in BF16 and quantizes rollout inference to FP8. This reduces GPU memory during generation and speeds up rollout without affecting training precision.
We monkey patch several vLLM functions to enable FP8 rollout for reinforcement learning:
vllm.model_executor.layers.quantization.fp8.Fp8LinearMethod.process_weights_after_loading function to handle weight processing after quantization. For SGLang, this patch is not needed as it natively supports loading quantized weights.Enable in config file:
rollout:
quantization: "fp8"
Or via command line:
actor_rollout_ref.rollout.quantization=fp8
Configuration
Accuracy dark green: BF16, orange: FP8 rollout + token-level TIS, light green: FP8 rollout without TIS
Results and observations:
Performance
green: BF16, orange: FP8 rollout + CUDA12.6 + DeepGemm, purple: FP8 rollout + CUDA 12.9 + DeepGemm
Results and observations:
Configuration
Accuracy grey: BF16 + token-level TIS, red: FP8 rollout + token-level TIS
Results and observations:
Performance
grey: BF16 + token-level TIS, red: FP8 rollout + token-level TIS
Results and observations:
FP8 E2E applies FP8 to the entire RL pipeline: forward/backward passes via Transformer Engine, FP8 optimizer states, and FP8 rollout inference via vLLM. This maximizes memory savings and throughput.
NVTE_FP8_BLOCK_SCALING_FP32_SCALES=1# FP8 training via Transformer Engine
actor_rollout_ref.actor.megatron.override_transformer_config:
fp8: "hybrid" # FP8 forward + backward; also supports "e4m3"
fp8_recipe: "blockwise" # block-wise scaling
# FP8 optimizer
actor_rollout_ref.actor.optim.override_optimizer_config:
fp8_recipe: "blockwise"
# FP8 rollout inference (vLLM)
actor_rollout_ref.rollout:
quantization: fp8
fp8_recipe: "blockwise")Configuration
Orange: BF16, Green: FP8 E2E, Red: FP8 rollout + BF16 training
Results and observations:
For more extensive experiments, ablation studies, and analysis on FP8 reinforcement learning, please refer to our technical report:
@article{qiu2026fp8rl,
title={FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning},
author={Qiu, Zhaopeng and Yu, Shuang and Zhang, Jingqi and Zhang, Shuai and Huang, Xue and Yang, Jingyi and Lai, Junjie},
journal={arXiv preprint arXiv:2601.18150},
year={2026},
url={https://arxiv.org/abs/2601.18150}
}