Back to Verl

NVFP4 QAT (Quantization-Aware Training) in verl

docs/low_precision/nvfp4_qat.md

0.8.03.0 KB
Original Source

NVFP4 QAT (Quantization-Aware Training) in verl

Last updated: 04/02/2026

verl supports NVFP4 Quantization-Aware Training (QAT), which applies fake quantization during training so the model learns to tolerate NVFP4 quantization error. At rollout time, weights are packed into real NVFP4 format for vLLM inference. This closes the precision gap between training and inference, preventing KL divergence explosion.

Training BackendTraining PrecisionRollout PrecisionvLLM Quant Method
FSDPBF16 + fake quantizationNVFP4 W4A16compressed-tensors
MegatronBF16 + fake quantizationNVFP4 W4A16modelopt

[!TIP] For ready-to-run scripts, environment setup, and experimental results, see the QAT recipe.


Key Configuration

FSDP Backend

Configured under actor_rollout_ref.actor.fsdp_config.qat:

yaml
actor_rollout_ref:
  actor:
    fsdp_config:
      qat:
        enable: true
        mode: "w4a16"
        group_size: 16
        ignore_patterns:
          - "lm_head"
          - "embed_tokens"
          - "re:.*mlp.gate$"
        quantization_config_path: "recipe/qat/config/nvfp4_w4a16.json"
ParameterDescriptionDefault
fsdp_config.qat.enableEnable QATFalse
fsdp_config.qat.modeQuantization mode"w4a16"
fsdp_config.qat.group_sizeQuantization group size16
fsdp_config.qat.ignore_patternsLayers to skip. Supports re: prefix for regex, otherwise substring match["lm_head", "embed_tokens", "re:.*mlp.gate$"]
fsdp_config.qat.quantization_config_pathvLLM quantization config JSON pathRequired

Megatron Backend

Configured under actor_rollout_ref.actor.megatron.qat:

yaml
actor_rollout_ref:
  actor:
    megatron:
      qat:
        enable: true
        mode: "w4a16"
        group_size: 16
        ignore_patterns:
          - "lm_head"
          - "*mlp.gate"
        quantization_config_path: "recipe/qat/config/nvfp4_w4a16_megatron.json"
ParameterDescriptionDefault
megatron.qat.enableEnable QATFalse
megatron.qat.modeQuantization mode"w4a16"
megatron.qat.group_sizeQuantization group size16
megatron.qat.ignore_patternsLayers to skip. Uses fnmatch glob syntax["lm_head", "*mlp.gate"]
megatron.qat.quantization_config_pathvLLM quantization config JSON pathRequired

Support Matrix

  • NVFP4 W4A16 (weight-only FP4 quantization)
  • Dense models and MoE models
  • FSDP and Megatron training backends
  • Full quantization and FFN-only quantization strategies
  • Verified on Qwen3-8B-Base and Qwen3-30B-A3B-Base

Notes

  • FSDP backend has scalability limitations for very large models. For large-scale training, use the Megatron backend.
  • FSDP uses re: prefix regex for ignore_patterns, while Megatron uses fnmatch glob syntax. The two are not interchangeable.