Back to Verl

REINFORCE++

examples/reinforce_plus_plus_trainer/README.md

0.8.0604 B
Original Source

REINFORCE++

REINFORCE++ is a simple, critic-free PG variant that extends REINFORCE with token-level KL penalties and advantage whitening.

Reference: REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models.

Canonical Scripts

ScriptInferTrainPlatform
run_qwen3_8b_fsdp.shvLLMFSDPNVIDIA

Switch to the baseline variant by setting ADV_ESTIMATOR=reinforce_plus_plus_baseline when running the script.