REINFORCE++

REINFORCE++ is a simple, critic-free PG variant that extends REINFORCE with token-level KL penalties and advantage whitening.

Canonical Scripts

Script	Infer	Train	Platform
`run_qwen3_8b_fsdp.sh`	vLLM	FSDP	NVIDIA

Switch to the baseline variant by setting ADV_ESTIMATOR=reinforce_plus_plus_baseline when running the script.