examples/router_replay/README.md
Router Replay is an advanced routing replay functionality within the Verl framework designed for Mixture of Experts (MoE) models. It enables deterministic training by recording and replaying routing decisions, ensuring consistent model behavior across training runs.
disabled: Router replay functionality is completely disabledR2: Standard router replay mode for recording and replaying routing decisionsR3: Rollout-specific router replay mode optimized for reinforcement learning workflowsrouter_replay:
mode: "disabled" # Available options: disabled, R2, R3
record_file: null # Path for recording routing decisions
replay_file: null # Path for replaying recorded decisions
Add the following to your training configuration:
actor:
router_replay:
mode: "R2"
Enable R2 mode via command-line parameters:
actor_rollout_ref.actor.router_replay.mode="R2"
Configure both actor and rollout settings:
# Actor configuration
router_replay:
mode: "R3"
# Rollout configuration
enable_rollout_routing_replay: True
Enable R3 mode via command-line parameters:
actor_rollout_ref.actor.router_replay.mode="R3"
actor_rollout_ref.rollout.enable_rollout_routing_replay=True
R3 mode requires the rollout backend to support returning router selection results. Currently, this functionality is being tested based on the vllm implementation at https://github.com/vllm-project/vllm/pull/28284 as well as bug fix at https://github.com/vllm-project/vllm/pull/33013 and SGLang implementation at https://github.com/sgl-project/sglang/commit/bed301a5acaa9577c9aa706468bdf242f6a43051.