docs/advance/rollout_skip.rst
Last updated: 08/01/2025.
The RolloutSkip functionality is designed to accelerate the rollout process in reinforcement learning training by caching and reusing previously generated sequences. This feature is particularly useful when:
You need to repeatedly run experiments with the same configuration
You want to save time by avoiding redundant sequence generation to come close to the optimal policy
2.1 Trainer Adaptation
Both`RayDAPOTrainer()` (in `verl/recipe/dapo/dapo_ray_trainer.py`) and `RayPPOTrainer()`(in `verl/trainer/ppo/ray_trainer.py``) have already been adapted.
This is an example of how to patch rollout_skip in RayPPOTrainer.
.. code-block:: python
#* Import the RolloutSkip class
from verl.utils.rollout_skip import RolloutSkip
...
class RayPPOTrainer:
...
def fit(self):
...
#* Add code as follow:
rollout_skip = RolloutSkip(self.config, self.actor_rollout_wg)
rollout_skip.wrap_generate_sequences()
...
for epoch in range(self.config.trainer.total_epochs):
for batch_dict in self.train_dataloader:
...
2.2 Basic Configuration
Then, you should add the following parameters to your config to enable the RolloutSkip feature:
.. code-block:: bash
actor_rollout_ref.rollout.skip_rollout=True \
actor_rollout_ref.rollout.skip_dump_dir="/tmp/rollout_dump" \
Note:
skip_dump_dir is the directory where the cached sequences will be stored. Ensure that this directory is writable and accessible by your training process. And make sure that skip_dump_dir is not relative path because ray will store the data in /tmp/ray/session_<session_id>/ and the relative path will not be found in the worker.{experiment_name}_{project_name}_TrainGBS{train_gbs}__InferGBS{gen_gbs}__N{n}, once you change the experiment_name, project_name, train_gbs, gen_gbs, or n, the cached data will be stored in a new directory.