RolloutSkip Function Usage Documentation

Last updated: 08/01/2025.

Applicable Scenarios

The RolloutSkip functionality is designed to accelerate the rollout process in reinforcement learning training by caching and reusing previously generated sequences. This feature is particularly useful when:

You need to repeatedly run experiments with the same configuration
You want to save time by avoiding redundant sequence generation to come close to the optimal policy

API and Usage Example

2.1 Trainer Adaptation


Both`RayDAPOTrainer()` (in `verl/recipe/dapo/dapo_ray_trainer.py`) and `RayPPOTrainer()`(in `verl/trainer/ppo/ray_trainer.py``) have already been adapted.

This is an example of how to patch rollout_skip in RayPPOTrainer.

.. code-block:: python

    #* Import the RolloutSkip class
    from verl.utils.rollout_skip import RolloutSkip

    ...
    class RayPPOTrainer:
        ...
        def fit(self):
            ...

            #* Add code as follow:
            rollout_skip = RolloutSkip(self.config, self.actor_rollout_wg)
            rollout_skip.wrap_generate_sequences()

            ...

            for epoch in range(self.config.trainer.total_epochs):
                for batch_dict in self.train_dataloader:
                    ...

2.2 Basic Configuration

Then, you should add the following parameters to your config to enable the RolloutSkip feature:

.. code-block:: bash

actor_rollout_ref.rollout.skip_rollout=True \
actor_rollout_ref.rollout.skip_dump_dir="/tmp/rollout_dump" \

Note:

The skip_dump_dir is the directory where the cached sequences will be stored. Ensure that this directory is writable and accessible by your training process. And make sure that skip_dump_dir is not relative path because ray will store the data in /tmp/ray/session_<session_id>/ and the relative path will not be found in the worker.
The dumped data path follows this naming pattern {experiment_name}_{project_name}_TrainGBS{train_gbs}__InferGBS{gen_gbs}__N{n}, once you change the experiment_name, project_name, train_gbs, gen_gbs, or n, the cached data will be stored in a new directory.