applications/ColossalChat/coati/distributed/zero_bubble/README.md
This folder contains code for the Zero Bubble distributed RL framework. It currently supports GRPO and DAPO. See the main README for general installation instructions and usage.
Note: This project is under active development — expect changes.
We aim to reduce the “bubble” — the idle time that occurs between rollouts and training steps (illustrated in Fig. 1).
<div align="center"> <p align="center"> </p> </div>Fig. 1 - In an all-sync online RL framework, rollout workers wait for the trainer to finish training and synchronize weights, and the trainer waits for rollouts. This causes large GPU idle time.
<div align="center"> <p align="center"> </p> </div>Fig. 2 - Our Zero Bubble pipeline follows a producer–consumer pattern:
Under ideal conditions (inference workers produce data at the same rate the trainer consumes it), the pipeline eliminates idle time. We call it zero bubble because, with an unlimited data buffer, inference and training can run indefinitely without waiting. In practice, to avoid wasted compute and stale/off-policy data, we set a bounded buffer size so inference workers will briefly wait when the buffer is full.
In addition to the general parameters (see the main README), the Zero Bubble pipeline introduces one additional parameter:
data_actor_buffer_size_limit - Maximum number of rollout batches the data buffer may hold. Defaults to twice the trainer’s mini-batch size. Avoid setting this too large — a very large buffer increases off-policy training. For DAPO, since only effective prompts count, you may need to raise data_actor_buffer_size_limit depending on sample utility.Example: RL training on 8 GPUs with Zero Bubble (zero2)
python rl_example_zero_bubble.py \
--dataset /path/to/your/dataset.jsonl \
--model /path/to/your/model \
-t 4 -i 4 -b vllm -a DAPO \
-imbs 8 -ibs 8 -tbs 8 -e 2 -rt boxed \
-si 25 -s "Please reason step by step, and put your final answer within \\boxed{}." \
-tMbs 2 -tmbs 2 -p Rebase_Experiments -zero 2 -mpt 512 -mnt 3584
Fig. 3 - Performance of the Zero Bubble pipeline tested with an unlimited buffer size.