docs/workers/engine_workers.rst
Last updated: 04/20/2026.
:mod:verl.workers.engine_workers provides the worker-layer classes that
RayWorkerGroup instantiates for PPO / GRPO / SFT style RL training.
They are engine agnostic – FSDP, FSDP2, Megatron-LM, Automodel,
TorchTitan and VeOmni are all wired in through the same entry points.
The specific backend is selected at runtime from actor.strategy /
critic.strategy and resolved by
:class:verl.workers.engine.EngineRegistry.
For the engine-layer design (how BaseEngine subclasses implement
forward_step, parallelism, checkpointing, weight export, etc.) see
:doc:model_engine.
::
ActorRolloutRefWorker # hybrid worker, co-locates actor + rollout + optional ref ├── self.actor : TrainingWorker (built if role contains "actor") ├── self.ref : TrainingWorker (built if role contains "ref") ├── self.rollout: BaseRollout (vLLM / SGLang, built if role contains "rollout") └── self.checkpoint_engine (built if role contains "actor")
TrainingWorker # generic "one engine + optimizer + profiler" worker └── self.engine : BaseEngine (fsdp / fsdp2 / megatron / automodel / veomni / torchtitan)
TrainingWorker is also used standalone for the critic, reference
model, reward model and SFT / DPO training – it's essentially a
Ray-wrapped BaseEngine with a Tinker-like API
(https://thinkingmachines.ai/tinker/) exposed as RPCs.
:class:verl.workers.engine_workers.ActorRolloutRefWorker is the
hybrid worker used for actor, rollout and (optional) reference policy.
The role argument selects which sub-workers are constructed:
========================= ===========================================================================
role What is built inside init_model
========================= ===========================================================================
actor self.actor (TrainingWorker) + checkpoint engine
rollout self.rollout (BaseRollout)
ref self.ref (TrainingWorker with forward_only engine config)
actor_rollout actor + rollout + checkpoint engine (most common for colocated PPO)
actor_rollout_ref all three
========================= ===========================================================================
Key RPCs ^^^^^^^^
init_model
.. code:: python
@register(dispatch_mode=Dispatch.ONE_TO_ALL) def init_model(self):
ONE_TO_ALL: the driver calls init_model and the same routine
runs on every worker. It builds the TrainingWorker (which in turn
builds the BaseEngine via EngineRegistry.new), the rollout
engine, and the checkpoint engine used for trainer→rollout weight
sync.
compute_log_prob / compute_ref_log_prob
.. code:: python
@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="actor")) def compute_log_prob(self, data: TensorDict) -> TensorDict: return self.actor.infer_batch(data)
@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="ref")) def compute_ref_log_prob(self, data: TensorDict) -> TensorDict: return self.ref.infer_batch(data)
TrainingWorker.infer_batch drives BaseEngine.infer_batch (eval
mode + no_grad). The n-d dispatch function is built from the
engine's actual parallel topology, so Megatron's PP dimension is
surfaced as an extra DP dimension to the single controller without
needing a backend-specific dispatch mode.
update_actor
.. code:: python
@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="actor")) def update_actor(self, data: TensorDict) -> TensorDict: return self.actor.train_mini_batch(data=data)
train_mini_batch splits the batch into mini-batches, iterates
over PPO epochs, and calls TrainingWorker.train_batch for each
mini-batch (one optimizer step per mini-batch). The PPO loss
or distillation loss is wired by init_model via
TrainingWorker.set_loss_fn.
update_weights
.. code:: python
@register(dispatch_mode=Dispatch.ONE_TO_ALL, blocking=False) async def update_weights(self, global_steps: int = None):
Push the freshest trainer weights to the rollout engine.
checkpoint_engine.backend == "naive"): export per-tensor parameters via
engine.get_per_tensor_param and call rollout.update_weights
directly. LoRA adapters are merged into base weights up-front when
model.lora.merge=True.self.checkpoint_engine.send_weights instead.save_checkpoint / load_checkpoint
Both delegate to the actor TrainingWorker, which in turn calls
BaseEngine.save_checkpoint / load_checkpoint. The backend
engine is responsible for sharded model + optimizer + scheduler state
(and HuggingFace export when applicable).
:class:verl.workers.engine_workers.TrainingWorker is the generic
worker for a single engine + optimizer + profiler. It is used:
self.actor / self.ref inside ActorRolloutRefWorker.add_critic_worker /
add_reward_model_worker in verl/trainer/main_ppo.py).Construction takes a single
:class:verl.workers.config.TrainingWorkerConfig which bundles the
model_config, engine_config, optimizer_config,
checkpoint_config and profiler_config. The backend is chosen
from engine_config.strategy (fsdp, fsdp2, megatron,
automodel, veomni, torchtitan).
Key RPCs ^^^^^^^^
reset() – first call initializes the engine; subsequent calls
reload weights and reset optimizer / scheduler state.to(device, model=True, optimizer=True, grad=True) – manual
load/offload control. device must be either "cpu" or
"device" (which is mapped to the actual accelerator name).set_loss_fn(loss_fn) – install the loss closure (PPO loss,
distillation loss, or any custom callable that
accepts (model_output, batch)).train_mini_batch(data) – mini-batch + PPO-epoch loop; one
optimizer step per mini-batch; allgather metrics across DP.train_batch(data) – single mini-batch train step. Usually invoked
indirectly via train_mini_batch.infer_batch(data) – forward-only step used for log-prob / value /
reward / distillation-teacher computation. Supports
no_lora_adapter=True to temporarily disable the adapter at
inference.save_checkpoint / load_checkpoint – delegate to
BaseEngine.Set the strategy field on actor.engine / critic.engine /
ref.engine in your Hydra config:
.. code-block:: yaml
actor_rollout_ref: actor: strategy: fsdp2 # or: fsdp, megatron, automodel, veomni, torchtitan engine: strategy: fsdp2 param_offload: False # ...
The EngineRegistry dispatches on (model_type, backend, device) –
for example (language_model, fsdp2, cuda) or
(language_model, megatron, npu):
===================== ====================== ===================== =====================================================================
model_type backend device Engine class
===================== ====================== ===================== =====================================================================
language_model fsdp / fsdp2 cuda / npu verl.workers.engine.fsdp.FSDPEngineWithLMHead
language_model megatron cuda verl.workers.engine.megatron.MegatronEngineWithLMHead
language_model megatron npu verl.workers.engine.mindspeed.MindspeedEngineWithLMHead
language_model mindspeed_megatron npu verl.workers.engine.mindspeed.MindSpeedMegatronEngineWithLMHead
language_model automodel cuda verl.workers.engine.automodel.AutomodelEngineWithLMHead
language_model veomni cuda / npu verl.workers.engine.veomni.VeOmniEngineWithLMHead
language_model torchtitan cuda / npu verl.workers.engine.torchtitan.TorchTitanEngineWithLMHead
value_model fsdp / fsdp2 cuda / npu verl.workers.engine.fsdp.FSDPEngineWithValueHead
value_model megatron cuda verl.workers.engine.megatron.MegatronEngineWithValueHead
===================== ====================== ===================== =====================================================================
The legacy verl.workers.fsdp_workers / verl.workers.megatron_workers
modules (together with verl.workers.actor / verl.workers.critic
/ verl.workers.sharding_manager / verl.workers.legacy) have been
removed. The table below summarises the equivalent entry points:
============================================================== =========================================================================
Legacy (removed) Current (verl.workers.engine_workers)
============================================================== =========================================================================
verl.workers.fsdp_workers.ActorRolloutRefWorker ActorRolloutRefWorker (strategy=fsdp/fsdp2)
verl.workers.megatron_workers.ActorRolloutRefWorker ActorRolloutRefWorker (strategy=megatron)
verl.workers.fsdp_workers.CriticWorker TrainingWorker (with critic config + value-model engine)
verl.workers.megatron_workers.CriticWorker TrainingWorker (with critic config + value-model engine)
verl.workers.actor.DataParallelPPOActor FSDPEngineWithLMHead + TrainingWorker
verl.workers.actor.MegatronPPOActor MegatronEngineWithLMHead + TrainingWorker
verl.workers.critic.DataParallelPPOCritic FSDPEngineWithValueHead + TrainingWorker
verl.workers.critic.MegatronPPOCritic MegatronEngineWithValueHead + TrainingWorker
verl.workers.sharding_manager.FSDPUlyssesShardingManager verl.utils.ulysses.FSDPUlyssesShardingManager
Dispatch.MEGATRON_PP_AS_DP_PROTO make_nd_compute_dataproto_dispatch_fn(mesh_name=...) (derived from engine)
use_legacy_worker_impl: True (removed; only the unified engine is available)
============================================================== =========================================================================
To add a new backend, implement a BaseEngine subclass under
verl/workers/engine/<your_backend>/ and register it with
@EngineRegistry.register(model_type=..., backend=...). The worker
layer (TrainingWorker / ActorRolloutRefWorker) is already
engine-agnostic and will pick up the new backend as soon as
engine_config.strategy is set accordingly. See :doc:model_engine
for the detailed extension guide and the test harness under
tests/special_e2e/sft/.
verl.workers.engine_workers –
engine_workers.py <https://github.com/volcengine/verl/blob/main/verl/workers/engine_workers.py>__verl.workers.engine –
engine/ <https://github.com/volcengine/verl/tree/main/verl/workers/engine>__verl.workers.rollout –
rollout/ <https://github.com/volcengine/verl/tree/main/verl/workers/rollout>__verl/trainer/main_ppo.py <https://github.com/volcengine/verl/blob/main/verl/trainer/main_ppo.py>__