docs/workers/automodel_workers.rst
Last updated: 03/07/2026.
We support the Automodel (nemo_automodel) backend by implementing the
AutomodelEngine and AutomodelEngineWithLMHead engine classes.
The Automodel backend delegates model building, parallelization, optimizer
sharding, LR scheduling, gradient clipping, and checkpointing to
nemo_automodel's infrastructure while using verl's training loop,
data pipeline, and loss function.
Requirements
Pros
Supports FSDP2 and TP distributed strategies out of the box.
Native support for Mixture-of-Experts (MoE) models with Expert Parallelism (EP) via DeepEP.
TransformerEngine (TE) integration for optimized attention, linear layers, and RMSNorm.
Readily supports any HuggingFace model without checkpoint conversion.
Cons
We provide example SFT training scripts using the Automodel backend in
examples/sft/gsm8k/ <https://github.com/verl-project/verl/blob/main/examples/sft/gsm8k/>_.
Basic: Qwen2.5-0.5B with FSDP2 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A minimal example using Qwen/Qwen2.5-0.5B-Instruct with FSDP2 and
no parallelism:
.. code:: shell
bash examples/sft/gsm8k/run_qwen2_5_0_5b_automodel.sh 4 /tmp/automodel_sft_test
See run_qwen2_5_0_5b_automodel.sh <https://github.com/verl-project/verl/blob/main/examples/sft/gsm8k/run_qwen2_5_0_5b_automodel.sh>_.
Advanced: Qwen3-30B MoE with Expert Parallelism ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A larger-scale example using Qwen/Qwen3-30B-A3B-Base (MoE model)
with Expert Parallelism (EP=8), DeepEP, TransformerEngine backend, and
torch_mm experts backend:
.. code:: shell
bash examples/sft/gsm8k/run_qwen3_30b_automodel.sh 8 /tmp/automodel_sft_30b
See run_qwen3_30b_automodel.sh <https://github.com/verl-project/verl/blob/main/examples/sft/gsm8k/run_qwen3_30b_automodel.sh>_.