Back to Verl

Automodel Backend

docs/workers/automodel_workers.rst

0.8.01.9 KB
Original Source

Automodel Backend

Last updated: 03/07/2026.

We support the Automodel (nemo_automodel) backend by implementing the AutomodelEngine and AutomodelEngineWithLMHead engine classes. The Automodel backend delegates model building, parallelization, optimizer sharding, LR scheduling, gradient clipping, and checkpointing to nemo_automodel's infrastructure while using verl's training loop, data pipeline, and loss function.

Requirements

  • Automodel r0.3.0
  • transformers v5.0.0

Pros

  • Supports FSDP2 and TP distributed strategies out of the box.

  • Native support for Mixture-of-Experts (MoE) models with Expert Parallelism (EP) via DeepEP.

  • TransformerEngine (TE) integration for optimized attention, linear layers, and RMSNorm.

  • Readily supports any HuggingFace model without checkpoint conversion.

Cons

  • Pipeline parallelism is not yet supported.

SFT Examples

We provide example SFT training scripts using the Automodel backend in examples/sft/gsm8k/ <https://github.com/verl-project/verl/blob/main/examples/sft/gsm8k/>_.

Basic: Qwen2.5-0.5B with FSDP2 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A minimal example using Qwen/Qwen2.5-0.5B-Instruct with FSDP2 and no parallelism:

.. code:: shell

bash examples/sft/gsm8k/run_qwen2_5_0_5b_automodel.sh 4 /tmp/automodel_sft_test

See run_qwen2_5_0_5b_automodel.sh <https://github.com/verl-project/verl/blob/main/examples/sft/gsm8k/run_qwen2_5_0_5b_automodel.sh>_.

Advanced: Qwen3-30B MoE with Expert Parallelism ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A larger-scale example using Qwen/Qwen3-30B-A3B-Base (MoE model) with Expert Parallelism (EP=8), DeepEP, TransformerEngine backend, and torch_mm experts backend:

.. code:: shell

bash examples/sft/gsm8k/run_qwen3_30b_automodel.sh 8 /tmp/automodel_sft_30b

See run_qwen3_30b_automodel.sh <https://github.com/verl-project/verl/blob/main/examples/sft/gsm8k/run_qwen3_30b_automodel.sh>_.