docs/examples/megatron_fsdp_example.rst
Last updated: 04/29/2026.
In this example, we run SFT and RL training with Megatron-FSDP:
verlai/verl:vllm011.dev7Download Megatron-LM and Megatron-Bridge. The required Megatron-FSDP support has already been merged into
Megatron-LM main
(<https://github.com/NVIDIA/Megatron-LM/pull/3191>) and
Megatron-Bridge main
(<https://github.com/NVIDIA-NeMo/Megatron-Bridge/pull/3512>).
.. code:: bash
git clone https://github.com/NVIDIA/Megatron-LM.git git clone https://github.com/NVIDIA-NeMo/Megatron-Bridge.git
Before launch, check and update key fields MODEL_PATH and SAVE_PATH in the script.
.. code:: bash
bash examples/sft/gsm8k/run_qwen_megatron_fsdp.sh
Before launch, check and update key fields in
examples/grpo_trainer/run_qwen2-7b_math_megatron_fsdp.sh:
actor_rollout_ref.model.path: model name or local model path.train_files / test_files: parquet paths for GSM8K and MATH.trainer.n_gpus_per_node and trainer.nnodes: hardware topology.trainer.project_name and trainer.experiment_name: experiment identifiers.Then run:
.. code:: bash
bash examples/grpo_trainer/run_qwen2-7b_math_megatron_fsdp.sh
The script launches RL training and enables Megatron-FSDP with:
actor_rollout_ref.actor.megatron.use_mbridge=Trueactor_rollout_ref.actor.megatron.vanilla_mbridge=Falseactor_rollout_ref.actor.megatron.use_megatron_fsdp=TrueMegatron-FSDP checkpoints are saved as DTensor checkpoints under dist_ckpt.
When checkpoint.save_contents includes model, verl also saves the HuggingFace config and
tokenizer under huggingface; HF weights can also be exported through Megatron-Bridge.
Current Megatron-FSDP checkpoint examples assume:
use_distributed_optimizer=True.CUDA_DEVICE_MAX_CONNECTIONS is unset or greater than 1.checkpoint.async_save=True is not covered for Megatron-FSDP DTensor checkpoints yet.model whenever
optimizer is listed in checkpoint.save_contents.