megatron/rl/README.md
08/27/2025: Megatron-RL is actively under development. While it is functional internally at NVIDIA, it is not yet usable by external users because not all required code has been released. The available code and examples may change as development progresses. For a current roadmap of planned Megatron-RL features please see #1776.
Megatron-RL is adding native reinforcement learning (RL) based post-training to Megatron-LM. It provides a flexible library for defining RL environments and agents, extending the Megatron-LM training loop with RL algorithm support.
The bulk of the new library code is located in megatron/rl. However:
Example environments for Megatron-RL can be found in examples/rl.
Megatron-RL is designed for research teams exploring RL post-training of LLMs at scale on state-of-the-art foundation models with cutting-edge performance on the latest NVIDIA hardware.
It is not intended as an enterprise framework and won't necessarily provide out-of-the-box support for any given open model. For those capabilities please refer to Nemo RL.
The design philosophy of Megatron RL is to keep the agent/environment design as decoupled as possible from the underlying RL implementation.
.generate(prompt, **generation_args)) and must return experience rollouts along with rewards.Below we describe the different conceptual components and how they divide responsibility.
InferenceInterface.Rollout or EvaluationResponse objects.InferenceInterface and Agents.AgenticEnvironment uses to run .generate(prompt, **generation_args).See examples/rl for demonstrations of:
InferenceInterface endpointsAgents