TRL - Transformers Reinforcement Learning

TRL is a full stack library where we provide a set of tools to train transformer language models with methods like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more. The library is integrated with 🤗 transformers.

🎉 What's New

TRL v1: We released TRL v1 — a major milestone that marks a real shift in what TRL is. Read the blog post to learn more.

Taxonomy

Below is the current list of TRL trainers, organized by method type (⚡️ = vLLM support; 🧪 = experimental).

Online methods

GRPOTrainer ⚡️
RLOOTrainer ⚡️
OnlineDPOTrainer 🧪 ⚡️
NashMDTrainer 🧪 ⚡️
PPOTrainer 🧪
XPOTrainer 🧪 ⚡️

Reward modeling

RewardTrainer
PRMTrainer 🧪

</div> <div style="flex: 1; min-width: 0;">

Offline methods

Knowledge distillation

GKDTrainer 🧪
MiniLLMTrainer 🧪

</div> </div>

You can also explore TRL-related models, datasets, and demos in the TRL Hugging Face organization.

Learn

Learn post-training with TRL and other libraries in 🤗 smol course.

The documentation is organized into the following sections:

Getting Started: installation and quickstart guide.
Conceptual Guides: dataset formats, training FAQ, and understanding logs.
How-to Guides: reducing memory usage, speeding up training, distributing training, etc.
Integrations: DeepSpeed, Liger Kernel, PEFT, etc.
Examples: example overview, community tutorials, etc.
API: trainers, utils, etc.

Blog posts

  <p class="text-gray-500 text-sm">Published March 27, 2026</p>
  <p class="text-gray-700">TRL v1: Post-Training Library That Holds When the Field Invalidates Its Own Assumptions</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="https://huggingface.co/blog/openenv">
  
  <p class="text-gray-500 text-sm">Published October 23, 2025</p>
  <p class="text-gray-700">Building the Open Agent Ecosystem Together: Introducing OpenEnv</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="https://huggingface.co/blog/trl-vlm-alignment">
  
  <p class="text-gray-500 text-sm">Published on August 7, 2025</p>
  <p class="text-gray-700">Vision Language Model Alignment in TRL ⚡️</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="https://huggingface.co/blog/vllm-colocate">
  
  <p class="text-gray-500 text-sm">Published on June 3, 2025</p>
  <p class="text-gray-700">NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="https://huggingface.co/blog/liger-grpo">
  
  <p class="text-gray-500 text-sm">Published on May 25, 2025</p>
  <p class="text-gray-700">🐯 Liger GRPO meets TRL</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="https://huggingface.co/blog/open-r1">
  
  <p class="text-gray-500 text-sm">Published on January 28, 2025</p>
  <p class="text-gray-700">Open-R1: a fully open reproduction of DeepSeek-R1</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="https://huggingface.co/blog/dpo_vlm">
  
  <p class="text-gray-500 text-sm">Published on July 10, 2024</p>
  <p class="text-gray-700">Preference Optimization for Vision Language Models with TRL</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="https://huggingface.co/blog/putting_rl_back_in_rlhf_with_rloo">
  
  <p class="text-gray-500 text-sm">Published on June 12, 2024</p>
  <p class="text-gray-700">Putting RL back in RLHF</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="https://huggingface.co/blog/trl-ddpo">
  
  <p class="text-gray-500 text-sm">Published on September 29, 2023</p>
  <p class="text-gray-700">Finetune Stable Diffusion Models with DDPO via TRL</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="https://huggingface.co/blog/dpo-trl">
  
  <p class="text-gray-500 text-sm">Published on August 8, 2023</p>
  <p class="text-gray-700">Fine-tune Llama 2 with DPO</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="https://huggingface.co/blog/stackllama">
  
  <p class="text-gray-500 text-sm">Published on April 5, 2023</p>
  <p class="text-gray-700">StackLLaMA: A hands-on guide to train LLaMA with RLHF</p>

</a> <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="https://huggingface.co/blog/trl-peft">

  <p class="text-gray-500 text-sm">Published on March 9, 2023</p>
  <p class="text-gray-700">Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="https://huggingface.co/blog/rlhf">
  
  <p class="text-gray-500 text-sm">Published on December 9, 2022</p>
  <p class="text-gray-700">Illustrating Reinforcement Learning from Human Feedback</p>
</a>

</div> </div>

Talks

  <p class="text-gray-500 text-sm">Talk given on October 30, 2025</p>
  <p class="text-gray-700">Fine tuning with TRL</p>
</a>

</div> </div>

TRL - Transformers Reinforcement Learning

TRL - Transformers Reinforcement Learning

🎉 What's New

Taxonomy

Online methods

Reward modeling

Offline methods

Knowledge distillation

Learn

Contents

Blog posts

Talks