docs/source/en/community_integrations/trl.md
TRL is a post-training framework for foundation models. It includes methods like SFT, GRPO, and DPO. Each method has a dedicated trainer that builds on the [Trainer] class and scales from a single GPU to multi-node clusters.
from datasets import load_dataset
from trl import GRPOTrainer
from trl.rewards import accuracy_reward
dataset = load_dataset("trl-lib/DeepMath-103K", split="train")
trainer = GRPOTrainer(
model="Qwen/Qwen2-0.5B-Instruct",
reward_funcs=accuracy_reward,
train_dataset=dataset,
)
trainer.train()
TRL extends Transformers APIs and adds method-specific settings.
TRL trainers build on [Trainer]. Method-specific trainers like [~trl.GRPOTrainer] add generation, reward scoring, and loss computation. Config classes extend [TrainingArguments] with method-specific fields.
Model loading uses [AutoConfig.from_pretrained], then instantiates the model class from the config with that class' from_pretrained.