Back to Trl

Community Tutorials

docs/source/community_tutorials.md

1.4.09.9 KB
Original Source

Community Tutorials

Community tutorials are made by active members of the Hugging Face community who want to share their knowledge and expertise with others. They are a great way to learn about the library and its features, and to get started with core classes and modalities.

Language Models

Tutorials

TaskClassDescriptionAuthorTutorialColab
Reinforcement Learning[GRPOTrainer]Efficient Online Training with GRPO and vLLM in TRLSergio PaniegoLink
Reinforcement Learning[GRPOTrainer]Post training an LLM for reasoning with GRPO in TRLSergio PaniegoLink
Reinforcement Learning[GRPOTrainer]Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorialPhilipp SchmidLink
Reinforcement Learning[GRPOTrainer]RL on LLaMA 3.1-8B with GRPO and Unsloth optimizationsAndrea ManzoniLink
Instruction tuning[SFTTrainer]Fine-tuning Google Gemma LLMs using ChatML format with QLoRAPhilipp SchmidLink
Structured Generation[SFTTrainer]Fine-tuning Llama-2-7B to generate Persian product catalogs in JSON using QLoRA and PEFTMohammadreza EsmaeilianLink
Preference Optimization[DPOTrainer]Align Mistral-7b using Direct Preference Optimization for human preference alignmentMaxime LabonneLink
Preference Optimization[experimental.orpo.ORPOTrainer]Fine-tuning Llama 3 with ORPO combining instruction tuning and preference alignmentMaxime LabonneLink
Instruction tuning[SFTTrainer]How to fine-tune open LLMs in 2025 with Hugging FacePhilipp SchmidLink
Step-Level Reasoning[GRPOTrainer]Supervised Reinforcement Learning (SRL) for step-by-step reasoning with vLLMDeepak SwaminathanLink

Videos

TaskTitleAuthorVideo
Instruction tuningFine-tuning open AI models using Hugging Face TRLWietse Venema
Instruction tuningHow to fine-tune a smol-LM with Hugging Face, TRL, and the smoltalk DatasetMayurji
<details> <summary>⚠️ Deprecated features notice for "How to fine-tune a smol-LM with Hugging Face, TRL, and the smoltalk Dataset" (click to expand)</summary>

[!WARNING] The tutorial uses two deprecated features:

  • SFTTrainer(..., tokenizer=tokenizer): Use SFTTrainer(..., processing_class=tokenizer) instead, or simply omit it (it will be inferred from the model).
  • setup_chat_format(model, tokenizer): Use SFTConfig(..., chat_template_path="Qwen/Qwen3-0.6B"), where chat_template_path specifies the model whose chat template you want to copy.
</details>

Vision Language Models

Tutorials

TaskClassDescriptionAuthorTutorialColab
Visual QA[SFTTrainer]Fine-tuning Qwen2-VL-7B for visual question answering on ChartQA datasetSergio PaniegoLink
Visual QA[SFTTrainer]Fine-tuning SmolVLM with TRL on a consumer GPUSergio PaniegoLink
SEO Description[SFTTrainer]Fine-tuning Qwen2-VL-7B for generating SEO-friendly descriptions from imagesPhilipp SchmidLink
Visual QA[DPOTrainer]PaliGemma 🤝 Direct Preference OptimizationMerve NoyanLink
Visual QA[DPOTrainer]Fine-tuning SmolVLM using direct preference optimization (DPO) with TRL on a consumer GPUSergio PaniegoLink
Object Detection Grounding[SFTTrainer]Fine tuning a VLM for Object Detection Grounding using TRLSergio PaniegoLink
Visual QA[DPOTrainer]Fine-Tuning a Vision Language Model with TRL using MPOSergio PaniegoLink
Reinforcement Learning[GRPOTrainer]Post training a VLM for reasoning with GRPO using TRLSergio PaniegoLink

Speech Language Models

Tutorials

TaskClassDescriptionAuthorTutorial
Text-to-Speech[GRPOTrainer]Post training a Speech Language Model with GRPO using TRLSteven ZhengLink

Contributing

If you have a tutorial that you would like to add to this list, please open a PR to add it. We will review it and merge it if it is relevant to the community.