docs/source/training/axolotl.md
This guide will help you get started with post-training (SFT, RLHF, RM, PRM) for Qwen3 / Qwen3_MOE using Axolotl, and covers optimizations to enable for better performance.
bf16 and Flash Attention, or AMD GPUYou can install Axolotl using PyPI, Conda, Git, Docker, or launch a cloud environment.
:::{important} Install PyTorch before installing Axolotl to ensure CUDA compatibility. :::
For the latest instructions, see the official Axolotl Installation Guide.
We have provided a sample YAML config for SFT with Qwen/Qwen3-32B: SFT 32B QLoRA config.
# Train the model
axolotl train path/to/32b-qlora.yaml
# Merge LoRA weights with the base model
# This will create a new `merged` directory under `{output_dir}`
axolotl merge-lora path/to/32b-qlora.yaml
:::{tip}
To train a smaller model, edit the base_model in your config:
base_model: Qwen/Qwen3-8B
:::
Qwen3 works with all Axolotl features including Flash Attention, bf16, LoRA, torch_compile, and QLoRA.
To run on more than single GPU, please take a look at the Multi-GPU Training Guide or Multi-node Training Guide.
See the RLHF Guide for required dataset formats and examples for each method.
Please refer to the Reward Modelling Guide for required dataset formats and config examples.
By default, the example config uses the mlabonne/FineTome-100k dataset (from HuggingFace Hub). You can substitute any dataset of your own.
Axolotl handles various SFT dataset formats, but the current recommended format (for use with chat_template) is the OpenAI Messages format:
[
{
"messages": [
{
"role": "user",
"content": "What is Qwen3?"
},
{
"role": "assistant",
"content": "Qwen3 is a language model..."
}
]
}
]
Use this in your config:
datasets:
- path: path/to/your/dataset.json
type: chat_template
You can also load datasets from multiple sources: HuggingFace Hub, local files, directories, S3, GCS, Azure, etc.
See the Dataset Loading Guide for more details.
To load different dataset formats, refer to the SFT Dataset Formats Guide.
With Qwen3/Qwen3_MOE, you can leverage Axolotl's custom optimizations for improved speed and reduced memory usage: