docs/source/en/community_integrations/axolotl.md
Axolotl is a fine-tuning and post-training framework for large language models. It supports adapter-based tuning, ND-parallel distributed training, GRPO, and QAT. Through TRL, Axolotl also handles preference learning, reinforcement learning, and reward modeling workflows.
Define your training run in a YAML config file.
base_model: NousResearch/Nous-Hermes-llama-1b-v1
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
datasets:
- path: tatsu-lab/alpaca
type: alpaca
output_dir: ./outputs
sequence_len: 512
micro_batch_size: 1
gradient_accumulation_steps: 1
num_epochs: 1
learning_rate: 2.0e-5
Launch training with the train command.
axolotl train my_config.yml
Axolotl's ModelLoader wraps the Transformers load flow.
The model config builds from [AutoConfig.from_pretrained]. Preload setup configures the device map, quantization config, and attention backend.
ModelLoader automatically selects the appropriate [AutoModel] class ([AutoModelForCausalLM], [AutoModelForImageTextToText], [AutoModelForSequenceClassification]) or a model-specific class from the multimodal mapping. Weights load with the selected loader's from_pretrained. When reinit_weights is set, Axolotl uses from_config for random initialization.
Axolotl uses Transformers, PEFT, and bitsandbytes to apply adapters after model initialization when PEFT-based techniques such as LoRA and QLoRA are enabled. A patch manager applies additional optimizations before and after model loading.
AxolotlTrainer extends [Trainer], adding Axolotl mixins while using the [Trainer] training loop and APIs.