docs/source/training/llama_factory.md
:::{attention} To be updated for Qwen3. :::
Here we provide a script for supervised finetuning Qwen2.5 with LLaMA-Factory. This script for supervised finetuning (SFT) has the following features:
In the following, we introduce more details about the usage of the script.
Before you start, make sure you have installed the following packages:
pip install deepspeed
pip install flash-attn --no-build-isolation
LLaMA-Factory provides several training datasets in data folder, you
can use it directly. If you are using a custom dataset, please prepare
your dataset as follows.
data
folder. LLaMA-Factory supports dataset in alpaca or sharegpt
format.alpaca format should follow the below format:[
{
"instruction": "user instruction (required)",
"input": "user input (optional)",
"output": "model response (required)",
"system": "system prompt (optional)",
"history": [
["user instruction in the first round (optional)", "model response in the first round (optional)"],
["user instruction in the second round (optional)", "model response in the second round (optional)"]
]
}
]
sharegpt format should follow the below format:[
{
"conversations": [
{
"from": "human",
"value": "user instruction"
},
{
"from": "gpt",
"value": "model response"
}
],
"system": "system prompt (optional)",
"tools": "tool description (optional)"
}
]
data/dataset_info.json in the
following format .alpaca format dataset, the columns in dataset_info.json
should be:"dataset_name": {
"file_name": "dataset_name.json",
"columns": {
"prompt": "instruction",
"query": "input",
"response": "output",
"system": "system",
"history": "history"
}
}
sharegpt format dataset, the columns in dataset_info.json
should be:"dataset_name": {
"file_name": "dataset_name.json",
"formatting": "sharegpt",
"columns": {
"messages": "conversations",
"system": "system",
"tools": "tools"
},
"tags": {
"role_tag": "from",
"content_tag": "value",
"user_tag": "user",
"assistant_tag": "assistant"
}
}
Execute the following training command:
DISTRIBUTED_ARGS="
--nproc_per_node $NPROC_PER_NODE \
--nnodes $NNODES \
--node_rank $NODE_RANK \
--master_addr $MASTER_ADDR \
--master_port $MASTER_PORT
"
torchrun $DISTRIBUTED_ARGS src/train.py \
--deepspeed $DS_CONFIG_PATH \
--stage sft \
--do_train \
--use_fast_tokenizer \
--flash_attn \
--model_name_or_path $MODEL_PATH \
--dataset your_dataset \
--template qwen \
--finetuning_type lora \
--lora_target q_proj,v_proj\
--output_dir $OUTPUT_PATH \
--overwrite_cache \
--overwrite_output_dir \
--warmup_steps 100 \
--weight_decay 0.1 \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 4 \
--ddp_timeout 9000 \
--learning_rate 5e-6 \
--lr_scheduler_type cosine \
--logging_steps 1 \
--cutoff_len 4096 \
--save_steps 1000 \
--plot_loss \
--num_train_epochs 3 \
--bf16
and enjoy the training process. To make changes to your training, you
can modify the arguments in the training command to adjust the
hyperparameters. One argument to note is cutoff_len, which is the
maximum length of the training data. Control this parameter to avoid OOM
error.
If you train your model with LoRA, you probably need to merge adapter parameters to the main branch. Run the following command to perform the merging of LoRA adapters.
CUDA_VISIBLE_DEVICES=0 llamafactory-cli export \
--model_name_or_path path_to_base_model \
--adapter_name_or_path path_to_adapter \
--template qwen \
--finetuning_type lora \
--export_dir path_to_export \
--export_size 2 \
--export_legacy_format False
The above content is the simplest way to use LLaMA-Factory to train Qwen. Feel free to dive into the details by checking the official repo!