recipes/finetune/deepspeed/finetune_fullparameter_multi_gpu.ipynb
Tongyi Qianwen is a large language model developed by Alibaba Cloud based on the Transformer architecture, trained on an extensive set of pre-training data. The pre-training data is diverse and covers a wide range, including a large amount of internet text, specialized books, code, etc. In addition, an AI assistant called Qwen-Chat has been created based on the pre-trained model using alignment mechanism.
This notebook uses Qwen-1.8B-Chat as an example to introduce how to fine-tune the Qianwen model using Deepspeed.
Please refer to requirements.txt to install the required dependencies.
First, download the model files. You can choose to download directly from ModelScope.
from modelscope.hub.snapshot_download import snapshot_download
model_dir = snapshot_download('Qwen/Qwen-1_8B-Chat', cache_dir='.', revision='master')
Download the data required for training; here, we provide a tiny dataset as an example. It is sampled from Belle.
Disclaimer: the dataset can be only used for the research purpose.
!wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/qwen_recipes/Belle_sampled_qwen.json
You can also refer to this format to prepare the dataset. Below is a simple example list with 1 sample:
[
{
"id": "identity_0",
"conversations": [
{
"from": "user",
"value": "你好"
},
{
"from": "assistant",
"value": "我是一个语言模型,我叫通义千问。"
}
]
}
]
You can also use multi-turn conversations as the training set. Here is a simple example:
[
{
"id": "identity_0",
"conversations": [
{
"from": "user",
"value": "你好,能告诉我遛狗的最佳时间吗?"
},
{
"from": "assistant",
"value": "当地最佳遛狗时间因地域差异而异,请问您所在的城市是哪里?"
},
{
"from": "user",
"value": "我在纽约市。"
},
{
"from": "assistant",
"value": "纽约市的遛狗最佳时间通常在早晨6点至8点和晚上8点至10点之间,因为这些时间段气温较低,遛狗更加舒适。但具体时间还需根据气候、气温和季节变化而定。"
}
]
}
]
You can directly run the prepared training script to fine-tune the model. nproc_per_node refers to the number of GPUs used fro training.
!torchrun --nproc_per_node 2 --nnodes 1 --node_rank 0 --master_addr localhost --master_port 6601 ../../finetune.py \
--model_name_or_path "Qwen/Qwen-1_8B-Chat/" \
--data_path "Belle_sampled_qwen.json" \
--bf16 True \
--output_dir "output_qwen" \
--num_train_epochs 5 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 16 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 1000 \
--save_total_limit 10 \
--learning_rate 1e-5 \
--weight_decay 0.1 \
--adam_beta2 0.95 \
--warmup_ratio 0.01 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--report_to "none" \
--model_max_length 512 \
--gradient_checkpointing True \
--lazy_preprocess True \
--deepspeed "../../finetune/ds_config_zero2.json"
We can test the model as follows:
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
tokenizer = AutoTokenizer.from_pretrained("output_qwen", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"output_qwen",
device_map="auto",
trust_remote_code=True
).eval()
response, history = model.chat(tokenizer, "你好", history=None)
print(response)