finetuning/dpo/README.md
Create a Conda Environment Use the following command to create and activate a new environment for the DPO training:
conda create -n dpo_env python=3.10
conda activate dpo_env
Install Dependencies After activating the environment, install all required dependencies by running:
pip install -r requirements.txt
Constructing DPO Data Provide the data for DPO trainng as follow: the jsonl file contains json object (each line).
{
{"prompt": "Prompt"},
{"chosen": "The chosen response"},
{"rejected": "The rejected response"},
}
Training Once the environment is ready and the model paths are configured, run the DPo training by executing the following script:
DATA_PATH="/path/to/preference/data"
SFT_MODEL="/path/to/sft/model"
OUTPUT_DIR="/path/to/output"
bash ./scripts/dpo_qwencoder.sh