Back to Qwen3 Coder

README

finetuning/dpo/README.md

latest967 B
Original Source

Setup

  1. Create a Conda Environment Use the following command to create and activate a new environment for the DPO training:

    bash
    conda create -n dpo_env python=3.10
    conda activate dpo_env
    
  2. Install Dependencies After activating the environment, install all required dependencies by running:

    bash
    pip install -r requirements.txt
    
  3. Constructing DPO Data Provide the data for DPO trainng as follow: the jsonl file contains json object (each line).

    json
    {
         {"prompt": "Prompt"},
         {"chosen": "The chosen response"},
         {"rejected": "The rejected response"},
    }
    
  4. Training Once the environment is ready and the model paths are configured, run the DPo training by executing the following script:

    DATA_PATH="/path/to/preference/data"
    SFT_MODEL="/path/to/sft/model"
    OUTPUT_DIR="/path/to/output"
    bash ./scripts/dpo_qwencoder.sh