README - Qwen3 Coder

Setup

Create a Conda Environment Use the following command to create and activate a new environment for the DPO training:
bash
```
conda create -n dpo_env python=3.10
conda activate dpo_env
```
Install Dependencies After activating the environment, install all required dependencies by running:
bash
```
pip install -r requirements.txt
```

Constructing DPO Data Provide the data for DPO trainng as follow: the jsonl file contains json object (each line).

json

{
     {"prompt": "Prompt"},
     {"chosen": "The chosen response"},
     {"rejected": "The rejected response"},
}

Training Once the environment is ready and the model paths are configured, run the DPo training by executing the following script:
```
DATA_PATH="/path/to/preference/data"
SFT_MODEL="/path/to/sft/model"
OUTPUT_DIR="/path/to/output"
bash ./scripts/dpo_qwencoder.sh
```