lpm_kernel/L2/dpo/README.md
The DPO (Direct Preference Optimization) workflow is a systematic process designed to optimize models based on preference signals. The entire workflow consists of the following key stages:
llama.cpp and expose it via an API endpoint.This document provides a comprehensive guide to executing the DPO workflow, both automatically and manually.
Before executing the subsequent steps, please ensure that your API key and base URL are configured in lpm_kernel/L2/dpo/utils.py. Additionally, you need to manually fill in the global bio, including your interests, occupation, etc.
To execute the DPO workflow automatically, follow these steps:
# Example command to run the SFT model with llama.cpp
./llama.cpp/build/llama_server --model /path/to/sft_model --port 8080
bash lpm_kernel/L2/dpo/dpo_pipeline.sh
This script encapsulates all the necessary steps for the DPO workflow, streamlining the process for users.
For users who prefer granular control, the DPO workflow can be executed step-by-step as follows:
Before synthesizing DPO training data, deploy the SFT model using llama.cpp. Before deploy the model, you should convert it into gguf format. Ensure the model is accessible via an API endpoint on port 8080.
# Example command to run the SFT model with llama.cpp
./llama.cpp/build/llama_server --model /path/to/sft_model --port 8080
Once the SFT model is deployed, proceed to generate DPO training data.
python lpm_kernel/L2/dpo/dpo_data.py
This script synthesizes the required data for DPO training.
After completing the data synthesis, train the model with the following command:
python lpm_kernel/L2/dpo/dpo_train.py \
--num_train_epochs 2 \
--learning_rate 5e-6 \
--lora_r 32 \
--lora_alpha 64 \
--batch_size 4
This command initiates the training process with specified hyperparameters. Adjust these parameters as needed for optimal results.
After training, merge the adapter weights with the base model using the following command(If you use Lora to dpo):
python lpm_kernel/L2/merge_lora_weights.py \
--base_model_path "resources/model/output/merged_model" \
--lora_adapter_path "resources/model/output/dpo_model/adapter" \
--output_model_path "resources/model/output/dpo_model/merged_model"
But if you do not use lora, you can skip this step.
Additional Notes
Feel free to expand upon this framework with specific details pertinent to your use case.