model/README.md
<a href="https://github-com.translate.goog/LAION-AI/Open-Assistant/blob/main/model/README.md?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=wapp"></a>
Here are some minimal commands to tun to whole pipeline on the collected data.
make sure python >= 3.10, otherwise, you would meet the [issue]
mkdir -p .cache
mkdir -p .saved_models
export DATA_PATH=$PWD/.cache
export MODEL_PATH=$PWD/.saved_models
Create a new or modify an existing configuration section in the config.yaml
(SFT), config_rm.yaml (RM) or config_rl.yaml (RL) YAML configuration files
located in the model_training/configs/ directory and specify the OA JSONL data
file or HuggingFace dataset to use.
.jsonl or .jsonl.gz) specify the
file name with the input_file_path configuration option. Place the file
either in the cache_dir (DATA_PATH) or specify an absolute path.cp /path/to/<oasst.trees.jsonl> $DATA_PATH
Example:
my_data_config:
datasets:
- oasst_export:
input_file_path: oasst_export.trees.jsonl.gz
hf_dataset_name configuration option.Example:
my_data_config:
datasets:
- oasst_export:
hf_dataset_name: OpenAssistant/oasst1
Note: If both hf_dataset_name and input_file_path are specified
input_file_path will take precedence.
See the OpenAssistant/oasst1 dataset card on the HuggingFace hub for more information.
cd model_training
# export shared modules
export PYTHONPATH=$PYTHONPATH:../../oasst-shared
python trainer_sft.py --configs defaults oa_dataset_only pythia --cache_dir $DATA_PATH --output_dir $MODEL_PATH/sft_model
# if you want to use wandb, add
--wandb_entity your_username/team_name
To change the model used, i.e. larger pythia version create a new config in
model_training/configs/config.yaml or set the flag --model_name to
EleutherAI/pythia-{size}-deduped. Larger models will probably need to also
adjust the --learning_rate and --per_device_train_batch_size flags.
# choose a specific checkpoint
export SFT_MODEL=$MODEL_PATH/sft_model/<checkpoint-X>
# or get latest checkpoint
export SFT_MODEL=$MODEL_PATH/sft_model/$(ls -t $MODEL_PATH/sft_model/ | head -n 1)
cd model_training
python trainer_rm.py --configs defaults_rm oasst-rm-1-pythia-1b
# choose a specific checkpoint
export REWARD_MODEL=$MODEL_PATH/reward_model/<checkpoint-X>
# or get latest checkpoint
export REWARD_MODEL=$MODEL_PATH/reward_model/$(ls -t $MODEL_PATH/reward_model/ | head -n 1)
cd model_training
python trainer_rl.py --configs defaults_rlhf --cache_dir $DATA_PATH --rank_model $REWARD_MODEL --sft_model $SFT_MODEL --output_dir $MODEL_PATH/rl_model
See the MESSAGE_AND_TOKEN_FORMAT.md file for information about the pattern we
are using.