packages/training/python/scripts/local-finetune/README.md
This directory contains scripts to train RL adapters from Babylon simulation logs.
bun packages/engine/examples/generate-training-data.tspython ingest_and_score.pypython train_from_csv.pypython test_adapter.pyIf you do not have a local Postgres database, Atropos server, or vLLM instance running, you can use the Offline Pipeline. This generates data to JSON files and uses direct PyTorch/HuggingFace libraries for training.
GROQ_API_KEY or OPENAI_API_KEY set in environment.pip install torch transformers peft pandas datasets trlRuns the game simulation in-memory and dumps "Observation -> Action" logs to JSON.
# Runs 24 simulated hours
bun packages/engine/examples/generate-training-data.ts
Output: training-data-output/trajectories/*.json
Converts raw JSON logs into a scored CSV dataset (System/User/Assistant format).
cd packages/training/python/scripts/local-finetune
python ingest_and_score.py
Output: packages/training/data/scored_trajectories.csv
Fine-tunes a base model (Qwen2.5-0.5B by default) on your scored data using LoRA.
python train_from_csv.py --output ./my-model-v1
Interactively chat with your new LoRA adapter to verify behavior.
python test_adapter.py
For the full cloud-based pipeline involving Postgres, GRPO, and Tinker compute, refer to scripts/run_full_pipeline.py.