Back to Eliza

Babylon Training Pipeline

packages/training/README.md

1.7.29.5 KB
Original Source

Babylon Training Pipeline

⚠️ Experimental - Under active development. APIs may change.

RL training for Babylon agents using trajectory-based learning with GRPO (Group Relative Policy Optimization).

Quick Start

1. Generate Trajectories

bash
bun run dev  # Start server first

babylon train parallel --archetypes trader --num-agents 5 --ticks 20

2. Train Locally

bash
cd packages/training/python
python3.11 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# Run full training pipeline (starts services, trains, logs to W&B)
python scripts/run_training.py --steps 100

Local GRPO Training

The local training pipeline uses the Atropos framework for GRPO-based RL training.

Prerequisites

  1. Python 3.11+ with CUDA support
  2. PostgreSQL with trajectory data
  3. GPU with at least 12GB VRAM (for 3B model)

Quick Run

bash
cd packages/training/python
source venv/bin/activate

# Full pipeline (recommended)
python scripts/run_training.py --steps 100

# Or run components separately:
# Terminal 1: Atropos API
run-api --port 8000

# Terminal 2: Babylon Environment
python -m src.training.babylon_env serve --slurm false

# Terminal 3: GRPO Trainer
python -m src.training.atropos_trainer --steps 100

Training Configuration

FlagDescriptionDefault
--stepsTraining steps100
--batch-sizeBatch size4
--lrInitial learning rate1e-5
--min-lrMinimum learning rate1e-7
--lr-schedulerLR scheduler: constant, linear, cosinecosine
--warmup-stepsWarmup steps10
--modelBase modelQwen/Qwen2.5-3B-Instruct
--save-pathCheckpoint directory./trained_models
--save-everySave checkpoint every N steps5
--resumeResume from checkpoint path-

Weights & Biases Integration

W&B logging is optional and works in offline mode if no API key is set.

bash
# With W&B (online)
export WANDB_API_KEY=your_key
python scripts/run_training.py --steps 100 --wandb-project babylon-training

# Offline mode (automatic if no API key)
python scripts/run_training.py --steps 100

# Disable W&B entirely
python scripts/run_training.py --steps 100 --no-wandb

Tracked Metrics

MetricDescription
train/lossGRPO training loss
train/learning_rateCurrent learning rate
train/grad_normGradient norm
train/pos_logpLog prob for positive advantages
train/neg_logpLog prob for negative advantages
train/aiJudgeRewardAverage AI Judge composite score
train/format_scoreAverage format quality score
train/reasoning_scoreAverage reasoning quality score

Resume from Checkpoint

bash
# Resume training from a checkpoint
python scripts/run_training.py --resume ./trained_models/step_50

# Or with full control
python -m src.training.atropos_trainer \
  --resume ./trained_models/step_50 \
  --steps 100

Learning Rate Schedules

Three schedules are available:

ScheduleDescription
constantFixed learning rate
linearLinear decay from initial to min LR
cosineCosine annealing from initial to min LR (default)

All schedules support warmup:

bash
python scripts/run_training.py \
  --lr 1e-5 \
  --min-lr 1e-7 \
  --lr-scheduler cosine \
  --warmup-steps 10

Hardware Requirements

PlatformBackendModelVRAM
Mac M1/M2 (16GB)MLXmlx-community/Qwen2.5-1.5B-Instruct-4bit8GB
Mac M1/M2 (32GB+)MLXmlx-community/Qwen2.5-3B-Instruct-4bit16GB
GTX 3060+ (12GB)CUDAQwen/Qwen2.5-1.5B-Instruct12GB
GTX 4090 (24GB)CUDAQwen/Qwen2.5-3B-Instruct20GB
AnyTinkerCloud-basedN/A

CLI Commands

Generate Data

bash
babylon train parallel --archetypes trader,degen --num-agents 3 --ticks 20
babylon train parallel -a all -n 2 -t 10      # All archetypes
babylon train parallel --dry-run               # Preview
FlagDescriptionDefault
-a, --archetypesComma-separated or alltrader
-n, --num-agentsAgents per archetype2
-t, --ticksTicks per agent10
-p, --parallelMax concurrent agents5
--cleanupDelete agents afterfalse

Score & Export

bash
babylon train score                           # Score all trajectories
babylon train archetype -a trader             # Score + export for archetype
babylon train archetype -a trader --score-only

Train

bash
babylon train pipeline -a trader              # Full pipeline
babylon train run -a all                      # All archetypes

Python Training

Local Training

bash
cd packages/training/python
source venv/bin/activate

python scripts/train_local.py                 # Auto-detect backend
python scripts/train_local.py --backend mlx   # Force MLX
python scripts/train_local.py --backend cuda  # Force CUDA

Options:

bash
python scripts/train_local.py \
  --backend mlx \
  --model mlx-community/Qwen2.5-1.5B-Instruct-4bit \
  --output ./trained_models/my_model \
  --iters 100 \
  --batch-size 2 \
  --lr 1e-5 \
  --min-actions 3 \
  --lookback-hours 168 \
  --max-trajectories 500 \
  --validate

Cloud Training (Tinker)

bash
export TINKER_API_KEY=your_key
export DATABASE_URL=postgresql://...
export OPENAI_API_KEY=sk-...

python scripts/run_tinker_training.py --steps 100

Archetypes

ArchetypeDescription
traderDisciplined profit-focused trader
degenHigh-risk YOLO trader
scammerManipulative, spreads misinformation
researcherAnalytical, data-driven
social-butterflyCommunity engagement focused
information-traderNews/signal-based
perps-traderPerpetual futures specialist
super-predictorPrediction market expert
infosecSecurity-conscious
goody-twoshoesHelpful, ethical
ass-kisserFollows crowd consensus
liarConsistently misleading

Architecture

Agent Trajectories → TrajectoryRecorder → Database
                                           ↓
                                  LLM-as-Judge Scoring (AI Judge)
                                           ↓
                                      GRPO Training
                                           ↓
                              W&B Logging (optional)
                                           ↓
                                    Trained Model

Training Pipeline Components

ComponentDescription
ServiceManagerManages Atropos API and vLLM servers
BabylonRLAIFEnvRLAIF environment for trajectory scoring
BabylonAtroposTrainerGRPO trainer with LR scheduling
run_training.pyOrchestrates full pipeline

TypeScript (src/)

DirectoryPurpose
archetypes/Archetype configs
generation/Trajectory generation
training/Recording and export
scoring/LLM-as-judge
rubrics/Evaluation rubrics
benchmark/Model benchmarking
huggingface/HuggingFace upload

Python (python/src/)

DirectoryPurpose
data_bridge/Database reader
training/Training modules

Environment Variables

bash
# Required
DATABASE_URL=postgresql://...       # PostgreSQL connection
OPENAI_API_KEY=sk-...               # For RLAIF judge

# Optional
WANDB_API_KEY=your_key              # For W&B logging (offline if not set)
TINKER_API_KEY=your_key             # For cloud training

Troubleshooting

No trajectory data

bash
bun run dev
babylon train parallel --archetypes trader --num-agents 5 --ticks 20

Not enough samples - Need 20+ trajectories with LLM calls. Run more agents.

MLX fails - pip install mlx mlx-lm

CUDA OOM - Use smaller model or add --lora

Database issues - Check DATABASE_URL in .env, ensure PostgreSQL running

vLLM startup timeout - Increase timeout or check GPU memory with nvidia-smi

W&B offline mode - If you see "offline mode", set WANDB_API_KEY or use --no-wandb

Scripts Reference

The scripts/ directory contains standalone utilities for training operations:

ScriptDescription
train-and-test.tsFull pipeline: train model + game test
run-full-pipeline.tsComplete training workflow orchestration
run-baseline-comparison.tsHead-to-head benchmark: random vs trained
real-archetype-benchmark.tsBenchmark using real agent data
json-mode-benchmark.tsBenchmark without database dependency
test-model-in-game.tsTest trained model in simulation
test-trained-model.tsValidate trained model from DB or path
test-scoring.tsDebug LLM-as-judge scoring
e2e-training-test.tsEnd-to-end pipeline verification
assess-training-data.tsAnalyze training data quality
export-rubrics.tsExport rubrics to JSON
generate-research-report.tsGenerate research documentation
verify-final.tsPost-training verification checks

Run any script with:

bash
bun packages/training/scripts/<script-name>.ts [options]

Development

bash
bun test packages/training
bun run typecheck
bun run packages/training/scripts/e2e-training-test.ts  # E2E validation

Python Tests

bash
cd packages/training/python
source venv/bin/activate
pytest tests/ -v