docs/rl-for-myflow-harbor.md
Use this plan to turn the current export/prep automation into a measurable RL improvement loop for agent behavior.
Current system already has:
assistant_sft.jsonl, train_events.jsonl, summary.json)train/val/test/canary + manifest.json)Goal: convert this into a closed loop where training updates are driven by observed failures/regressions and promoted only through hard gates.
Primary outcomes:
assistant_sft.jsonl is empty or split counts are invalid.Done definition:
train_events.jsonl (success, retries, rollback, human override, time-to-fix).Done definition:
reward_schema_version)Done definition:
Done definition:
canary_reward_delta_meancanary_reward_delta_ci95_low/highaction_error_ratefallback_or_override_ratetime_to_resolution_p50/p95hardcase_recurrence_rate# 1) Export latest data from myflow to Harbor
cd ~/code/myflow
f harbor-export-data-maple
# 2) Prepare deterministic splits
cd ~/repos/laude-institute/harbor
python3 scripts/prepare_myflow_dataset.py --snapshot latest
# 3) Train/eval candidate in Harbor (task names TBD in harbor)
# 4) Promote only if holdout + canary gates pass
myflow-validate-snapshot (manifest + split sanity checks).myflow-eval-canary (fixed JSON report schema for promotion gate).myflow-mine-hardcases (from failed canary/prod traces).