docs/rl-myflow-harbor-task-specs.md
Concrete task contracts for the Harbor RL loop. These map directly to scripts currently in ~/repos/laude-institute/harbor/scripts.
scripts/myflow_validate_snapshot.pypython3 scripts/myflow_validate_snapshot.py \
--snapshot <snapshot|latest> \
--myflow-dir data/myflow \
--prepared-dir data/myflow_prepared \
--require-train-events \
--report-out data/myflow_prepared/<snapshot>/validation_report.json
data/myflow/<snapshot>/assistant_sft.jsonldata/myflow/<snapshot>/train_events.jsonl (required when --require-train-events)data/myflow_prepared/<snapshot>/manifest.json0 = passscripts/myflow_build_reward_labels.pypython3 scripts/myflow_build_reward_labels.py \
--snapshot <snapshot|latest> \
--myflow-dir data/myflow \
--out-dir data/myflow_rewards
data/myflow/<snapshot>/train_events.jsonldata/myflow_rewards/<snapshot>/train_event_rewards.jsonldata/myflow_rewards/<snapshot>/reward_summary.json0 = labels generatedscripts/myflow_eval_canary.pypython3 scripts/myflow_eval_canary.py \
--candidate data/myflow_rewards/<snapshot>/train_event_rewards.jsonl \
--baseline data/myflow_rewards/<baseline_snapshot>/train_event_rewards.jsonl \
--report-out data/myflow_reports/<snapshot>/canary_gate.json \
--min-candidate-mean 0.55 \
--min-delta-mean 0.00 \
--min-delta-ci95-low -0.02
--rollouts for action-dominance gate0 = promotion gate pass1 = gate fail (expected for regressions)scripts/myflow_mine_hardcases.pypython3 scripts/myflow_mine_hardcases.py \
--snapshot <snapshot|latest> \
--prepared-dir data/myflow_prepared \
--candidate-rewards data/myflow_rewards/<snapshot>/train_event_rewards.jsonl \
--baseline-rewards data/myflow_rewards/<baseline_snapshot>/train_event_rewards.jsonl \
--out-dir data/myflow_hardcases \
--top-k 100
hardcases.jsonlnext_train_seed.jsonlhardcase_summary.json0 = hardcases emittedmyflow-validate-snapshotmyflow-build-reward-labelsmyflow-eval-canarymyflow-mine-hardcasesUse these names in Harbor task orchestration so docs/runbooks stay stable.
Executed end-to-end on a deterministic fixture snapshot:
prepare_myflow_dataset.py (30 rows input)myflow_validate_snapshot.py -> PASSmyflow_build_reward_labels.py -> labels generatedmyflow_eval_canary.py -> Promotion gate: PASSmyflow_mine_hardcases.py -> hardcases + next seed generatedArtifact root used during verification:
/var/folders/.../tmp.5arfBojfhp/* (temporary run directory)