Back to Ruflo

SKILL

plugins/ruflo-metaharness/skills/harness-evolve/SKILL.md

3.14.03.7 KB
Original Source

Surfaces the upstream metaharness-darwin evolve CLI as a ruflo skill. The write layer that pairs with ADR-150's read layer (score / genome / mcp-scan / threat-model / oia-audit). Use when you have a harness whose readiness scores are flat and you want to discover which surface mutation moves them — without retraining the foundation model.

When to use

  • A harness-score result is below target and you don't know which policy surface is responsible.
  • You're seeding a harness for a new vertical and want to find a good starting configuration empirically rather than hand-tuning.
  • You're comparing your hand-tuned harness against an evolved baseline (treat darwin's champion as the strawman).

When NOT to use

  • For continuous background optimization. Darwin Mode is human-initiated. Wire it into CI for one-shot exploration, not for autonomous self-modification.
  • For ruflo itself in CI. ADR-153 §5 explicitly rejects auto-evolving ruflo — the CI gate verifies graceful degradation, not convergence.

Algorithm

Implementation: scripts/evolve.mjs.

  1. Validate args (--repo exists, caps on --generations ≤ 50, --children ≤ 20, --concurrency ≤ 8, sandbox/selection/mutator are known values).
  2. Without --confirm: print plan + exit 0 (mirrors harness-mint safety convention; defense in depth over the upstream safety.ts checks).
  3. With --confirm: shell to npx -y @metaharness/darwin@~0.3.1 metaharness-darwin evolve <repo> ... via the shared _darwin.mjs async helper. Per-generation progress is forwarded to stderr; final champion JSON is captured from stdout.
  4. Compute timeout from generations × children × per-variant (per-variant ≈ 60s real, ≈ 2s mock). Caller may override with --timeout-ms.
  5. Honor upstream exit code 99 — propagate as "safety-disqualified", do not remap. This is a designed-in tripwire (a variant tripped inspectVariant for secrets / shell-out / network / dynamic-eval). See ADR-153 §"Safety model".
  6. Optional --alert-on-no-improvement: exit 1 when champion ≤ parent.

The seven mutation surfaces

SurfaceWhat it owns
plannertask decomposition / step ordering
contextBuilderwhat gets fed into the prompt
reviewerself-critique / output verification
retryPolicywhen + how to retry on failure
toolPolicywhich tools the agent may use, under which conditions
memoryPolicywhat to persist, recall, forget
scorePolicyhow the agent grades its own output

One mutation per variant. Multi-surface mutations are not allowed (causal attribution stays clean).

Output

Reports land under <repo>/.metaharness/:

.metaharness/
  archive.json         # full lineage tree (sampling next gen draws from this)
  lineage.json         # parent→child edges only
  variants/<id>/       # per-variant code (kept for audit)
  runs/<id>/           # per-variant sandbox test output
  reports/winner.json  # final champion + score delta vs parent

Skill stdout = JSON {success, data: {champion, plan, durationMs, improved}}.

Exit codes

CodeMeaning
0Evolved OK, or dry-run, or degraded (Darwin absent)
1--alert-on-no-improvement and champion did not beat parent
2Config error or evolution infrastructure failure
99Upstream "safety-disqualified" (PROPAGATED, not remapped)

Graceful degradation (ADR-150 constraint 3 + ADR-153)

When @metaharness/darwin is not installed, the script emits {degraded: true, reason: 'metaharness-darwin-not-available', hint: ...} and exits 0. ruflo continues to function. CI's no-metaharness-smoke.yml-style job asserts this path.