plugins/ruflo-metaharness/skills/harness-security-bench/SKILL.md
Surfaces the upstream metaharness-darwin security bench command. This is
the upstream's own ADR-155 — Darwin Shield — and is the closest reference
implementation for ruflo's nightly self-learning security harness (#2417).
ruflo's ADR-155 proposes three learning loops (per-dimension confidence,
severity calibration, auto-fix bid). Loop A trains on accumulated
(finding, dimension, human_outcome) tuples — but the gradient signal is
only sound if the underlying detection mechanism converges on a known-good
corpus. Darwin Shield evolves exactly that mechanism on a 10-vuln/9-decoy
ground-truth set. Running this nightly gives us:
Implementation: scripts/security-bench.mjs.
npx -y @metaharness/darwin@~0.3.1 metaharness-darwin security bench --population N --cycles N [--seed S].3s × 19 evaluations × population × cycles + 30s overhead.
At default --population 2 --cycles 1 ≈ 144s; at --population 4 --cycles 3 ≈ 12 min.--alert-on-fail, exit 1 when overall = FAIL.{
"success": true,
"data": {
"overall": { "ok": true, "icon": "✅" },
"gates": {
"total": 11,
"passed": 11,
"failed": 0,
"details": [{ "ok": true, "criterion": "TPR improvement ≥ 25% vs fixed harness", "measured": "+150% (B2 0.4 → B3 1)" }, ...]
},
"baselines": [
{ "harness": "static-only", "fitness": 0.5665, "tpr": 0.3, "fpr": 1, "unsafe": 0, ... },
{ "harness": "LLM single-pass", "fitness": 0.1365, ... },
{ "harness": "fixed agent", "fitness": 0.598, ... },
{ "harness": "Darwin champion", "fitness": 0.93275, "tpr": 1, "fpr": 0, ... }
],
"rawMarkdown": "...",
"shape": { "population": 2, "cycles": 1, "seed": null },
"durationMs": 142000
}
}
The ADR-155 nightly workflow (per #2418 task W1.5) will spawn this as
one of the active-pentest dimension's calls — its results become a
trajectory record:
{
"dimension": "mcp-pentest",
"subdimension": "darwin-shield-bench",
"champion_fitness": 0.93275,
"champion_tpr": 1, "champion_fpr": 0,
"gates_passed": 11, "gates_failed": 0,
"shape": { "population": 4, "cycles": 3 }
}
Loop A learns: if darwin-shield-bench consistently passes on the seeded
corpus, weight findings caught only by mcp-pentest higher.
| Code | Meaning |
|---|---|
| 0 | Bench ran (overall PASS or FAIL — distinguish via JSON overall.ok), or degraded |
| 1 | --alert-on-fail and overall.ok === false |
| 2 | Config error or upstream infrastructure failure |
When @metaharness/darwin is absent, emits {degraded: true, reason: 'metaharness-darwin-not-available'} and exits 0.