v3/docs/adr/ADR-152-genome-similarity-search.md
Status: Accepted (spike landed iter 35 — both invariants pass)
Date: 2026-06-16 (revised same-day with spike result)
Parent: ADR-151 (Phase 3 scope shell — Harness Intelligence Layer)
Inherits: ADR-150's four architectural constraints (removable / optional / graceful / CI-gate)
Spike: plugins/ruflo-metaharness/scripts/_spike-similarity.mjs (iter-35 commit)
similarity(LEGAL, LEGAL).overall = 1.0000 ✓ Invariant 1 (self-match) — exact
similarity(LEGAL, SUPPORT).overall = 0.8296
similarity(LEGAL, DEVOPS).overall = 0.5840 ✓ Invariant 2 (vertical affinity) — support > devops
Per-component (LEGAL vs SUPPORT): cosine=0.9987 categorical=0.75 jaccard=0.2857
Per-component (LEGAL vs DEVOPS): cosine=0.9734 categorical=0 jaccard=0
Notable findings from the spike:
categorical: 0 for LEGAL/DEVOPS correctly fires because they share no enum field (different archetypes, different templates, different recommendedMode). This is a strong feature of the design — categorical disagreement is a clean kill-switch.ADR-151 Phase-3 §3.1 calls for a similarity function over MetaHarness genome + scorecard JSON. It is the critical-path dependency for §3.2 (Recommendation Engine), §3.3 (Fleet Drift), and §3.5 (Plugin Compatibility) — three of the four other Phase-3 sub-capabilities consume similarity scores.
What we have already (post-ADR-150 implementation):
harness genome <repo> emits a 7-section JSON blob: repo_type, agent_topology[], risk_score, mcp_surface, test_confidence, publish_readiness.harness score <repo> emits a 5-dimension JSON blob: harnessFit, compileConfidence, taskCoverage, toolSafety, memoryUsefulness, plus archetype and template.harness threat-model <repo> emits a worst severity + a list of categorized findings.oia-audit records bundle all three above per timestamp in metaharness-audit memory namespace.What we need: a function similarity(genomeA, genomeB) → number ∈ [0,1] plus a per-dimension breakdown explaining where the two harnesses agree or differ.
Implement genome similarity as a pure-TS function in the existing ruflo-metaharness plugin with three components:
Project the structured genome + scorecard into a fixed-length numerical vector:
| Index | Source | Field | Normalization |
|---|---|---|---|
| 0 | score | harnessFit | already 0..100 → divide by 100 |
| 1 | score | compileConfidence | divide by 100 |
| 2 | score | taskCoverage | divide by 100 |
| 3 | score | toolSafety | divide by 100 |
| 4 | score | memoryUsefulness | divide by 100 |
| 5 | genome | risk_score | already 0..1 |
| 6 | genome | test_confidence | already 0..1 |
| 7 | genome | publish_readiness | already 0..1 |
| 8 | score | estCostPerRunUsd | log-transformed: log10(usd + 0.001) / log10(10) clamped to 0..1 |
Cosine similarity over these 9 dims gives a [0, 1] score where 1 = identical and 0 = orthogonal. Cheap, deterministic, byte-identical for identical inputs.
Three fields are categorical:
| Field | Source | Possible values |
|---|---|---|
repo_type | genome | e.g. node_mcp_ci, python_lib, rust_cli |
archetype | score | e.g. typescript-sdk-harness, python-agent-harness |
template | score | one of the 20 metaharness templates |
recommendedMode | score | CLI, MCP, or CLI + MCP |
Each contributes 1 if matching, 0 if not. Sum / 4 → categorical-agreement score.
agent_topology[]The agent_topology field is an array (e.g. ["maintainer", "tester", "security", "release"]). Jaccard similarity = |A ∩ B| / |A ∪ B|.
overallSimilarity = 0.6 · cosine + 0.25 · categorical + 0.15 · jaccard
Weights chosen so that:
Return shape:
interface SimilarityResult {
overall: number; // [0,1]
components: {
cosine: number; // [0,1]
categorical: number; // [0,1]
jaccard: number; // [0,1]
};
perDimension: Record<string, {
a: number | string | string[];
b: number | string | string[];
contribution: number; // signed [-w, +w]
}>;
generatedAt: string;
}
The perDimension breakdown lets consumers explain why two harnesses scored as they did — critical for the §3.2 Recommendation Engine's confidence calculation and the §3.3 Drift Detection's alert reason.
plugins/ruflo-metaharness/scripts/_similarity.mjs (shared module convention — see iter-1 _harness.mjs and iter-73 _sessions.mjs).harness-similarity invoked as npx ruflo metaharness similarity --a <genomeA.json> --b <genomeB.json> (or --a-key/--b-key for memory-namespace lookup, mirroring iter-15 audit-trend).mcp__claude-flow__metaharness_similarity so agents can call it during conversation.NO new dependency on @metaharness/* — the function operates on JSON shapes that the existing CLI already emits. Genuinely zero blast radius on ADR-150's four constraints:
degraded if inputs are malformed)npx required)Before ADR-152 is marked Accepted, ship a 30-LOC proof that:
harness genome + harness score JSON files from disk.cosine over the 9 numeric dims.The spike script: plugins/ruflo-metaharness/scripts/_spike-similarity.mjs. Lives in the plugin to avoid polluting global scripts; deleted after ADR-152 graduates.
threat-model worst because it's categorical and would inflate the categorical component too far. If worst regression is what matters for drift detection (§3.3), drift consumers must check worst separately.Alternative A: Use sentence embeddings over the README + agent prompt text.
Alternative B: Skip cosine, use only categorical + jaccard.
harnessFit: 82 vs harnessFit: 45 should clearly contribute to similarity, and categorical can't see that.Alternative C: Train a similarity model on real harness pairs.
mcp-policy.json? Lean no — the 0.6/0.25/0.15 split should be a single global default. Per-org tuning is out of scope for §3.1.--format mermaid. Cheap to add post-spike.audit-trend script? audit-trend already diffs two timestamps of the SAME harness; this ADR adds a separate primitive for diffing two DIFFERENT harnesses. Both should live as siblings, not subtypes.audit-trend — plugins/ruflo-metaharness/scripts/audit-trend.mjs (sibling drift-detection primitive)_harness.mjs — shared-module convention referenceharness genome / score / threat-model outputs documented in ruvnet/agent-harness-generator source