v3/docs/adr/ADR-076-structured-distillation.md
Status: Accepted — Implemented in ruflo 3.10.16 Date: 2026-05-30 Tracking: #2241 — Dream Cycle 2026-05-30 performance scan Paper: arXiv:2603.13017 (Grade A, March 2026) — "Structured Distillation of Agent Exchanges: 4-field schema for 11× compression and improved retrieval MRR"
The Dream Cycle 2026-05-30 scan (#2241) identified Structured Distillation as the highest-ROI intelligence finding from a 2026 Grade-A paper that maps directly onto ruflo's trajectory memory: the paper compresses agent exchanges from ~371 to ~38 tokens (≈11×) using a four-field schema, and shows retrieval MRR rising from 0.745 (raw) to 0.759 (distilled, Δ +0.014) on a 214 K-pair consensus-graded corpus.
ADR-074 wired the self-learning surfaces; ADR-075 unified the four stat aggregators. Both fixed honesty — making the surfaces report what they actually do. ADR-076 is the first round-C quality win: a real SOTA-paper alignment with measured proof, not just wiring.
Adopt the 4-field schema for trajectory step content:
interface DistilledContent {
summary: string; // first sentence — the headline of the exchange
detail: string; // the rest of the content — kept for fidelity
labels: string[]; // domain tokens: verbs (refactor/fix/add/…) + camelCase nouns
paths: string[]; // file paths and file:line references
}
Schema lives in v3/@claude-flow/cli/src/memory/structured-distill.ts. The serialiser (serialiseDistilled) places labels and paths at the front so the embedder allocates more probability mass to high-signal tokens — that ordering is what the paper credits for the MRR gain.
The extractor is rule-based, deterministic, dependency-free, and sub-millisecond. A future round can plug a learned distiller (LLM / cross-encoder) into the same schema as a drop-in replacement; the corpus + harness already exist as the gate.
distillTrajectoryContent(raw) — extracts the 4 fields.serialiseDistilled(d) — produces the embedding-ready string with high-signal tokens first.distillAndSerialise(raw) — convenience: distill + serialise.compressionRatio(raw) — utility for tracking byte-level shrink (1.0 = parity, >1 = smaller).bench/trajectory-mrr-corpus.json — 30 paired (raw, query) trajectories drawn from the recent ruflo issue-fix history.scripts/benchmark-trajectory-mrr.mjs — runs raw vs distilled retrieval, computes MRR, writes a run JSON.docs/benchmarks/runs/trajectory-mrr-latest.json — bridge ONNX embedder (Xenova/all-MiniLM-L6-v2, 384-dim), corpus N=30:
| Metric | Raw | Distilled | Δ | Direction |
|---|---|---|---|---|
| MRR | 0.0964 | 0.1367 | +0.0403 (+41.8%) | ✅ distilled better |
| Total bytes | 9,149 | 12,378 | 0.74× compression | — bigger (honest tradeoff) |
| Distilled wins | — | — | — | TRUE |
Honest comparison to the paper (arXiv:2603.13017):
| Our run | Paper | |
|---|---|---|
| Embedder | bridge ONNX (live MCP path) | learned cross-encoder |
| Corpus | N=30 hand-curated ruflo fixes | 214 K consensus-graded pairs |
| Distiller | rule-based regex | learned LLM-based |
| MRR delta | +0.0403 (+41.8% relative) | +0.014 (+1.9% relative) |
| Compression | 0.74× (distilled grew by 35%) | 9.76× (371→38 tokens) |
The direction matches the paper (distilled improves MRR); the relative delta is larger in our corpus (small + curated, so a high-signal-token serialisation order pays more). The byte compression does NOT match because a rule-based distiller can't safely drop content; a learned distiller is required to hit the paper's 11×. We don't claim the byte number — we claim the schema, the harness, and the MRR direction.
distillAndSerialise() into recordTrajectory() at write time so the embedded form of every stored step is distilled. The infrastructure is in place; the live integration is the next ADR.__tests__/structured-distill-2241.test.ts — 9 tests:
scripts/benchmark-trajectory-mrr.mjs — committed run shows distilled MRR > raw MRR with the real ONNX embedder.tsc -b); full CLI suite green modulo pre-existing flakes documented in ADR-074.git clone https://github.com/ruvnet/ruflo && cd ruflo
npm install && ( cd v3/@claude-flow/cli && npx tsc -b )
# Schema + extractor tests
( cd v3/@claude-flow/cli && npx vitest run __tests__/structured-distill-2241.test.ts )
# MRR proof benchmark (uses the bridge ONNX embedder when available;
# falls back to hash-deterministic with an explicit "degraded" warning)
node v3/@claude-flow/cli/scripts/benchmark-trajectory-mrr.mjs
# → MRR raw 0.0964 → distilled 0.1367 (Δ +0.0403) on the committed 30-entry corpus