plugins/ruflo-cost-tracker/docs/adrs/0003-implementation-arc-v0.5-to-v0.15.md
ADR-0001 fixed namespace routing (v0.2.2). ADR-0002 added the Agent Booster integration with verification (v0.3.0–v0.4.0). At that point the plugin had a credible verification harness — but it was almost entirely a consumer of the cost-tracking AgentDB namespace. Nothing in the plugin actually wrote to that namespace. Every skill operated on empty data.
This ADR documents the implementation arc that closed that gap (v0.4.0 → v0.15.0) and the deliberate scope decisions made along the way.
Eleven priorities were implemented as separate plugin-local commits, each with version bump + smoke contract growth. Headline numbers from a fresh measured bench (2026-05-05, corpus v3 = 25 cases):
| Endpoint | Tier 1 win | Adversarial correct | Avg latency | Cost / edit | Speedup vs Booster |
|---|---|---|---|---|---|
| Agent Booster (WASM) | 18/18 | escalates 7/7 | 0.36 ms | $0 | — |
| Gemini 2.0 Flash | 18/18 | 3/7 | 807.56 ms | $0.000028 | 2243× |
| Claude Sonnet 4.6 | 18/18 | 2/7 | 1270.64 ms | $0.000933 | 3530× |
| Claude Opus 4.7 | 18/18 | 5/7 | 1563.72 ms | $0.005943 | 4344× |
cost-track (v0.5.0) — the most embarrassing gap. Reads ~/.claude/projects/<encoded-cwd>/<session>.jsonl, sums per-message usage by model, persists to cost-tracking namespace. Without this every other skill operated on empty data.cost-budget-check (v0.6.0) — the README documented a 50/75/90/100% alert ladder but no code enforced it. Wired the producer (cost track) to the consumer (budget check) with a real fail-closed exit-1 path on HARD_STOP.hooks_model-outcome (v0.7.0) — cost-optimize step 8 was prose. Replaced with a wrapper script (outcome.mjs) and a cost outcome subcommand so applied recommendations actually train the router.compact.mjs (v0.8.0) — dropped the inline node --input-type=module -e '...' block from cost-compact-context. True MCP wrapping (modifying @claude-flow/cli source) deliberately deferred — see "Riskiest assumption" below.cost-trend (v0.9.0) — the binary smoke gate misses curves. Trend across all runs/*.json flags drifts the gate doesn't.expectedTier1 field and 7 adversarial cases. Win rate now means something (was tautological at 100% across all endpoints on v1).cost-conversation (v0.12.0) — per-conversation lens (different aggregation axis from cost-report's per-agent / per-model).cost-export (v0.13.0) — Prometheus textfile collector + webhook POST. External observability.cost-federation (v0.14.0) — ADR-097 Phase 3 consumer wired. Activates when upstream emits.cost-summary (v0.15.0) — stable programmatic JSON contract for inter-plugin consumption.Positive:
cost-track writing to cost-tracking) and four lenses on the same data (report, optimize, conversation, summary). Before this work, every skill consumed an empty namespace.budget check && spawn … fail-closed pattern guards expensive operations.Negative:
cost_report / cost_summary MCP tools requires modifying v3/@claude-flow/cli/src/mcp-tools/, which is outside plugin-local scope and deserves its own ADR. The current summary.mjs provides equivalent functionality via Bash shell-out, but it is not an MCP tool.npx @claude-flow/cli memory store rejects keys that memory retrieve doesn't see (a UNIQUE-constraint inconsistency in the @claude-flow/cli memory layer). budget.mjs works around this by writing timestamped keys (budget-config-<ms>) and resolving the latest at retrieve time. This is functional but indicates an upstream bug that should be fixed in a separate ADR.federation_send completion events flow into the federation-spend namespace. Documented; not a hard issue but means the metric is currently zero.BENCH_LLM_BASELINE=1 BENCH_ANTHROPIC=1.Neutral:
allowed-tools (no wildcards). Smoke step 10 enforces this.agent-booster installed when the bench isn't being run. All other skills (track, budget, outcome, trend, conversation, export, federation, summary) avoid the booster import entirely. Verified live: cost-track, cost-budget-check, cost-summary all run cleanly outside the v3/ tree.Smoke contract is at 44 checks (bash plugins/ruflo-cost-tracker/scripts/smoke.sh). Live verified:
cost-track captured this session: 1597 messages, $1546.36 on Opus 4.7cost-budget set / get / check: $2500 budget, 61.9% utilization → 🟡 INFOoutcome.mjs: emitted haiku=success to router, [OK] Outcome recordedcost-trend: 6+ runs analyzed, 100% win rate stable, latency drifted 0.58→0.36 mscost-export --prometheus: textfile cleanly written with all 6 metric typescost-federation: with mock events, peer-bob $7.50 → ⚠ flagged over $5/24h thresholdcost-summary --format json: stable contract verifiedBench result persisted at docs/benchmarks/runs/latest.json (corpus v3, 4 endpoints).
15fae7a5495026f025fb3baf721c20ea — bench proof + corpus + harnessc0b44f45 — driver of this implementation arc; cancelled at end of sessioncost_report / cost_summary as registered MCP tools, not Bash shell-out)memory store UNIQUE-constraint inconsistencyruflo-federation plugin, not this one)