Back to Ruflo

ADR-098: Plugin Capability Sync + Token / Performance / Intelligence / Self-Optimization Pass

v3/docs/adr/ADR-098-plugin-capability-sync-and-optimization.md

3.6.3011.0 KB
Original Source

ADR-098: Plugin Capability Sync + Token / Performance / Intelligence / Self-Optimization Pass

Status: Proposed Date: 2026-05-04 Version: target v3.6.x (multi-iteration, no single-version pin) Supersedes: nothing Related: ADR-094 (transformers loader), ADR-095 (architectural gaps), ADR-096 (encryption-at-rest), ADR-097 (federation budget circuit breaker), plugins/ruflo-* directory

Context

The plugins/ruflo-* tree is the user-facing surface of Ruflo on Claude Code — 32 plugins distributed via the Ruflo marketplace, each bundling agent prompts, skills, slash commands, and (in some cases) hooks. End users install via /plugin install ruflo-X@ruflo and immediately get the agent / commands.

Recent shipped work (ADR-094, 095, 096, 097) added or modified capabilities that the plugin tree doesn't yet surface:

Recent capabilityPlugin that should know about itCurrent coverage
ADR-096 encryption-at-rest (CLAUDE_FLOW_ENCRYPT_AT_REST gate, fs-secure helpers)ruflo-aidefence, ruflo-security-audit, ruflo-rag-memory, ruflo-rvfNone of these mention it
ADR-097 federation budget circuit breaker (maxHops, maxTokens, maxUsd)ruflo-federation ✅, ruflo-cost-tracker should consume federation_spend eventsFederation has it; cost-tracker doesn't
validateEnv() loader-hijack denylistruflo-aidefence, ruflo-security-audit (relevant for threat agents)Not surfaced
validateBudget() / enforceBudget() (federation)ruflo-cost-trackerNot surfaced
AgentDB controllers activated in 3.6.24 (G7 — gnn, rvf, mut, att, gvb)ruflo-agentdb, ruflo-rag-memory, ruflo-knowledge-graphSkill docs don't mention them
3-tier model routing (haiku / sonnet / opus per ADR-026)All plugins with agentsSome plugins use model: opus where haiku would do

A scan of the 32 plugin trees (auto-extracted via scripts/inventory-capabilities.mjs) surfaced four categories of debt:

Audit findings

1. Capability sync (high priority — user-visible)

Only ruflo-federation references ADR-096 / ADR-097 / encryption / budget concepts. The other 31 plugins don't reference any post-3.6.13 capabilities. End users installing ruflo-aidefence or ruflo-security-audit see no mention of the new file-mode-0600 default, the encryption-at-rest gate, or the loader-hijack denylist — even though those plugins' agent prompts are explicitly about security posture.

2. Token-cost overage (medium priority — runtime cost)

Per-agent prompt sizes vary 19 → 105 lines. Outliers above 80 lines:

PluginAgentLinesReason
ruflo-cost-trackercost-analyst105Heavy command-table inlining
ruflo-adradr-architect96Lifecycle-state machine inlined
ruflo-ddddomain-modeler93DDD vocabulary table inlined
ruflo-iot-cognitumdevice-coordinator~80Trust-tier table inlined

Each line in the agent prompt is loaded into context every time the agent is spawned. A 100-line prompt at ~12 tokens/line is ~1200 tokens per spawn just for the agent definition — multiplied by spawn frequency, that's measurable spend. Reference tables and command catalogs belong in skills (loaded on-demand) or in a sibling REFERENCE.md file, not in the agent prompt itself.

3. Performance / model-tier mismatch (medium priority — cost)

Three agents use model: opus (the highest tier). Two are clearly justified by task complexity:

  • ruflo-federation/federation-coordinator — multi-phase coordination, trust scoring, audit-grade logging.
  • ruflo-neural-trader/trading-strategist — real-money trading decisions; opus is correct.
  • ruflo-security-audit/security-auditor — debatable; security review work is sonnet-tier in practice.

The third should drop to sonnet (~5× cheaper per token) unless the task scope actually warrants opus.

4. Intelligence / learning gap (low priority — self-improvement)

7 of 43 agents (16%) lack a hooks post-task --train-neural true invocation in their prompt. These agents complete tasks without feeding the SONA learning loop. The agent prompts that DO have it form the dominant pattern; the missing ones are an oversight that costs the system long-term improvement signal.

Specific gaps:

PluginAgentMissing hook
(audit script will produce the 7 names)post-task neural training

Lack of post-edit --train-neural is a smaller concern (post-edit hooks fire from the runtime, not from the agent), but the post-task call is agent-emitted and easy to standardize.

5. Self-optimization signal absence (low — long-term)

No plugin agent currently dispatches background workers (hooks worker dispatch --trigger optimize) on completion. The optimize, audit, testgaps workers exist precisely to consume successful agent runs as training data — but no plugin invokes them. This is a missed feedback loop.

Decision

Ship a 5-part remediation plan. Each part is one iteration; no single one needs the others to land.

Part 1 — Capability sync

For every plugin whose surface meaningfully overlaps a post-3.6.13 capability, add a brief reference (1 paragraph or ≤5 bullets) in the plugin README and the relevant agent prompt:

PluginAdd reference to
ruflo-aidefencevalidateEnv loader-hijack denylist; chmod 0600 file mode; encryption-at-rest gate (defense-in-depth pairing)
ruflo-security-auditSame set, plus the github-tools / update/executor shell injection patterns to scan for
ruflo-rag-memory, ruflo-rvfEncryption-at-rest gate (memory.db wraps under CLAUDE_FLOW_ENCRYPT_AT_REST=1)
ruflo-cost-trackerFederation budget breaker; federation_spend events; per-peer rolling aggregation API (when ADR-097 P3 lands)
ruflo-agentdb, ruflo-knowledge-graphThe 5 activated G7 controllers (gnn, rvf, mut, att, gvb) and their MCP tools
ruflo-federationAlready done in v0.2.0

Bump plugin versions where the surface materially changed (0.1.0 → 0.2.0).

Part 2 — Token-cost diet for fat agent prompts

Move reference tables / command catalogs out of the agent prompt and into either (a) a skill that the agent can load on-demand, or (b) a sibling REFERENCE.md file the agent reads only when needed. Target: keep agent prompts ≤ 60 lines.

Affected:

  • ruflo-cost-tracker/agents/cost-analyst.md (105 → ≤ 60)
  • ruflo-adr/agents/adr-architect.md (96 → ≤ 60)
  • ruflo-ddd/agents/domain-modeler.md (93 → ≤ 60)
  • ruflo-iot-cognitum/agents/device-coordinator.md (~80 → ≤ 60)

Acceptance: agent prompts under 60 lines AND agent still passes its existing skill tests (those that have them).

Part 3 — Model-tier rightsizing

Change ruflo-security-audit/agents/security-auditor.md from model: opusmodel: sonnet. Justification: security review is bounded-scope analysis that sonnet handles cleanly; opus's long-context advantage isn't load-bearing here. Track for a release cycle and revert if quality drops.

Part 4 — Intelligence / learning hook standardization

For every plugin agent without hooks post-task --train-neural true, append the standard 3-line tail:

bash
### Neural learning
After completing tasks, store the outcome:
`npx @claude-flow/cli@latest hooks post-task --task-id "$TASK_ID" --success $SUCCESS --train-neural true`

Targets the 7 agents flagged by audit. Adds ~3 lines per agent — ~21 lines net repository-wide. Standardizes the learning-feedback contract.

Part 5 — Self-optimization worker dispatch

For agents whose work materially contributes to long-term quality (coder, reviewer, tester, security-auditor, perf-analyzer, etc.), append a worker-dispatch line:

bash
### Self-optimization
On successful completion, trigger background optimization:
`npx @claude-flow/cli@latest hooks worker dispatch --trigger <relevant-worker> --task-id "$TASK_ID"`

Worker mapping per agent class:

Agent classWorker
coder, refactoroptimize
tester, testgentestgaps
reviewer, security-auditoraudit
docs, api-docsdocument
analyzer, perf-analyzerbenchmark

Lower priority than Parts 1-4 because workers run async and benefit from stable upstream signal — Part 4 should land first.

Scope guardrails

  • This ADR does not change runtime code in @claude-flow/cli. All edits are in plugins/ruflo-*/.
  • Each part is independently shippable.
  • No new ADR cycle unless a part surfaces a runtime gap (e.g. Part 5 might need a new MCP tool for worker telemetry; if so, separate ADR).
  • Per-plugin version bumps follow semver: capability sync = minor (0.1.0 → 0.2.0); token diet alone = patch (0.1.0 → 0.1.1).

Implementation status

PartScopeStatus
1Capability sync (6 plugins)pending
2Token diet (4 plugins)pending
3Model tier rightsizing (1 agent)pending
4Neural training hook (7 agents)pending
5Worker dispatch (variable)pending — lands after Part 4

Acceptance criteria

The pass is done when:

  • Each affected plugin (per Part 1) has a paragraph or bullet list referencing the relevant ADR-094/095/096/097 capability.
  • All 4 outlier agent prompts are ≤ 60 lines.
  • ruflo-security-audit/security-auditor.md is on model: sonnet.
  • All 43 plugin agents include a hooks post-task --train-neural true invocation.
  • At least 8 work-producing agents include a hooks worker dispatch invocation tied to the right background worker.
  • No regression in the plugin marketplace install path (/plugin install ruflo-X@ruflo still resolves).
  • Spot-check: ruflo doctor -c agentic-flow and the broader doctor output stays green.

Trade-offs

DecisionAlternativeWhy this
Reference tables → skills/REFERENCE.mdKeep them in the agent promptSkills load on-demand, REFERENCE.md loads on-explicit-read; either avoids paying tokens every spawn.
Sonnet for security-auditStay on opusIf post-rollout QA shows degradation, revert. The 5× cost difference is worth a trial period.
Standard post-task tail across all agentsHand-tune eachStandard tail is auditable; users can grep the plugin tree to confirm coverage.
Workers via hooks-dispatchDirect in-process triggerHooks are the existing surface; in-process dispatch would require a new API.
Per-plugin minor bump for capability syncBulk 0.2.0 across allSelective bumps signal which plugins materially changed; bulk would obscure that.

Risks

  1. Skills/REFERENCE refactor breaks agent flow — moving reference tables out of the prompt may break agents that implicitly relied on them being in-context. Mitigation: each token-diet PR runs the existing skill tests and a smoke spawn before merge.
  2. Sonnet drop hurts security-audit quality — possible. Roll out behind a doc note, monitor for one release cycle, revert if needed.
  3. Worker dispatch firehose — if every agent fires a worker, the daemon could fall behind. Mitigation: workers already have priority queues; if backpressure shows, cap dispatch frequency in the agent prompt to 1/N tasks.