v3/docs/adr/ADR-145-plugin-supply-chain-integrity-memory-governance.md
Status: Proposed Date: 2026-06-02 Issue: ruvnet/ruflo#2254 Related: ADR-004 (Plugin Architecture), ADR-006 (Unified Memory), ADR-015 (Unified Plugin System), ADR-131 (ToolOutputGuardrail — content layer), ADR-144 (Authorization Propagation — runtime layer)
ADR-131 closed the content boundary and ADR-144 closes the runtime action boundary. Both assume the code that's about to run is trustworthy in the first place. Two Grade A papers published April–May 2026 demonstrate that assumption is broken under current Ruflo defenses.
Semantic Compliance Hijacking — SCH (arXiv:2605.14460, May 14, 2026, Grade A): Eliminates the recognizable payload entirely. Instead of injecting malicious code, SCH wraps malicious intent in natural-language "compliance rule" text within plugin descriptions. The LLM agent reads the rule and generates the harmful code itself at runtime. Benchmark: 77.67% confidentiality breach, 67.33% RCE success, 0.00% scanner detection. Signature-based defenses are completely blind because there is no static payload to sign against.
DDIPE (arXiv:2604.03081, Apr 3, 2026, Grade A): Document-Driven Implicit Payload Execution embeds malicious logic in plugin documentation and configuration templates. Tested across 4 frameworks, 5 models, 1,070 adversarial skills covering 15 MITRE ATT&CK categories. Bypass rate 11.6–33.5%; 2.5% of adversarial skills evade both detection and alignment systems. Four CVEs were disclosed from this research.
Ruflo's current plugin install path (v3/@claude-flow/cli/src/plugins/store/discovery.ts) fetches manifests from IPFS via Pinata. It carries no signature verification and no semantic intent analysis.
MINJA (arXiv:2601.05504, Jan 2026, Grade A): Query-only memory injection achieves 95% success under idealized conditions; 28–38% on production-deployed strong models. The realistic-production number is still unacceptable for a system where compromised memory steers future agent behavior.
Plan Injection (arXiv:2506.17318, Jun 2026, Grade A): Context-chained memory attacks achieve 3× higher attack success than prompt-based injection and +17.7% exfiltration gain by creating logical bridges between unrelated memory entries. The fix is namespace isolation with explicit write grants — agents cannot poison a namespace they have no business writing to.
Mnemonic Sovereignty survey (arXiv:2604.16548, Apr 2026, Grade A): Catalogs nine governance primitives required for secure long-term agent memory. No existing published architecture satisfies all nine. The Ruflo gap: the shared collaboration namespace (and all AgentDB namespaces) accepts writes from any agent with no per-namespace authorization. ADR-131 catches what gets read out; this ADR catches who is allowed to write in.
Distinct from existing security ADRs:
| Layer | ADR | Concern |
|---|---|---|
| Install-time integrity | this ADR (Part A) | Is the plugin code trustworthy enough to load? |
| Memory write authority | this ADR (Part B) | Is this agent allowed to write to this namespace? |
| Runtime action authority | ADR-144 | Is this agent allowed to call this tool right now? |
| Tool/memory output content | ADR-131 | Does this content contain hijacking instructions? |
ADR-145 introduces two new trust boundaries, two new module surfaces, a protocol addition to the IPFS plugin registry manifest format, and an API addition to AgentDB. All four are architectural.
PluginIntegrityVerifierAdd PluginIntegrityVerifier to @claude-flow/security with two verification stages run at plugins install time.
Stage 1 — Signature verification (blocks DDIPE's static-payload variants):
discovery.ts MUST refuse to install unsigned plugins when CLAUDE_FLOW_STRICT_PLUGINS=true (default: warn-only for backwards compatibility).v3/@claude-flow/cli/src/plugins/trust/trust-anchors.json. Edits are gated on CODEOWNERS review.Stage 2 — Semantic intent scan (blocks SCH):
plugins install, the verifier pipes every natural-language field (description, README excerpt, "compliance rules", any field that ends up in agent context) through a lightweight intent classifier.CLAUDE_FLOW_PLUGIN_SCH_THRESHOLD (default 0.8).plugins install in environments without LLM credentials.Implementation targets:
v3/@claude-flow/security/src/plugins/integrity-verifier.ts (new)v3/@claude-flow/cli/src/plugins/store/discovery.ts — verification hook on installv3/@claude-flow/cli/src/plugins/trust/trust-anchors.json (new)Address governance primitives 1–3 from the Mnemonic Sovereignty taxonomy (write authorization, read authorization, update authorization). Primitives 4–9 (retention, decay, audit, etc.) deferred for a future ADR.
writeNamespaces: string[] grant.MemoryWriteDenied on write attempt.readNamespaces grant is optional in v1 and becomes required in v4 (matching the strict-mode escalation in ADR-144).Implementation targets:
v3/@claude-flow/memory/src/namespaces/authorization.ts (new)v3/@claude-flow/memory/src/agent-db.ts — grant enforcementv3/@claude-flow/cli/src/agent/spawn.ts — writeNamespaces parameter| Phase | Scope | Where |
|---|---|---|
| P1 | PluginIntegrityVerifier skeleton + Stage-1 signature path; trust-anchors file with the existing official-plugin keys | @claude-flow/security/src/plugins/, @claude-flow/cli/src/plugins/trust/ |
| P2 | Stage-2 semantic scan (pattern fallback first, classifier opt-in) | same files |
| P3 | Memory namespace ACL primitives 1–3 in AgentDB | @claude-flow/memory/src/namespaces/ |
| P4 | agent spawn --write-namespaces plumbing through every spawn callsite | @claude-flow/cli/src/agent/spawn.ts, hooks |
| P5 | Strict-mode flips to default in v4.0; legacy mode requires explicit env var to re-enable | release docs + breaking-change ADR |
CLAUDE_FLOW_STRICT_PLUGINS=false). Existing unsigned plugins continue to install with a warning.writeNamespaces retain legacy full-access until CLAUDE_FLOW_STRICT_MEMORY=true is set.audit-env-var-precedence.mjs with rationale.Pattern-matching SCH at content boundary (ADR-131 extension). SCH attacks succeed against content screening because the malicious content is the description — there's no instruction-shaped hijack to match. Catching it requires semantic intent classification at install, before the description ever enters agent context.
Per-plugin sandboxing instead of signing. Process-level sandboxing buys defense-in-depth but doesn't solve SCH: the malicious behavior is generated by the host model, not by sandboxed plugin code. Signing addresses the trust question; sandboxing addresses the blast-radius question — both belong on the roadmap, but signing closes the more urgent gap.
Skip Part B; rely on ADR-131 for memory. ADR-131 catches read-side injection. Plan Injection (arXiv:2506.17318) shows that allowing arbitrary writes lets attackers stage payloads that look innocuous individually but compose into a hijack across multiple reads — content screening cannot catch that compositional pattern. Write authorization is the missing piece.
Positive:
Negative / risks:
plugins install — acceptable at install time, would be unacceptable at runtime.agent_spawn callsite that uses shared namespaces. Existing pipelines fail open in legacy mode until v4.Telemetry / observability:
pass, signature-missing, signature-invalid, sch-blocked) MUST be logged with plugin id, publisher fingerprint, and category.MemoryWriteDenied MUST be logged with agent id, namespace, and the granted-namespaces set at spawn time.P1 lands with:
plugins install ./unsigned-plugin warns by default, errors under CLAUDE_FLOW_STRICT_PLUGINS=true.writeNamespaces: ['a'] cannot memory_store to namespace b under strict-memory mode; legacy mode allows it with a warning log.