v3/@claude-flow/guidance/docs/adrs/ADR-G022-adversarial-model.md
Status: Accepted Date: 2026-02-01 Author: Guidance Control Plane Team
The governance substrate assumes agents are well-intentioned but fallible. It does not model adversarial scenarios: prompt injection through tool inputs, memory poisoning through coordinated writes, privilege escalation through shard manipulation, or collusion between agents to circumvent gates. In a multi-agent system where agents process external inputs and communicate with each other, these threat vectors are real and must be addressed at the governance layer.
Introduce three security components: ThreatDetector, CollusionDetector, and MemoryQuorum.
ThreatDetector: Six threat categories with configurable detection patterns:
| Category | Detection Method | Default Pattern |
|---|---|---|
prompt-injection | Regex + heuristic | /ignore previous|system prompt|you are now|forget instructions/i |
memory-poisoning | Regex + rate analysis | /\b(admin|root|sudo)\b.*=.*true/i, >10 writes/minute |
shard-manipulation | Pattern matching | /shard[_-]?id|shard[_-]?override|inject[_-]?shard/i |
malicious-delegation | Pattern matching | /delegate.*all|transfer.*authority|impersonate/i |
privilege-escalation | Pattern matching | /\b(chmod|chown|setuid|capabilities)\b/i |
data-exfiltration | Regex + encoding | /\b(curl|wget|fetch)\s+https?:\/\//i, base64 blocks |
analyzeInput(input, context): scans tool inputs for injection, exfiltrationanalyzeMemoryWrite(key, value, agentId): detects poisoning patterns and rate violationsgetThreatScore(agentId): aggregated score (0-1) with recency weightingCollusionDetector:
detectCollusion() identifies three suspicious patterns:
CollusionReport with detected flag, suspicious patterns, agents involved, and confidence scoresMemoryQuorum:
propose(key, value, proposerId) → proposalIdvote(proposalId, voterId, approve) → records voteresolve(proposalId) → checks if quorum threshold met (default 0.67 = 2/3 majority)