v3/implementation/adrs/ADR-022-aidefence-integration.md
Proposed - Design Review
2026-01-12
The aidefence npm package (v2.1.1) provides a production-ready AI Manipulation Defense System (AIMDS) with capabilities that complement and enhance Claude Flow V3's security architecture:
| Component | Performance | Description |
|---|---|---|
| Detection Layer | <10ms (~8ms actual) | Pattern matching, prompt injection (50+ patterns), PII detection |
| Analysis Layer | <100ms (~80ms actual) | Behavioral analysis, Lyapunov chaos detection, LTL policy verification |
| Response Layer | <50ms | Adaptive mitigation with 25-level meta-learning (strange-loop) |
| API Throughput | >12,000 req/s | Production-grade performance |
| aidefence Feature | Claude Flow V3 Equivalent | Synergy |
|---|---|---|
| AgentDB integration | @claude-flow/memory with AgentDB | Direct compatibility - both use AgentDB for vector search |
| HNSW threat search | HNSW pattern search (150x faster) | Shared infrastructure - unified threat pattern index |
| Prompt injection detection | Security domain service | Enhancement - 50+ patterns vs current regex-based |
| Behavioral analysis | SecurityDomainService.detectThreats() | Enhancement - temporal/chaos analysis |
| Meta-learning (strange-loop) | ReasoningBank pattern learning | Integration - shared learning substrate |
| Express REST API | MCP server HTTP transport | Bridge - unified security API |
| Prometheus metrics | CLI performance metrics | Observability - unified dashboards |
The current @claude-flow/security module addresses CVE-2, CVE-3, HIGH-1, HIGH-2 but lacks:
Integrate aidefence as a security enhancement layer within Claude Flow V3 using a bounded context approach with clear domain boundaries.
┌─────────────────────────────────────────────────────────────────────────────┐
│ Claude Flow V3 Security Domain │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────────────────┐ ┌────────────────────────────────────┐ │
│ │ @claude-flow/security │ │ @claude-flow/aidefence │ │
│ │ (Core Security Context) │ │ (AI Defense Context) │ │
│ ├────────────────────────────┤ ├────────────────────────────────────┤ │
│ │ • CVE remediation │◄──►│ • Prompt injection detection │ │
│ │ • Password hashing │ │ • Behavioral analysis │ │
│ │ • Safe execution │ │ • Adaptive response │ │
│ │ • Path validation │ │ • Meta-learning (strange-loop) │ │
│ │ • Token generation │ │ • PII detection │ │
│ │ • Input validation │ │ • Policy verification (LTL) │ │
│ └────────────────────────────┘ └────────────────────────────────────┘ │
│ │ │ │
│ └──────────────┬─────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ Shared Security Infrastructure Layer │ │
│ ├──────────────────────────────────────────────────────────────────────┤ │
│ │ • AgentDB Vector Store (HNSW-indexed threat patterns) │ │
│ │ • ReasoningBank (shared learning patterns) │ │
│ │ • MCP Security Endpoints │ │
│ │ • Prometheus Metrics │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
@claude-flow/security)Responsibility: Foundational security primitives and CVE remediation
// Domain entities remain unchanged
interface CoreSecurityContext {
passwordHasher: PasswordHasher; // CVE-2 fix
credentialGenerator: CredentialGenerator; // CVE-3 fix
safeExecutor: SafeExecutor; // HIGH-1 fix
pathValidator: PathValidator; // HIGH-2 fix
inputValidator: InputValidator;
tokenGenerator: TokenGenerator;
}
@claude-flow/aidefence) - NEWResponsibility: AI-specific adversarial defense
// New domain entities from aidefence
interface AIDefenseContext {
// Detection subdomain
detection: {
promptInjectionDetector: PromptInjectionDetector;
piiDetector: PIIDetector;
patternMatcher: AhoCorasickMatcher;
};
// Analysis subdomain
analysis: {
behavioralAnalyzer: BehavioralAnalyzer;
chaosDetector: LyapunovChaosDetector;
policyVerifier: LTLPolicyVerifier;
anomalyDetector: StatisticalAnomalyDetector;
};
// Response subdomain
response: {
mitigationEngine: AdaptiveMitigationEngine;
metaLearner: StrangeLoopMetaLearner;
rollbackManager: RollbackManager;
};
}
Translate between aidefence and claude-flow domains:
// v3/@claude-flow/aidefence/src/infrastructure/aidefence-adapter.ts
import { DefenseResult as AIDefenseResult } from 'aidefence';
import { ThreatDetectionResult } from '@claude-flow/security';
export class AIDefenceAdapter {
private aidefence: AIMDSClient;
constructor(config: AIDefenceConfig) {
this.aidefence = new AIMDSClient(config);
}
/**
* Translate aidefence detection result to claude-flow threat format
*/
async detectThreats(input: string): Promise<ThreatDetectionResult> {
const result: AIDefenseResult = await this.aidefence.defend({
action: input,
source: 'claude-flow-agent'
});
return this.translateToThreatResult(result);
}
/**
* Batch analysis for swarm coordination
*/
async analyzeAgentBehavior(
agentId: string,
actions: string[]
): Promise<BehavioralAnalysisResult> {
const embeddings = await this.generateActionEmbeddings(actions);
return this.aidefence.analyzeBehavior({
entityId: agentId,
actionEmbeddings: embeddings,
temporalWindow: '1h'
});
}
/**
* Store threat pattern in shared AgentDB
*/
async learnThreatPattern(
pattern: ThreatPattern,
effectiveness: number
): Promise<void> {
// Store in shared AgentDB namespace
await this.aidefence.storePattern({
...pattern,
namespace: 'security_threats',
reward: effectiveness
});
}
private translateToThreatResult(
result: AIDefenseResult
): ThreatDetectionResult {
return {
safe: result.status === 'safe',
threats: result.detections.map(d => ({
type: this.mapThreatType(d.type),
severity: this.mapSeverity(d.confidence),
description: d.description,
location: d.location
}))
};
}
private mapThreatType(aidefenceType: string): string {
const mapping: Record<string, string> = {
'prompt_injection': 'prompt-injection',
'jailbreak': 'jailbreak-attempt',
'pii_exposure': 'credential-exposure',
'adversarial': 'adversarial-input'
};
return mapping[aidefenceType] ?? aidefenceType;
}
private mapSeverity(confidence: number): 'low' | 'medium' | 'high' | 'critical' {
if (confidence >= 0.9) return 'critical';
if (confidence >= 0.7) return 'high';
if (confidence >= 0.5) return 'medium';
return 'low';
}
}
// v3/@claude-flow/mcp/src/tools/aidefence-tools.ts
export const aidefenceTools: ToolDefinition[] = [
{
name: 'aidefence_scan',
description: 'Scan input for AI manipulation attempts (prompt injection, jailbreak, PII)',
inputSchema: {
type: 'object',
properties: {
input: { type: 'string', description: 'Input to scan' },
mode: { enum: ['quick', 'thorough', 'paranoid'], default: 'thorough' }
},
required: ['input']
},
handler: async (params, context) => {
const adapter = context.get<AIDefenceAdapter>('aidefence');
return adapter.detectThreats(params.input);
}
},
{
name: 'aidefence_analyze_behavior',
description: 'Analyze agent behavior patterns for anomalies',
inputSchema: {
type: 'object',
properties: {
agentId: { type: 'string' },
timeWindow: { type: 'string', default: '1h' }
},
required: ['agentId']
},
handler: async (params, context) => {
const adapter = context.get<AIDefenceAdapter>('aidefence');
const actions = await context.get<Memory>('memory')
.searchByAgent(params.agentId, params.timeWindow);
return adapter.analyzeAgentBehavior(params.agentId, actions);
}
},
{
name: 'aidefence_verify_policy',
description: 'Verify agent behavior against LTL security policies',
inputSchema: {
type: 'object',
properties: {
agentId: { type: 'string' },
policy: { type: 'string', description: 'LTL policy formula' }
},
required: ['agentId', 'policy']
},
handler: async (params, context) => {
const adapter = context.get<AIDefenceAdapter>('aidefence');
return adapter.verifyPolicy(params.agentId, params.policy);
}
}
];
// v3/@claude-flow/cli/src/commands/security.ts (extension)
// Add aidefence subcommands to existing security command
securityCommand
.command('defend')
.description('Run AI manipulation defense scan')
.option('-i, --input <text>', 'Input text to scan')
.option('-f, --file <path>', 'File to scan')
.option('-m, --mode <mode>', 'Scan mode: quick|thorough|paranoid', 'thorough')
.option('--json', 'Output as JSON')
.action(async (options) => {
const adapter = await getAIDefenceAdapter();
const input = options.file
? await readFile(options.file, 'utf-8')
: options.input;
const result = await adapter.detectThreats(input);
if (options.json) {
console.log(JSON.stringify(result, null, 2));
} else {
printDefenseResult(result);
}
});
securityCommand
.command('behavior')
.description('Analyze agent behavioral patterns')
.requiredOption('-a, --agent <id>', 'Agent ID to analyze')
.option('-w, --window <duration>', 'Time window', '1h')
.action(async (options) => {
const adapter = await getAIDefenceAdapter();
const result = await adapter.analyzeAgentBehavior(
options.agent,
options.window
);
printBehaviorAnalysis(result);
});
// v3/@claude-flow/cli/src/hooks/aidefence-hooks.ts
export const aidefenceHooks: HookDefinition[] = [
{
name: 'pre-agent-input',
description: 'Scan agent inputs for manipulation attempts',
handler: async (context) => {
const { input, agentId } = context;
const adapter = getAIDefenceAdapter();
const result = await adapter.detectThreats(input);
if (!result.safe) {
const critical = result.threats.filter(t => t.severity === 'critical');
if (critical.length > 0) {
throw new SecurityError(
`Blocked: ${critical.length} critical threats detected`,
{ threats: critical }
);
}
// Log non-critical threats
await logSecurityEvent('threats_detected', {
agentId,
threats: result.threats
});
}
return { proceed: result.safe || result.threats.every(t => t.severity !== 'critical') };
}
},
{
name: 'post-agent-action',
description: 'Learn from agent actions for behavioral modeling',
handler: async (context) => {
const { agentId, action, result, success } = context;
const adapter = getAIDefenceAdapter();
// Feed action to meta-learner
await adapter.recordAction({
agentId,
action,
result,
success,
timestamp: Date.now()
});
// Periodically check for behavioral anomalies
if (Math.random() < 0.1) { // 10% sampling
const analysis = await adapter.analyzeAgentBehavior(agentId, '10m');
if (analysis.anomalyScore > 0.8) {
await notifySecurityTeam('behavioral_anomaly', { agentId, analysis });
}
}
}
}
];
# v3/@claude-flow/cli/.claude/skills/aidefence.yaml
name: aidefence
version: 1.0.0
description: AI Manipulation Defense System integration for real-time threat detection
author: rUv
capabilities:
- prompt_injection_detection
- behavioral_analysis
- pii_detection
- policy_verification
- adaptive_mitigation
commands:
scan:
description: Scan input for AI manipulation attempts
usage: /aidefence scan <input>
options:
- name: mode
type: choice
choices: [quick, thorough, paranoid]
default: thorough
analyze:
description: Analyze agent behavior for anomalies
usage: /aidefence analyze <agent-id>
options:
- name: window
type: string
default: "1h"
policy:
description: Verify agent against security policy
usage: /aidefence policy <agent-id> <ltl-formula>
hooks:
pre-agent-input:
enabled: true
config:
block_critical: true
log_all: true
post-agent-action:
enabled: true
config:
sampling_rate: 0.1
anomaly_threshold: 0.8
integration:
agentdb:
namespace: security_threats
hnsw_enabled: true
reasoningbank:
store_patterns: true
learn_mitigations: true
# v3/@claude-flow/cli/.claude/agents/v3/security-architect.yaml (enhancement)
# Add to existing security-architect capabilities
capabilities:
# ... existing capabilities ...
# NEW: aidefence integration
- aidefence_threat_detection # Real-time prompt injection detection
- aidefence_behavioral_analysis # Temporal anomaly detection
- aidefence_policy_verification # LTL security policy verification
- aidefence_meta_learning # Adaptive mitigation learning
# Add aidefence-specific hooks
hooks:
pre: |
# ... existing pre-hook ...
# NEW: Check for similar attack patterns via aidefence
ATTACK_PATTERNS=$(npx claude-flow@v3alpha security defend --input "$TASK" --mode thorough --json)
if echo "$ATTACK_PATTERNS" | jq -e '.threats | length > 0' > /dev/null; then
echo "⚠️ Potential manipulation detected in task request"
echo "$ATTACK_PATTERNS" | jq -r '.threats[] | " - \(.type): \(.description)"'
fi
post: |
# ... existing post-hook ...
# NEW: Feed security assessment to aidefence meta-learner
npx claude-flow@v3alpha security behavior --agent "security-architect-$(date +%s)" --record-action "$TASK"
// v3/@claude-flow/memory/src/config/security-namespaces.ts
export const securityNamespaces: NamespaceConfig[] = [
{
name: 'security_threats',
description: 'Shared threat pattern storage (aidefence + claude-flow)',
vectorDimension: 384,
hnswConfig: {
m: 16,
efConstruction: 200,
efSearch: 100
},
schema: {
type: { type: 'string', index: true },
severity: { type: 'string', index: true },
pattern: { type: 'string' },
mitigation: { type: 'string' },
effectiveness: { type: 'number' },
source: { type: 'string', enum: ['aidefence', 'claude-flow', 'manual'] }
}
},
{
name: 'security_behaviors',
description: 'Agent behavioral patterns for anomaly detection',
vectorDimension: 384,
hnswConfig: {
m: 12,
efConstruction: 150,
efSearch: 50
},
schema: {
agentId: { type: 'string', index: true },
actionType: { type: 'string', index: true },
timestamp: { type: 'number', index: true },
lyapunovExponent: { type: 'number' },
attractorType: { type: 'string' }
}
},
{
name: 'security_mitigations',
description: 'Learned mitigation strategies from meta-learning',
vectorDimension: 384,
schema: {
threatType: { type: 'string', index: true },
strategy: { type: 'string' },
effectiveness: { type: 'number' },
rollbackAvailable: { type: 'boolean' },
recursionDepth: { type: 'number' } // strange-loop depth
}
}
];
// v3/@claude-flow/aidefence/src/infrastructure/metrics.ts
import { Registry, Counter, Histogram, Gauge } from 'prom-client';
export function registerAIDefenceMetrics(registry: Registry) {
// Threat detection metrics
new Counter({
name: 'aidefence_threats_detected_total',
help: 'Total threats detected by type',
labelNames: ['type', 'severity'],
registers: [registry]
});
new Histogram({
name: 'aidefence_detection_latency_ms',
help: 'Threat detection latency in milliseconds',
buckets: [1, 5, 10, 25, 50, 100],
registers: [registry]
});
new Histogram({
name: 'aidefence_analysis_latency_ms',
help: 'Behavioral analysis latency in milliseconds',
buckets: [10, 25, 50, 100, 250, 500],
registers: [registry]
});
// Behavioral analysis metrics
new Gauge({
name: 'aidefence_anomaly_score',
help: 'Current anomaly score by agent',
labelNames: ['agentId'],
registers: [registry]
});
// Meta-learning metrics
new Counter({
name: 'aidefence_mitigations_applied_total',
help: 'Total mitigations applied by strategy',
labelNames: ['strategy', 'success'],
registers: [registry]
});
new Gauge({
name: 'aidefence_meta_learning_depth',
help: 'Current strange-loop recursion depth',
registers: [registry]
});
}
v3/@claude-flow/aidefence/
├── package.json
├── src/
│ ├── index.ts # Public API exports
│ ├── domain/
│ │ ├── entities/
│ │ │ ├── threat.ts # Threat domain entity
│ │ │ ├── behavior-pattern.ts # Behavioral pattern entity
│ │ │ └── mitigation.ts # Mitigation strategy entity
│ │ ├── services/
│ │ │ ├── detection-service.ts
│ │ │ ├── analysis-service.ts
│ │ │ └── mitigation-service.ts
│ │ └── events/
│ │ ├── threat-detected.ts
│ │ └── anomaly-detected.ts
│ ├── application/
│ │ ├── commands/
│ │ │ ├── scan-input.ts
│ │ │ └── analyze-behavior.ts
│ │ └── queries/
│ │ ├── get-threat-patterns.ts
│ │ └── get-behavior-analysis.ts
│ └── infrastructure/
│ ├── aidefence-adapter.ts # Anti-corruption layer
│ ├── metrics.ts # Prometheus integration
│ └── agentdb-repository.ts # Shared storage
├── __tests__/
│ ├── unit/
│ ├── integration/
│ └── acceptance/
└── README.md
{
"name": "@claude-flow/aidefence",
"version": "3.0.0-alpha.1",
"dependencies": {
"aidefence": "^2.1.1",
"@claude-flow/security": "workspace:*",
"@claude-flow/memory": "workspace:*",
"@claude-flow/core": "workspace:*",
"agentdb": "^2.0.0-alpha.3"
},
"peerDependencies": {
"prom-client": "^15.1.0"
}
}
| Metric | Requirement | aidefence Baseline |
|---|---|---|
| Detection latency | <15ms p99 | ~8ms actual |
| Analysis latency | <150ms p99 | ~80ms actual |
| API throughput | >5,000 req/s | >12,000 req/s |
| Memory overhead | <50MB | ~30MB |
| Requirement | Validation Method |
|---|---|
| Prompt injection detection | Test suite with 100+ known injection patterns |
| No false negatives on critical threats | Adversarial testing with red team samples |
| PII detection accuracy >95% | Synthetic PII test dataset |
| Behavioral anomaly detection | Simulated attack scenarios |
| Requirement | Validation Method |
|---|---|
| AgentDB namespace sharing works | Integration tests with shared data |
| MCP tools registered correctly | MCP test client validation |
| CLI commands function | E2E CLI tests |
| Hooks fire correctly | Hook integration tests |
| Metrics exposed | Prometheus scrape test |
@claude-flow/aidefence packagesecurity defend commandsecurity behavior command