ADR-036: Test Intelligence Plugin

Status: Proposed Date: 2026-01-24 Category: Advanced Development Tool Author: Plugin Architecture Team Version: 1.0.0 Deciders: Plugin Architecture Team, QA Engineering Team Supersedes: None

Context

Testing is critical for software quality, but teams struggle with test suite optimization, flaky test identification, and efficient test selection for CI/CD pipelines. Traditional approaches run all tests on every change, wasting resources and slowing down development. AI-powered test intelligence can dramatically improve test efficiency while maintaining confidence in code quality.

Decision

Create a Test Intelligence Plugin that leverages RuVector WASM packages for predictive test selection, flaky test detection, test gap analysis, and automated test generation suggestions.

Plugin Name

@claude-flow/plugin-test-intelligence

Description

A comprehensive test intelligence plugin combining reinforcement learning for optimal test selection with graph neural networks for code-to-test mapping. The plugin enables predictive test selection (run only tests likely to fail), flaky test detection, mutation testing optimization, and test coverage gap identification while integrating seamlessly with popular testing frameworks.

Key WASM Packages

Package	Purpose
`micro-hnsw-wasm`	Fast code-to-test similarity matching
`ruvector-learning-wasm`	RL-based test selection and prioritization
`ruvector-gnn-wasm`	Code-test dependency graphs for impact analysis
`ruvector-sparse-inference-wasm`	Efficient flaky test pattern detection
`sona`	Continuous learning from test execution history

MCP Tools

1. `test/select-predictive`

Select tests most likely to fail based on code changes.

typescript

{
  name: 'test/select-predictive',
  description: 'Predictively select tests based on code changes using RL',
  inputSchema: {
    type: 'object',
    properties: {
      changes: {
        type: 'object',
        properties: {
          files: { type: 'array', items: { type: 'string' } },
          gitDiff: { type: 'string' },
          gitRef: { type: 'string' }
        }
      },
      strategy: {
        type: 'string',
        enum: ['fast_feedback', 'high_coverage', 'risk_based', 'balanced'],
        default: 'balanced'
      },
      budget: {
        type: 'object',
        properties: {
          maxTests: { type: 'number' },
          maxDuration: { type: 'number', description: 'Max seconds' },
          confidence: { type: 'number', default: 0.95 }
        }
      }
    },
    required: ['changes']
  }
}

2. `test/flaky-detect`

Identify and analyze flaky tests.

typescript

{
  name: 'test/flaky-detect',
  description: 'Detect flaky tests using pattern analysis',
  inputSchema: {
    type: 'object',
    properties: {
      scope: {
        type: 'object',
        properties: {
          testSuite: { type: 'string' },
          historyDepth: { type: 'number', default: 100, description: 'Runs to analyze' }
        }
      },
      analysis: {
        type: 'array',
        items: {
          type: 'string',
          enum: ['intermittent_failures', 'timing_sensitive', 'order_dependent', 'resource_contention', 'environment_sensitive']
        }
      },
      threshold: { type: 'number', default: 0.1, description: 'Flakiness threshold' }
    }
  }
}

3. `test/coverage-gaps`

Identify test coverage gaps.

typescript

{
  name: 'test/coverage-gaps',
  description: 'Identify test coverage gaps using code-test graph analysis',
  inputSchema: {
    type: 'object',
    properties: {
      targetPaths: { type: 'array', items: { type: 'string' } },
      coverageType: {
        type: 'string',
        enum: ['line', 'branch', 'function', 'semantic'],
        default: 'semantic'
      },
      prioritization: {
        type: 'string',
        enum: ['risk', 'complexity', 'churn', 'recency'],
        default: 'risk'
      },
      minCoverage: { type: 'number', default: 80 }
    }
  }
}

4. `test/mutation-optimize`

Optimize mutation testing for efficiency.

typescript

{
  name: 'test/mutation-optimize',
  description: 'Optimize mutation testing using selective mutation',
  inputSchema: {
    type: 'object',
    properties: {
      targetPath: { type: 'string' },
      budget: { type: 'number', description: 'Max mutations to run' },
      strategy: {
        type: 'string',
        enum: ['random', 'coverage_guided', 'ml_guided', 'historical'],
        default: 'ml_guided'
      },
      mutationTypes: {
        type: 'array',
        items: {
          type: 'string',
          enum: ['arithmetic', 'logical', 'boundary', 'null_check', 'return_value']
        }
      }
    },
    required: ['targetPath']
  }
}

5. `test/generate-suggest`

Suggest test cases for uncovered code.

typescript

{
  name: 'test/generate-suggest',
  description: 'Suggest test cases for uncovered code paths',
  inputSchema: {
    type: 'object',
    properties: {
      targetFunction: { type: 'string', description: 'Function to generate tests for' },
      testStyle: {
        type: 'string',
        enum: ['unit', 'integration', 'property_based', 'snapshot'],
        default: 'unit'
      },
      framework: {
        type: 'string',
        enum: ['jest', 'vitest', 'pytest', 'junit', 'mocha'],
        default: 'vitest'
      },
      edgeCases: { type: 'boolean', default: true },
      mockStrategy: { type: 'string', enum: ['minimal', 'full', 'none'] }
    },
    required: ['targetFunction']
  }
}

Use Cases

CI Optimization: Reduce CI time by 60-80% with predictive test selection
Flaky Test Management: Identify and quarantine unreliable tests
Coverage Improvement: Targeted test generation for uncovered code
Mutation Testing: Efficient mutation testing within time budgets
Test Suite Health: Monitor and improve overall test suite quality

Architecture

+------------------+     +----------------------+     +------------------+
| Test Execution   |---->|  Test Intelligence   |---->| Test Selection   |
| (Jest/Vitest/etc)|     |  (Learning Engine)   |     | (Optimized)      |
+------------------+     +----------------------+     +------------------+
                                   |
                         +---------+---------+
                         |         |         |
                    +----+---+ +---+----+ +--+-----+
                    |   RL   | |  GNN   | | Sparse |
                    |Select  | |Mapping | |Flaky   |
                    +--------+ +--------+ +--------+
                                   |
                              +----+----+
                              |  SONA   |
                              | Learn   |
                              +---------+

Framework Support

Framework	Language	Support Level
Jest	JavaScript/TypeScript	Full
Vitest	JavaScript/TypeScript	Full
pytest	Python	Full
JUnit	Java	Partial
Mocha	JavaScript	Partial
RSpec	Ruby	Basic

Performance Targets

Metric	Target	Baseline (Traditional)	Improvement
Test selection	<1s for 10K tests	~10min (full suite)	600x
Flaky detection	<5s for 1000 test runs	~1hr (manual analysis)	720x
Coverage gap analysis	<10s for 100K LOC	~30min (coverage tools)	180x
Mutation optimization	80% mutation score in 20% time	80% in 100% time	5x
CI time reduction	60-80%	N/A (run all tests)	Novel

Security Considerations

Input Validation (CRITICAL)

All MCP tool inputs MUST be validated using Zod schemas:

typescript

// test/select-predictive input validation
const SelectPredictiveSchema = z.object({
  changes: z.object({
    files: z.array(z.string().max(500)).max(1000).optional(),
    gitDiff: z.string().max(1_000_000).optional(), // 1MB max diff
    gitRef: z.string().max(100).optional()
  }),
  strategy: z.enum(['fast_feedback', 'high_coverage', 'risk_based', 'balanced']).default('balanced'),
  budget: z.object({
    maxTests: z.number().int().min(1).max(100000).optional(),
    maxDuration: z.number().min(1).max(86400).optional(), // Max 24 hours
    confidence: z.number().min(0.5).max(1.0).default(0.95)
  }).optional()
});

// test/flaky-detect input validation
const FlakyDetectSchema = z.object({
  scope: z.object({
    testSuite: z.string().max(500).optional(),
    historyDepth: z.number().int().min(10).max(10000).default(100)
  }).optional(),
  analysis: z.array(z.enum([
    'intermittent_failures', 'timing_sensitive', 'order_dependent',
    'resource_contention', 'environment_sensitive'
  ])).optional(),
  threshold: z.number().min(0.01).max(0.5).default(0.1)
});

// test/generate-suggest input validation
const GenerateSuggestSchema = z.object({
  targetFunction: z.string().max(500),
  testStyle: z.enum(['unit', 'integration', 'property_based', 'snapshot']).default('unit'),
  framework: z.enum(['jest', 'vitest', 'pytest', 'junit', 'mocha']).default('vitest'),
  edgeCases: z.boolean().default(true),
  mockStrategy: z.enum(['minimal', 'full', 'none']).optional()
});

WASM Security Constraints

Constraint	Value	Rationale
Memory Limit	512MB max	Sufficient for test analysis
CPU Time Limit	60 seconds per operation	Prevent runaway analysis
No Test Execution	Analysis only, no actual test runs	Prevent arbitrary code execution
No Network Access	Enforced by WASM sandbox	Prevent data exfiltration
Sandboxed History	Test history isolated per project	Prevent cross-project leakage

Test Execution Safety (CRITICAL)

typescript

// CRITICAL: Plugin MUST NOT execute tests directly
// Test execution happens through the user's test framework, not WASM

// BAD - dangerous, could execute arbitrary code
await wasmInstance.runTests(testFiles);
exec(`npm test ${userSelectedTests}`);

// GOOD - return test selection only, user runs tests
const selectedTests = await wasmInstance.selectTests(changes);
return { testsToRun: selectedTests }; // User executes via their CI/CD

Identified Security Risks

Risk ID	Severity	Description	Mitigation
TEST-SEC-001	HIGH	Arbitrary code execution via test generation	Never auto-execute generated tests
TEST-SEC-002	HIGH	Command injection in test selection	No shell execution, list outputs only
TEST-SEC-003	MEDIUM	Test history data leakage	Per-project isolation, access controls
TEST-SEC-004	MEDIUM	DoS via massive test suites	Input size limits, pagination
TEST-SEC-005	LOW	Timing attacks on test selection	Constant-time selection algorithms

Rate Limiting

typescript

const TestRateLimits = {
  'test/select-predictive': { requestsPerMinute: 60, maxConcurrent: 5 },
  'test/flaky-detect': { requestsPerMinute: 10, maxConcurrent: 2 },
  'test/coverage-gaps': { requestsPerMinute: 30, maxConcurrent: 3 },
  'test/mutation-optimize': { requestsPerMinute: 5, maxConcurrent: 1 },  // Expensive
  'test/generate-suggest': { requestsPerMinute: 30, maxConcurrent: 3 }
};

Risk Assessment

Risk	Likelihood	Impact	Mitigation
Missing failing tests	Low	High	Confidence thresholds, periodic full runs
Learning period inaccuracy	High	Medium	Fallback to full runs, gradual adoption
Framework incompatibility	Medium	Medium	Tier-based support, custom adapter API
Test history data loss	Low	Medium	Persistent storage, backup strategies

Learning Pipeline

Execution History --> SONA Learning --> RL Policy
      |                    |                |
      v                    v                v
[fail rates]       [pattern bank]    [selection model]
[timings]          [failure modes]   [prioritization]
[coverage]         [correlations]    [budget allocation]

Implementation Notes

Phase 1: Core Selection

Test execution history ingestion
Code-test mapping via GNN
Basic predictive selection

Phase 2: Advanced Analysis

Flaky test detection patterns
Coverage gap identification
RL-based optimization

Phase 3: Generation

Test case suggestions
Mutation testing optimization
Continuous learning from feedback

Dependencies

json

{
  "dependencies": {
    "micro-hnsw-wasm": "^0.2.0",
    "ruvector-learning-wasm": "^0.1.0",
    "ruvector-gnn-wasm": "^0.1.0",
    "ruvector-sparse-inference-wasm": "^0.1.0",
    "sona": "^0.1.0",
    "istanbul-lib-coverage": "^3.2.0"
  }
}

Consequences

Positive

60-80% CI time reduction with maintained confidence
Data-driven flaky test management
Improved test suite ROI

Negative

Requires test execution history for training
Initial learning period before optimal performance
May miss edge cases in early stages

Neutral

Can fallback to full test runs when uncertain

ADR	Relationship
ADR-004: Plugin Architecture	Foundation - Defines plugin structure
ADR-017: RuVector Integration	Dependency - Provides WASM packages
ADR-035: Code Intelligence	Related - Code-test mapping
ADR-037: Performance Optimizer	Related - Test performance analysis
ADR-040: Quantum Optimizer	Related - Test selection optimization

References

Google Test Selection: https://testing.googleblog.com/2019/11/debugging-test-selection.html
Predictive Test Selection Research: https://arxiv.org/abs/2108.06123
ADR-017: RuVector Integration
ADR-004: Plugin Architecture

Last Updated: 2026-01-24

ADR-036: Test Intelligence Plugin

ADR-036: Test Intelligence Plugin

Context

Decision

Plugin Name

Description

Key WASM Packages

MCP Tools

1. test/select-predictive

2. test/flaky-detect

3. test/coverage-gaps

4. test/mutation-optimize

5. test/generate-suggest

Use Cases

Architecture

Framework Support

Performance Targets

Security Considerations

Input Validation (CRITICAL)

WASM Security Constraints

Test Execution Safety (CRITICAL)

Identified Security Risks

Rate Limiting

Risk Assessment

Learning Pipeline

Implementation Notes

Phase 1: Core Selection

Phase 2: Advanced Analysis

Phase 3: Generation

Dependencies

Consequences

Positive

Negative

Neutral

Related ADRs

References

1. `test/select-predictive`

2. `test/flaky-detect`

3. `test/coverage-gaps`

4. `test/mutation-optimize`

5. `test/generate-suggest`