Back to Ruflo

ADR-036: Test Intelligence Plugin

v3/implementation/adrs/ADR-036-test-intelligence-plugin.md

3.6.3013.6 KB
Original Source

ADR-036: Test Intelligence Plugin

Status: Proposed Date: 2026-01-24 Category: Advanced Development Tool Author: Plugin Architecture Team Version: 1.0.0 Deciders: Plugin Architecture Team, QA Engineering Team Supersedes: None

Context

Testing is critical for software quality, but teams struggle with test suite optimization, flaky test identification, and efficient test selection for CI/CD pipelines. Traditional approaches run all tests on every change, wasting resources and slowing down development. AI-powered test intelligence can dramatically improve test efficiency while maintaining confidence in code quality.

Decision

Create a Test Intelligence Plugin that leverages RuVector WASM packages for predictive test selection, flaky test detection, test gap analysis, and automated test generation suggestions.

Plugin Name

@claude-flow/plugin-test-intelligence

Description

A comprehensive test intelligence plugin combining reinforcement learning for optimal test selection with graph neural networks for code-to-test mapping. The plugin enables predictive test selection (run only tests likely to fail), flaky test detection, mutation testing optimization, and test coverage gap identification while integrating seamlessly with popular testing frameworks.

Key WASM Packages

PackagePurpose
micro-hnsw-wasmFast code-to-test similarity matching
ruvector-learning-wasmRL-based test selection and prioritization
ruvector-gnn-wasmCode-test dependency graphs for impact analysis
ruvector-sparse-inference-wasmEfficient flaky test pattern detection
sonaContinuous learning from test execution history

MCP Tools

1. test/select-predictive

Select tests most likely to fail based on code changes.

typescript
{
  name: 'test/select-predictive',
  description: 'Predictively select tests based on code changes using RL',
  inputSchema: {
    type: 'object',
    properties: {
      changes: {
        type: 'object',
        properties: {
          files: { type: 'array', items: { type: 'string' } },
          gitDiff: { type: 'string' },
          gitRef: { type: 'string' }
        }
      },
      strategy: {
        type: 'string',
        enum: ['fast_feedback', 'high_coverage', 'risk_based', 'balanced'],
        default: 'balanced'
      },
      budget: {
        type: 'object',
        properties: {
          maxTests: { type: 'number' },
          maxDuration: { type: 'number', description: 'Max seconds' },
          confidence: { type: 'number', default: 0.95 }
        }
      }
    },
    required: ['changes']
  }
}

2. test/flaky-detect

Identify and analyze flaky tests.

typescript
{
  name: 'test/flaky-detect',
  description: 'Detect flaky tests using pattern analysis',
  inputSchema: {
    type: 'object',
    properties: {
      scope: {
        type: 'object',
        properties: {
          testSuite: { type: 'string' },
          historyDepth: { type: 'number', default: 100, description: 'Runs to analyze' }
        }
      },
      analysis: {
        type: 'array',
        items: {
          type: 'string',
          enum: ['intermittent_failures', 'timing_sensitive', 'order_dependent', 'resource_contention', 'environment_sensitive']
        }
      },
      threshold: { type: 'number', default: 0.1, description: 'Flakiness threshold' }
    }
  }
}

3. test/coverage-gaps

Identify test coverage gaps.

typescript
{
  name: 'test/coverage-gaps',
  description: 'Identify test coverage gaps using code-test graph analysis',
  inputSchema: {
    type: 'object',
    properties: {
      targetPaths: { type: 'array', items: { type: 'string' } },
      coverageType: {
        type: 'string',
        enum: ['line', 'branch', 'function', 'semantic'],
        default: 'semantic'
      },
      prioritization: {
        type: 'string',
        enum: ['risk', 'complexity', 'churn', 'recency'],
        default: 'risk'
      },
      minCoverage: { type: 'number', default: 80 }
    }
  }
}

4. test/mutation-optimize

Optimize mutation testing for efficiency.

typescript
{
  name: 'test/mutation-optimize',
  description: 'Optimize mutation testing using selective mutation',
  inputSchema: {
    type: 'object',
    properties: {
      targetPath: { type: 'string' },
      budget: { type: 'number', description: 'Max mutations to run' },
      strategy: {
        type: 'string',
        enum: ['random', 'coverage_guided', 'ml_guided', 'historical'],
        default: 'ml_guided'
      },
      mutationTypes: {
        type: 'array',
        items: {
          type: 'string',
          enum: ['arithmetic', 'logical', 'boundary', 'null_check', 'return_value']
        }
      }
    },
    required: ['targetPath']
  }
}

5. test/generate-suggest

Suggest test cases for uncovered code.

typescript
{
  name: 'test/generate-suggest',
  description: 'Suggest test cases for uncovered code paths',
  inputSchema: {
    type: 'object',
    properties: {
      targetFunction: { type: 'string', description: 'Function to generate tests for' },
      testStyle: {
        type: 'string',
        enum: ['unit', 'integration', 'property_based', 'snapshot'],
        default: 'unit'
      },
      framework: {
        type: 'string',
        enum: ['jest', 'vitest', 'pytest', 'junit', 'mocha'],
        default: 'vitest'
      },
      edgeCases: { type: 'boolean', default: true },
      mockStrategy: { type: 'string', enum: ['minimal', 'full', 'none'] }
    },
    required: ['targetFunction']
  }
}

Use Cases

  1. CI Optimization: Reduce CI time by 60-80% with predictive test selection
  2. Flaky Test Management: Identify and quarantine unreliable tests
  3. Coverage Improvement: Targeted test generation for uncovered code
  4. Mutation Testing: Efficient mutation testing within time budgets
  5. Test Suite Health: Monitor and improve overall test suite quality

Architecture

+------------------+     +----------------------+     +------------------+
| Test Execution   |---->|  Test Intelligence   |---->| Test Selection   |
| (Jest/Vitest/etc)|     |  (Learning Engine)   |     | (Optimized)      |
+------------------+     +----------------------+     +------------------+
                                   |
                         +---------+---------+
                         |         |         |
                    +----+---+ +---+----+ +--+-----+
                    |   RL   | |  GNN   | | Sparse |
                    |Select  | |Mapping | |Flaky   |
                    +--------+ +--------+ +--------+
                                   |
                              +----+----+
                              |  SONA   |
                              | Learn   |
                              +---------+

Framework Support

FrameworkLanguageSupport Level
JestJavaScript/TypeScriptFull
VitestJavaScript/TypeScriptFull
pytestPythonFull
JUnitJavaPartial
MochaJavaScriptPartial
RSpecRubyBasic

Performance Targets

MetricTargetBaseline (Traditional)Improvement
Test selection<1s for 10K tests~10min (full suite)600x
Flaky detection<5s for 1000 test runs~1hr (manual analysis)720x
Coverage gap analysis<10s for 100K LOC~30min (coverage tools)180x
Mutation optimization80% mutation score in 20% time80% in 100% time5x
CI time reduction60-80%N/A (run all tests)Novel

Security Considerations

Input Validation (CRITICAL)

All MCP tool inputs MUST be validated using Zod schemas:

typescript
// test/select-predictive input validation
const SelectPredictiveSchema = z.object({
  changes: z.object({
    files: z.array(z.string().max(500)).max(1000).optional(),
    gitDiff: z.string().max(1_000_000).optional(), // 1MB max diff
    gitRef: z.string().max(100).optional()
  }),
  strategy: z.enum(['fast_feedback', 'high_coverage', 'risk_based', 'balanced']).default('balanced'),
  budget: z.object({
    maxTests: z.number().int().min(1).max(100000).optional(),
    maxDuration: z.number().min(1).max(86400).optional(), // Max 24 hours
    confidence: z.number().min(0.5).max(1.0).default(0.95)
  }).optional()
});

// test/flaky-detect input validation
const FlakyDetectSchema = z.object({
  scope: z.object({
    testSuite: z.string().max(500).optional(),
    historyDepth: z.number().int().min(10).max(10000).default(100)
  }).optional(),
  analysis: z.array(z.enum([
    'intermittent_failures', 'timing_sensitive', 'order_dependent',
    'resource_contention', 'environment_sensitive'
  ])).optional(),
  threshold: z.number().min(0.01).max(0.5).default(0.1)
});

// test/generate-suggest input validation
const GenerateSuggestSchema = z.object({
  targetFunction: z.string().max(500),
  testStyle: z.enum(['unit', 'integration', 'property_based', 'snapshot']).default('unit'),
  framework: z.enum(['jest', 'vitest', 'pytest', 'junit', 'mocha']).default('vitest'),
  edgeCases: z.boolean().default(true),
  mockStrategy: z.enum(['minimal', 'full', 'none']).optional()
});

WASM Security Constraints

ConstraintValueRationale
Memory Limit512MB maxSufficient for test analysis
CPU Time Limit60 seconds per operationPrevent runaway analysis
No Test ExecutionAnalysis only, no actual test runsPrevent arbitrary code execution
No Network AccessEnforced by WASM sandboxPrevent data exfiltration
Sandboxed HistoryTest history isolated per projectPrevent cross-project leakage

Test Execution Safety (CRITICAL)

typescript
// CRITICAL: Plugin MUST NOT execute tests directly
// Test execution happens through the user's test framework, not WASM

// BAD - dangerous, could execute arbitrary code
await wasmInstance.runTests(testFiles);
exec(`npm test ${userSelectedTests}`);

// GOOD - return test selection only, user runs tests
const selectedTests = await wasmInstance.selectTests(changes);
return { testsToRun: selectedTests }; // User executes via their CI/CD

Identified Security Risks

Risk IDSeverityDescriptionMitigation
TEST-SEC-001HIGHArbitrary code execution via test generationNever auto-execute generated tests
TEST-SEC-002HIGHCommand injection in test selectionNo shell execution, list outputs only
TEST-SEC-003MEDIUMTest history data leakagePer-project isolation, access controls
TEST-SEC-004MEDIUMDoS via massive test suitesInput size limits, pagination
TEST-SEC-005LOWTiming attacks on test selectionConstant-time selection algorithms

Rate Limiting

typescript
const TestRateLimits = {
  'test/select-predictive': { requestsPerMinute: 60, maxConcurrent: 5 },
  'test/flaky-detect': { requestsPerMinute: 10, maxConcurrent: 2 },
  'test/coverage-gaps': { requestsPerMinute: 30, maxConcurrent: 3 },
  'test/mutation-optimize': { requestsPerMinute: 5, maxConcurrent: 1 },  // Expensive
  'test/generate-suggest': { requestsPerMinute: 30, maxConcurrent: 3 }
};

Risk Assessment

RiskLikelihoodImpactMitigation
Missing failing testsLowHighConfidence thresholds, periodic full runs
Learning period inaccuracyHighMediumFallback to full runs, gradual adoption
Framework incompatibilityMediumMediumTier-based support, custom adapter API
Test history data lossLowMediumPersistent storage, backup strategies

Learning Pipeline

Execution History --> SONA Learning --> RL Policy
      |                    |                |
      v                    v                v
[fail rates]       [pattern bank]    [selection model]
[timings]          [failure modes]   [prioritization]
[coverage]         [correlations]    [budget allocation]

Implementation Notes

Phase 1: Core Selection

  • Test execution history ingestion
  • Code-test mapping via GNN
  • Basic predictive selection

Phase 2: Advanced Analysis

  • Flaky test detection patterns
  • Coverage gap identification
  • RL-based optimization

Phase 3: Generation

  • Test case suggestions
  • Mutation testing optimization
  • Continuous learning from feedback

Dependencies

json
{
  "dependencies": {
    "micro-hnsw-wasm": "^0.2.0",
    "ruvector-learning-wasm": "^0.1.0",
    "ruvector-gnn-wasm": "^0.1.0",
    "ruvector-sparse-inference-wasm": "^0.1.0",
    "sona": "^0.1.0",
    "istanbul-lib-coverage": "^3.2.0"
  }
}

Consequences

Positive

  • 60-80% CI time reduction with maintained confidence
  • Data-driven flaky test management
  • Improved test suite ROI

Negative

  • Requires test execution history for training
  • Initial learning period before optimal performance
  • May miss edge cases in early stages

Neutral

  • Can fallback to full test runs when uncertain
ADRRelationship
ADR-004: Plugin ArchitectureFoundation - Defines plugin structure
ADR-017: RuVector IntegrationDependency - Provides WASM packages
ADR-035: Code IntelligenceRelated - Code-test mapping
ADR-037: Performance OptimizerRelated - Test performance analysis
ADR-040: Quantum OptimizerRelated - Test selection optimization

References


Last Updated: 2026-01-24