Training Pipeline - Real Code Execution Only

Overview

The Claude Flow Training Pipeline now exclusively uses real code execution. There is no simulation mode - all training runs actual npm tests on real code files to provide genuine learning and improvement.

What Changed

Before (v1 - Simulation Mode)

Used Math.random() to simulate test results
No actual code execution
Artificial scores that didn't reflect reality
Learning from random data

Now (v2 - Real Execution Only)

Creates actual JavaScript files with real code
Runs real npm install and npm test commands
Executes actual Jest tests
Learns from genuine test results
Shows real improvements in agent performance

How It Works

1. Task Generation

The pipeline creates real code files in .claude-flow/training/real-tasks/:

javascript

// Example: Email validation function
function validateEmail(email) {
  const regex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
  return regex.test(email);
}

2. Strategy Testing

Three strategies modify the code differently:

Conservative: Adds extra validation (more reliable, slower)
Balanced: Keeps original code (good balance)
Aggressive: Reduces validation (faster, riskier)

3. Real Execution

Each strategy variant is tested using:

bash

npm install  # Install Jest and dependencies
npm test     # Run actual tests
npm run lint # Check code quality

4. Learning from Results

The system learns from actual test results:

Test pass/fail rates
Real execution times
Actual error messages
Performance metrics

Usage

Run Training

bash

# Always runs with real code - no simulation option
./claude-flow train-pipeline run

# Options
./claude-flow train-pipeline run --complexity hard --iterations 5

Check Status

bash

./claude-flow train-pipeline status

# Output shows real metrics:
📊 Training Pipeline Status
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🤖 Strategy Profiles:
   conservative:
     Success Rate: 40.9%    # Real test pass rate
     Average Score: 40.25   # Based on actual results
     Execution Time: 1633ms # Real npm test time
     Real Executions: 4     # Number of real runs

Validate Performance

bash

./claude-flow train-pipeline validate

# Shows current real performance:
📊 Current Performance:
   Success Rate: 43.1%      # Actual success rate
   Avg Execution Time: 1567ms # Real execution time
   Average Score: 41.05      # Based on real tests

Real Results Example

From actual training runs:

Initial State (Iteration 1)

📊 Learning Results:
   conservative: Score 12.64, Success 0.0%, Time 1839ms
   balanced: Score 12.98, Success 0.0%, Time 1756ms
   aggressive: Score 13.24, Success 0.0%, Time 1691ms

After Fixes (Iteration 2)

📊 Learning Results:
   conservative: Score 42.56, Success 50.0%, Time 1860ms
   balanced: Score 42.57, Success 50.0%, Time 1858ms
   aggressive: Score 43.33, Success 50.0%, Time 1667ms

📈 Improvements:
   Success Rate: +14.3%
   Execution Time: -10.8%
   Score: +3.0%

Task Complexity Levels

Easy

Simple functions (email validation, string manipulation)
Basic tests with clear pass/fail
Quick execution (~2 seconds)

Medium

API endpoints with Express
Integration tests
Moderate execution (~3-4 seconds)

Hard

Complex algorithms (sorting, searching)
Performance-critical code
Comprehensive test suites (~5+ seconds)

Files Created

The training pipeline creates real project structures:

.claude-flow/training/real-tasks/
└── task-[timestamp]/
    └── [taskName]/
        ├── index.js        # Real implementation
        ├── index.test.js   # Real Jest tests
        └── package.json    # Real dependencies

Learning Mechanism

Exponential Moving Average

javascript

// Learning rate: 0.4 for real execution (higher than simulation)
newReliability = oldReliability * 0.6 + newScore * 0.4

Real Metrics Tracked

Success Rate: Actual test pass percentage
Execution Time: Real npm test duration
Score: Weighted combination of success and speed
Trend: Improvement or decline over time

Benefits of Real Execution

Genuine Learning: Agents learn from actual test results
Real Performance: Metrics reflect true execution times
Accurate Predictions: Future predictions based on real data
Practical Improvements: Optimizations that actually work
No Artificial Bias: No random numbers affecting results

Migration from Simulation

If you have existing profiles from the simulation mode:

The system will continue to use them but update with real data
After a few real executions, the data will be fully based on reality
Old simulation scores will be overwritten by real scores

Troubleshooting

Tests Failing

Check that Jest is installed: npm ls jest
Verify test syntax is correct
Ensure proper regex escaping in templates

Slow Execution

Normal for first run (npm install)
Subsequent runs are faster (cached dependencies)
Use --complexity easy for quicker iterations

No Improvement

Real improvement takes multiple iterations
Some randomness in test execution is normal
Focus on trends rather than single runs

Summary

The Training Pipeline now provides real machine learning based on actual code execution. No more simulations - every score, every metric, and every improvement is based on real npm test results. This ensures that agent improvements translate directly to better real-world performance.