Back to Ruflo

Training Pipeline - Real Code Execution Only

v2/docs/development/training-pipeline-real-only.md

3.6.305.6 KB
Original Source

Training Pipeline - Real Code Execution Only

Overview

The Claude Flow Training Pipeline now exclusively uses real code execution. There is no simulation mode - all training runs actual npm tests on real code files to provide genuine learning and improvement.

What Changed

Before (v1 - Simulation Mode)

  • Used Math.random() to simulate test results
  • No actual code execution
  • Artificial scores that didn't reflect reality
  • Learning from random data

Now (v2 - Real Execution Only)

  • Creates actual JavaScript files with real code
  • Runs real npm install and npm test commands
  • Executes actual Jest tests
  • Learns from genuine test results
  • Shows real improvements in agent performance

How It Works

1. Task Generation

The pipeline creates real code files in .claude-flow/training/real-tasks/:

javascript
// Example: Email validation function
function validateEmail(email) {
  const regex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
  return regex.test(email);
}

2. Strategy Testing

Three strategies modify the code differently:

  • Conservative: Adds extra validation (more reliable, slower)
  • Balanced: Keeps original code (good balance)
  • Aggressive: Reduces validation (faster, riskier)

3. Real Execution

Each strategy variant is tested using:

bash
npm install  # Install Jest and dependencies
npm test     # Run actual tests
npm run lint # Check code quality

4. Learning from Results

The system learns from actual test results:

  • Test pass/fail rates
  • Real execution times
  • Actual error messages
  • Performance metrics

Usage

Run Training

bash
# Always runs with real code - no simulation option
./claude-flow train-pipeline run

# Options
./claude-flow train-pipeline run --complexity hard --iterations 5

Check Status

bash
./claude-flow train-pipeline status

# Output shows real metrics:
šŸ“Š Training Pipeline Status
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

šŸ¤– Strategy Profiles:
   conservative:
     Success Rate: 40.9%    # Real test pass rate
     Average Score: 40.25   # Based on actual results
     Execution Time: 1633ms # Real npm test time
     Real Executions: 4     # Number of real runs

Validate Performance

bash
./claude-flow train-pipeline validate

# Shows current real performance:
šŸ“Š Current Performance:
   Success Rate: 43.1%      # Actual success rate
   Avg Execution Time: 1567ms # Real execution time
   Average Score: 41.05      # Based on real tests

Real Results Example

From actual training runs:

Initial State (Iteration 1)

šŸ“Š Learning Results:
   conservative: Score 12.64, Success 0.0%, Time 1839ms
   balanced: Score 12.98, Success 0.0%, Time 1756ms
   aggressive: Score 13.24, Success 0.0%, Time 1691ms

After Fixes (Iteration 2)

šŸ“Š Learning Results:
   conservative: Score 42.56, Success 50.0%, Time 1860ms
   balanced: Score 42.57, Success 50.0%, Time 1858ms
   aggressive: Score 43.33, Success 50.0%, Time 1667ms

šŸ“ˆ Improvements:
   Success Rate: +14.3%
   Execution Time: -10.8%
   Score: +3.0%

Task Complexity Levels

Easy

  • Simple functions (email validation, string manipulation)
  • Basic tests with clear pass/fail
  • Quick execution (~2 seconds)

Medium

  • API endpoints with Express
  • Integration tests
  • Moderate execution (~3-4 seconds)

Hard

  • Complex algorithms (sorting, searching)
  • Performance-critical code
  • Comprehensive test suites (~5+ seconds)

Files Created

The training pipeline creates real project structures:

.claude-flow/training/real-tasks/
└── task-[timestamp]/
    └── [taskName]/
        ā”œā”€ā”€ index.js        # Real implementation
        ā”œā”€ā”€ index.test.js   # Real Jest tests
        └── package.json    # Real dependencies

Learning Mechanism

Exponential Moving Average

javascript
// Learning rate: 0.4 for real execution (higher than simulation)
newReliability = oldReliability * 0.6 + newScore * 0.4

Real Metrics Tracked

  • Success Rate: Actual test pass percentage
  • Execution Time: Real npm test duration
  • Score: Weighted combination of success and speed
  • Trend: Improvement or decline over time

Benefits of Real Execution

  1. Genuine Learning: Agents learn from actual test results
  2. Real Performance: Metrics reflect true execution times
  3. Accurate Predictions: Future predictions based on real data
  4. Practical Improvements: Optimizations that actually work
  5. No Artificial Bias: No random numbers affecting results

Migration from Simulation

If you have existing profiles from the simulation mode:

  1. The system will continue to use them but update with real data
  2. After a few real executions, the data will be fully based on reality
  3. Old simulation scores will be overwritten by real scores

Troubleshooting

Tests Failing

  • Check that Jest is installed: npm ls jest
  • Verify test syntax is correct
  • Ensure proper regex escaping in templates

Slow Execution

  • Normal for first run (npm install)
  • Subsequent runs are faster (cached dependencies)
  • Use --complexity easy for quicker iterations

No Improvement

  • Real improvement takes multiple iterations
  • Some randomness in test execution is normal
  • Focus on trends rather than single runs

Summary

The Training Pipeline now provides real machine learning based on actual code execution. No more simulations - every score, every metric, and every improvement is based on real npm test results. This ensures that agent improvements translate directly to better real-world performance.