v2/benchmark/docs/real-benchmark-quickstart.md
The Real Benchmark Engine executes actual claude-flow commands and captures comprehensive performance metrics, resource usage, and quality assessments. This guide helps you get started quickly.
claude-flow is installed and accessible:which claude-flow
# or
claude-flow --version
cd benchmark
pip install -r requirements.txt
pip install -e .
Run a basic benchmark with default settings:
python -m swarm_benchmark real "Create a hello world function"
Test a specific SPARC mode:
python -m swarm_benchmark real "Build a REST API" --sparc-mode coder
python -m swarm_benchmark real "Analyze code quality" --sparc-mode analyzer
python -m swarm_benchmark real "Design system architecture" --sparc-mode architect
Test different swarm strategies with coordination modes:
# Development strategy with hierarchical coordination
python -m swarm_benchmark real "Build a web app" \
--strategy development \
--mode hierarchical
# Research strategy with distributed coordination
python -m swarm_benchmark real "Research AI trends" \
--strategy research \
--mode distributed \
--parallel
# Analysis strategy with mesh coordination
python -m swarm_benchmark real "Analyze codebase" \
--strategy analysis \
--mode mesh \
--monitor
Enable parallel task execution for faster benchmarking:
python -m swarm_benchmark real "Create multiple components" \
--parallel \
--max-agents 4 \
--task-timeout 60
Enable detailed resource monitoring:
python -m swarm_benchmark real "Process large dataset" \
--monitor \
--output json sqlite \
--output-dir ./benchmark-results
Test all SPARC modes and swarm strategies:
# WARNING: This is resource-intensive and may take a long time!
python -m swarm_benchmark real "Build a complete application" \
--all-modes \
--parallel \
--timeout 120
| Option | Description | Default |
|---|---|---|
--strategy | Swarm strategy (auto, research, development, etc.) | auto |
--mode | Coordination mode (centralized, distributed, etc.) | centralized |
--sparc-mode | Specific SPARC mode to test | None |
--all-modes | Test all SPARC modes and strategies | False |
--max-agents | Maximum parallel agents | 5 |
--timeout | Overall timeout in minutes | 60 |
--task-timeout | Individual task timeout in seconds | 300 |
--parallel | Enable parallel execution | False |
--monitor | Enable resource monitoring | False |
--output | Output formats (json, sqlite) | json |
--output-dir | Output directory | ./reports |
--verbose | Enable verbose output | False |
#!/bin/bash
# compare_sparc_modes.sh
OBJECTIVE="Create a user authentication system"
for mode in coder architect reviewer tdd; do
echo "Testing SPARC mode: $mode"
python -m swarm_benchmark real "$OBJECTIVE" \
--sparc-mode $mode \
--output-dir ./sparc-comparison
done
#!/bin/bash
# analyze_strategies.sh
OBJECTIVE="Build a data processing pipeline"
for strategy in development research analysis optimization; do
for mode in centralized distributed hierarchical; do
echo "Testing $strategy with $mode coordination"
python -m swarm_benchmark real "$OBJECTIVE" \
--strategy $strategy \
--mode $mode \
--parallel \
--monitor \
--output json sqlite
done
done
#!/usr/bin/env python3
# profile_resources.py
import asyncio
from swarm_benchmark.core.real_benchmark_engine import RealBenchmarkEngine
from swarm_benchmark.core.models import BenchmarkConfig, StrategyType
async def profile_task():
config = BenchmarkConfig(
name="resource-profile",
monitoring=True,
parallel=True,
max_agents=3
)
engine = RealBenchmarkEngine(config)
result = await engine.run_benchmark("Analyze system performance")
# Extract resource metrics
if result['results']:
metrics = result['results'][0]['resource_usage']
print(f"Peak Memory: {metrics['peak_memory_mb']:.1f} MB")
print(f"Average CPU: {metrics['average_cpu_percent']:.1f}%")
asyncio.run(profile_task())
{
"benchmark_id": "uuid",
"status": "success",
"duration": 45.2,
"results": [{
"task_id": "uuid",
"status": "success",
"execution_time": 42.1,
"resource_usage": {
"cpu_percent": 35.2,
"memory_mb": 128.5,
"peak_memory_mb": 256.0
},
"quality_metrics": {
"accuracy": 0.9,
"completeness": 0.85,
"overall": 0.88
}
}]
}
Performance Metrics
Quality Assessment
Resource Usage
Start Small
Scale Gradually
Monitor Resources
Analyze Results
# Check installation
which claude-flow
# Add to PATH if needed
export PATH="$PATH:/path/to/claude-flow"
# Increase timeout
python -m swarm_benchmark real "Complex task" \
--task-timeout 600 \
--timeout 120
# Limit parallel execution
python -m swarm_benchmark real "Heavy task" \
--max-agents 2 \
--monitor
For advanced usage patterns and comprehensive examples, see:
Happy benchmarking! 🚀