v2/benchmark/docs/analysis.md
Learn how to analyze and interpret benchmark results to optimize your swarm performance.
Each benchmark produces a comprehensive result structure:
{
"benchmark_id": "uuid",
"name": "benchmark-name",
"status": "completed",
"duration": 0.25,
"config": { ... },
"tasks": [ ... ],
"results": [ ... ],
"metrics": {
"performance_metrics": { ... },
"quality_metrics": { ... },
"resource_usage": { ... },
"coordination_metrics": { ... }
}
}
# Analyze a specific benchmark
swarm-benchmark analyze <benchmark-id>
# Example output:
# Performance Summary:
# - Execution Time: 0.25s
# - Success Rate: 95%
# - Average Quality: 0.87
# - Resource Efficiency: 0.82
# Comprehensive analysis
swarm-benchmark analyze <benchmark-id> --detailed
# Specific analysis types
swarm-benchmark analyze <benchmark-id> --type performance
swarm-benchmark analyze <benchmark-id> --type quality
swarm-benchmark analyze <benchmark-id> --type resource
swarm-benchmark analyze <benchmark-id> --type coordination
# Compare multiple benchmarks
swarm-benchmark compare id1 id2 id3
# Compare specific metrics
swarm-benchmark compare id1 id2 --metrics execution_time,quality_score
# Visual comparison
swarm-benchmark compare id1 id2 --format chart --export comparison.png
Understand where time is spent:
# Time distribution analysis
total_time = benchmark.duration
execution_time = sum(r.execution_time for r in results)
coordination_time = sum(r.coordination_overhead for r in results)
queue_time = sum(r.queue_time for r in results)
print(f"Execution: {execution_time/total_time*100:.1f}%")
print(f"Coordination: {coordination_time/total_time*100:.1f}%")
print(f"Queue: {queue_time/total_time*100:.1f}%")
# Find performance bottlenecks
swarm-benchmark analyze <id> --bottlenecks
# Output:
# Performance Bottlenecks:
# 1. High coordination overhead (18% of total time)
# - Consider switching from mesh to hierarchical mode
# 2. Long queue times for analysis tasks
# - Increase agent pool size or use parallel execution
# 3. Memory peaks during data processing
# - Enable streaming or batch processing
Track performance over time:
# Analyze performance trends
swarm-benchmark analyze --trend --strategy development --days 7
# Generate trend report
swarm-benchmark report --type trends --period weekly
Understanding quality metrics:
Overall Quality = (
accuracy_score * 0.35 +
completeness_score * 0.30 +
consistency_score * 0.20 +
relevance_score * 0.15
)
# Analyze quality issues
swarm-benchmark analyze <id> --quality-breakdown
# Example output:
# Quality Analysis:
# - Accuracy: 0.92 ✅
# - Completeness: 0.78 ⚠️ (Missing test cases)
# - Consistency: 0.85 ✓
# - Relevance: 0.90 ✅
#
# Recommendations:
# 1. Increase max_retries for better completeness
# 2. Enable review mode for consistency
# 3. Use more specific objectives for relevance
# Compare quality across strategies
swarm-benchmark analyze --compare-quality \
--strategies research,development,analysis
# Quality by coordination mode
swarm-benchmark analyze --quality-by-mode \
--task-type "API development"
# Detailed resource analysis
swarm-benchmark analyze <id> --resource-details
# Resource usage over time
swarm-benchmark analyze <id> --resource-timeline --export resources.csv
# Calculate resource efficiency
resource_efficiency = (
(tasks_completed / total_tasks) *
(1 - (avg_cpu_usage / 100)) *
(1 - (avg_memory_usage / memory_limit))
)
# Get resource optimization suggestions
swarm-benchmark analyze <id> --optimize-resources
# Example output:
# Resource Optimization Suggestions:
# 1. CPU Usage: 85% average (high)
# - Consider increasing task_timeout
# - Use distributed mode to spread load
# 2. Memory Usage: 450MB/1024MB
# - Optimal, no changes needed
# 3. Network: 15MB transferred
# - Enable compression for large data
Analyze how well agents work together:
# Coordination analysis
swarm-benchmark analyze <id> --coordination-metrics
# Output:
# Coordination Analysis:
# - Mode: hierarchical
# - Agents: 8
# - Coordination Overhead: 12%
# - Communication Latency: 45ms avg
# - Task Distribution: balanced
# - Agent Utilization: 78%
# Compare coordination modes for task type
swarm-benchmark analyze --mode-effectiveness \
--task-pattern "Build*" \
--min-samples 10
# Best modes by task type
swarm-benchmark report --coordination-recommendations
# Python script for statistical analysis
import json
import numpy as np
from scipy import stats
# Load benchmark results
with open('benchmark_results.json') as f:
data = json.load(f)
# Calculate statistics
execution_times = [r['execution_time'] for r in data['results']]
mean_time = np.mean(execution_times)
std_time = np.std(execution_times)
ci_95 = stats.t.interval(0.95, len(execution_times)-1,
mean_time, std_time/np.sqrt(len(execution_times)))
print(f"Mean execution time: {mean_time:.3f}s")
print(f"95% CI: [{ci_95[0]:.3f}, {ci_95[1]:.3f}]")
Identify patterns in benchmark results:
# Find patterns in failures
swarm-benchmark analyze --failure-patterns --days 30
# Success patterns by configuration
swarm-benchmark analyze --success-patterns \
--group-by strategy,mode
# Predict execution time for configuration
swarm-benchmark predict \
--strategy development \
--mode hierarchical \
--agents 8 \
--task-complexity high
# Predicted: 0.35s (±0.05s)
# Based on 50 similar benchmarks
# Performance dashboard
swarm-benchmark report --type dashboard \
--period monthly \
--export dashboard.html
# Strategy comparison chart
swarm-benchmark report --type comparison \
--strategies all \
--metrics execution_time,quality_score \
--export strategy_comparison.png
# custom_analysis.py
import matplotlib.pyplot as plt
import json
# Load benchmark data
def analyze_benchmark(benchmark_id):
with open(f'reports/{benchmark_id}.json') as f:
data = json.load(f)
# Extract metrics
metrics = data['metrics']
# Create visualization
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# Performance pie chart
axes[0, 0].pie(
[metrics['execution_time'], metrics['coordination_overhead'], metrics['queue_time']],
labels=['Execution', 'Coordination', 'Queue'],
autopct='%1.1f%%'
)
axes[0, 0].set_title('Time Distribution')
# Quality scores bar chart
quality = data['quality_metrics']
axes[0, 1].bar(quality.keys(), quality.values())
axes[0, 1].set_title('Quality Scores')
# Resource usage over time
# ... (additional visualizations)
plt.tight_layout()
plt.savefig(f'analysis_{benchmark_id}.png')
# Daily analysis script
#!/bin/bash
DATE=$(date +%Y%m%d)
swarm-benchmark analyze --since yesterday \
--report daily \
--export "reports/daily_$DATE.html"
Always compare against baselines:
# Set baseline
swarm-benchmark baseline set <benchmark-id>
# Compare against baseline
swarm-benchmark analyze <new-id> --compare-baseline
Consider multiple factors:
# Comprehensive analysis
swarm-benchmark analyze <id> \
--dimensions performance,quality,resource,coordination \
--export comprehensive_analysis.json
Focus on actionable recommendations:
# Get specific recommendations
swarm-benchmark analyze <id> --recommendations
# Example output:
# Top 3 Recommendations:
# 1. Switch to distributed mode (est. 25% faster)
# 2. Increase quality threshold to 0.9 (improve accuracy)
# 3. Enable parallel execution (reduce time by 40%)
Baseline Measurement
swarm-benchmark run "Task" --name baseline
swarm-benchmark baseline set <id>
Identify Issues
swarm-benchmark analyze <id> --bottlenecks
Test Improvements
swarm-benchmark run "Task" --mode distributed --parallel
Compare Results
swarm-benchmark compare <baseline-id> <new-id>
Validate Improvements
swarm-benchmark analyze <new-id> --compare-baseline
Quality Assessment
swarm-benchmark analyze <id> --quality-breakdown
Apply Improvements
swarm-benchmark run "Task" \
--quality-threshold 0.95 \
--review \
--max-retries 5
Verify Quality
swarm-benchmark analyze <new-id> --quality-validation
# High failure rate
swarm-benchmark analyze <id> --failure-analysis
# → Increase timeouts, add retries
# Poor quality scores
swarm-benchmark analyze <id> --quality-issues
# → Enable review, increase threshold
# Resource exhaustion
swarm-benchmark analyze <id> --resource-problems
# → Reduce agent count, enable limits
# Coordination overhead
swarm-benchmark analyze <id> --coordination-issues
# → Simplify mode, reduce communication
# Export for further analysis
swarm-benchmark export <id> --format csv --include-raw
swarm-benchmark export <id> --format json --pretty
swarm-benchmark export <id> --format sql --table benchmarks
# Export to pandas DataFrame
import pandas as pd
import json
def benchmark_to_dataframe(benchmark_file):
with open(benchmark_file) as f:
data = json.load(f)
# Flatten results
records = []
for result in data['results']:
record = {
'benchmark_id': data['id'],
'task_id': result['task_id'],
'execution_time': result['execution_time'],
'quality_score': result['quality_metrics']['overall_quality'],
'cpu_usage': result['resource_usage']['cpu_percent'],
'memory_usage': result['resource_usage']['memory_mb']
}
records.append(record)
return pd.DataFrame(records)
# Analyze with pandas
df = benchmark_to_dataframe('benchmark_results.json')
print(df.describe())
print(df.groupby('strategy').mean())
Effective benchmark analysis helps you:
Remember: Regular analysis and comparison are key to continuous improvement!