v2/benchmark/docs/best-practices.md
A comprehensive guide to best practices for swarm benchmarking, optimization, and performance tuning.
Always begin with the simplest configuration and gradually add complexity:
# ❌ Don't start with this
swarm-benchmark run "Complex task" \
--strategy optimization \
--mode hybrid \
--max-agents 20 \
--parallel \
--distributed \
--monitor \
--profile
# ✅ Start with this
swarm-benchmark run "Complex task"
# Then optimize based on results
swarm-benchmark analyze <id> --recommendations
The quality of results depends heavily on objective clarity:
# ❌ Vague objectives
swarm-benchmark run "Make it better"
swarm-benchmark run "Fix the code"
swarm-benchmark run "Analyze stuff"
# ✅ Clear and specific objectives
swarm-benchmark run "Optimize the user authentication API response time to under 100ms"
swarm-benchmark run "Fix the memory leak in the image processing module"
swarm-benchmark run "Analyze Q4 2023 sales data to identify top 3 growth regions"
Always establish baselines before making changes:
# Establish baseline
BASELINE_ID=$(swarm-benchmark run "Task" --output json | jq -r '.benchmark_id')
swarm-benchmark baseline set $BASELINE_ID
# Test optimization
NEW_ID=$(swarm-benchmark run "Task" --mode distributed --output json | jq -r '.benchmark_id')
# Compare results
swarm-benchmark compare $BASELINE_ID $NEW_ID
Use this decision matrix:
| Task Contains | Recommended Strategy | Example |
|---|---|---|
| "research", "investigate", "find" | research | "Research best database for our needs" |
| "build", "create", "implement" | development | "Build user authentication service" |
| "analyze", "process", "insights" | analysis | "Analyze customer churn data" |
| "test", "validate", "verify" | testing | "Test API endpoints for security" |
| "optimize", "improve", "faster" | optimization | "Optimize query performance" |
| "update", "document", "refactor" | maintenance | "Update API documentation" |
| Mixed/unclear | auto | "Handle user feedback" |
# Use distributed mode for broader coverage
swarm-benchmark run "Research topic" \
--strategy research \
--mode distributed \
--max-agents 8
# Set longer timeouts for complex research
swarm-benchmark run "Deep research task" \
--strategy research \
--task-timeout 600
# Use hierarchical mode for complex projects
swarm-benchmark run "Build system" \
--strategy development \
--mode hierarchical \
--quality-threshold 0.9
# Enable review for critical code
swarm-benchmark run "Security module" \
--strategy development \
--review \
--testing
# Use mesh mode for collaborative analysis
swarm-benchmark run "Analyze data" \
--strategy analysis \
--mode mesh \
--quality-threshold 0.95
# Enable parallel processing for large datasets
swarm-benchmark run "Big data analysis" \
--strategy analysis \
--parallel \
--max-agents 10
| Mode | Optimal Agent Count | Maximum Effective |
|---|---|---|
| Centralized | 2-3 | 5 |
| Distributed | 4-6 | 10 |
| Hierarchical | 5-8 | 15 |
| Mesh | 3-4 | 6 |
| Hybrid | 4-6 | 10 |
def select_coordination_mode(task_complexity, agent_count, priority):
if agent_count <= 3:
return "centralized"
if priority == "speed" and task_complexity == "simple":
return "centralized"
elif priority == "reliability":
return "mesh" if agent_count <= 5 else "distributed"
elif priority == "scalability":
return "hierarchical"
elif task_complexity == "complex":
return "hierarchical" if agent_count >= 5 else "distributed"
else:
return "hybrid" # Adaptive selection
# Monitor coordination overhead
swarm-benchmark run "Task" --monitor | grep "coordination"
# If overhead > 15%, simplify mode
# From mesh → distributed → hierarchical → centralized
Break large tasks into smaller, manageable pieces:
# ❌ Monolithic task
swarm-benchmark run "Build complete e-commerce platform with payment, inventory, and shipping"
# ✅ Decomposed tasks
swarm-benchmark run "Design e-commerce database schema"
swarm-benchmark run "Implement user authentication module"
swarm-benchmark run "Create product catalog API"
swarm-benchmark run "Build payment processing service"
Use parallel execution for independent tasks:
# Identify parallelizable work
swarm-benchmark analyze <previous-id> --parallelization-opportunities
# Enable parallel execution
swarm-benchmark run "Process multiple files" \
--parallel \
--max-agents 8 \
--mode distributed
Set appropriate resource limits:
# Calculate optimal resources
AGENTS=$(swarm-benchmark calculate-agents --task-complexity medium)
MEMORY=$(swarm-benchmark calculate-memory --task-type analysis)
# Run with optimized resources
swarm-benchmark run "Task" \
--max-agents $AGENTS \
--memory-limit $MEMORY \
--cpu-limit 70
# Enable result caching for repeated tasks
swarm-benchmark run "Daily analysis" \
--cache-results \
--cache-ttl 3600
# Use cached research results
swarm-benchmark run "Build on previous research" \
--use-cache \
--cache-namespace "research"
| Task Criticality | Recommended Threshold | Max Retries |
|---|---|---|
| Critical | 0.95+ | 5 |
| Important | 0.85-0.94 | 3 |
| Standard | 0.75-0.84 | 2 |
| Experimental | 0.70+ | 1 |
# Critical task configuration
swarm-benchmark run "Deploy payment system" \
--quality-threshold 0.95 \
--max-retries 5 \
--review \
--testing \
--validation-mode strict
# Peer review for important tasks
swarm-benchmark run "API redesign" \
--review \
--review-threshold 0.9 \
--reviewer-count 2
# Automated testing
swarm-benchmark run "New feature" \
--testing \
--test-coverage 0.8 \
--test-types "unit,integration"
# Strict validation for critical paths
swarm-benchmark run "Security implementation" \
--validation-mode strict \
--security-check \
--compliance-check
# Output validation
swarm-benchmark run "Data processing" \
--output-validation \
--schema-file output-schema.json
# Always monitor important runs
swarm-benchmark run "Production task" \
--monitor \
--alert-on-failure \
--metrics-interval 2
# Set up alerts
swarm-benchmark run "Critical task" \
--alert-email [email protected] \
--alert-threshold 0.8
# Step 1: Verbose dry run
swarm-benchmark run "Problematic task" --dry-run --verbose
# Step 2: Trace execution
swarm-benchmark run "Problematic task" --trace --log-level debug
# Step 3: Profile performance
swarm-benchmark run "Problematic task" --profile --profile-output debug.json
# Step 4: Analyze results
swarm-benchmark analyze <id> --debug --include-logs
# Enable comprehensive logging
swarm-benchmark run "Task" \
--log-level info \
--log-file benchmark.log \
--log-format json
# Aggregate logs for analysis
swarm-benchmark logs aggregate --since "1 hour ago" \
--filter-level warning \
--export logs-analysis.json
# Use meaningful names and tags
swarm-benchmark run "Q4 Analysis" \
--name "quarterly-analysis-2024-q4" \
--tags finance,quarterly,analysis \
--metadata '{"department": "finance", "priority": "high"}'
# Organize by project
swarm-benchmark run "Task" \
--output-dir "./projects/project-x/benchmarks" \
--namespace "project-x"
# Regular backups
#!/bin/bash
# backup-benchmarks.sh
DATE=$(date +%Y%m%d)
BACKUP_DIR="/backups/benchmarks/$DATE"
# Export important benchmarks
swarm-benchmark export --since "7 days ago" \
--format json \
--compress \
--output "$BACKUP_DIR/weekly-backup.tar.gz"
# Backup configurations
cp -r ~/.swarm-benchmark/configs "$BACKUP_DIR/"
# Automated cleanup policy
swarm-benchmark policy set \
--retain-successful 30 \
--retain-failed 90 \
--archive-after 180 \
--delete-after 365
# Manual cleanup
swarm-benchmark clean \
--older-than 30 \
--keep-tagged \
--keep-baselines
# benchmark-config.yaml
defaults:
strategy: auto
mode: distributed
max_agents: 6
quality_threshold: 0.85
output_formats: ["json", "sqlite"]
profiles:
development:
strategy: development
mode: hierarchical
review: true
production:
quality_threshold: 0.95
max_retries: 5
alert_on_failure: true
# Development environment
export SWARM_BENCHMARK_ENV=development
export SWARM_BENCHMARK_TIMEOUT=300
# Production environment
export SWARM_BENCHMARK_ENV=production
export SWARM_BENCHMARK_TIMEOUT=3600
export [email protected]
# Track benchmark configurations
git add benchmark-config.yaml
git commit -m "Updated benchmark configuration for v2.0"
# Tag important benchmarks
swarm-benchmark tag <id> --tag "v2.0-baseline"
git tag -a "benchmark-v2.0" -m "Baseline for v2.0 release"
# .github/workflows/benchmark.yml
name: Automated Benchmarking
on:
push:
branches: [main]
pull_request:
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Run Benchmarks
run: |
swarm-benchmark run "${{ github.event.head_commit.message }}" \
--name "ci-${{ github.run_id }}" \
--tags ci,automated \
--output json \
> benchmark-results.json
- name: Analyze Results
run: |
swarm-benchmark analyze $(jq -r '.benchmark_id' benchmark-results.json) \
--compare-baseline \
--fail-on-regression
# Fail if performance degrades
swarm-benchmark gate \
--baseline main \
--max-regression 10 \
--metrics "execution_time,quality_score"
# Fail if quality drops
swarm-benchmark gate \
--min-quality 0.85 \
--required-tests "unit,integration"
# Weekly performance report
swarm-benchmark report weekly \
--include-trends \
--compare-previous \
--export weekly-report.html
# Monthly executive summary
swarm-benchmark report monthly \
--format executive \
--metrics-summary \
--cost-analysis
# dashboard.py
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
def generate_dashboard():
# Get recent benchmarks
benchmarks = get_benchmarks_since(datetime.now() - timedelta(days=7))
# Create dashboard
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
# Performance trends
plot_performance_trends(axes[0, 0], benchmarks)
# Strategy effectiveness
plot_strategy_comparison(axes[0, 1], benchmarks)
# Resource utilization
plot_resource_usage(axes[1, 0], benchmarks)
# Quality scores
plot_quality_trends(axes[1, 1], benchmarks)
plt.savefig('weekly_dashboard.png')
# Create team baseline
swarm-benchmark run "Standard task" --name "team-baseline-v1"
swarm-benchmark baseline set <id> --shared --team "engineering"
# Everyone compares against baseline
swarm-benchmark run "My implementation" --compare-baseline team-baseline-v1
# Document successful configurations
swarm-benchmark document <successful-id> \
--title "Optimal config for API development" \
--export docs/api-benchmark-guide.md
# Share optimization findings
swarm-benchmark insights export \
--strategy development \
--successes \
--export insights/development-tips.json
# Schedule regular reviews
swarm-benchmark schedule \
--review-meeting weekly \
--participants "[email protected]" \
--include-trends \
--include-recommendations
# ❌ Don't over-configure
swarm-benchmark run "Simple task" \
--max-agents 20 \
--mode hybrid \
--parallel \
--distributed \
--monitor \
--profile \
--trace \
--validation-mode strict
# ✅ Use appropriate configuration
swarm-benchmark run "Simple task" --max-agents 3
# ❌ Don't skip baseline comparison
swarm-benchmark run "Optimized version"
# ✅ Always compare
swarm-benchmark run "Optimized version" --compare-baseline
# ❌ Don't optimize without data
swarm-benchmark run "Task" --mode mesh --max-agents 15
# ✅ Measure first, then optimize
swarm-benchmark run "Task" # Measure
swarm-benchmark analyze <id> --recommendations # Analyze
# Then apply recommended optimizations
# Monthly performance review
swarm-benchmark retrospective \
--period monthly \
--identify-patterns \
--generate-recommendations
# Track configuration evolution
git log --follow benchmark-config.yaml
# Document learnings
echo "## Learnings from Q4 2024" >> BENCHMARK_LEARNINGS.md
echo "- Distributed mode works best for research tasks" >> BENCHMARK_LEARNINGS.md
echo "- Quality threshold of 0.9 optimal for production" >> BENCHMARK_LEARNINGS.md
# Share successful patterns
swarm-benchmark patterns export \
--successful \
--anonymize \
--contribute
Remember: The best configuration is the one that meets your specific needs while using resources efficiently!