Swarm-Benchmark CLI Usage Guide

Overview

The swarm-benchmark CLI provides comprehensive benchmarking for Claude Flow with real command execution.

Installation

bash

cd benchmark
pip install -e .

Commands

Real Execution (Production)

Execute actual claude-flow commands and measure real performance:

bash

# Basic swarm benchmark
swarm-benchmark real swarm "Build a REST API" --strategy development

# Hive-mind benchmark
swarm-benchmark real hive-mind "Design architecture" --max-workers 8

# SPARC mode benchmark
swarm-benchmark real sparc coder "Implement authentication"

# Test all modes
swarm-benchmark real swarm "Create microservice" --all-modes --parallel

Options

Global Options

--verbose, -v: Enable verbose output
--config PATH: Configuration file path
--version: Show version

Real Command Options

--strategy: Execution strategy (auto, research, development, etc.)
--mode: Coordination mode (centralized, distributed, hierarchical, mesh, hybrid)
--sparc-mode: Specific SPARC mode to test
--all-modes: Test all SPARC modes and strategies
--max-agents: Maximum number of agents (default: 5)
--timeout: Overall timeout in minutes (default: 60)
--task-timeout: Individual task timeout in seconds (default: 300)
--parallel: Enable parallel execution
--monitor: Enable real-time monitoring
--output: Output format (json, sqlite)
--output-dir: Output directory (default: ./reports)

Important Notes on Multi-Word Tasks

When running commands with multi-word tasks from the shell, ensure proper quoting:

bash

# CORRECT - Use quotes around the entire task
swarm-benchmark real hive-mind "Design hello world" --max-workers 4

# For complex tasks with special characters, use single quotes
swarm-benchmark real swarm 'Build REST API with /users endpoint' --strategy development

# Or use a Python script for reliability
python -c "
import subprocess
subprocess.run(['swarm-benchmark', 'real', 'hive-mind', 
                'Design hello world in ./hello-bench/', 
                '--max-workers', '8'])"

Examples

Simple Benchmark

bash

swarm-benchmark real swarm "echo hello world" --max-agents 1 --timeout 1

Development Benchmark

bash

swarm-benchmark real swarm "Build user authentication system" \
  --strategy development \
  --mode hierarchical \
  --max-agents 5 \
  --monitor \
  --output json \
  --output-dir ./results

Comparative Analysis

bash

# Test different strategies
for strategy in auto research development optimization; do
  swarm-benchmark real swarm "Optimize database queries" \
    --strategy $strategy \
    --output json \
    --name "strategy-$strategy"
done

Batch Processing

bash

# Run multiple benchmarks
cat tasks.txt | while read task; do
  swarm-benchmark real swarm "$task" \
    --parallel \
    --output json
done

Output

Results are saved in JSON format with comprehensive metrics:

Token usage (input/output)
Execution time
Agent activity
Memory usage
Error rates
Command details

Example output structure:

json

{
  "benchmark_id": "...",
  "objective": "Build a REST API",
  "metrics": {
    "total_tokens": 4351,
    "execution_time": 95.2,
    "agents_spawned": 3,
    "memory_peak_mb": 156.4
  },
  "command": ["./claude-flow", "swarm", "..."],
  "timestamp": "2025-01-06T02:45:00Z"
}

Performance Tips

Use --parallel for multiple independent tasks
Set appropriate timeouts to avoid hanging
Monitor resource usage with --monitor flag
Use --all-modes for comprehensive testing
Save results with --output-dir for analysis

Troubleshooting

Command not found

bash

# Ensure installation
pip install -e .
# Check PATH
which swarm-benchmark

Timeout issues

bash

# Increase timeouts for complex tasks
swarm-benchmark real swarm "Complex task" --timeout 120 --task-timeout 600

Memory issues

bash

# Limit agents for resource-constrained environments
swarm-benchmark real swarm "Task" --max-agents 2

Integration

CI/CD Pipeline

yaml

# GitHub Actions example
- name: Run benchmarks
  run: |
    cd benchmark
    pip install -e .
    swarm-benchmark real swarm "${{ github.event.inputs.task }}" \
      --output json \
      --output-dir ./artifacts

Python API

python

from swarm_benchmark import BenchmarkEngine

engine = BenchmarkEngine(use_real_executor=True)
result = await engine.run_real_benchmark("Your task here")

Best Practices

Start small: Test with simple tasks first
Monitor resources: Use --monitor for production runs
Save results: Always specify --output-dir
Use timeouts: Prevent hanging on complex tasks
Parallel execution: Use --parallel for independent tasks

For more information, see the full documentation.