v2/benchmark/REAL_EXECUTION.md
The benchmark system is configured to use real claude-flow commands without any simulations.
All SPARC modes work correctly and produce real output:
sparc spec - Specification modesparc architect - Architecture modesparc tdd - Test-driven development modesparc integration - Integration modesparc refactor - Refactoring modeExample:
swarm-benchmark real sparc tdd "Create a function" --timeout 1
The swarm command requires Claude CLI even with --executor flag:
# This will timeout without Claude CLI installed
swarm-benchmark real swarm "Create API" --strategy development
To install Claude CLI:
npm install -g @anthropic-ai/claude-code
Similar to swarm, requires Claude CLI for execution.
All commands are configured to run non-interactively by default:
--executor flag (requires Claude CLI)--executor flag (requires Claude CLI)# In benchmark/src/swarm_benchmark/core/claude_flow_real_executor.py
class RealClaudeFlowExecutor:
def __init__(self, force_non_interactive=True):
# Always uses non-interactive mode
All commands use real claude-flow binary at /workspaces/claude-code-flow/claude-flow
# Test different SPARC modes
swarm-benchmark real sparc spec "Design a system"
swarm-benchmark real sparc tdd "Create a calculator"
swarm-benchmark real sparc architect "Design API structure"
# View detailed execution metrics
swarm-benchmark real sparc tdd "Build feature" --output-dir ./my-reports
cat ./my-reports/sparc_tdd_*.json
The system has been updated to:
For benchmarking without Claude CLI, use SPARC commands which provide real workflow information and execution metrics.