v2/benchmark/docs/api_reference.md
This document provides comprehensive API documentation for the Claude Flow Benchmark System, including the new CLAUDE.md optimizer, MLE-STAR integration, and advanced metrics collection.
swarm_benchmark.core - Core benchmarking functionalityswarm_benchmark.claude_optimizer - CLAUDE.md configuration optimizationswarm_benchmark.mle_star - MLE-STAR ensemble learning integrationswarm_benchmark.automation - Non-interactive automation systemsswarm_benchmark.advanced_metrics - Advanced performance metricsswarm_benchmark.collective - Collective intelligence benchmarkingThe main class for generating optimized CLAUDE.md configurations.
from swarm_benchmark.claude_optimizer import ClaudeMdOptimizer
ClaudeMdOptimizer()
Creates a new optimizer instance with all use case templates and optimization rules loaded.
generate_optimized_config(
use_case: str,
project_context: ProjectContext,
performance_targets: PerformanceTargets
) -> str
Generate an optimized CLAUDE.md configuration for a specific use case.
Parameters:
use_case (str): Type of development project
"api_development", "ml_pipeline", "frontend_react", "backend_microservices", "data_pipeline", "devops_automation", "mobile_development", "testing_automation", "documentation", "performance_optimization"project_context (ProjectContext): Project-specific informationperformance_targets (PerformanceTargets): Performance optimization goalsReturns:
str: Complete optimized CLAUDE.md configuration contentRaises:
ValueError: If use_case is not supportedRuntimeError: If optimization failsExample:
from swarm_benchmark.claude_optimizer import ClaudeMdOptimizer, ProjectContext, PerformanceTargets
optimizer = ClaudeMdOptimizer()
context = ProjectContext(
project_type="web_api",
team_size=5,
complexity="medium",
primary_languages=["Python", "JavaScript"],
frameworks=["FastAPI", "React"],
performance_requirements={"response_time": "<100ms"},
existing_tools=["pytest", "docker"],
constraints={"budget": "medium"}
)
targets = PerformanceTargets(
priority="speed",
target_completion_time=30.0,
target_token_usage=500,
target_memory_usage=1024.0,
target_error_rate=0.01
)
config = optimizer.generate_optimized_config("api_development", context, targets)
print(config)
async benchmark_config_effectiveness(
claude_md_content: str,
test_tasks: List[str],
iterations: int = 3
) -> BenchmarkMetrics
Benchmark the effectiveness of a CLAUDE.md configuration.
Parameters:
claude_md_content (str): The CLAUDE.md configuration to testtest_tasks (List[str]): List of tasks to run for benchmarkingiterations (int, optional): Number of benchmark iterations. Default: 3Returns:
BenchmarkMetrics: Comprehensive performance metricsExample:
tasks = [
"Create a REST API endpoint",
"Add input validation",
"Write unit tests",
"Generate documentation"
]
metrics = await optimizer.benchmark_config_effectiveness(config, tasks, iterations=5)
print(f"Optimization score: {metrics.optimization_score}")
print(f"Completion rate: {metrics.completion_rate}")
get_optimization_suggestions(
current_metrics: BenchmarkMetrics,
target_metrics: PerformanceTargets
) -> List[str]
Generate optimization suggestions based on current performance.
Parameters:
current_metrics (BenchmarkMetrics): Current performance metricstarget_metrics (PerformanceTargets): Desired performance targetsReturns:
List[str]: List of specific optimization suggestions@dataclass
class ProjectContext:
project_type: str
team_size: int
complexity: str # "simple", "medium", "complex"
primary_languages: List[str]
frameworks: List[str]
performance_requirements: Dict[str, Any]
existing_tools: List[str]
constraints: Dict[str, Any]
@dataclass
class PerformanceTargets:
priority: str # "speed", "accuracy", "tokens", "memory"
target_completion_time: Optional[float] = None
target_token_usage: Optional[int] = None
target_memory_usage: Optional[float] = None
target_error_rate: Optional[float] = None
@dataclass
class BenchmarkMetrics:
completion_rate: float = 0.0
avg_tokens_per_task: int = 0
avg_execution_time: float = 0.0
error_rate: float = 0.0
peak_memory_mb: float = 0.0
optimization_score: float = 0.0
cache_hit_rate: float = 0.0
parallel_efficiency: float = 0.0
Generates CLAUDE.md files from configuration dictionaries.
from swarm_benchmark.claude_optimizer import TemplateEngine
generate_claude_md(config: Dict[str, Any]) -> str
Generate a complete CLAUDE.md file from configuration.
Parameters:
config (Dict[str, Any]): Configuration dictionary containing all settingsReturns:
str: Complete CLAUDE.md content as stringExample:
template_engine = TemplateEngine()
config = {
"use_case": "api_development",
"max_agents": 5,
"swarm_topology": "hierarchical",
"preferred_agents": ["backend-dev", "api-docs", "tester"],
"critical_rules": ["Always validate input parameters", "Batch operations"]
}
claude_md = template_engine.generate_claude_md(config)
Apply optimization rules to CLAUDE.md configurations.
from swarm_benchmark.claude_optimizer import OptimizationRulesEngine
optimize_for_speed(config: Dict) -> Dict
Optimize configuration for maximum execution speed.
Parameters:
config (Dict): Base configuration dictionaryReturns:
Dict: Speed-optimized configurationoptimize_for_accuracy(config: Dict) -> Dict
Optimize configuration for maximum accuracy and correctness.
optimize_for_tokens(config: Dict) -> Dict
Optimize configuration for token efficiency.
optimize_for_memory(config: Dict) -> Dict
Optimize configuration for memory efficiency.
Main engine for running benchmark tests.
from swarm_benchmark.core import BenchmarkEngine
BenchmarkEngine(
strategy_name: str = "development",
mode_name: str = "hierarchical",
config: Optional[Dict] = None
)
async run_benchmark(
task: str,
num_agents: int = 4,
iterations: int = 1,
enable_metrics: bool = True
) -> BenchmarkResult
Run a complete benchmark test.
Parameters:
task (str): Description of the task to benchmarknum_agents (int): Number of agents to useiterations (int): Number of iterations to runenable_metrics (bool): Whether to collect detailed metricsReturns:
BenchmarkResult: Complete benchmark resultsExecute MLE-STAR ensemble benchmarks with multiple models.
from swarm_benchmark.mle_star import MLEStarEnsembleExecutor
MLEStarEnsembleExecutor(config: MLEStarConfig)
async execute_ensemble_benchmark(
task: str,
dataset: Any
) -> BenchmarkResult
Execute a complete ensemble benchmark.
Track and optimize token usage across benchmark runs.
from swarm_benchmark.advanced_metrics import TokenOptimizationTracker
measure_token_usage(
task: str,
execution_log: ExecutionLog
) -> TokenMetrics
Measure token usage for a task execution.
Profile memory persistence overhead and optimization.
from swarm_benchmark.advanced_metrics import MemoryPersistenceProfiler
async profile_memory_persistence(swarm_id: str) -> MemoryProfile
Profile memory persistence for a swarm.
The benchmark system provides a comprehensive CLI for running benchmarks and optimizations.
# Run a basic benchmark
python -m swarm_benchmark benchmark --task "Create API" --strategy development
# Generate optimized CLAUDE.md
python -m swarm_benchmark optimize --use-case api_development --output claude.md
# Run performance analysis
python -m swarm_benchmark analyze --config-file claude.md --tasks tasks.txt
python -m swarm_benchmark benchmark [OPTIONS]
Options:
--task TEXT: Task description to benchmark--strategy CHOICE: Strategy to use (development, testing, optimization, etc.)--mode CHOICE: Coordination mode (hierarchical, mesh, distributed, etc.)--agents INTEGER: Number of agents to use--iterations INTEGER: Number of iterations to run--output PATH: Output file for resultspython -m swarm_benchmark optimize [OPTIONS]
Options:
--use-case CHOICE: Use case to optimize for--team-size INTEGER: Team size--complexity CHOICE: Project complexity level--priority CHOICE: Optimization priority (speed, accuracy, tokens, memory)--output PATH: Output file for optimized CLAUDE.mdpython -m swarm_benchmark analyze [OPTIONS]
Options:
--config-file PATH: CLAUDE.md configuration file--tasks PATH: File containing test tasks--iterations INTEGER: Number of benchmark iterations--report-format CHOICE: Output report format (json, html, markdown)Base exception class for benchmark-related errors.
class BenchmarkError(Exception):
"""Base exception for benchmark errors."""
pass
Exception raised during configuration optimization.
class OptimizationError(BenchmarkError):
"""Exception raised during configuration optimization."""
pass
Exception raised for invalid configuration.
class ConfigurationError(BenchmarkError):
"""Exception raised for invalid configuration."""
pass
CLAUDE_FLOW_LOG_LEVEL: Logging level (DEBUG, INFO, WARNING, ERROR)CLAUDE_FLOW_CONFIG_PATH: Path to configuration fileCLAUDE_FLOW_CACHE_DIR: Directory for caching optimization resultsThe system supports YAML and JSON configuration files:
# benchmark_config.yaml
benchmark:
default_strategy: "development"
default_mode: "hierarchical"
default_agents: 4
enable_metrics: true
optimization:
cache_enabled: true
cache_ttl: 3600
max_cache_size: 1000
logging:
level: "INFO"
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
import asyncio
from swarm_benchmark.claude_optimizer import (
ClaudeMdOptimizer,
ProjectContext,
PerformanceTargets
)
async def optimize_for_project():
# Initialize optimizer
optimizer = ClaudeMdOptimizer()
# Define project context
context = ProjectContext(
project_type="microservice_api",
team_size=8,
complexity="complex",
primary_languages=["Python", "Go"],
frameworks=["FastAPI", "Gin", "Docker"],
performance_requirements={
"response_time": "<50ms",
"throughput": ">1000 rps",
"availability": "99.9%"
},
existing_tools=["pytest", "k6", "prometheus"],
constraints={"budget": "high", "timeline": "aggressive"}
)
# Define performance targets
targets = PerformanceTargets(
priority="speed",
target_completion_time=60.0,
target_token_usage=800,
target_memory_usage=2048.0,
target_error_rate=0.005
)
# Generate optimized configuration
config = optimizer.generate_optimized_config(
"backend_microservices",
context,
targets
)
# Save to file
with open("optimized_claude.md", "w") as f:
f.write(config)
# Benchmark the configuration
test_tasks = [
"Design microservice architecture",
"Implement API endpoints with validation",
"Add comprehensive error handling",
"Write unit and integration tests",
"Set up monitoring and logging",
"Deploy with Docker containers"
]
metrics = await optimizer.benchmark_config_effectiveness(
config,
test_tasks,
iterations=5
)
print(f"Configuration Performance:")
print(f" Optimization Score: {metrics.optimization_score:.3f}")
print(f" Completion Rate: {metrics.completion_rate:.1%}")
print(f" Avg Execution Time: {metrics.avg_execution_time:.2f}s")
print(f" Token Efficiency: {metrics.avg_tokens_per_task} tokens/task")
# Get optimization suggestions
suggestions = optimizer.get_optimization_suggestions(metrics, targets)
if suggestions:
print("\nOptimization Suggestions:")
for suggestion in suggestions:
print(f" - {suggestion}")
# Run the optimization
asyncio.run(optimize_for_project())
This API reference provides comprehensive documentation for using the Claude Flow Benchmark System's optimization and benchmarking capabilities. For additional examples and advanced usage patterns, see the examples directory and user guides.