v2/docs/integrations/reasoningbank/REASONINGBANK-BENCHMARK-RESULTS.md
This document contains benchmark results from testing ReasoningBank with 5 real-world software engineering scenarios.
Date: 2025-10-11
Version: 1.5.8
Command: npx tsx src/reasoningbank/demo-comparison.ts
Complexity: Medium Query: Extract product data from e-commerce site with dynamic pagination and lazy loading
Traditional Approach:
ReasoningBank Approach:
Complexity: High Query: Integrate with third-party payment API handling authentication, webhooks, and retries
Traditional Approach:
ReasoningBank Approach:
Complexity: High Query: Migrate PostgreSQL database with foreign keys, indexes, and minimal downtime
Traditional Approach:
ReasoningBank Approach:
Complexity: Medium Query: Process CSV files with 1M+ rows including validation, transformation, and error recovery
Traditional Approach:
ReasoningBank Approach:
Complexity: High Query: Deploy microservices with health checks, rollback capability, and database migrations
Traditional Approach:
ReasoningBank Approach:
The system attempts OpenRouter first for cost savings, then falls back to Anthropic:
claude-sonnet-4-5-20250929 fail (not a valid OpenRouter model ID)Note: OpenRouter requires different model IDs (e.g., anthropic/claude-sonnet-4.5-20250929)
Current config uses Anthropic's API model ID which causes OpenRouter to fail, but fallback works correctly.
Each failed attempt creates 2 memories on average:
The benchmark successfully demonstrates:
anthropic/claude-sonnet-4.5-20250929)