Back to Ruflo

ADR-073: Stub Tool Honesty & Real Predictions

v3/implementation/adrs/ADR-073-stub-tool-honesty-real-predictions.md

3.6.309.7 KB
Original Source

ADR-073: Stub Tool Honesty & Real Predictions

Status: Accepted
Date: 2026-04-06 (updated v3.5.59)
Context: Issue #1514 (independent audit), Issues #1058, #1516, #1518, #1521, #1526, #1530, #1531, #1538, PR #1539

Decision

1. Remove fabricated metrics from token-optimizer

The TokenOptimizer class (@claude-flow/integration) contained hardcoded savings numbers:

BeforeAfter
totalTokensSaved += 100 per cache hitRemoved — cache hits tracked but no fabricated token count
baseline = 1000 (hardcoded)queryTokenEstimate = query.length / 4 (actual content size)
totalTokensSaved += 50 per editRemoved — edit count tracked, savings not fabricated
executionMs: 352 fallbackexecutionMs: 0 (honest: no optimization occurred)

2. Make getOptimalConfig() responsive to agentCount

Previously returned identical config regardless of input. Now scales:

Agent CountBatch SizeTopologyCache (MB)
1-22hierarchical25
3-42hierarchical50
5-64hierarchical75
7-84hierarchical-mesh100
9-126hierarchical-mesh125-150
13+6mesh175-200

Formula: batchSize = agentCount<=4?2 : agentCount<=8?4 : 6, cacheSizeMB = min(200, 25*ceil(agentCount/2))

3. Wire neural_predict to real embedding similarity

Previously: hardcoded labels ['coder', 'researcher', 'reviewer', 'tester'] with random confidence.

Now:

  • If stored patterns exist: generates real embedding for input, computes cosine similarity against all stored pattern embeddings, returns top-K nearest neighbors
  • If no patterns stored: returns empty array [] (no fake labels, no simulated data)
  • All results include _realEmbedding (bool: ML model loaded) and _hasStoredPatterns (bool: patterns available) transparency flags

4. neural_train stores real embeddings

Training now generates real embeddings for each training data entry (via ML model or deterministic hash fallback) and stores them as searchable patterns. Accuracy is 1.0 if patterns were stored, 0 otherwise — not simulated. Cosine similarity search against these stored embeddings produces real nearest-neighbor results.

5. Fix bare model names (#1516)

All embedding model defaults now use Xenova/ prefix (e.g., Xenova/all-MiniLM-L6-v2) so @xenova/transformers can resolve them.

6. Fix intelligence data bloat (#1518, #1526)

  • Deduplicate store entries by ID before building graph (v3.5.54: also persist deduped store in consolidate via preDedupCount tracking)
  • Applied dedup to both v3 and root intelligence.cjs copies
  • Scope bootstrapFromMemoryFiles() to current project only (was scanning all 51+ project dirs)
  • Fix tool_input snake_case mismatch in hook-handler

7. Deep audit Math.random() removal (v3.5.56)

  • agent_health aggregate: replaced Math.random() CPU/memory/latency with null + _note
  • system_health: replaced hardcoded "healthy" + random latency with real fs.existsSync() checks timed with performance.now()
  • system_status: replaced hardcoded component health (0.95, 0.90, 1.0) with status: 'unknown'
  • coordination_metrics: replaced Math.random() with null + real sync counts
  • github-tools: all 5 tools return _stub: true, Math.random() removed
  • neural_compress and neural_optimize: return _stub: true
  • performance bottleneck/profile/optimize: return _stub: true, profile no longer sleeps 100ms
  • hooks_metrics: reads real counts from memory store
  • hooks_pretrain: returns _stub: true
  • hooks_intelligence-reset: actually deletes data files now
  • hooks_session-end: reads real task/file/agent counts
  • hooks_explain: reads real success rate from routing-outcomes.json
  • hooks_transfer: returns failure instead of substituting demo data
  • workflow_execute/resume: steps stay pending, don't auto-complete
  • task-tools: fixed path mismatch agents.jsonagents/store.json
  • session_restore: syncs to sql.js database after writing legacy JSON
  • claims_rebalance: executes moves when dryRun=false
  • config get: calls configManager.get() instead of hardcoded map
  • process monitor: uses real os.loadavg(), process.memoryUsage()
  • process logs: reads actual log files
  • status: flashAttention/searchSpeed → 'not measured'
  • token-optimizer: removed double-increment of editsOptimized

8. Remaining stub cleanup (v3.5.57)

  • daa_agent_adapt: removed fake 50ms setTimeout delay
  • daa_workflow_execute: steps stay pending instead of auto-completing
  • daa_cognitive_pattern: replaced hardcoded analysis with real agent metrics
  • daa_knowledge_share: added _note clarifying no cross-agent transfer
  • hooks_intelligence_attention: removed Math.exp(-i*0.5) fake sigmoid weights, returns empty with _stub: true when no backend
  • system_reset + loadMetrics(): use real os.loadavg()/os.totalmem() instead of hardcoded cpu:25, memory:256/1024
  • benchmark CLI: honest zero fallback instead of hardcoded searchTime: 0.5
  • embeddings CLI: "Skipped" instead of "Simulated"
  • providers CLI: clarifying comment on static catalog
  • system_metrics: wired up real agent/task counters from store files

9. Real implementations & AgentDB integration (v3.5.59, PR #1539)

Three performance tool stubs replaced with real implementations:

  • performance_bottleneck: Real CPU load (os.loadavg()), memory (process.memoryUsage()), disk I/O latency (4KB write/read probe), severity classification
  • performance_profile: Real V8 profiling with process.cpuUsage(), performance.now(), operation hotspot detection across memory/io/cpu targets
  • performance_optimize: Real before/after system snapshots, GC collection (when --expose-gc), bottleneck-informed recommendations

Two neural tool stubs replaced with real implementations:

  • neural_compress: Three real methods — quantize (Int8 via quantizeInt8() from memory-initializer, 3.92x compression), prune (remove low-usage patterns), distill (merge by cosine similarity > 0.95)
  • neural_optimize: Target-aware — speed (dedup by hash+cosine), memory (Int8 quantization), accuracy (prune zero-norm embeddings), balanced (all three)

MCP request tracking:

  • New request-tracker.ts singleton counter module tracks tool invocations in-process
  • Wired into mcp-server.ts (success/error paths)
  • system_metrics reads live counts instead of stale JSON

AgentDB integration (primary data layer with JSON fallback):

  • hive-mind consensus results → hive-consensus namespace
  • hive-mind shared memory → hive-memory namespace
  • daa_agent_createdaa-agents namespace
  • daa_workflow_executedaa-workflows namespace
  • All AgentDB writes are after JSON store save, in try/catch — backward compatible

TypeScript fixes:

  • Zero TS compilation errors for first time — added ambient type declarations for 7 optional packages
  • Fixed unknown[] casts in embeddings.ts, registry as any in memory-bridge.ts

Test coverage:

  • 28 honesty tests (source-level checks for fabrication patterns)
  • 30 feature-gap tests (verify real implementations, AgentDB integration, request tracking)

Consequences

  • Token optimizer reports honest numbers (will show 0 savings when agentic-flow is not installed)
  • neural_predict returns real cosine similarity results when patterns stored, empty array when not
  • neural_train stores real embeddings, no simulated accuracy
  • Zero instances of Math.random() for confidence/accuracy/metrics in shipped code
  • Zero instances of setTimeout() for fake delays in shipped code
  • All remaining stubs marked _stub: true; all real tools marked _real: true
  • Users can distinguish real ML vs hash-based embedding via _realEmbedding flag
  • hooks explain matchScore uses real keyword ratio instead of random
  • system_metrics returns real agent/task/request counts from persistent stores and live tracker
  • performance_bottleneck/profile/optimize are fully real (V8 profiling, OS metrics, disk I/O)
  • neural_compress/optimize are fully real (Int8 quantization, cosine similarity, pruning)
  • AgentDB is primary data layer for hive-mind and DAA tools with JSON store backward compat
  • 58 automated tests enforce honesty invariants and real implementation requirements

Tools Status (Post-Fix)

CategoryStatusNotes
Memory/HNSWRealVector search, persistence, embeddings
AgentDBRealPattern store, hierarchical recall, HNSW; primary layer for hive-mind + DAA
EmbeddingsRealXenova/transformers, cosine similarity
Neural predictReal (with patterns)Cosine similarity search; empty array when no patterns
Neural trainRealEmbeds training data, stores as searchable patterns
Neural compressReal (v3.5.59)Int8 quantize (3.92x), prune, distill (cosine > 0.95)
Neural optimizeReal (v3.5.59)Target-aware: speed/memory/accuracy/balanced
Performance bottleneckReal (v3.5.59)CPU load, memory, disk I/O latency, severity classification
Performance profileReal (v3.5.59)V8 cpuUsage, memoryUsage, operation hotspots
Performance optimizeReal (v3.5.59)GC collect, before/after snapshots, recommendations
Token optimizerHonest metricsNo fabricated numbers
Agent spawn/taskReal state trackingStore persistence via agents/store.json
DAA toolsReal + AgentDB (v3.5.59)Local state + AgentDB persistence, no fake delays
System metricsReal + live tracker (v3.5.59)CPU/memory from os, agents/tasks from stores, requests from tracker
Hive-mindReal + AgentDB (v3.5.59)Vote counting + AgentDB persistence for consensus + shared memory
WASM agentsStubEcho-based, no WASM runtime