v3/implementation/adrs/ADR-067-critical-issue-remediation-v3543.md
Status: Implemented Date: 2026-03-25 Author: RuvNet Version: v3.5.42 → v3.5.43 Tracking: GitHub Issues #1395, #1423, #1425, #1428, #1431, #1399, #1404, #1422
Community-reported issues have identified 6 critical/high-severity bugs and 2 moderate issues in v3.5.42 that collectively degrade core functionality — swarm execution, headless workers, memory initialization, AgentDB bridge, MCP schema validation, and hive-mind tool routing. Multiple reporters confirm that the swarm orchestration engine registers agents but never dispatches work, headless workers fail 100% of the time, and accumulated daemon processes can cause kernel panic on macOS.
This ADR documents all findings, root causes, and the remediation plan for v3.5.43.
| Priority | Issue | GitHub | Root Cause |
|---|---|---|---|
| P0 — Critical | Headless workers hang forever (stdin never closed) | #1395 (Bug 1) | stdio: ['pipe','pipe','pipe'] — stdin opened but never closed; claude --print blocks on EOF |
| P0 — Critical | Workers fail inside active Claude Code session | #1395 (Bug 2) | Nested session detection kills subprocess; workers can never succeed during normal use |
| P0 — Critical | Swarm agents do not execute work | #1423, #1425 | startSwarm() updates metadata but has no task consumer/dispatcher; commands return hardcoded success |
| P1 — High | Stale/nonexistent model IDs in daemon workers | #1431 | Hardcoded claude-sonnet-4-5-20250929 and claude-haiku-4-5-20251001 — both expired/invalid |
| P1 — High | Daemons never terminate, accumulate across sessions | #1395 (Bug 3) | No PID singleton enforcement; each session spawns a new daemon |
| P1 — High | memory init hangs after completion | #1428 | ONNX worker threads + SQLite connection never terminated; no process.exit() after init |
| P2 — Medium | AgentDB bridge unavailable | #1399 | CLI bundles @claude-flow/[email protected] (missing ControllerRegistry); runtime patch targets v1.x paths |
| P2 — Medium | MCP array schema missing items | #1404 | type: 'array' without items in ruvllm-tools.ts — invalid JSON Schema, breaks VSCode Copilot |
| P2 — Medium | Hive-mind uses native tools instead of MCP | #1422 | No tool preference enforcement; Claude defaults to native tools over Ruflo MCP |
Address all issues in a single v3.5.43 release, prioritized by severity and dependency order.
1.1 — Fix stdin pipe (one-line change)
File: v3/@claude-flow/cli/src/daemon/headless-worker-executor.ts
- stdio: ['pipe', 'pipe', 'pipe']
+ stdio: ['ignore', 'pipe', 'pipe']
Rationale: 'ignore' closes stdin at spawn, allowing --print mode to proceed immediately. This unblocks all headless worker functionality.
1.2 — Fix nested session detection
Options (choose one):
CLAUDE_CODE_WORKER=1 env var on spawned processes; patch Claude Code session check to allow workersclaude --print entirelyoptimize/testgaps workers by default; document limitation1.3 — Update model IDs to aliases
File: v3/@claude-flow/cli/src/daemon/headless-worker-executor.ts
const MODEL_IDS = {
- sonnet: 'claude-sonnet-4-5-20250929',
- opus: 'claude-opus-4-6',
- haiku: 'claude-haiku-4-5-20251001',
+ sonnet: 'sonnet',
+ opus: 'opus',
+ haiku: 'haiku',
};
Rationale: Model aliases auto-resolve to the latest version, preventing future staleness. Additionally, add a model field to daemon-state.json worker config for user overrides.
1.4 — PID singleton enforcement for daemon
Implement standard PID-file pattern:
daemon start, check $PROJECT/.claude-flow/daemon.pidkill -0), skip start1.5 — Fix orphan process cleanup
When wrapper timeout fires, send SIGTERM to the child process group (not just reject the promise). Align wrapper timeout to executor_timeout + 60s.
2.1 — Implement task dispatcher
The core gap: startSwarm() in swarm.ts registers agents and updates metadata but has no execution loop. Commands like swarm start, swarm coordinate, and task operations return hardcoded responses.
Remediation:
TaskDispatcher class that polls task queue and dispatches to agent workersclaude --print or SDK calls"active")2.2 — Remove hardcoded stubs
Audit all commands in swarm.ts and deployment.ts for stub responses. Either:
"Not yet implemented" with a link to the tracking issue2.3 — Dynamic agent count
File: v3/@claude-flow/cli/src/commands/swarm.ts (line ~645)
Replace hardcoded 8-agent count with dynamic fetch from swarm state.
3.1 — Fix memory init hang
File: v3/@claude-flow/cli/src/commands/memory.ts (init handler)
ort.env.close() or terminate ONNX inference sessions after initprocess.exit(0) as final fallback after cleanup3.2 — Fix AgentDB bridge
@claude-flow/cli dependency on @claude-flow/memory to >=3.0.0-alpha.12agentdb-runtime-patch.js path: dist/controllers/index.js → dist/src/controllers/index.jsrequire('./controllers/index.js') → require('./index.js')3.3 — Fix memory store/search
all-MiniLM-L6-v2 output), or detect at runtime4.1 — Add missing items to array schemas
Files: ruvllm-tools.ts, process-manager-tools.ts, and all other MCP tool definitions.
Audit all type: 'array' properties and add appropriate items schema. This is partially addressed in PR #73 (ruflo repo).
4.2 — Enforce Ruflo MCP tool preference in hive-mind
Options:
--allowedTools constraint when spawning hive-mind sessions to prefer Ruflo MCP tools5.1 — config.yaml support
Daemon currently reads only config.json. Add YAML fallback or emit warning when config.yaml exists without config.json.
5.2 — Parser flag collision
-f flag used by 50+ subcommands. Audit parser.ts for resolution order issues. Consider namespacing or removing the global -f shorthand.
deployment.ts, config commands, and providers commandsPhase 1.1 (stdin fix) ← trivial, unblocks all workers
Phase 1.3 (model IDs) ← trivial, unblocks daemon success
Phase 1.4 (PID singleton) ← low effort, prevents accumulation
Phase 1.5 (orphan cleanup) ← low effort
Phase 3.1 (memory init hang) ← medium effort
Phase 4.1 (MCP schema) ← low effort, partially done in PR #73
Phase 1.2 (nested session) ← medium effort, design decision needed
Phase 3.2 (AgentDB bridge) ← medium effort, requires npm publish
Phase 3.3 (memory ID/dims) ← medium effort
Phase 4.2 (tool preference) ← medium effort, design decision needed
Phase 2.x (swarm execution) ← high effort, core architecture work
Phase 5.x (code quality) ← ongoing
Each phase must pass:
npx ruflo daemon start → workers execute successfullynpx ruflo memory init → process exits cleanlyagentdb_health returns available: true