docs/DEBUG_CHECKLIST.md
When agent teams spawns subagent CLI processes, they write to the same session JSONL. On subsequent query() resumes, the CLI reads the JSONL but may pick a stale branch tip (from before the subagent activity), causing the agent's response to land on a branch the host never receives a result for. Fix: pass resumeSessionAt with the last assistant message UUID to explicitly anchor each resume.
Both timers fire at the same time, so containers always exit via hard SIGKILL (code 137) instead of graceful _close sentinel shutdown. The idle timeout should be shorter (e.g., 5 min) so containers wind down between messages, while container timeout stays at 30 min as a safety net for stuck agents.
processGroupMessages advances lastAgentTimestamp before the agent runs. If the container times out, retries find no messages (cursor already past them). Messages are permanently lost on timeout.
# 1. Is the service running?
launchctl list | grep nanoclaw
# Expected: PID 0 com.nanoclaw (PID = running, "-" = not running, non-zero exit = crashed)
# 2. Any running containers?
container ls --format '{{.Names}} {{.Status}}' 2>/dev/null | grep nanoclaw
# 3. Any stopped/orphaned containers?
container ls -a --format '{{.Names}} {{.Status}}' 2>/dev/null | grep nanoclaw
# 4. Recent errors in service log?
grep -E 'ERROR|WARN' logs/nanoclaw.log | tail -20
# 5. Is WhatsApp connected? (look for last connection event)
grep -E 'Connected to WhatsApp|Connection closed|connection.*close' logs/nanoclaw.log | tail -5
# 6. Are groups loaded?
grep 'groupCount' logs/nanoclaw.log | tail -3
# Check for concurrent CLI processes in session debug logs
ls -la data/sessions/<group>/.claude/debug/
# Count unique SDK processes that handled messages
# Each .txt file = one CLI subprocess. Multiple = concurrent queries.
# Check parentUuid branching in transcript
python3 -c "
import json, sys
lines = open('data/sessions/<group>/.claude/projects/-workspace-group/<session>.jsonl').read().strip().split('\n')
for i, line in enumerate(lines):
try:
d = json.loads(line)
if d.get('type') == 'user' and d.get('message'):
parent = d.get('parentUuid', 'ROOT')[:8]
content = str(d['message'].get('content', ''))[:60]
print(f'L{i+1} parent={parent} {content}')
except: pass
"
# Check for recent timeouts
grep -E 'Container timeout|timed out' logs/nanoclaw.log | tail -10
# Check container log files for the timed-out container
ls -lt groups/*/logs/container-*.log | head -10
# Read the most recent container log (replace path)
cat groups/<group>/logs/container-<timestamp>.log
# Check if retries were scheduled and what happened
grep -E 'Scheduling retry|retry|Max retries' logs/nanoclaw.log | tail -10
# Check if messages are being received from WhatsApp
grep 'New messages' logs/nanoclaw.log | tail -10
# Check if messages are being processed (container spawned)
grep -E 'Processing messages|Spawning container' logs/nanoclaw.log | tail -10
# Check if messages are being piped to active container
grep -E 'Piped messages|sendMessage' logs/nanoclaw.log | tail -10
# Check the queue state — any active containers?
grep -E 'Starting container|Container active|concurrency limit' logs/nanoclaw.log | tail -10
# Check lastAgentTimestamp vs latest message timestamp
sqlite3 store/messages.db "SELECT chat_jid, MAX(timestamp) as latest FROM messages GROUP BY chat_jid ORDER BY latest DESC LIMIT 5;"
# Check mount validation logs (shows on container spawn)
grep -E 'Mount validated|Mount.*REJECTED|mount' logs/nanoclaw.log | tail -10
# Verify the mount allowlist is readable
cat ~/.config/nanoclaw/mount-allowlist.json
# Check group's container_config in DB
sqlite3 store/messages.db "SELECT name, container_config FROM registered_groups;"
# Test-run a container to check mounts (dry run)
# Replace <group-folder> with the group's folder name
container run -i --rm --entrypoint ls nanoclaw-agent:latest /workspace/extra/
# Check if QR code was requested (means auth expired)
grep 'QR\|authentication required\|qr' logs/nanoclaw.log | tail -5
# Check auth files exist
ls -la store/auth/
# Re-authenticate if needed
npm run auth
# Restart the service
launchctl kickstart -k gui/$(id -u)/com.nanoclaw
# View live logs
tail -f logs/nanoclaw.log
# Stop the service (careful — running containers are detached, not killed)
launchctl bootout gui/$(id -u)/com.nanoclaw
# Start the service
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.nanoclaw.plist
# Rebuild after code changes
npm run build && launchctl kickstart -k gui/$(id -u)/com.nanoclaw