docs/public/usage/manual-recovery.mdx
Claude-mem's manual recovery system helps you recover observations that get stuck in the processing queue after worker crashes, system restarts, or unexpected shutdowns.
Key Change in v5.x: Automatic recovery on worker startup is now disabled. This gives you explicit control over when reprocessing happens, preventing unexpected duplicate observations.
You should trigger manual recovery when:
The interactive CLI tool is the safest and easiest way to recover stuck observations:
# Check status and prompt for recovery
bun scripts/check-pending-queue.ts
This will:
For scripting or when you're confident recovery is needed:
# Auto-process without prompting
bun scripts/check-pending-queue.ts --process
# Limit to 5 sessions
bun scripts/check-pending-queue.ts --process --limit 5
Messages progress through these lifecycle states:
Messages in processing state for >5 minutes are considered stuck:
pending on worker startupstuckCount field when checking queue statusBest for: Regular users, interactive sessions, when you want visibility into what's happening
bun scripts/check-pending-queue.ts
Example Output:
Checking worker health...
Worker is healthy ✓
Queue Summary:
Pending: 12 messages
Processing: 2 messages (1 stuck)
Failed: 0 messages
Recently Processed: 5 messages in last 30 minutes
Sessions with pending work: 3
Session 44: 5 pending, 1 processing (age: 2m)
Session 45: 4 pending, 1 processing (age: 7m - STUCK)
Session 46: 2 pending
Would you like to process these pending queues? (y/n)
Features:
--process flag--limit NBest for: Automation, scripting, integration with monitoring systems
curl http://localhost:37777/api/pending-queue
Response:
{
"queue": {
"messages": [
{
"id": 123,
"session_db_id": 45,
"claude_session_id": "abc123",
"message_type": "observation",
"status": "pending",
"retry_count": 0,
"created_at_epoch": 1730886600000
}
],
"totalPending": 12,
"totalProcessing": 2,
"totalFailed": 0,
"stuckCount": 1
},
"recentlyProcessed": [...],
"sessionsWithPendingWork": [44, 45, 46]
}
Key Fields:
totalPending - Messages waiting to processtotalProcessing - Messages currently processingstuckCount - Processing messages >5 minutes oldsessionsWithPendingWork - Session IDs needing recoverycurl -X POST http://localhost:37777/api/pending-queue/process \
-H "Content-Type: application/json" \
-d '{"sessionLimit": 10}'
Response:
{
"success": true,
"totalPendingSessions": 15,
"sessionsStarted": 10,
"sessionsSkipped": 2,
"startedSessionIds": [44, 45, 46, 47, 48, 49, 50, 51, 52, 53]
}
Response Fields:
totalPendingSessions - Total sessions with pending messages in databasesessionsStarted - Sessions we started processing this requestsessionsSkipped - Sessions already processing (prevents duplicate agents)startedSessionIds - Database IDs of sessions we started# Check queue status first
curl http://localhost:37777/api/pending-queue
# Or use CLI tool which checks automatically
bun scripts/check-pending-queue.ts
# Process only 5 sessions at a time
bun scripts/check-pending-queue.ts --process --limit 5
This prevents overwhelming the worker with too many concurrent SDK agents.
Watch worker logs while recovery runs:
npm run worker:logs
Look for:
Starting SDK agent for session...Processed observation...ERROR or Failed to process...Check recently processed messages:
curl http://localhost:37777/api/pending-queue | jq '.recentlyProcessed'
Or use the CLI tool which shows this automatically.
Messages that fail 3 times are marked failed and won't auto-retry:
# View failed messages
sqlite3 ~/.claude-mem/claude-mem.db "
SELECT id, session_db_id, message_type, retry_count
FROM pending_messages
WHERE status = 'failed'
ORDER BY completed_at_epoch DESC;
"
You can manually reset them if needed:
sqlite3 ~/.claude-mem/claude-mem.db "
UPDATE pending_messages
SET status = 'pending', retry_count = 0
WHERE status = 'failed';
"
Symptom: Triggered recovery but messages still pending
Solutions:
Verify worker health:
curl http://localhost:37777/health
Check worker logs for errors:
npm run worker:logs | grep -i error
Restart worker:
npm run worker:restart
Check database integrity:
sqlite3 ~/.claude-mem/claude-mem.db "PRAGMA integrity_check;"
Symptom: Messages show as "processing" for hours
Solution: Force reset stuck messages
# Reset all stuck messages to pending
sqlite3 ~/.claude-mem/claude-mem.db "
UPDATE pending_messages
SET status = 'pending', started_processing_at_epoch = NULL
WHERE status = 'processing';
"
# Then trigger recovery
bun scripts/check-pending-queue.ts --process
Symptom: Worker stops while processing recovered messages
Solutions:
Check available memory:
npm run worker:status
Reduce session limit:
bun scripts/check-pending-queue.ts --process --limit 3
Check for SDK errors in logs:
npm run worker:logs | grep -i "sdk"
Increase worker memory (if using custom runner):
export NODE_OPTIONS="--max-old-space-size=4096"
npm run worker:restart
View all pending messages:
sqlite3 ~/.claude-mem/claude-mem.db "
SELECT
id,
session_db_id,
message_type,
status,
retry_count,
datetime(created_at_epoch/1000, 'unixepoch') as created_at,
datetime(started_processing_at_epoch/1000, 'unixepoch') as started_at,
CAST((strftime('%s', 'now') * 1000 - started_processing_at_epoch) / 60000 AS INTEGER) as age_minutes
FROM pending_messages
WHERE status IN ('pending', 'processing')
ORDER BY created_at_epoch;
"
sqlite3 ~/.claude-mem/claude-mem.db "
SELECT status, COUNT(*) as count
FROM pending_messages
GROUP BY status;
"
sqlite3 ~/.claude-mem/claude-mem.db "
SELECT
session_db_id,
COUNT(*) as pending_count,
GROUP_CONCAT(message_type) as message_types
FROM pending_messages
WHERE status IN ('pending', 'processing')
GROUP BY session_db_id;
"
sqlite3 ~/.claude-mem/claude-mem.db "
SELECT
id,
session_db_id,
message_type,
retry_count,
datetime(completed_at_epoch/1000, 'unixepoch') as failed_at
FROM pending_messages
WHERE status = 'failed'
ORDER BY completed_at_epoch DESC
LIMIT 10;
"
#!/bin/bash
# Run every hour to process stuck queues
# Check if worker is healthy
if curl -f http://localhost:37777/health > /dev/null 2>&1; then
# Auto-process up to 5 sessions
bun scripts/check-pending-queue.ts --process --limit 5
else
echo "Worker not healthy, skipping recovery"
exit 1
fi
#!/bin/bash
# Alert if stuck count exceeds threshold
STUCK_COUNT=$(curl -s http://localhost:37777/api/pending-queue | jq '.queue.stuckCount')
if [ "$STUCK_COUNT" -gt 5 ]; then
echo "WARNING: $STUCK_COUNT stuck messages detected"
# Send alert (email, Slack, etc.)
fi
#!/bin/bash
# Process pending queues before system shutdown
echo "Processing pending queues before shutdown..."
bun scripts/check-pending-queue.ts --process --limit 20
echo "Waiting for processing to complete..."
sleep 10
echo "Stopping worker..."
claude-mem stop
If you're upgrading from v4.x to v5.x:
v4.x Behavior (Automatic Recovery):
v5.x Behavior (Manual Recovery):
Migration Steps:
bun scripts/check-pending-queue.tsbun scripts/check-pending-queue.ts --process