Back to Claude Mem

claude-mem Production Guide

docs/production-guide.md

12.7.14.1 KB
Original Source

claude-mem Production Guide

Practical guide based on 23 days of production usage with 3,400+ observations across two physical servers and 8 projects.

SettingDefaultRecommendedWhy
CLAUDE_MEM_MAX_CONCURRENT_AGENTS23Better throughput without overload
CLAUDE_MEM_SEMANTIC_INJECTtruetrueRelevant context >> recent context
CLAUDE_MEM_SEMANTIC_INJECT_LIMIT55Sweet spot for token cost vs coverage
CLAUDE_MEM_TIER_ROUTING_ENABLEDtruetrue~52% cost savings, no quality loss

Health Monitoring

Key metrics to watch

MetricHealthyWarningAction
pending_messages (pending)0-5>10Check worker logs, may need restart
pending_messages (failed)0>0 growingCircuit-breaker may be tripping
sdk_sessions (active)0-3>5 stuckOrphan sessions, worker restart
WAL size<10 MB>20 MBRun PRAGMA wal_checkpoint(TRUNCATE)
Chroma sizeGrowing slowlySudden jumpCheck for sync loops
Errors/day in logs0-2>10Investigate log patterns

Quick health check

bash
# Check worker status
curl -s http://127.0.0.1:37777/api/health | python3 -m json.tool

# Check database stats
sqlite3 ~/.claude-mem/claude-mem.db "
  SELECT 'observations' as metric, COUNT(*) as value FROM observations
  UNION ALL SELECT 'summaries', COUNT(*) FROM session_summaries
  UNION ALL SELECT 'pending', COUNT(*) FROM pending_messages WHERE status='pending'
  UNION ALL SELECT 'active_sessions', COUNT(*) FROM sdk_sessions WHERE status='active';
"

Multi-Machine Setup

If running claude-mem on multiple machines, use claude-mem-sync to keep observations in sync:

bash
claude-mem-sync push <remote-host>    # local -> remote
claude-mem-sync pull <remote-host>    # remote -> local
claude-mem-sync sync <remote-host>    # bidirectional
claude-mem-sync status <remote-host>  # compare counts

Deduplication is by (created_at, title) — safe to run repeatedly.

Growth Expectations

Based on active daily development usage:

MetricPer dayPer monthNotes
Observations~120~3,600Varies with coding activity
Summaries~40~1,200One per session
SQLite~0.8 MB~24 MB~5 KB per observation
Chroma~4 MB~120 MB~50 KB per observation (embeddings)

Common Issues and Solutions

Summarize error loop

Symptom: Repeated [ERROR] Missing last_assistant_message in logs. Cause: Transcript with no assistant messages triggers summary attempt that fails repeatedly. Fix: PR #1566 — skip summary when transcript is empty.

Chroma sync failures

Symptom: [ERROR] Batch add failed... IDs already exist Cause: MCP timeout during add leaves partial writes; retry fails on existing IDs. Fix: PR #1566 — fallback to delete+add reconciliation.

Port conflict on startup

Symptom: Worker failed to start... Is port 37777 in use? Cause: Two sessions starting simultaneously — HTTP check is non-atomic (TOCTOU race). Fix: PR #1566 — atomic socket bind on Unix.

Orphaned pending messages

Symptom: pending_messages table growing with old entries for completed sessions. Cause: SIGTERM kills generator before queue is drained. Fix: PR #1567 — drain after deleteSession().

Context not relevant to current topic

Symptom: Claude receives observations about CSS when you're asking about authentication. Cause: Default recency-based injection selects most recent, not most relevant. Fix: PR #1568 — semantic injection via Chroma on every prompt.

Log Analysis Tips

bash
# Count errors by day
grep '\[ERROR\]' ~/.claude-mem/logs/claude-mem-*.log | \
  sed 's/\[20[0-9][0-9]-[0-9][0-9]-/\n&/g' | \
  grep -oP '^\[20\d{2}-\d{2}-\d{2}' | sort | uniq -c

# Find circuit-breaker trips
grep 'circuit\|Circuit\|ABANDONED\|abandoned' ~/.claude-mem/logs/claude-mem-*.log

# Check pending message health
grep 'CLAIMED\|CONFIRMED\|FAILED\|ABANDONED' ~/.claude-mem/logs/claude-mem-$(date +%Y-%m-%d).log | tail -20