Back to Claude Mem

Finish BullMQ Observation Queue Branch — Ship Plan

plans/2026-05-07-finish-bullmq-branch-ship-plan.md

13.2.015.4 KB
Original Source

Finish BullMQ Observation Queue Branch — Ship Plan

Date: 2026-05-07 Branch: bullmq-vs-bee-queue-for-claude-mem-observation-que Base: origin/main @ 0a43ab76 Parent plan: plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md

Reframe

The prior session believed Phase 1 was ungated because two reviewer agents failed (one returned not_found, "Carver" was user-aborted at 111.9s). That belief was based on a stale snapshot that predated commit 4e0fc77a Add Postgres observation storage foundation. Phase 1 is committed. git status shows zero uncommitted changes under src/storage/postgres/.

What is actually dirty in the worktree is Phase 2: Define Server Runtime Boundary. The dirty files map 1:1 to that phase's "What To Implement" section. The remaining work to "finish this branch" is: confirm Phase 1 with concrete checks (not another reviewer agent), land Phase 2, push.

Phases 3–13 (BullMQ queue, event-to-job pipeline, provider extraction, hook routing, MCP, compat, Docker, team auth, observability, final verification) are explicitly out of scope for this branch. The PR is already 167 files / 23.5K insertions. Continuing past Phase 2 here would make review impossible.

Phase 0: Documentation Discovery

Sources Read

  • plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md (parent plan, 987 lines, all 14 sections from Phase 0 through Phase 13)
  • PR_REORIENTATION_REPORT.md (660 lines) — independent inventory of committed + dirty surfaces
  • git status, git log --oneline -15, git diff --stat HEAD
  • Worktree: src/server/runtime/{ServerBetaService.ts,create-server-beta-service.ts,types.ts}
  • Worktree: src/storage/postgres/ — already in commit 4e0fc77a

Concrete Findings

  • Phase 1 (Postgres storage foundation) is committed in 4e0fc77a. Includes scoped addSource, transitionStatus, generation-job event append, FTS via generated content_search tsvector + GIN index, tenant-scoped uniqueness constraints, and 20 integration tests including the negative-scope mutation test.
  • Phase 2 (server runtime boundary) is implemented but uncommitted. Files match the parent plan's Phase 2 deliverables exactly: independent ServerBetaService, create-server-beta-service, disabled boundary types, .server-beta.{pid,port,runtime.json} paths, runtime labels in /api/health and /v1/info, server-beta CLI lifecycle, build-hooks split into a separate server-beta-service.cjs bundle, ephemeral-port test for /api/health and /v1/info.
  • Two doc artifacts (AGENTS.md, PR_REORIENTATION_REPORT.md) are also untracked. Decide before push.

Anti-Pattern Guards (carried from parent plan)

  • Do not spawn a third reviewer agent to "gate" Phase 1. The integration test suite plus the plan's grep checklist is the gate. Reviewer agents are a second opinion, not the primary gate.
  • Do not pull Phase 3+ work into this branch.
  • Do not amend 4e0fc77a to "tidy" Phase 1; create new commits.
  • Do not couple Phase 2 to WorkerService (the entire point of Phase 2 is independence).

Phase A: Re-Confirm Phase 1 Gate (Deterministic, No Reviewer Agent)

What To Run

  1. tsc --noEmit scoped to Postgres storage:
    bash
    bunx tsc --noEmit src/storage/postgres/*.ts
    
  2. Postgres integration suite (requires DATABASE_URL or local Postgres on default port):
    bash
    bun test tests/storage/postgres
    
  3. Anti-pattern greps (must all return zero matches in src/storage/postgres/):
    bash
    rg -n "UNIQUE\s*\(\s*source_type\s*,\s*source_id\s*,\s*job_type\s*\)" src/storage/postgres
    rg -n "UNIQUE\s*\(\s*observation_id\s*,\s*source_type\s*,\s*source_id\s*\)" src/storage/postgres
    
  4. Scoped-mutation grep (must show projectId/teamId parameters):
    bash
    rg -n "addSource|transitionStatus|append" src/storage/postgres
    

Verification Checklist

  • TypeScript clean.
  • All 20 Postgres integration tests pass, including the negative-scope mutation test.
  • Both anti-pattern greps return empty.
  • Scoped-mutation grep shows projectId/teamId in every signature.

Anti-Pattern Guards

  • Do not edit src/storage/postgres/*.ts in this phase. If Phase A fails, open a separate fix-up commit; do not amend 4e0fc77a.

Phase B: Land Phase 2 (Server Runtime Boundary)

What To Run

  1. Phase 2 independence grep — Server beta runtime must not import worker:
    bash
    rg -n "WorkerService|services/worker-service|worker/http" \
      src/server/runtime src/npx-cli/commands/server.ts
    
    Allowed: matches inside src/services/worker-service.ts itself (delegation back to server-beta is fine). Forbidden: any import inside src/server/runtime/.
  2. Server-beta service test:
    bash
    bun test tests/server/server-beta-service.test.ts
    
  3. CLI namespace test:
    bash
    bun test tests/npx-cli-server-namespace.test.ts
    
  4. Build verifies server-beta-service.cjs bundle is produced:
    bash
    npm run build-and-sync
    ls -la plugin/scripts/server-beta-service.cjs
    
  5. Smoke test independence:
    bash
    npx claude-mem server status      # before start
    npx claude-mem server start
    npx claude-mem server status      # running, runtime=server-beta
    curl -s http://127.0.0.1:$(cat ~/.claude-mem/.server-beta.port)/healthz
    curl -s http://127.0.0.1:$(cat ~/.claude-mem/.server-beta.port)/v1/info
    npx claude-mem server stop
    
    Worker start|stop|status must remain functional throughout.

Commit Layout

Two commits, in order:

  1. feat(server-beta): add independent runtime service

    • src/server/runtime/ServerBetaService.ts
    • src/server/runtime/create-server-beta-service.ts
    • src/server/runtime/types.ts
    • src/server/routes/v1/ServerV1Routes.ts (runtime label)
    • src/services/server/Server.ts (runtime option)
    • src/shared/paths.ts (.server-beta.{pid,port,runtime.json})
    • tests/server/server-beta-service.test.ts
  2. feat(server-beta): route CLI lifecycle and build a separate bundle

    • scripts/build-hooks.js (server-beta bundle output)
    • src/npx-cli/commands/runtime.ts (server-beta lifecycle commands)
    • src/npx-cli/commands/server.ts (CLI routing)
    • src/services/worker-service.ts (delegate server-start|stop|restart|status to sibling bundle)
    • tests/npx-cli-server-namespace.test.ts

Documentation References

  • Parent plan, lines 469–514: Phase 2 deliverables and verification checklist.
  • src/services/server/Server.ts: existing route-composition style to copy.
  • src/services/infrastructure/ProcessManager.ts: PID-file safety patterns.

Verification Checklist

  • All five Phase B steps pass.
  • Worker lifecycle still works while server-beta is running, and vice versa.
  • Two commits land cleanly with no --amend or force operations.

Anti-Pattern Guards

  • Do not import WorkerService from src/server/runtime/.
  • Do not overload worker PID/port files.
  • Do not boot worker as a background dependency of server-beta.
  • Do not silently fall back from server-beta to worker.

Phase C: Decide Doc Artifacts

What To Decide

FileRecommendationRationale
PR_REORIENTATION_REPORT.mdUse as PR body, then delete (or move to docs/internal/).It's a snapshot, not durable docs. Useful for the PR reviewer; rots in-tree.
AGENTS.mdRead first, then either commit (if generally useful guidance) or move under .scratch/.Decision depends on content.

Verification

  • Final git status shows only intended doc artifacts (or none).
  • .scratch/ is gitignored if used.

Anti-Pattern Guard

  • Do not push PR_REORIENTATION_REPORT.md to main as a doc; it has a date and a HEAD SHA, it ages immediately.

Phase D: Push and Open/Update PR

What To Run

  1. git push -u origin bullmq-vs-bee-queue-for-claude-mem-observation-que
  2. gh pr view --web (if PR exists) or gh pr create with body sourced from PR_REORIENTATION_REPORT.md.
  3. PR body must explicitly carve scope: "Includes Phase 1 + Phase 2 from plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md. Phases 3–13 are follow-ups on separate branches."

Verification Checklist

  • PR title is short (under 70 chars) and reflects scope: e.g., "Add Postgres storage + independent server-beta runtime (Phases 1–2)".
  • PR body lists out-of-scope phases.
  • CI is green.

Anti-Pattern Guards

  • Do not force-push to main.
  • Do not merge without CI green.

Phase E: Branch Closeout

Once the PR merges, this branch is done. Phase 3 (BullMQ-First Server Queue) starts on a fresh branch off main. Do not reuse this branch for Phase 3 work — keep the queue/runtime split visible in history.

Final Verification (cross-phase)

Run after Phases A–D:

bash
git status                                   # clean or only intended doc artifacts
git log --oneline origin/main..HEAD          # 4e0fc77a + Phase 2 commits, no force-push markers
bun test tests/storage/postgres tests/server tests/npx-cli-server-namespace.test.ts
rg -n "WorkerService|services/worker-service|worker/http" src/server/runtime
rg -n "PendingMessageStore|SessionQueueProcessor" src/server/runtime

Expected:

  • All three test paths green.
  • Both greps return zero matches.
  • Branch ready to merge.

Decisions Locked

  1. Phase 1 gate: orchestrator-managed deterministic checks (no reviewer agent).
  2. AGENTS.md + PR_REORIENTATION_REPORT.md: discard before commit.
  3. Scope: this branch ships Phases 1 + 2 + 3 (BullMQ-First Server Queue). Phase E becomes Phase 3 work, push moves to Phase F.

Phase D (revised): Discard Untracked Doc Artifacts

bash
rm AGENTS.md PR_REORIENTATION_REPORT.md

Verification: git status shows neither file.

Phase E: Implement Phase 3 — BullMQ-First Server Queue

Source: parent plan lines 515–570.

What To Implement

  • src/server/jobs/types.ts — job-shape types:
    • ServerGenerationJob (base)
    • GenerateObservationsForEventJob
    • GenerateObservationsForEventBatchJob
    • GenerateSessionSummaryJob
    • ReindexObservationJob
    • Every job carries team_id, project_id, source_type, source_id, generation_job_id. Event jobs add agent_event_id. Summary jobs add server_session_id. Reindex jobs add target observation ID or deterministic reindex scope ID.
  • src/server/jobs/job-id.ts — deterministic, colon-free job IDs (port the SHA-256-safe pattern from src/server/queue/BullMqObservationQueueEngine.ts).
  • src/server/jobs/ServerJobQueue.ts — thin wrapper around BullMQ Queue, Worker, QueueEvents. Use autorun: false, explicit concurrency: 1 default per lane, and an error listener on every Worker.
  • src/server/jobs/outbox.ts — durable outbox over ObservationGenerationJobRepository. Statuses: queued, processing, completed, failed, cancelled. Tracks attempts, last error, timestamps, and tenant/project/session IDs.
  • Startup reconciliation:
    • Re-enqueue rows in queued or stale processing.
    • Skip rows already completed.
    • Replace terminal BullMQ jobs before reusing deterministic IDs.
  • Wire queue health into /v1/info, /api/health, and claude-mem server status via the existing runtime label hook.
  • Activate the queue boundary in ServerBetaService (Phase 2 left it disabled). Provide a real adapter when CLAUDE_MEM_QUEUE_ENGINE=bullmq and REDIS_URL are present; keep the disabled adapter as the fallback.

Documentation References

Verification Checklist

Unit tests under tests/server/jobs/:

  • job-id.test.ts — deterministic IDs, no colons, stable across runs, content-derived.
  • server-job-queue.test.ts — Queue/Worker lifecycle, error listener attached, concurrency honored, autorun false.
  • outbox.test.ts — duplicate enqueue suppression, terminal job replacement, status transitions, attempt counting.

Integration tests under tests/server/queue-bootstrap/:

  • Start ServerBetaService with Postgres + Valkey + queue boundary enabled.
  • Insert outbox rows directly through ObservationGenerationJobRepository.
  • Enqueue fake jobs; restart before fake processing completes.
  • Assert reconciliation re-enqueues exactly once and outbox status reaches completed exactly once.
  • Assert Redis-down fails Server beta startup when CLAUDE_MEM_QUEUE_ENGINE=bullmq; no silent fallback to SQLite.

Greps:

bash
rg -n "Bull(MQ|Mq).*\.add\(" src/server/jobs        # uses BullMQ Queue.add
rg -n "autorun" src/server/jobs                     # workers explicitly set autorun
rg -n "on\(['\"]error" src/server/jobs              # error listener attached
rg -n ":job:|:obs:" src/server/jobs                 # NO colons in deterministic IDs

The colon-grep must return zero matches.

Anti-Pattern Guards

  • Do not treat BullMQ completed/failed state as canonical history — Postgres outbox is canonical.
  • Do not require event-route wiring or provider generation here (Phase 4 territory).
  • Do not allow duplicate processor side effects on retry — keep observation writes idempotent by deterministic key.
  • Do not use BullMQ Pro-only features (groups).
  • Do not leave pending work only in Redis.
  • Do not silently fall back from BullMQ to SQLite when CLAUDE_MEM_QUEUE_ENGINE=bullmq is set.

Commit Layout

Two commits:

  1. feat(server-beta): add BullMQ job queue primitives

    • src/server/jobs/types.ts
    • src/server/jobs/job-id.ts
    • src/server/jobs/ServerJobQueue.ts
    • src/server/jobs/outbox.ts
    • tests/server/jobs/*.test.ts
  2. feat(server-beta): activate queue boundary in runtime service

    • src/server/runtime/ServerBetaService.ts (queue boundary wiring)
    • src/server/runtime/create-server-beta-service.ts (boundary selection from env)
    • src/server/runtime/types.ts (active queue manager interface)
    • Health surface updates in /v1/info and /api/health if not already covered by Phase 2 runtime label.
    • tests/server/queue-bootstrap/*.test.ts

Phase F: Push and Open/Update PR

bash
git push -u origin bullmq-vs-bee-queue-for-claude-mem-observation-que
gh pr view --web   # if PR exists
# else:
gh pr create --title "Server-beta: Postgres storage + independent runtime + BullMQ queue (Phases 1–3)"

PR body must list:

  • Scope: Phases 1, 2, 3 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md.
  • Out of scope: Phases 4–13 (event-to-job pipeline, provider extraction, hook routing, MCP, compat, Docker, team auth, observability, final verification).

Verification Checklist

  • git status clean.
  • git log --oneline origin/main..HEAD shows all expected commits, no force-push markers.
  • CI green.

Final Cross-Phase Verification

bash
git status                                                            # clean
bun test tests/storage/postgres tests/server tests/npx-cli-server-namespace.test.ts
rg -n "WorkerService|services/worker-service|worker/http" src/server/runtime    # zero
rg -n "PendingMessageStore|SessionQueueProcessor" src/server/runtime src/server/jobs  # zero