plans/2026-04-25-cynical-deletion.md
Date: 2026-04-25
Branch: claude-mem-skill-invocation-and-github-issue-2139
Source: Triage of all 29 open issues for thedotmack/claude-mem applied with delete-first lens.
The codebase has accumulated defenders (orphan cleanup → duplicate detection → restart-port-stealing) and tolerators (silent JSON drops, drifted SQL/SSE filters, silent metadata drops). Each defender breeds two more bugs; each tolerator hides the bug it tolerates until it explodes as a "regression." The work is deleting the moats, not patching them.
| Phase | Action | Closes |
|---|---|---|
| P1 | DEL-1 + DEL-2: process-management theater + shell-string spawning | #2090, #2095, #2107, #2111, #2114, #2117, #2135, #2123, #2097 |
| P2 | DEL-9: observer-sessions trust boundary (CLAUDE_MEM_INTERNAL env) | #2126, #2118 |
| P3 | CON-2 + DEL-7: multi-account commit, port/path de-hardcoding | #2103, #2109, #2101 |
| P4 | CON-1: extend env sanitizer to proxy vars | #2115, #2099 |
| P5 | FF-1: fail-fast cleanup | #2089, #2094, #2116 |
| P6 | DEL-4 + DEL-5 + DEL-6 + DEL-8: small deletions | #2113, #2087, #2127, #2098, #2054 |
| P7 | #2106 install fixes (UX + shutdown-before-overwrite + uninstall coverage + real-port query) | #2106 |
| P8 | DEL-3 lite: pin chroma-mcp deterministically (full sqlite-vec migration deferred) | #2046, #2085, #2102 |
| P9 | Verification + close-as-dup/already-fixed | #2112, #2123→#2135, #2097→#2135, #2098→#2127, #2126 (closed by P2) |
child_process.spawn(cmd, [args], { detached, stdio, env }) — Node API used in ProcessManager.ts. Bun.spawn does NOT support detached:true (per process-registry.ts:633-639 comment). Use Node child_process for daemon spawning.Bun.spawn([args], { env }) — used for non-detached children (e.g. chroma-vector-sync.test.ts:25). Arg-array form bypasses shell on all platforms.Agent SDK query({ cwd, env, spawnClaudeCodeProcess }) — used by SDKAgent.ts:145-163 and KnowledgeAgent.ts:75-84. Custom spawnClaudeCodeProcess lets us inject env vars into the spawned claude subprocess.sanitizeEnv() from src/supervisor/env-sanitizer.ts — currently strips CLAUDE_CODE_* and CLAUDECODE_* (preserve list: CLAUDE_CODE_OAUTH_TOKEN, CLAUDE_CODE_GIT_BASH_PATH).SettingsDefaultsManager.get('CLAUDE_MEM_WORKER_PORT') — canonical port reader. Default: 37700 + (uid % 100).paths.ts exports: DATA_DIR, OBSERVER_SESSIONS_DIR, OBSERVER_SESSIONS_PROJECT, USER_SETTINGS_PATH, DB_PATH. All resolve under CLAUDE_MEM_DATA_DIR if set.-EncodedCommand and quoting heuristics are deletable once we stop building shell strings.|| true, silent JSON drops, .passthrough() schemas that drop fields). Fail loud or accept the input.session-init.ts:78 returns EXIT_CODE.SUCCESS on worker-unreachable. Verify against the issue's repro before patching.sanitizeEnv does NOT strip HTTP_PROXY/HTTPS_PROXY/NO_PROXY. Extend the sanitizer; don't add a passthrough knob (#2099).file-context.ts:184,196 truncation is intentional token economics. The bug is that the truncated Read return value confuses Claude into infinite Edit retries. Fix: don't return a partial Read result from a hook — emit an injected-context note instead, or let the full Read happen.Closes: #2090, #2095, #2107, #2111, #2114, #2117, #2135, #2123, #2097
aggressiveStartupCleanup() at src/services/infrastructure/ProcessManager.ts:659-727. Including:
ps -eo pid,etime,command | grep block (lines 607-644)AGGRESSIVE_CLEANUP_PATTERNS and AGE_GATED_CLEANUP_PATTERNS constantsORPHAN_MAX_AGE_MINUTES constantaggressiveStartupCleanup (grep for usage; expected: worker-service.ts startup)-EncodedCommand wrapper at ProcessManager.ts:944-1041. Replace with child_process.spawn(cmd, [args], { detached: true, stdio: 'ignore', windowsHide: true }). Arg-array form bypasses shell on Windows, no quoting needed. The setsid Unix wrapper stays (it's correct).worker-service.ts:1154-1175. Replace with: try httpShutdown(port) → if port still bound after 5s, log error and exit 1 (let user resolve). Don't loop. Don't kill PID by force. The user sees the error and acts.src/cli/worker-cli.js (or wherever the restart entry-point lives). Find the path that triggers duplicate detection on a restart command and remove it. The PID file owns the lock; restart should atomically swap.verifyPidFileOwnership() at process-registry.ts:160-182 and captureProcessStartToken() at lines 94-146 — these are correct. PID file with start-time token is exactly the OS-trust pattern we want.~/.claude-mem/worker.pid (or $DATA_DIR/worker.pid). This is the lock.waitForPortFree() with a short timeout — used to confirm shutdown completed. Stays.git grep -n aggressiveStartupCleanup → list every callsite. Delete the function and every callsite. Run npm run build-and-sync.ProcessManager.ts:944-1041:
child_process.spawn(execPath, args, { detached: true, stdio: 'ignore', windowsHide: true }).unref()setsid wrapper on Unix when available (process-group cleanup on parent death).worker-service.ts:1154-1175 restart case:
await httpShutdown(port)
const free = await waitForPortFree(port, 5000)
if (!free) {
console.error('Port still bound after shutdown. Resolve manually.')
process.exit(1)
}
removePidFile()
spawnDaemon(__filename, port)
session-init.ts:30-80. If yes, log "no-op" in plan execution notes. If the original repro still fires, add || true-equivalent at the hooks.json shell wrapper layer (NOT in the handler itself).git grep aggressiveStartupCleanup returns zero hits.git grep -E "EncodedCommand|powershell.*Start-Process" returns zero hits in src/.C:\Users\Alex Newman\) — confirm spawn works without quoting drama. Closes #2135/#2123/#2097.setsid check.Closes: #2126 (items 2, 3, 4, 6 by deletion; items 1, 5 by small fix), #2118
Replace the cwd === OBSERVER_SESSIONS_DIR discriminator pattern (which has to be repeated by every consumer and inevitably drifts) with a single env-var trust boundary.
Set the env var at every spawn site:
src/services/worker/SDKAgent.ts:113 (buildIsolatedEnv) — add CLAUDE_MEM_INTERNAL: '1' to the returned env.src/services/worker/knowledge/KnowledgeAgent.ts:73 — same.Agent SDK query() with env: isolatedEnv so the spawned claude subprocess inherits.Check the env var first in shouldTrackProject:
src/shared/should-track-project.ts:35-44 — first line of function: if (process.env.CLAUDE_MEM_INTERNAL === '1') return false;isWithin(cwd, OBSERVER_SESSIONS_DIR) check as a belt-and-braces fallback.Delete now-redundant filters:
src/services/worker/PaginationHelper.ts:115-117 — keep (UI hides observer rows; harmless).src/services/worker/PaginationHelper.ts:178 — change hardcoded string 'observer-sessions' to OBSERVER_SESSIONS_PROJECT const for consistency. Tiny fix.src/services/worker/SSEBroadcaster.ts:45-60 — add the SAME filter that SearchManager uses (SearchManager.ts:194). Don't invent a new one. Extract the filter predicate to a shared helper used by both. Closes #2118.#2126 item 1 (basename glob fix): Read the issue's exact bug. Likely EXCLUDED_PROJECTS matches by full path instead of basename. Fix in the matcher; one-liner.
#2126 item 5 (cleanup CLI): Extend src/services/infrastructure/CleanupV12_4_3.ts:185-205 to take a --dry-run and report counts. Don't write a new CLI; add the flag to existing.
SDKAgent, verify the spawned subprocess has CLAUDE_MEM_INTERNAL=1 in its env.shouldTrackProject('/any/path') with CLAUDE_MEM_INTERNAL=1 set returns false./api/events, trigger an observer session, confirm no observer events on the SSE stream.CLAUDE_MEM_OBSERVER_SESSION_DIR env override (#2126 item 2). CLAUDE_MEM_DATA_DIR already overrides; the observer dir is derived.Closes: #2103, #2109, #2101
Discovery showed multi-account is ~80% there: DATA_DIR is fully overridable, per-UID port already exists, PID files are DATA_DIR-relative. The remaining gap is 8 hardcoded 37777 literals + hooks.json bare-port assumption.
Eliminate every hardcoded 37777:
src/ui/viewer/constants/settings.ts:8 — change to read from settings/env at runtime if possible; otherwise leave as build-time default (least bad).src/npx-cli/commands/runtime.ts:154, install.ts:545, uninstall.ts:109 — replace fallback with SettingsDefaultsManager.get('CLAUDE_MEM_WORKER_PORT').src/integrations/opencode-plugin/index.ts:97 — same. Read from settings.src/services/integrations/OpenClawInstaller.ts:171 — drop the default; require the caller to pass it.plugin/skills/timeline-report/SKILL.md:23,53 — replace literal with ${CLAUDE_MEM_WORKER_PORT:-37700} or instruct the skill to read from settings.json. Closes #2103.Fix hooks.json port handling for #2109:
plugin/hooks/hooks.json — every hook command needs to either (a) inherit the port from env or (b) read from settings.json. Update the bun-runner.js wrapper to do this once.node.exe. The bun-runner.js wrapper is the right place.Multi-account commit:
CLAUDE_MEM_DATA_DIR=/path/to/account-N per shell. All paths derive from it. Per-UID port collision is handled automatically.claude-mem profile use <name> that exports the right env vars (or just print the export command for user to eval).git grep -nE "37777" src/ plugin/ returns only the build-time default in settings.ts.CLAUDE_MEM_DATA_DIR values; both bind successfully on different ports; both have separate PID files; both serve separate SSE streams.--port flag to every CLI command. The env / settings.json owns it.Closes: #2115, #2099
src/supervisor/env-sanitizer.ts — extend ENV_PREFIXES and/or add a PROXY_VARS set that strips:
HTTP_PROXY, HTTPS_PROXY, ALL_PROXY, NO_PROXY (and lowercase variants)npm_config_proxy, npm_config_https_proxyclaude for internal AI calls; the user's proxy config should not bleed in.HTTPS_PROXY=http://bad-proxy:1234 in the worker shell. Spawn an SDK subprocess. Confirm the subprocess's env does NOT contain HTTPS_PROXY. Add a test for this.git grep -n "HTTP_PROXY\|HTTPS_PROXY" shows the sanitizer is the only place that knows about them.Closes: #2089, #2094, #2116. #2118 is closed by P2.
src/cli/stdin-reader.ts:156-164 — onEnd resolves with undefined even on parse failure. Change to: if input is non-empty AND parse fails, throw or call the safety-timeout error path. Match what the issue asks for: distinguish "no input" from "malformed input." Document in the function header.
src/cli/handlers/file-context.ts:141-143, 184, 196 — the truncation is intentional (token economics), but returning a truncated Read result confuses Claude. Fix:
additionalContext or skip entirely.{ continue: true } (let the real Read happen) or inject context via additionalContext field./api/memory/save silently drops metadatasrc/services/worker/http/routes/MemoryRoutes.ts:16-20, 38-67 — the schema uses .passthrough() which keeps unknown fields, but discovery suggests fields are dropped at insert time. Audit:
text/title/project are in the INSERT statement, the metadata is dropped silently.metadata JSON column, or reject requests with unknown fields (.strict() instead of .passthrough()). Pick one. Default: accept into a JSON column.MemoryRoutes.ts:40 (const targetProject = project || this.defaultProject) is fine. It uses caller's value if provided. Verify the issue reporter wasn't omitting project field.POST /api/memory/save with metadata: { foo: 'bar' } — confirm the data is retrievable.scripts/build-hooks.js:163-171, 203-230, 294 — add 'zod' to the external list for hook builds.typeof x.foo === 'string' etc.). Don't reach for Zod for hook input.src/hooks/ for Zod imports; replace with hand-rolled checks.worker-service.cjs) can still bundle Zod — the conflict is only in hook-bundled scripts loaded by OpenCode.Verification: node -e "require('./plugin/scripts/<hook>.js')" shows no Zod in the bundle. Run with OpenCode hook environment; #2113's TypeError doesn't reproduce.
src/services/worker/GeminiAgent.ts:130-132 — delete setFallbackAgent.src/services/worker/GeminiAgent.ts:365 — delete the if (this.fallbackAgent) branch. On 429: log + throw.src/services/worker/OpenRouterAgent.ts:79-81 — same.tests/gemini_agent.test.ts:279, 313 — delete the fallback tests; add an explicit "429 throws" test.MAX_SESSION_WALL_CLOCK_MS (likely src/services/worker/sessions/SessionManager.ts or similar). Read the surrounding code: what does the timeout do? (Likely cleanup of stale sessions.)installCLI() alias (#2054)plugin/scripts/smart-install.js:345-395 — delete installCLI function.plugin/scripts/smart-install.js:633 — delete the call.src/npx-cli/commands/uninstall.ts — add a one-time legacy-alias-strip pass:
~/.bashrc, ~/.zshrc, ~/Documents/PowerShell/Microsoft.PowerShell_profile.ps1.^alias claude-mem= or ^function claude-mem.npx claude-mem <cmd> and bunx claude-mem <cmd>.Verification: Fresh install creates no shell-config mutations. Existing user with the alias runs uninstall — alias is gone. which claude-mem after uninstall returns nothing.
Closes: #2106 (items 1, 3, 4, 6 by fix; items 2, 7 by close-as-already-fixed/insufficient-detail; item 5 by documentation).
src/npx-cli/commands/install.ts:275-277 — change initialValues: detected.filter(...).map(...) to initialValues: []. Force explicit opt-in.uninstall.ts:109-132 (HTTP shutdown + poll) to src/services/install/shutdown-helper.ts. Call it from both uninstall.ts and install.ts before copyPluginToMarketplace.src/npx-cli/commands/uninstall.ts — add removal of:
~/.npm/_npx/*/node_modules/claude-mem~/.cache/claude-cli-nodejs/*/mcp-logs-plugin-claude-mem-*~/.claude/plugins/data/claude-mem-thedotmack/install.ts:545 — after smart-install.js completes, hit http://127.0.0.1:<settingsPort>/api/health and report the actually-bound port. If health fails, just print "worker not yet ready" and exit cleanly.~/.claude-mem will be recreated by active hooks."ba37b2b2/68e92edc.find ~/.npm ~/.cache ~/.claude -name "*claude-mem*" returns empty.Closes: #2046, #2085, #2102
Full sqlite-vec migration is a separate plan (would require replacing the embedding pipeline currently owned by chroma-mcp's bundled SBERT). For this plan: stop using uvx --with flags ad-hoc and pin chroma-mcp to a specific version with locked deps.
src/services/sync/ChromaMcpManager.ts:200-244 — change buildCommandArgs() to invoke a specific pinned version: uvx --python 3.11 chroma-mcp==<X.Y.Z> (pick a known-good version that bundles its own deps).--with httpcore --with httpx ONLY if the pinned version requires them. Verify by running the pinned command in a clean uvx cache. If the deps are declared properly upstream, the --with flags are unnecessary.05114bec added transport cleanup on timeout, stale onclose handler guard, and 10s reconnect backoff. Read ChromaMcpManager.ts to confirm these are still present.ChromaSync.ts. Not in this plan.~/.claude-mem/chroma/ populates, chroma_query_documents returns results without errors.git grep -nE "aggressiveStartupCleanup|EncodedCommand|setFallbackAgent|installCLI" — all return zero hits.git grep -nE "37777" src/ plugin/ — only the build-time default in viewer/constants/settings.ts.npm run build-and-sync completes; worker starts; SessionStart context injection works (manual test: open a new session, confirm memory recap appears).All other issues close as part of their respective phase verification.
P1 first (highest leverage; closes 9 issues; reverses regression treadmill). Then P2 (single trust boundary closes 2 issues + prevents future leaks). P3-P8 are independent and can run in parallel by different sessions. P9 last.
If time-constrained, the high-value subset is P1 + P2 + P5: kills the two structural patterns (defenders, tolerators) plus the trust-boundary leak. That alone closes 14 of 29 issues with mostly deletions.