Back to Claude Mem

Installer Failure Transparency — Cross-IDE Matrix

plans/04-installer-transparency.md

13.2.047.0 KB
Original Source

Installer Failure Transparency — Cross-IDE Matrix

Goal: Stop the universal installer (npx claude-mem install) from silently swallowing real failures and falsely reporting "installed successfully" on all 12 IDEs. Convert every error-suppression site to a single installerError(severity, ctx) decision point driven by an explicit taxonomy. Make tree-sitter ERESOLVE conflicts and missing uv fail loudly with platform-specific remediation. Add a 12-IDE × 4-failure-mode validation matrix and CI postinstall regression guards inspired by the v12.6.2 tree-sitter-swift fix.

Net effect:

  • "Installation Complete" is only printed when every ABORT-level dependency was satisfied. Partial outcomes get a yellow "Installation Partial" headline with a remediation block.
  • runNpmInstallInMarketplace() runs strict first; --legacy-peer-deps is only applied on a confirmed ERESOLVE token, with the fallback announced loudly.
  • Missing uv after auto-install attempt = ABORT with platform-specific instructions surfaced as the primary message (not buried under a wrapped "version probe failed" line). When the user has opted out of vector search, downgrade to WARN_CONTINUE.
  • Postinstall regression guard: any new transitive dep with scripts.postinstall or scripts.install that is not in an explicit allowlist fails the build, preventing a re-run of the v12.6.1 tree-sitter-swift hang.
  • Cross-IDE test matrix: 12 IDEs × 4 scenarios (happy / ERESOLVE / missing uv / missing bun) = 48 cells, each asserting exit code, summary text, and remediation presence.

Out of scope (defer to follow-up plans):

  • Replacing bun-runner.js (its own runtime concerns; tracked in plans/2026-04-29-installer-streamline.md).
  • Re-architecting bufferConsole to a structured event stream (this plan only fixes the data loss; full streaming UX is later).
  • Internationalizing the new remediation messages (English-only for now).
  • Migrating openclaw/install.sh from bash to TypeScript (audit only; remediation is in-place hardening).

Problem Statement (with line citations)

Concrete swallowed errors that exist today:

#FileLine(s)Current behaviorWhy it matters
1src/npx-cli/commands/install.ts1126–1135Catches every npm install error, prints console.warn, returns the misleading task message Dependencies may need manual install ⚠. The surrounding install still ends with installed successfully!.A genuine ERESOLVE (or any npm crash) becomes a yellow tip the user immediately ignores.
2src/npx-cli/commands/install.ts565–581runNpmInstallInMarketplace always uses npm install --omit=dev --legacy-peer-deps. The flag papers over real peer conflicts unconditionally.The next time a tree-sitter peer range tightens, --legacy-peer-deps will quietly install a broken tree, and we'll only see runtime failures.
3src/npx-cli/install/setup-runtime.ts206–219If getUvVersion() returns null after auto-install, throws "uv installed but version probe failed." runInstallCommand does not wrap this with platform-specific instructions; the user sees the wrapped error during a clack spinner that may overwrite it.Honors CLAUDE.md's "uv auto-installed if missing" promise on the happy path but degrades to a confusing one-liner on failure.
4src/npx-cli/commands/install.ts163–169, 328–347Per-IDE failures push into pendingErrors[] via bufferConsole (lines 43–64). installStatus (line 1197) only reads failedIDEs.length > 0, so an IDE that throws after bufferConsole returns 0 is invisible. The summary line "Failed: …" is the only signal.A single failed IDE produces a yellow note that scrolls off-screen above the green "installed successfully!" outro.
5src/npx-cli/commands/install.ts1131console.warn('[install] npm install error:', …) — error is logged but not classified, retried, or surfaced in the summary.Same root cause as #1: stderr disappears, exit code stays 0.
6src/npx-cli/commands/install.ts1161–1166disableClaudeAutoMemory failures classified as "WARN_CONTINUE" today (correct severity), but the implementation is ad-hoc.Inconsistent — every other catch in this file uses different logging shapes.
7openclaw/install.sh36 occurrences of 2>/dev/null / || true (e.g. lines 169, 224–229, 251, 255, 289, 293, 405, 435, 471, 495, 572, 612, 631, 670, 1076, 1155, 1161, 1185)Bash-level error suppression on curl/jq/find/health-check pipelines. Many are correct (best-effort probes), but several mask genuine install failures.Some || true patterns hide a missing bun or unwritable plugin dir.
8src/services/integrations/*.ts50+ catch blocks across 7 files (Codex, Cursor, Gemini, OpenCode, OpenClaw, Windsurf, MCP)Each integration installer has its own ad-hoc error handling. Errors return non-zero, are buffered by bufferConsole, then dropped.The IDE matrix has 12 different failure UX paths.
9scripts/build-hooks.jsGenerates plugin/package.json with all tree-sitter deps and trustedDependencies: ['tree-sitter-cli']. No CI guard prevents adding a new package with scripts.postinstall outside this allowlist.The exact root cause of v12.6.1 — re-runnable by anyone editing this file.

Reference incident (canonical learning)

CHANGELOG.md:93–110 documents v12.6.1 → v12.6.2: PR #2300 moved 21 tree-sitter grammars from devDependencies to dependencies; tree-sitter-swift's postinstall pulled a nested tree-sitter-cli that downloaded a Rust binary and SIGINT'd. Lesson: npm does not honor trustedDependencies (Bun-only). Any new transitive dep with a network postinstall can hang npx claude-mem install. Phase 7 turns this into a CI guard.


Phase 0 — Documentation Discovery

Each implementation phase below cites these facts by line number; do not re-derive.

Allowed APIs / patterns to copy

ItemLocationWhat to copy
Existing clack runTasks / bufferConsole patternsrc/npx-cli/commands/install.ts:32–64Tasks return a string; orchestrator handles spinner. Reuse, but route every error through installerError.
describeExecError (stdout/stderr extractor)src/npx-cli/install/setup-runtime.ts:100–112Already canonical for child_process errors. Move to a shared module.
Marker write pattern for partial statesrc/npx-cli/install/setup-runtime.ts:262–275Use the same JSON shape ({ severity, component, phase, cause, …}) for the new ~/.claude-mem/last-install-error.json.
Plugin-cache resolutionsrc/npx-cli/utils/paths.ts (pluginCacheDirectory, marketplaceDirectory)All path resolution must honor CLAUDE_MEM_DATA_DIR; reuse instead of inventing.
Existing IDE list (canonical 12)src/npx-cli/commands/ide-detection.ts:40–129claude-code, gemini-cli, opencode, openclaw, windsurf, codex-cli, cursor, copilot-cli, antigravity, goose, roo-code, warp.
trustedDependencies allowlist (postinstall guard)scripts/build-hooks.js:106–108 and root package.json:190–202The pattern Phase 7 enforces.
Existing install tests (extend, don't replace)tests/install-non-tty.test.ts, tests/setup-runtime.test.ts, tests/install-disable-auto-memory.test.tsSame harness shape (mocked spawn, isolated TMPDIR HOME).
Docker harness (clean Linux)Dockerfile.test-installerAlready supports running install with no bun/uv preinstalled. Phase 6 forks this for the matrix runner.
CLAUDE.md exit-code contractCLAUDE.md "Exit Code Strategy" sectionHooks: exit 0 = success, 1 = non-blocking, 2 = blocking. Installer is NOT a hook — it can exit 1 or 2 for ABORT. Phase 8 cross-references.
Prior plan formatplans/2026-04-29-installer-streamline.md, plans/2026-04-30-onboarding-ux-overhaul.mdPhased layout, file inventory, anti-patterns table.
v12.6.2 incident textCHANGELOG.md:93–110Phase 7 quotes this verbatim in code comments.

External facts (cited)

TopicSource / canonical referenceKey fact
npm ERESOLVE semanticsnpm install docs (npm v10+) and npm RFC 0023ERESOLVE is emitted on stderr with a deterministic prefix npm error code ERESOLVE followed by While resolving: block. --legacy-peer-deps skips peer-dep resolution; --force accepts conflicting trees. They are NOT equivalent — --force is more aggressive and is not what we want.
Bun install errorsbun install source / docsStderr lines start with error:. A peer-dep violation prints error: package "X" has unmet peer "Y". A network failure prints error: failed to resolve.
uv install script return codeshttps://astral.sh/uv/install.shExits 0 on success even when binary lands in a non-PATH dir (e.g. ~/.local/bin not yet on PATH). The version probe must check UV_COMMON_PATHS after the script runs.
Claude Code hook exit-code contractCLAUDE.md "Exit Code Strategy"Worker/hook errors exit 0 (Windows Terminal hygiene). The npx claude-mem install CLI is NOT a hook and is allowed to exit non-zero on ABORT.

Anti-patterns / API methods that DO NOT exist (avoid inventing)

  • There is no central installerError function today. Phase 3 must create it. Do not reach for a non-existent helper.
  • --force is not a substitute for --legacy-peer-deps. Phase 4 must not "upgrade" the fallback to --force — that masks more than ERESOLVE.
  • npm has no --no-postinstall flag at the CLI level. The correct flag is --ignore-scripts. Don't invent.
  • Bun's trustedDependencies is not honored by npm. Do not assume the same allowlist works for both. Phase 7 enforces a separate npm-level guard.
  • process.exitCode = 1 (line 1324 of install.ts) does not abort an in-flight await chain. Phase 3's InstallAbortError must throw, not just set exitCode.
  • The bufferConsole wrapper (install.ts:43–64) swallows stderr inside the buffer; do not assume stderr ever reaches the terminal in non-interactive mode unless explicitly flushed.
  • clack's p.spinner() overwrites the line on .stop(). Errors emitted via console.warn during a spinner are lost. Phase 3's WARN_CONTINUE must enqueue to a summary list, not log live.
  • ensureUv() already throws on failure — but the throw is caught one level up by clack's task runner, which displays the message in a single line. Do not assume the user reads it; Phase 5 must add an explicit ABORT block.
  • The install/public/install.sh and install/public/installer.js files are already deprecated stubs (verified — both just print "use npx claude-mem install"). Don't waste audit time on them.
  • openclaw/install.sh is the active shell installer (1653 lines). It has its own bash-level audit in Phase 1.

File inventory

FileLinesDisposition
src/npx-cli/commands/install.ts1371Edited heavily (Phase 1, 3, 4, 5)
src/npx-cli/install/setup-runtime.ts288Edited (Phase 5, 7)
src/npx-cli/install/error-taxonomy.tsNEWCREATED (Phase 2)
src/npx-cli/install/error-reporter.tsNEWCREATED (Phase 3)
src/services/integrations/CodexCliInstaller.ts~360Edited (Phase 3) — every catch routed to installerError
src/services/integrations/CursorHooksInstaller.ts~530Edited (Phase 3)
src/services/integrations/GeminiCliHooksInstaller.ts~310Edited (Phase 3)
src/services/integrations/OpenCodeInstaller.ts~250Edited (Phase 3)
src/services/integrations/OpenClawInstaller.ts~260Edited (Phase 3)
src/services/integrations/WindsurfHooksInstaller.ts~395Edited (Phase 3)
src/services/integrations/McpIntegrations.ts~220Edited (Phase 3)
openclaw/install.sh1653Audited and selectively hardened (Phase 1)
scripts/build-hooks.js~250Edited (Phase 7) — postinstall allowlist guard
scripts/check-postinstall-allowlist.jsNEWCREATED (Phase 7) — pre-publish CI script
tests/install-error-matrix.test.tsNEWCREATED (Phase 6) — 12 × 4 matrix
tests/install-non-tty.test.ts277Extended (Phase 6)
tests/setup-runtime.test.ts135Extended (Phase 5)
Dockerfile.test-installer-matrixNEWCREATED (Phase 6)
docs/public/troubleshooting.mdxNEW or extendedEdited (Phase 8)
CLAUDE.md "Exit Code Strategy"ExistingEdited (Phase 8) — cross-reference taxonomy
CHANGELOG.mdDO NOT EDIT — generated automatically per CLAUDE.md

Phase 1 — Audit every error-suppression pattern

Goal: Produce a definitive table of every catch, || true, 2>/dev/null, and try {} catch {} in installer paths. Every row gets a proposed Phase 2 classification (ABORT / FAIL_LOUD_PER_IDE / WARN_CONTINUE / SILENT_RETRY).

Deliverable: plans/audit-installer-errors.csv (committed alongside this plan), with columns: file, line, kind (catch | bash-or-true | bash-redirect), current_behavior, proposed_severity, proposed_remediation_text, notes.

What to audit (exact greps)

Run these greps from repo root and turn every hit into a row:

bash
# TS catch blocks
grep -nE 'catch\s*(\(|\{)' src/npx-cli/ src/services/integrations/ -r

# TS empty catch
grep -nB1 'catch\s*\{\s*\}' src/npx-cli/ src/services/integrations/ -r

# TS console.warn after caught error
grep -nE 'catch.*\{' src/npx-cli/ src/services/integrations/ -r -A 3 | grep -A 0 'console\.warn\|log\.warn'

# Shell silent failures
grep -nE '\|\| true|2>/dev/null|2>&1.*\|\|' openclaw/install.sh

# Build / sync scripts
grep -nE 'catch|process\.exit\(0\)' scripts/build-hooks.js scripts/sync-marketplace.cjs

# Plugin hooks
grep -nE 'catch|exit 0' plugin/scripts/version-check.js plugin/scripts/bun-runner.js

Known counts (from the initial audit baked into this plan)

  • src/npx-cli/commands/install.ts: 14 catch blocks (lines 387, 393, 406, 455, 596, 613, 631, 725, 980, 1056, 1131, 1161, 1243, 1252).
  • src/npx-cli/install/setup-runtime.ts: 5 catch blocks (lines 38, 60, 73, 95, 233).
  • src/services/integrations/CursorHooksInstaller.ts: 8 catch blocks.
  • src/services/integrations/CodexCliInstaller.ts: 8 catch blocks.
  • src/services/integrations/WindsurfHooksInstaller.ts: 9 catch blocks.
  • src/services/integrations/OpenCodeInstaller.ts: 8 catch blocks.
  • src/services/integrations/OpenClawInstaller.ts: 4 catch blocks.
  • src/services/integrations/GeminiCliHooksInstaller.ts: 4 catch blocks.
  • src/services/integrations/McpIntegrations.ts: 2 catch blocks.
  • scripts/sync-marketplace.cjs: 6 catch blocks (line 28, 75, 90, 101, 111, 188, 220).
  • scripts/build-hooks.js: 1 catch block (line 422).
  • openclaw/install.sh: 36 || true / 2>/dev/null patterns.

Audit total ≈ 105 sites. Each row in the CSV must end with a Phase 2 severity proposal.

Verification checklist

  • CSV row count ≥ 100 (matches grep counts above ± 5%).
  • Every row has a non-empty proposed_severity.
  • No row has proposed_severity = SILENT — that severity does not exist; the closest valid choice is SILENT_RETRY.
  • CSV is committed at plans/audit-installer-errors.csv and referenced from this plan.

Anti-pattern guards

  • Do not classify "this catch logs a warning today" as "WARN_CONTINUE" automatically. Read each one and decide. Some are genuine ABORTs masquerading as warnings.
  • Do not classify any 2>/dev/null on a curl health probe as ABORT — health probes are best-effort by design.
  • Do not mark installClaudeCode() (line 416–462) failures as ABORT; the user explicitly opted into "install Claude Code now" and a failure should be FAIL_LOUD with manual remediation, not abort the install.

Phase 2 — Define error taxonomy

Goal: Single source-of-truth typed enum + lookup table that classifies every installer error and prescribes a remediation string.

File to create: src/npx-cli/install/error-taxonomy.ts

What to implement

Copy the structure from this skeleton (paraphrased; do not edit copy verbatim — adapt to actual TypeScript types in the repo):

typescript
export enum ErrorSeverity {
  ABORT = 'ABORT',                       // exit 1, do not continue
  FAIL_LOUD_PER_IDE = 'FAIL_LOUD_PER_IDE', // exit 1 if all IDEs fail; otherwise partial summary
  WARN_CONTINUE = 'WARN_CONTINUE',         // print warning to end-of-install summary, continue
  SILENT_RETRY = 'SILENT_RETRY',           // retry once with backoff; escalate to WARN_CONTINUE
}

export interface ErrorCategory {
  id: string;                             // 'tree-sitter-eresolve', 'uv-missing', etc.
  severity: ErrorSeverity;
  match: (cause: unknown, ctx: { component: string; phase: string }) => boolean;
  remediation: (ctx: { platform: NodeJS.Platform; dataDir: string }) => string;
}

export const ERROR_CATEGORIES: ErrorCategory[] = [ /* see seed list below */ ];

Seed taxonomy (the categories Phase 3 must implement)

idSeverityMatch heuristicRemediation summary
bun-missing-after-installABORTcause.message.includes('Bun executable not found')"Install Bun manually then re-run npx claude-mem install. macOS/Linux: curl -fsSL https://bun.sh/install | bash. Windows: winget install Oven-sh.Bun."
uv-missing-after-installABORT (downgradable to WARN_CONTINUE if user opted out of vector search — see Phase 5)cause.message.includes('uv executable not found') || cause.message.includes('uv installed but version probe failed')Platform-specific block from installUv() (lines 164–166) surfaced as primary message.
tree-sitter-eresolveABORT (after one retry with --legacy-peer-deps)stderr contains literal ERESOLVE AND --legacy-peer-deps retry also failed"ERESOLVE conflict in marketplace deps that --legacy-peer-deps could not resolve. Open an issue at https://github.com/thedotmack/claude-mem/issues with the conflicting peer ranges below: <details>."
bun-install-network-failSILENT_RETRY → WARN_CONTINUEbun stderr error: failed to resolve for a known package on first try, repeated on retry"bun install failed to resolve packages — check network connectivity and re-run npx claude-mem install. Cached packages in ~/.bun/install/cache will be reused."
marketplace-dir-not-writableABORTEACCES/EPERM on mkdirSync / writeFileSync to marketplaceDirectory()"Cannot write to marketplace directory ${dataDir}/.claude/plugins/.... Check filesystem permissions or set CLAUDE_MEM_DATA_DIR to a writable path."
plugin-json-corruptABORTJSON.parse error on plugin.json"Existing plugin.json is corrupt. Run rm -rf ~/.claude/plugins/marketplaces/thedotmack and re-run install."
all-ides-failedABORTfailedIDEs.length === selectedIDEs.length && selectedIDEs.length > 0"Every selected IDE integration failed. See per-IDE errors above. Re-run with --ide=<single> to isolate."
single-ide-failedFAIL_LOUD_PER_IDEper-IDE installer non-zero exitEcho first 20 lines of stderr + "Run npx claude-mem install --ide=<name> to retry just this IDE."
mcp-integration-optional-failWARN_CONTINUEMCP installer non-zero AND IDE has alternate (non-MCP) integration path"MCP setup for ${ide} failed; non-MCP features still work. Run npx claude-mem mcp ${ide} later."
path-update-failedWARN_CONTINUEapplyClaudeCodePathSetupIfNeeded write fails"Could not auto-update PATH in ${configFile}. Run manually: echo '...' >> ${configFile}."
auto-memory-toggle-failedWARN_CONTINUEdisableClaudeAutoMemory throws"Could not disable Claude Code auto-memory. Add CLAUDE_CODE_DISABLE_AUTO_MEMORY=1 to ~/.claude/settings.json env block."
version-probe-transientSILENT_RETRY → WARN_CONTINUEbun/uv --version returns non-zero once(no message on first try; on retry: "Could not verify ${tool} version — installation likely OK.")
idempotent-json-merge-raceSILENT_RETRYEEXIST/ENOENT race during writeJsonFileAtomic retry(silent; retry once.)
child-process-timeoutABORTspawnSync/execSync timeout (Phase 7's wrapper)"${command} did not finish in ${timeout}s. Check network connectivity. If the host is slow, set CLAUDE_MEM_INSTALL_TIMEOUT_MS."

Verification checklist

  • error-taxonomy.ts exports ErrorSeverity, ErrorCategory, ERROR_CATEGORIES.
  • ERROR_CATEGORIES contains exactly the 14 rows above (extensions allowed).
  • Every category's remediation() reads dataDir from a passed-in context, not from process.env directly (so multi-account setups work — see CLAUDE.md "Multi-account").
  • npm run typecheck passes.

Anti-pattern guards

  • Do not include a SILENT severity (no remediation, no log). It does not exist in this taxonomy.
  • Do not hard-code ~/.claude-mem paths in remediation strings. Always interpolate dataDir.
  • Do not add a category for "unknown error" with low severity. Unknown errors must default to ABORT until classified — fail loud is the safe default.

Phase 3 — Implement installerError(severity, ctx) central handler

Goal: Single function every catch in installer paths must call. ABORTs throw a typed error; WARN_CONTINUEs enqueue to a summary list; SILENT_RETRYs re-invoke the wrapped action.

Files to create: src/npx-cli/install/error-reporter.ts

What to implement

Skeleton (adapt to actual repo conventions; do not paste verbatim):

typescript
export class InstallAbortError extends Error {
  readonly category: ErrorCategory;
  readonly remediation: string;
  readonly cause: unknown;
}

export interface ErrorContext {
  component: string;       // 'cursor', 'codex-cli', 'marketplace-npm-install', 'uv-install', etc.
  phase: string;           // 'setup-runtime', 'ide-install', 'marketplace-deps', etc.
  cause: unknown;
  remediation?: string;    // optional override; default from taxonomy
  eresolveDetails?: string; // raw stderr block to surface verbatim
}

export interface InstallSummary {
  warnings: Array<{ component: string; message: string; remediation: string }>;
  failedIDEs: string[];
  retryCount: Record<string, number>;
}

export function createInstallSummary(): InstallSummary;

export function installerError(
  severity: ErrorSeverity,
  ctx: ErrorContext,
  summary: InstallSummary
): never | void;

export async function withRetry<T>(
  action: () => Promise<T>,
  ctx: ErrorContext,
  summary: InstallSummary,
  maxAttempts: number = 2
): Promise<T>;

export function flushSummary(summary: InstallSummary, isInteractive: boolean): void;

Behavior contract

SeverityBehavior
ABORTWrite ~/.claude-mem/last-install-error.json (path resolved via pluginCacheDirectory / CLAUDE_MEM_DATA_DIR), print remediation block to stderr (ANSI-colored only when process.stderr.isTTY), throw InstallAbortError with cause chained. The top-level runInstallCommand catches InstallAbortError, prints the headline "Installation Aborted: <category.id>", and process.exit(1).
FAIL_LOUD_PER_IDEAppend to summary.failedIDEs, append a remediation block to summary.warnings. Continue. The top-level summary prints "Installation Partial" (red, not green). Exits 1 only if all IDEs fail (which then triggers all-ides-failed ABORT).
WARN_CONTINUEAppend to summary.warnings. Do not log live (clack spinner would clobber). flushSummary prints all warnings after the spinner / outro.
SILENT_RETRYIncrement summary.retryCount[component]. If count > 1, escalate to WARN_CONTINUE. Caller uses withRetry helper to wrap the action.

Refactor every audited catch

For each row in plans/audit-installer-errors.csv produced by Phase 1, replace the existing handler with a call to installerError(severity, ctx, summary). Before/after example:

Before (install.ts:1126–1135):

typescript
try {
  runNpmInstallInMarketplace();
  return `Dependencies installed ${pc.green('OK')}`;
} catch (error: unknown) {
  console.warn('[install] npm install error:', error instanceof Error ? error.message : String(error));
  return `Dependencies may need manual install ${pc.yellow('!')}`;
}

After:

typescript
try {
  await runNpmInstallInMarketplace();  // Phase 4: now async w/ ERESOLVE handling
  return `Dependencies installed ${pc.green('OK')}`;
} catch (error: unknown) {
  installerError(ErrorSeverity.ABORT, {
    component: 'marketplace-npm-install',
    phase: 'marketplace-deps',
    cause: error,
  }, summary);
  // installerError throws — unreachable, but TypeScript needs a return
  return '';
}

Rework bufferConsole

src/npx-cli/commands/install.ts:43–64 currently swallows stderr into a string buffer and only surfaces it via pendingErrors. After this phase:

  • A non-zero result from the wrapped function must preserve the stderr verbatim in the returned object (already does).
  • setupIDEs (lines 328–347) must call installerError(FAIL_LOUD_PER_IDE, …) with eresolveDetails: output.slice(0, 4000) (first ~80 lines).
  • The IDE summary block must show the exit code + first 20 lines of stderr verbatim, not a generic "X failed" line.

Top-level wiring

In runInstallCommand (install.ts:961), thread summary through:

  1. Create summary at the top.
  2. Pass to setupIDEs, every runTasks task, ensureBun/ensureUv, runNpmInstallInMarketplace.
  3. After all tasks, call flushSummary(summary, isInteractive) before the existing p.note(summaryLines, installStatus).
  4. Wrap the entire body in try { … } catch (e) { if (e instanceof InstallAbortError) { … print + exit 1 } else throw }.

Verification checklist

  • grep -rE 'console\.warn\(.*install' src/npx-cli/ src/services/integrations/ returns 0 hits (all warnings go via installerError).
  • grep -rE 'catch.*\{[^}]*//.*ignore' src/npx-cli/ src/services/integrations/ returns 0 hits.
  • Every catch in the Phase 1 CSV has been edited (verify by line-number cross-check).
  • New unit test: ABORT throws InstallAbortError, WARN_CONTINUE appends to summary, SILENT_RETRY escalates after 2 attempts.
  • npm run typecheck passes.
  • npm run test passes (existing tests must keep passing — refactor must be behavior-preserving on the happy path).

Anti-pattern guards

  • Do not call process.exit() directly inside installerError — throw InstallAbortError so the top-level handler can flush the summary and print a coherent outro.
  • Do not print warnings live during a clack spinner. Always enqueue to summary.warnings and flush at the end.
  • Do not introduce a new global module. summary is an explicit parameter (testability).
  • Do not silence the stack trace inside InstallAbortError — Node's default stack is fine; the user wants debug info.

Phase 4 — tree-sitter ERESOLVE detection and explicit handling

Goal: Replace the unconditional --legacy-peer-deps with strict-first, fall-back-on-confirmed-ERESOLVE-only.

File to edit: src/npx-cli/commands/install.ts:565–581

What to implement

Rewrite runNpmInstallInMarketplace:

typescript
async function runNpmInstallInMarketplace(summary: InstallSummary): Promise<void> {
  const marketplaceDir = marketplaceDirectory();
  const packageJsonPath = join(marketplaceDir, 'package.json');
  if (!existsSync(packageJsonPath)) return;

  // Phase 7: --ignore-scripts is the default. The 12.6.2 incident proved that
  // any new transitive dep with a postinstall (e.g. tree-sitter-swift's
  // tree-sitter-cli download) can hang `npx claude-mem install`.
  const baseFlags = ['install', '--omit=dev', '--ignore-scripts'];

  const strictResult = await runNpmStrict(marketplaceDir, baseFlags);
  if (strictResult.code === 0) return;

  const stderr = strictResult.stderr ?? '';
  const isEresolve = /\bERESOLVE\b/.test(stderr) || /code ERESOLVE/.test(stderr);
  if (!isEresolve) {
    installerError(ErrorSeverity.ABORT, {
      component: 'marketplace-npm-install',
      phase: 'marketplace-deps',
      cause: new Error(`npm install failed (exit ${strictResult.code})`),
      eresolveDetails: stderr.slice(0, 4000),
    }, summary);
  }

  // Confirmed ERESOLVE — log loudly, attempt one fallback with --legacy-peer-deps.
  log.warn(`npm reported ERESOLVE peer-dependency conflict in marketplace deps; retrying with --legacy-peer-deps. Conflict details:`);
  log.warn(extractEresolveBlock(stderr));

  const legacyResult = await runNpmStrict(marketplaceDir, [...baseFlags, '--legacy-peer-deps']);
  if (legacyResult.code === 0) {
    summary.warnings.push({
      component: 'marketplace-npm-install',
      message: 'tree-sitter peer-dep ERESOLVE was resolved with --legacy-peer-deps fallback. This is benign for the marketplace install but should be re-evaluated when tree-sitter peer ranges change.',
      remediation: 'No action required.',
    });
    return;
  }

  installerError(ErrorSeverity.ABORT, {
    component: 'marketplace-npm-install',
    phase: 'marketplace-deps',
    cause: new Error(`npm install --legacy-peer-deps still failed (exit ${legacyResult.code})`),
    eresolveDetails: legacyResult.stderr?.slice(0, 4000),
  }, summary);
}

Helpers (extract to src/npx-cli/install/npm-install-helper.ts):

  • runNpmStrict(cwd, flags): Promise<{ code: number; stdout: string; stderr: string }> — wraps spawnSync with timeout (Phase 7).
  • extractEresolveBlock(stderr): string — pulls the While resolving:Conflicting peer dependency: block for display.

Bun install hardening (installPluginDependencies setup-runtime.ts:221–239)

Same pattern: wrap with runBunStrict, parse stderr for error: failed to resolve (network) vs error: package "X" not found (real missing dep). Network failures = SILENT_RETRY (one retry); real missing = ABORT.

Verification checklist

  • Existing test tests/install-non-tty.test.ts still passes (happy path).
  • New unit test: simulated npm install exit 1 with ERESOLVE in stderr triggers fallback path.
  • New unit test: simulated npm install exit 1 without ERESOLVE → immediate ABORT (no fallback).
  • New unit test: both strict and legacy fail → ABORT with first-20-lines stderr in eresolveDetails.
  • grep -n "legacy-peer-deps" src/npx-cli/commands/install.ts only appears inside runNpmInstallInMarketplace's fallback path, never on first try.

Anti-pattern guards

  • Do not use --force. It accepts conflicting trees that --legacy-peer-deps would skip — different semantics.
  • Do not retry the strict install — strict failure with no ERESOLVE means a real bug; retrying just hides it.
  • Do not assume ERESOLVE is always present in lowercase. The npm format is uppercase; match /\bERESOLVE\b/ not /eresolve/i.
  • Do not parse stderr with a fragile regex; the simple \bERESOLVE\b token check is sufficient. Keep extractEresolveBlock defensive (return raw stderr if the block markers aren't found).

Phase 5 — Missing-uv auto-detection and explicit failure

Goal: Honor CLAUDE.md's "uv auto-installed if missing" promise, but make the failure case loud and platform-specific. Downgrade to WARN_CONTINUE if the user opted out of vector search.

File to edit: src/npx-cli/install/setup-runtime.ts:206–219

What to implement

Augment ensureUv():

typescript
export async function ensureUv(
  summary: InstallSummary,
  options: { allowVectorSearchOptOut?: boolean } = {}
): Promise<{ uvPath: string; version: string } | { uvPath: null; version: null }> {

  if (!isUvInstalled()) {
    installUv();   // existing logic — already throws platform-specific error on failure
  }

  // Post-install verification: PATH may not yet include ~/.local/bin in the
  // current shell. Re-probe UV_COMMON_PATHS explicitly.
  let uvPath = getUvPath();
  if (!uvPath) {
    // One more direct check of UV_COMMON_PATHS (in case install just wrote there).
    uvPath = UV_COMMON_PATHS.find(existsSync) ?? null;
  }

  if (!uvPath) {
    if (options.allowVectorSearchOptOut && userHasOptedOutOfVectorSearch()) {
      installerError(ErrorSeverity.WARN_CONTINUE, {
        component: 'uv-install',
        phase: 'setup-runtime',
        cause: new Error('uv binary not found after install; vector search disabled — continuing.'),
      }, summary);
      return { uvPath: null, version: null };
    }
    installerError(ErrorSeverity.ABORT, {
      component: 'uv-install',
      phase: 'setup-runtime',
      cause: new Error('uv binary not found after auto-install attempt'),
      remediation: platformUvRemediation(),  // surfaced as PRIMARY message
    }, summary);
  }

  const version = getUvVersion();
  if (!version) {
    // Probe failed once — retry with a 1-second sleep (sometimes new binaries need a moment).
    await new Promise((r) => setTimeout(r, 1000));
    const retried = getUvVersion();
    if (!retried) {
      installerError(ErrorSeverity.WARN_CONTINUE, {
        component: 'uv-version-probe',
        phase: 'setup-runtime',
        cause: new Error(`uv binary at ${uvPath} did not respond to --version after retry`),
      }, summary);
      return { uvPath, version: 'unknown' };
    }
    return { uvPath, version: retried };
  }
  return { uvPath, version };
}

Helpers:

  • userHasOptedOutOfVectorSearch() — check SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH) for a CLAUDE_MEM_DISABLE_VECTOR_SEARCH setting (define if it does not exist; default false).
  • platformUvRemediation() — extract the existing platform-specific block from installUv (lines 164–166) into a standalone exported function so both error paths share it.

Apply same pattern to ensureBun

ensureBun (lines 191–204): same retry-after-1s, same platformBunRemediation(). Bun has no opt-out — bun is mandatory for hooks.

Verification checklist

  • tests/setup-runtime.test.ts extended: case where installUv succeeds but getUvPath still returns null (mock existsSync to lie) → ABORT with platform string.
  • Test: same scenario but with vector search opted out → WARN_CONTINUE, ensureUv returns {uvPath: null}.
  • Test: getUvVersion returns null on first call, version on second → returns { version: ...} after retry, no warning.
  • Test: getUvVersion returns null both times → WARN_CONTINUE, version: 'unknown'.

Anti-pattern guards

  • Do not call installUv() more than once per ensureUv() invocation. The auto-install attempt is one-shot; if it fails, ABORT with manual instructions. Do not loop.
  • Do not silently swallow installUv()'s thrown error — its message already contains the platform-specific instructions; let them propagate as the ABORT remediation.
  • Do not add a "press enter to continue" prompt on missing uv — non-interactive installs would hang.

Phase 6 — Cross-IDE validation matrix (12 × 4 = 48 cells)

Goal: Every IDE × every failure mode asserts the right outcome.

Files to create:

  • tests/install-error-matrix.test.ts
  • Dockerfile.test-installer-matrix

What to implement

Use bun test's existing harness. For each of the 12 IDEs (claude-code, gemini-cli, opencode, openclaw, windsurf, codex-cli, cursor, copilot-cli, antigravity, goose, roo-code, warp) and for each of 4 scenarios, generate one test case:

ScenarioFixture / mockAssertions
Happy pathMock spawnSync so bun --version, uv --version, npm install all return 0.exit 0, stdout contains installed successfully, summary failedIDEs.length === 0, summary.warnings.length === 0.
tree-sitter ERESOLVEMock npm install to exit 1 with npm error code ERESOLVE in stderr; mock --legacy-peer-deps retry to also exit 1.exit 1, stderr contains Installation Aborted: tree-sitter-eresolve, stderr contains the conflicting peer ranges block, stdout does not contain installed successfully.
Missing uv (auto-install fails)Mock getUvPath to return null; mock installUv to throw with astral.sh 404.exit 1, stderr contains Installation Aborted: uv-missing-after-install, stderr contains platform-specific manual instructions (curl -LsSf https://astral.sh/uv/install.sh | sh on Linux, winget install astral-sh.uv on Windows).
Missing bun (auto-install fails)Mock getBunPath to return null; mock installBun to throw with bun.sh 404.exit 1, stderr contains Installation Aborted: bun-missing-after-install, stderr contains platform-specific manual instructions.

Helpers needed

  • setupIsolatedHome(): { home: string; cleanup: () => void } — creates a temp HOME, sets CLAUDE_MEM_DATA_DIR=$home/.claude-mem, HOME=$home, returns paths.
  • mockSpawnSync(matrix: Record<string, { code: number; stdout?: string; stderr?: string }>): void — installs a mock that matches by command+arg.
  • runInstallSubprocess(ide: string, env: Record<string, string>): Promise<{ exitCode: number; stdout: string; stderr: string }> — spawns bun src/npx-cli/index.ts install --no-auto-start --ide=${ide} with mocked env via a wrapper that injects the spawn mocks.

Docker matrix runner

Dockerfile.test-installer-matrix extends Dockerfile.test-installer:

  • Adds RUN bun install for the test deps.
  • ENTRYPOINT runs bun test tests/install-error-matrix.test.ts --reporter junit > /workspace/results.xml.
  • A scripts/run-matrix-docker.sh wrapper builds the image and runs it; CI invokes this on every PR that touches src/npx-cli/, src/services/integrations/, scripts/build-hooks.js, or tests/install-*.

Verification checklist

  • bun test tests/install-error-matrix.test.ts produces 48 test cases (12 × 4).
  • Every case asserts at least: exit code, summary headline (installed successfully vs Installation Aborted), specific remediation substring, structured stderr.
  • Docker matrix run completes in < 5 minutes.
  • CI fails the PR if any of the 48 cells regresses.

Anti-pattern guards

  • Do not test against the real ~/.claude — every case must use isolated TMPDIR HOME.
  • Do not mock at the installerError level. Mock the underlying spawnSync/existsSync so the full pipeline is exercised.
  • Do not skip the IDEs marked coming soon in the matrix — the install command can still be invoked with them. The matrix should assert that they exit cleanly with a "support coming soon" message and exit 0 (they are not failures).
  • Do not rely on process.env.HOME mutations inside the test process — spawn a subprocess with the env override.

Phase 7 — Postinstall regression guards (12.6.2 lesson)

Goal: Prevent another tree-sitter-swift-style hang. CI must fail when a new transitive dep with scripts.postinstall or scripts.install lands outside the explicit allowlist.

Files to create / edit:

  • scripts/check-postinstall-allowlist.js (NEW, pre-publish CI)
  • package.json prepublishOnly script (extend)
  • src/npx-cli/install/setup-runtime.ts installPluginDependencies (timeout wrapper)

CI guard

scripts/check-postinstall-allowlist.js:

javascript
#!/usr/bin/env node
// Enforces: no transitive dep with scripts.postinstall|scripts.install may
// land in plugin/ or root node_modules unless allowlisted.
//
// Why: see CHANGELOG.md:93–110 (12.6.1 → 12.6.2 incident). npm does NOT honor
// trustedDependencies (Bun-only). Any new package with a network postinstall
// will hang `npx claude-mem install`.

const ALLOWLIST = new Set([
  'tree-sitter-cli',     // builds bindings; trusted because we explicitly need it
  'esbuild',             // platform-specific binary download is the package itself
]);

// Walk node_modules, parse each package.json, fail if scripts.postinstall or
// scripts.install is present and the package name is not in ALLOWLIST.
// Run against both root and plugin/ trees.

Wire into prepublishOnly: "prepublishOnly": "npm run build && node scripts/check-postinstall-allowlist.js".

Runtime --ignore-scripts default

installPluginDependencies (setup-runtime.ts:228–233): pass --ignore-scripts to bun install. Add comment:

typescript
// Per CHANGELOG.md:93–110 (v12.6.1 → v12.6.2): tree-sitter-swift's
// nested tree-sitter-cli postinstall downloads a Rust binary and can
// hang the install. We allowlist the small set of packages that legitimately
// need postinstall (tree-sitter-cli, esbuild) via package.json
// trustedDependencies. Bun honors trustedDependencies; npm does not, which is
// why we additionally pass --ignore-scripts and why root devDependencies stay
// out of npx fetch (v12.6.2 fix).
execSync(`${bunCmd} install --ignore-scripts`, { ... });

runNpmInstallInMarketplace already has --ignore-scripts from Phase 4.

Timeout wrapper

Every execSync/spawnSync install command must have an explicit timeout:

typescript
const TIMEOUT_FIRST_RUN_MS = 5 * 60 * 1000;   // 5 min
const TIMEOUT_SUBSEQUENT_MS = 2 * 60 * 1000;  // 2 min
const installTimeout = process.env.CLAUDE_MEM_INSTALL_TIMEOUT_MS
  ? Number(process.env.CLAUDE_MEM_INSTALL_TIMEOUT_MS)
  : (isFirstRun ? TIMEOUT_FIRST_RUN_MS : TIMEOUT_SUBSEQUENT_MS);

spawnSync returns signal === 'SIGTERM' on timeout. Convert to ABORT with child-process-timeout category.

Apply to all install spawns

Audit-driven list of spawns to wrap:

  • installBun (line 122–127) — curl pipe-bash, 5 min timeout, allow override.
  • installUv (line 152–155) — curl pipe-bash, 5 min timeout.
  • installPluginDependencies bun install — 5 min first run, 2 min subsequent.
  • runNpmStrict and runNpmStrict --legacy-peer-deps — 5 min first run, 2 min subsequent.
  • installClaudeCode (line 426) — already has its own spinner, but no timeout. Add 5 min.

Verification checklist

  • node scripts/check-postinstall-allowlist.js against the current tree exits 0 (no offenders today).
  • Adding tree-sitter-haskell-evil (hypothetical fixture) with a fake postinstall breaks CI.
  • grep -n "ignore-scripts" src/npx-cli/install/setup-runtime.ts src/npx-cli/commands/install.ts shows the flag in both bun install and npm install paths.
  • Test: spawnSync with timeout: 100ms on a slow command returns signal: 'SIGTERM' and triggers ABORT.

Anti-pattern guards

  • Do not auto-add packages to the allowlist when CI fails. Failing CI is the point — a human reviews each new postinstall.
  • Do not add tree-sitter-cli to the allowlist twice (it already lives in trustedDependencies in package.json:190 and scripts/build-hooks.js:106). The new allowlist is just a CI-time guard, not a duplicate of trustedDependencies.
  • Do not remove --ignore-scripts from bun install even though Bun honors trustedDependencies — the belt-and-suspenders is intentional.
  • Do not make the timeout configurable per-IDE — one global CLAUDE_MEM_INSTALL_TIMEOUT_MS env var is sufficient.

Phase 8 — Documentation and cross-references

Goal: Document the taxonomy and remediation map for end-users and contributors. Update CLAUDE.md to cross-reference.

Files to edit / create:

  • docs/public/troubleshooting.mdx (CREATE or EXTEND if it exists)
  • CLAUDE.md "Exit Code Strategy" section
  • plans/04-installer-transparency.md (this file — already)

What to write

docs/public/troubleshooting.mdx:

  • Section "Installation errors": lists each id from the taxonomy table, the error message format, and the remediation. Markdown table mirroring Phase 2's seed taxonomy.
  • Section "Reading the error": shows a sample stderr block and how to copy-paste the bottom block into a GitHub issue.
  • Section "Debug": doc the CLAUDE_MEM_INSTALL_TIMEOUT_MS env var and ~/.claude-mem/last-install-error.json.

CLAUDE.md "Exit Code Strategy" — append:

markdown
**Installer exit codes** (note: installer is NOT a hook; it follows standard CLI exit semantics):

- **Exit 0**: install succeeded; "Installation Complete" headline; summary may include `WARN_CONTINUE` warnings.
- **Exit 1**: ABORT or partial-IDE failures. Headline is "Installation Aborted: \<category\>" or "Installation Partial". Structured cause written to `~/.claude-mem/last-install-error.json` (or `$CLAUDE_MEM_DATA_DIR/last-install-error.json`). See `src/npx-cli/install/error-taxonomy.ts` for the full category list.

docs.json (Mintlify nav): add a link to the new troubleshooting page.

Verification checklist

  • troubleshooting.mdx covers all 14 categories from Phase 2.
  • CLAUDE.md cross-reference points to the right file.
  • docs.json updated.
  • CHANGELOG.md is NOT edited (auto-generated per CLAUDE.md's "No need to edit the changelog ever, it's generated automatically.").

Anti-pattern guards

  • Do not edit CHANGELOG.md.
  • Do not add a "report this error to support" link to a non-existent endpoint. Use the GitHub issues URL from package.json:25–27.
  • Do not localize the remediation strings yet — English-only for this phase.

Phase 9 — Final verification

Whole-system checks

  • npm run typecheck passes (root + viewer).
  • npm run test passes (all suites including the new matrix).
  • bun test tests/install-error-matrix.test.ts produces 48 test cases, all green.
  • Docker matrix runner (scripts/run-matrix-docker.sh) green on clean Linux.
  • npm run build-and-sync completes without errors and the worker restarts cleanly.
  • Manual test: bun src/npx-cli/index.ts install --no-auto-start on a fresh test home (HOME=/tmp/test-home) — should succeed and produce a clean summary.
  • Manual test: same command after mv ~/.bun /tmp/.bun-stash (simulate missing bun) — should ABORT with platform-specific instructions.
  • grep -nE 'console\.warn\(' src/npx-cli/ src/services/integrations/ — should only show non-installer-error usage (e.g. bug-report script), no swallowed-error patterns.
  • grep -nE '\|\| true' openclaw/install.sh — sites that should remain (best-effort probes) are documented; sites that should fail loud are converted to \|\| { error "..."; exit 1; }.

Anti-pattern guards (sweep)

  • No new try {} catch {} empty handlers introduced.
  • No new console.warn in installer paths that bypass installerError.
  • No use of --force anywhere in install scripts.
  • No removal of --ignore-scripts from bun install or npm install calls.
  • No edits to CHANGELOG.md.

Rollback plan

If post-merge a real-world install regression appears:

  1. Revert PR. Each phase is on a separate commit so partial revert is feasible.
  2. The pre-existing --legacy-peer-deps unconditional behavior is preserved in git history at the line numbers cited in this plan.
  3. The ~/.claude-mem/last-install-error.json file written by installerError provides a reproducible diagnostic for any user who hits an ABORT — capture this in the rollback issue.

Phase boundaries / ordering

Phases must execute in numerical order:

  • Phase 0 → Phase 1: discovery before audit.
  • Phase 2 (taxonomy) blocks Phase 3 (reporter uses the enum).
  • Phase 3 (reporter) blocks Phase 4 / 5 (both call installerError).
  • Phase 4 + 5 land independently after Phase 3.
  • Phase 6 (matrix tests) needs 3, 4, 5 complete to assert correct behavior.
  • Phase 7 (postinstall guards) can land any time after Phase 3 — independent.
  • Phase 8 (docs) is last (documents what shipped).

Each phase is a separate commit (and each is a runnable mini-task in a fresh chat context).