plans/06-worker-env-isolation.md
Goal: Stop host-side environment variables from contaminating the worker's Anthropic SDK subprocess. Two confirmed bugs anchor this plan:
ANTHROPIC_BASE_URLleaks from the parent shell whileANTHROPIC_AUTH_TOKENis blocked, breaking proxy/gateway auth (#2375); andCLAUDE_CODE_EFFORT_LEVELpropagates from host CLI settings into the SDK subprocess where it triggers a permanent HTTP 400 that the retry classifier mistakes for transient (#2357). Adjacent feature #2289 ($TIERalias syntax) is in scope where it shares the same env/model-resolution surface.Net effect:
- The OAuth-skip predicate requires a real credential (
ANTHROPIC_API_KEYorANTHROPIC_AUTH_TOKEN), not a bareANTHROPIC_BASE_URL. Proxy/gateway users put credentials in~/.claude-mem/.env; nothing relies on parent-shell leaks.BLOCKED_ENV_VARSaddsANTHROPIC_BASE_URLand theCLAUDE_CODE_EFFORT_LEVEL/CLAUDE_CODE_ALWAYS_ENABLE_EFFORTpair (defense in depth alongside the existingenv-sanitizer.tsCLAUDE_CODE_*prefix filter).- The Claude provider's error classifier explicitly handles HTTP 400 as
unrecoverable, matchingGeminiProvider/OpenRouterProvider. No more unbounded retry loop on permanent-error responses.- Every spawn boundary that hands env to a child process applies BOTH
buildIsolatedEnvandsanitizeEnv. A grep-based CI check forbids spawning subprocesses with rawprocess.env.~/.claude-mem/.envbecomes the single source of truth for non-OAuth Anthropic credentials. The loader's whitelist documents this contract.Out of scope:
- Hook-side env handling (Plan 01 / 02 territory).
- Worker daemon lifecycle, DB bloat, and chroma-mcp leaks (Plan 03).
- Observer/Knowledge SDK tool enforcement (Plan 05).
- Re-auth UX flow (different concern; out of scope for this plan).
- General provider-router refactor —
$TIERalias is scoped to model resolution only (Phase 4).
ANTHROPIC_BASE_URL leaks, OAuth gets skipped, ANTHROPIC_AUTH_TOKEN is missing (#2375)src/shared/EnvManager.ts lines 14–24 (BLOCKED_ENV_VARS):
const BLOCKED_ENV_VARS = [
'ANTHROPIC_API_KEY', // #733
'ANTHROPIC_AUTH_TOKEN', // added 5edf1557 (2026-05-04) — leak prevention
'CLAUDECODE',
'CLAUDE_CODE_OAUTH_TOKEN', // #2215
];
ANTHROPIC_BASE_URL is not in the list, so it survives buildIsolatedEnv() (lines 166–205) and reaches isolatedEnv from process.env.
buildIsolatedEnvWithFreshOAuth() lines 222–288 then runs the OAuth-skip predicate at lines 237–244:
if (
isolatedEnv.ANTHROPIC_API_KEY ||
isolatedEnv.ANTHROPIC_BASE_URL ||
isolatedEnv.ANTHROPIC_AUTH_TOKEN
) {
clearStaleMarker();
return isolatedEnv;
}
The bare BASE_URL branch was added in commit a122d34e (2026-05-04) under the rationale "tokenless gateways may exist." Combined with the AUTH_TOKEN block from 5edf1557 the same day, the subprocess ends up with:
ANTHROPIC_BASE_URL ✅ (leaked from parent)ANTHROPIC_AUTH_TOKEN ❌ (blocked, never re-injected because ~/.claude-mem/.env is empty for first-time proxy users)CLAUDE_CODE_OAUTH_TOKEN ❌ (skip path bypassed the keychain read)Result: Not logged in · Please run /login from every SDK subprocess.
CLAUDE_CODE_EFFORT_LEVEL triggers permanent 400 + unbounded retry (#2357)The Anthropic SDK subprocess reads CLAUDE_CODE_EFFORT_LEVEL from its env and forwards it as the effort parameter on Messages API calls. claude-mem's source contains zero references to effort — the leak path is environmental, not code. Models without effort support (Haiku 4.5, Sonnet 4.5, older) reject with HTTP 400.
src/supervisor/env-sanitizer.ts lines 1–51 already filters CLAUDE_CODE_* via ENV_PREFIXES (with explicit allowances in ENV_PRESERVE). But:
buildIsolatedEnv does NOT call sanitizeEnv internally; callers are expected to chain them.BLOCKED_ENV_VARS is the canonical leak deny-list and does not name CLAUDE_CODE_EFFORT_LEVEL. Defense-in-depth is currently single-layer.src/services/worker/ClaudeProvider.ts has no HTTP 400 case; the default branch at line 98 returns kind: 'transient', so a permanent 400 loops forever.src/services/worker/GeminiProvider.ts lines 89–94 and src/services/worker/OpenRouterProvider.ts lines 82–87 already classify 400 as unrecoverable; that pattern is the copy-target for ClaudeProvider.
$TIER alias syntax (#2289)src/shared/SettingsDefaultsManager.ts line 116 already implements a portable 'haiku' alias for CLAUDE_MEM_TIER_SIMPLE_MODEL (per #1463). What's missing is the user-facing $TIER syntax in the CLAUDE_MEM_MODEL field that resolves to a provider-appropriate model at request time. Same code surface (model resolution in ClaudeProvider.getModelId at lines 442–446); minimal extension.
Findings below are direct file reads dated 2026-05-08. Each implementation phase cites by line number; do not re-derive. Confidence: HIGH on file/API inventory. Local-only files were read end-to-end.
| Item | Location | What to copy |
|---|---|---|
BLOCKED_ENV_VARS array | src/shared/EnvManager.ts:14–24 | Add new entries; keep the comment-per-entry convention |
buildIsolatedEnv filter pattern | src/shared/EnvManager.ts:166–205 | Filter on BLOCKED_ENV_VARS.includes(key); defensive delete isolatedEnv.X post-filter |
buildIsolatedEnvWithFreshOAuth skip-check | src/shared/EnvManager.ts:237–244 | Restrict predicate to real credentials only |
loadClaudeMemEnv whitelist + ClaudeMemEnv interface | src/shared/EnvManager.ts:26–32, 79–100 | Single source of truth for what ~/.claude-mem/.env accepts |
ENV_PRESERVE / ENV_EXACT_MATCHES / ENV_PREFIXES | src/supervisor/env-sanitizer.ts:1–51 | Whitelist-based env stripping; do NOT add CLAUDE_CODE_EFFORT_LEVEL to ENV_PRESERVE |
| Provider error classifier (HTTP 400 → unrecoverable) | src/services/worker/GeminiProvider.ts:89–94, src/services/worker/OpenRouterProvider.ts:82–87 | Identical pattern to apply in ClaudeProvider |
ClassifiedProviderError constructor + kind: 'unrecoverable' | 'auth_invalid' | 'transient' | 'rate_limit' | 'quota_exhausted' | src/services/worker/retry.ts | Use existing kind enum; do not invent permanent |
isRetryableKind predicate | src/services/worker/retry.ts:37–44 | Used by all retry sites; no edit needed once classifier is correct |
Tier model resolution + 'haiku' alias | src/services/worker/http/routes/SessionRoutes.ts:503–521, src/shared/SettingsDefaultsManager.ts:51–53, 115–117 | Pattern for extending $TIER syntax |
Settings flat-key + loadFromFile | src/shared/SettingsDefaultsManager.ts:6–67, 70–131, 137–139, 161–206 | New keys MUST be added to interface AND DEFAULTS block |
| Plan format (phase numbering, line-cited edits, anti-patterns block) | plans/01-hook-io-discipline.md, plans/05-observer-tool-enforcement.md | Reuse layout |
effort, CLAUDE_CODE_EFFORT_LEVEL, CLAUDE_CODE_ALWAYS_ENABLE_EFFORT, or reasoning_effort. Do not "remove the effort parameter we forward" — there is none. The leak is the SDK subprocess reading the env var directly.BLOCKED_ENV_VARS is an Array<string> with .includes lookup. Do NOT convert to Set in the same change — that touches every caller and is an unrelated refactor.ClassifiedProviderError.kind does NOT support the value 'permanent'. The existing enum is 'transient' | 'rate_limit' | 'unrecoverable' | 'auth_invalid' | 'quota_exhausted'. Use unrecoverable for permanent 400s.pending_messages has no retry_count column (dropped — see src/services/sqlite/SessionStore.ts:104's deadColumns array). Issue #2357's "retry counter climbed past #1874" refers to log-line numbering, not a DB counter. Do not add a counter as part of this plan; that's Plan 03 territory.sanitizeEnv is whitelist-based (preserves a fixed set; strips everything matching CLAUDE_CODE_* etc). It is NOT idempotent if you re-add a name to ENV_PRESERVE. Do not add CLAUDE_CODE_EFFORT_LEVEL to ENV_PRESERVE — that's the opposite of what we want.buildIsolatedEnv and sanitizeEnv are independent layers. Some callers chain (sanitizeEnv(buildIsolatedEnv(...))); some only use one. Do not assume chaining is universal — Phase 5 audits every spawn boundary.~/.claude-mem/.env loader at src/shared/EnvManager.ts:79–100 uses property-by-property assignment as an implicit whitelist. Do NOT replace with Object.assign(result, parsed) — that breaks the whitelist guarantee.| File | Lines | Disposition |
|---|---|---|
src/shared/EnvManager.ts | 319 | Edited heavily (Phase 2, Phase 5) |
src/supervisor/env-sanitizer.ts | 51 | Light edit (Phase 3 — comment change only; CLAUDE_CODE_* prefix already filters EFFORT_LEVEL) |
src/services/worker/ClaudeProvider.ts | 448 | Edited (Phase 3 — error classifier on query() rejection path) |
src/services/worker/retry.ts | small | Confirm-only (Phase 3 — isRetryableKind already correct) |
src/services/worker/GeminiProvider.ts | reference only | Read for pattern (Phase 3) |
src/services/worker/OpenRouterProvider.ts | reference only | Read for pattern (Phase 3) |
src/shared/SettingsDefaultsManager.ts | 209 | Edited (Phase 4 — $TIER alias resolution) |
src/services/worker/http/routes/SessionRoutes.ts | reference | Read tier-routing pattern (Phase 4) |
src/services/infrastructure/ProcessManager.ts | line 415 | Audit (Phase 5) — confirm sanitizeEnv chain is sufficient |
src/services/sync/ChromaMcpManager.ts | line 585 | Audit (Phase 5) |
src/supervisor/process-registry.ts | line 539 | Audit (Phase 5) |
src/services/worker-service.ts | line 412 | Audit (Phase 5) |
src/services/worker/knowledge/KnowledgeAgent.ts | lines 54, 149 | Confirm-only (Phase 5) |
tests/env-isolation.test.ts | NEW | CREATED (Phase 6) |
scripts/check-spawn-env-discipline.cjs | NEW | CREATED (Phase 7) |
CLAUDE.md | small | Edited (Phase 7 — document ~/.claude-mem/.env contract) |
Goal: Pin down current behavior with red tests so the fix can prove itself green. No production-code changes in this phase.
tests/env-isolation.test.ts)Use bun:test per package.json "test": "bun test". Pattern from tests/claude-provider-resume.test.ts:1.
buildIsolatedEnvWithFreshOAuth strips ANTHROPIC_BASE_URL when no .env credentials are configured
process.env.ANTHROPIC_BASE_URL = 'https://proxy.example', no ~/.claude-mem/.env, no API_KEY/AUTH_TOKEN in env.buildIsolatedEnvWithFreshOAuth().ANTHROPIC_BASE_URL (post-fix). Currently RED.OAuth-skip does not fire on bare ANTHROPIC_BASE_URL
readClaudeOAuthToken.readClaudeOAuthToken was called (because BASE_URL alone is not enough to skip). Currently RED — readClaudeOAuthToken is NOT called today.ANTHROPIC_AUTH_TOKEN from ~/.claude-mem/.env reaches the isolated env
.env with ANTHROPIC_AUTH_TOKEN=test-token and ANTHROPIC_BASE_URL=https://proxy.example.isolatedEnv.ANTHROPIC_AUTH_TOKEN === 'test-token' AND isolatedEnv.ANTHROPIC_BASE_URL === 'https://proxy.example'. Currently GREEN (already works); test guards against regression.CLAUDE_CODE_EFFORT_LEVEL is stripped from the isolated env
process.env.CLAUDE_CODE_EFFORT_LEVEL = 'MAX'.sanitizeEnv(buildIsolatedEnv()) does NOT contain CLAUDE_CODE_EFFORT_LEVEL. Currently GREEN via env-sanitizer.ENV_PREFIXES; test guards.CLAUDE_CODE_EFFORT_LEVEL is in BLOCKED_ENV_VARS for defense-in-depth
BLOCKED_ENV_VARS.includes('CLAUDE_CODE_EFFORT_LEVEL'). Currently RED.HTTP 400 from Claude SDK is classified unrecoverable
error.status === 400, body contains does not support the effort parameter).classifyClaudeProviderError(err).kind === 'unrecoverable'. Currently RED — falls through to transient.HTTP 400 with effort-parameter body emits a once-only warn log
logger.warn calls.SDK and a hint pointing at #2357 / ~/.claude-mem/.env. Currently RED.bun test tests/env-isolation.test.ts runs cleanly (RED tests fail with the expected assertion, no other errors).git diff src/ empty).EnvManager.buildIsolatedEnv — it's the unit under test.vi.* (project uses bun:test, not vitest)..env files. Use a per-test beforeEach/afterEach with mkdtempSync.Goal: Make the OAuth-skip require a real credential, and add ANTHROPIC_BASE_URL to the deny-list so it can only be configured via ~/.claude-mem/.env.
src/shared/EnvManager.ts:14–24 — extend BLOCKED_ENV_VARSBefore:
const BLOCKED_ENV_VARS = [
'ANTHROPIC_API_KEY',
'ANTHROPIC_AUTH_TOKEN',
'CLAUDECODE',
'CLAUDE_CODE_OAUTH_TOKEN',
];
After (add ANTHROPIC_BASE_URL):
const BLOCKED_ENV_VARS = [
'ANTHROPIC_API_KEY', // #733
'ANTHROPIC_AUTH_TOKEN', // 5edf1557 — leak prevention; re-injected from ~/.claude-mem/.env when configured
'ANTHROPIC_BASE_URL', // #2375 — same leak class as AUTH_TOKEN; re-injected from ~/.claude-mem/.env. Without this entry, a leaked BASE_URL alone triggered the OAuth-skip while no auth credential reached the subprocess.
'CLAUDECODE',
'CLAUDE_CODE_OAUTH_TOKEN', // #2215
];
src/shared/EnvManager.ts:237–244 — restrict OAuth-skip to real credentialsBefore:
if (
isolatedEnv.ANTHROPIC_API_KEY ||
isolatedEnv.ANTHROPIC_BASE_URL ||
isolatedEnv.ANTHROPIC_AUTH_TOKEN
) {
clearStaleMarker();
return isolatedEnv;
}
After:
// Skip OAuth lookup ONLY when a real credential is configured. A bare
// ANTHROPIC_BASE_URL is not a credential — every documented gateway needs
// either an AUTH_TOKEN or an API_KEY. This guards #2375 against a class of
// leaks where a parent shell exports BASE_URL (e.g. for the Claude Code CLI
// itself) while no token is present.
if (isolatedEnv.ANTHROPIC_API_KEY || isolatedEnv.ANTHROPIC_AUTH_TOKEN) {
clearStaleMarker();
return isolatedEnv;
}
~/.claude-mem/.env re-injection at src/shared/EnvManager.ts:178–195Currently the loader path covers BASE_URL re-injection from .env. Confirm by reading the function. No code change required here, but add a TS comment block above lines 178–195 documenting the new contract:
// Contract (post-#2375): ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, and
// ANTHROPIC_API_KEY are *only* populated from ~/.claude-mem/.env. They are
// in BLOCKED_ENV_VARS so parent-shell values never leak through.
bun test).grep -n "ANTHROPIC_BASE_URL" src/shared/EnvManager.ts shows entries at: BLOCKED_ENV_VARS, ClaudeMemEnv interface, loader, re-injection, OAuth-skip predicate (NOT in skip predicate).~/.claude-mem/.env containing ANTHROPIC_BASE_URL=... and ANTHROPIC_AUTH_TOKEN=..., the worker actually authenticates against the proxy. Test with BigModel or any sandboxed proxy.ANTHROPIC_BASE_URL to ENV_PRESERVE in env-sanitizer.ts — BLOCKED_ENV_VARS is the right layer; env-sanitizer is a downstream filter.delete isolatedEnv.CLAUDE_CODE_OAUTH_TOKEN defensive line at line 229. That guard is intact; it's belt-and-suspenders for #2215 and orthogonal to this plan.Goal: Two-layer defense for the env leak (existing CLAUDE_CODE_* prefix filter + new BLOCKED_ENV_VARS entries), plus a permanent classification for the resulting HTTP 400 so the retry loop terminates if the leak ever sneaks past either layer.
src/shared/EnvManager.ts:14–24 — add EFFORT entries to BLOCKED_ENV_VARSAfter the Phase 2 edit, the list is:
const BLOCKED_ENV_VARS = [
'ANTHROPIC_API_KEY',
'ANTHROPIC_AUTH_TOKEN',
'ANTHROPIC_BASE_URL',
'CLAUDECODE',
'CLAUDE_CODE_OAUTH_TOKEN',
// #2357 — host CLI config, not part of the plugin's contract. The
// env-sanitizer's CLAUDE_CODE_* prefix filter strips these for spawn paths
// that go through it, but BLOCKED_ENV_VARS is the canonical deny-list and
// belongs in defense-in-depth.
'CLAUDE_CODE_EFFORT_LEVEL',
'CLAUDE_CODE_ALWAYS_ENABLE_EFFORT',
];
src/services/worker/ClaudeProvider.ts — classify HTTP 400 as unrecoverableLocate the existing error-classification path. The Anthropic SDK raises errors with error.status and a body containing the failure description. Pattern from src/services/worker/GeminiProvider.ts:89–94 (the canonical copy-target):
if (status === 400) {
return new ClassifiedProviderError(
`Gemini bad request (status 400)`,
{ kind: 'unrecoverable', cause: input.cause },
);
}
Add the equivalent in ClaudeProvider's error classifier (new function or existing — read the file; create if absent, mirroring GeminiProvider shape):
function classifyClaudeProviderError(input: { cause: unknown }): ClassifiedProviderError {
const err = input.cause;
const status = (err as { status?: number })?.status;
const bodyText = String((err as { message?: string })?.message ?? '');
// Permanent: SDK rejected the request itself. Most common cause in the wild
// is a leaked CLAUDE_CODE_EFFORT_LEVEL the SDK subprocess forwarded as
// `effort` against a model that doesn't support it (#2357). The leak is
// also blocked at BLOCKED_ENV_VARS + env-sanitizer; this classifier ends
// the retry loop if either layer is bypassed.
if (status === 400) {
if (/effort parameter/i.test(bodyText)) {
logger.warn(
'SDK',
'Claude API rejected effort parameter — likely CLAUDE_CODE_EFFORT_LEVEL leaked into SDK env (issue #2357). Configure CLAUDE_MEM_MODEL or set credentials in ~/.claude-mem/.env.',
{ status, bodyText },
);
}
return new ClassifiedProviderError(
`Claude bad request (status 400): ${bodyText}`,
{ kind: 'unrecoverable', cause: input.cause },
);
}
// 401 / 403 → auth_invalid (existing pattern from GeminiProvider:96-103)
if (status === 401 || status === 403) {
return new ClassifiedProviderError(
`Claude auth rejected (status ${status})`,
{ kind: 'auth_invalid', cause: input.cause },
);
}
// 429 → rate_limit
if (status === 429) {
return new ClassifiedProviderError(
`Claude rate limited (status 429)`,
{ kind: 'rate_limit', cause: input.cause },
);
}
// Default: transient (preserves the existing fall-through behavior).
return new ClassifiedProviderError(
`Claude SDK error: ${bodyText}`,
{ kind: 'transient', cause: input.cause },
);
}
Wire this classifier into the existing try { ... } catch around query(...) in ClaudeProvider.ts. Read the actual catch shape before editing — the function lives near line 180–195 and the existing for await over queryResult is where rejections surface.
src/supervisor/env-sanitizer.ts already strips CLAUDE_CODE_EFFORT_LEVELRead lines 1–51. Verify:
ENV_PREFIXES includes 'CLAUDE_CODE_'.ENV_PRESERVE does NOT include CLAUDE_CODE_EFFORT_LEVEL, CLAUDE_CODE_ALWAYS_ENABLE_EFFORT.Add an inline comment at the ENV_PREFIXES declaration:
// Filters CLAUDE_CODE_* unless explicitly preserved in ENV_PRESERVE.
// This is layer 2 of defense for #2357 — layer 1 is BLOCKED_ENV_VARS in EnvManager.
No code change to behavior here.
grep -n "CLAUDE_CODE_EFFORT_LEVEL" src/ returns hits in EnvManager.ts (BLOCKED_ENV_VARS) and the test file. Nothing else.CLAUDE_CODE_EFFORT_LEVEL=MAX bun run src/services/worker-service.ts --daemon
# Observe: no `effort` parameter on outgoing requests.
logger.warn fires once.kind: 'unrecoverable' already exists and is the right slot.error.status === 400 is the deterministic signal; the body text check is purely for the user-facing log hint.classifyClaudeProviderError for every 400 — only the effort-parameter sub-case warrants a hint. Generic 400s are noisy enough at the call site./effort/i as auth_invalid — that would trigger the "re-login" flow incorrectly. Use unrecoverable.effort SDK-option that we strip. The SDK type does not expose effort; the leak is the SDK's own subprocess (pathToClaudeCodeExecutable) reading the env var. Stripping at our env layer is the only fix we control.$TIER alias syntax (#2289)Goal: Allow CLAUDE_MEM_MODEL=$TIER:summary (and similar) to resolve at request time to a provider-appropriate model, reusing the existing 'haiku' portable alias machinery (line 116, #1463). Optional phase; can be deferred without blocking Phase 2/3.
src/shared/SettingsDefaultsManager.ts — extend tier interfaceAdd to the SettingsDefaults interface near lines 51–53:
CLAUDE_MEM_TIER_FAST_MODEL: string; // for $TIER:fast — defaults to 'haiku'
CLAUDE_MEM_TIER_SMART_MODEL: string; // for $TIER:smart — defaults to 'sonnet' (or provider-equivalent)
Add to the DEFAULTS block near lines 115–117:
CLAUDE_MEM_TIER_FAST_MODEL: 'haiku',
CLAUDE_MEM_TIER_SMART_MODEL: 'sonnet',
src/services/worker/ClaudeProvider.ts:442–446 — add $TIER resolutionReplace getModelId():
private getModelId(): string {
const settingsPath = paths.settings();
const settings = SettingsDefaultsManager.loadFromFile(settingsPath);
return resolveTierAlias(settings.CLAUDE_MEM_MODEL, settings);
}
Add resolveTierAlias to a shared util (src/services/worker/model-aliases.ts, NEW):
import type { SettingsDefaults } from '../../shared/SettingsDefaultsManager';
const TIER_PATTERN = /^\$TIER:(fast|smart|simple|summary)$/;
export function resolveTierAlias(model: string, settings: SettingsDefaults): string {
const match = TIER_PATTERN.exec(model);
if (!match) return model;
switch (match[1]) {
case 'fast': return settings.CLAUDE_MEM_TIER_FAST_MODEL || 'haiku';
case 'smart': return settings.CLAUDE_MEM_TIER_SMART_MODEL || 'sonnet';
case 'simple': return settings.CLAUDE_MEM_TIER_SIMPLE_MODEL || 'haiku';
case 'summary': return settings.CLAUDE_MEM_TIER_SUMMARY_MODEL || settings.CLAUDE_MEM_MODEL;
default: return model;
}
}
KnowledgeAgent.ts:149 (getModelId)Apply the same resolveTierAlias wrap. Knowledge agent uses the same settings path.
resolveTierAlias('$TIER:fast', settings) returns settings.CLAUDE_MEM_TIER_FAST_MODEL.resolveTierAlias('claude-haiku-4-5-20251001', settings) returns input unchanged (non-tier passthrough).CLAUDE_MEM_MODEL=$TIER:fast and starting the worker actually queries against the fast-tier model.docs/public/configuration.mdx with the four tier aliases.$TIER:* greedily — the regex is anchored.$PROVIDER: or $MODEL: aliases in this phase — out of scope; one syntax at a time.settings inside resolveTierAlias; pure function only.Goal: Every place claude-mem spawns a subprocess must apply both buildIsolatedEnv (or the async variant) AND sanitizeEnv. A grep-based check codifies the rule.
| File | Line | Spawn target | Env construction | Sufficient? |
|---|---|---|---|---|
src/services/worker/ClaudeProvider.ts | 155 | Anthropic SDK subprocess | sanitizeEnv(await buildIsolatedEnvWithFreshOAuth()) | ✅ |
src/services/worker/knowledge/KnowledgeAgent.ts | 54, 149 | Knowledge SDK subprocess | sanitizeEnv(await buildIsolatedEnvWithFreshOAuth()) | ✅ |
src/services/infrastructure/ProcessManager.ts | 415 | Worker daemon | sanitizeEnv({...process.env, CLAUDE_MEM_WORKER_PORT, ...extraEnv}) | ⚠️ daemon inherits parent env then sanitizes — does not pass through buildIsolatedEnv. Document why this is OK: daemon is the trust boundary; parent env IS the truth. But it should still strip CLAUDE_CODE_EFFORT_LEVEL via the prefix filter. Confirm. |
src/services/sync/ChromaMcpManager.ts | 585 | chroma-mcp subprocess | sanitizeEnv(process.env) | ⚠️ same as above. |
src/supervisor/process-registry.ts | 539 | Generic spawn factory | sanitizeEnv(options.env ?? process.env) | ⚠️ same. |
src/services/worker-service.ts | 412 | MCP server subprocess | sanitizeEnv(process.env) | ⚠️ same. |
For the worker-daemon and downstream MCP/chroma spawns, parent-process env IS the source of truth — they are pre-credential paths. As long as CLAUDE_CODE_EFFORT_LEVEL and the Anthropic credentials are stripped (which sanitizeEnv does via CLAUDE_CODE_* prefix and the existing ANTHROPIC_AUTH_TOKEN block), behavior is correct. The plan does not change these paths — it adds tests that prove they stay correct.
tests/env-isolation.test.tsevery documented spawn site applies sanitizeEnv
sanitizeEnv(. Currently GREEN; test prevents regression.worker-daemon spawn env does not contain CLAUDE_CODE_EFFORT_LEVEL
process.env.CLAUDE_CODE_EFFORT_LEVEL = 'MAX'.CLAUDE_CODE_EFFORT_LEVEL. Currently GREEN.buildIsolatedEnv calls to ProcessManager / ChromaMcpManager / MCP server spawn paths. They legitimately need parent-shell PATH, HOME, etc. — those would be wiped by the credential-isolated builder.Goal: Smoke test the proxy/gateway path so we know the fix works in the real world.
# Setup:
cat > ~/.claude-mem/.env <<'EOF'
ANTHROPIC_BASE_URL=https://open.bigmodel.cn/api/anthropic
ANTHROPIC_AUTH_TOKEN=<your-bigmodel-token>
EOF
chmod 600 ~/.claude-mem/.env
# Reset worker:
npm run build-and-sync
pkill -f worker-service.cjs
# Trigger:
# In any Claude Code session, use any tool — PostToolUse hook should land an observation.
# Verify:
tail -f ~/.claude-mem/logs/claude-mem-$(date +%Y-%m-%d).log
# Expect: no "Not logged in" errors; observations land via the proxy.
# Setup:
export CLAUDE_CODE_EFFORT_LEVEL=MAX
export CLAUDE_CODE_ALWAYS_ENABLE_EFFORT=true
# Restart Claude Code so the env propagates to the hook subprocess.
# Verify:
tail -f ~/.claude-mem/logs/claude-mem-$(date +%Y-%m-%d).log
# Expect: NO repeated "API Error: 400 This model does not support the effort parameter."
# Expect: NO "PARSER returned non-XML response; marking messages as failed for retry".
bun test is green.~/.claude-mem/.env is the only knob users need for proxy auth.Goal: A grep-based CI check rejects PRs that introduce a subprocess spawn without sanitizeEnv. Documentation aligns with the new contract.
scripts/check-spawn-env-discipline.cjsPattern from plans/01-hook-io-discipline.md Phase 6 (scripts/check-hook-io-discipline.cjs):
#!/usr/bin/env node
// Forbid raw process.env in subprocess spawn calls. Every spawn must use
// sanitizeEnv(...) and (where credentials are involved) buildIsolatedEnv*.
const { execSync } = require('node:child_process');
const VIOLATIONS = [];
// Find every `spawn(` / `spawnSync(` / `child_process.spawn(` call in src/
const grep = execSync(
`grep -rEn "spawn(Sync)?\\(" src/ | grep -v "node_modules" | grep -v "\\.test\\."`,
{ encoding: 'utf8' },
);
for (const line of grep.split('\n').filter(Boolean)) {
// Allow if the same logical block contains sanitizeEnv
// (heuristic: read 5 lines after the match in the source file)
const [filePath, lineNumStr] = line.split(':', 2);
const lineNum = Number.parseInt(lineNumStr, 10);
const src = require('node:fs').readFileSync(filePath, 'utf8').split('\n');
const window = src.slice(lineNum - 1, lineNum + 8).join('\n');
if (!/sanitizeEnv\s*\(/.test(window)) {
VIOLATIONS.push(`${filePath}:${lineNum} — spawn without sanitizeEnv`);
}
}
if (VIOLATIONS.length > 0) {
console.error('Spawn-env discipline check FAILED:');
VIOLATIONS.forEach(v => console.error(' ' + v));
process.exit(1);
}
console.log('Spawn-env discipline check passed.');
Wire to package.json scripts.test:env-discipline. Add to CI alongside existing hook checks.
CLAUDE.md — document the ~/.claude-mem/.env contractAdd a section under "Configuration":
### Anthropic Credentials (proxies, gateways, BigModel, etc.)
For non-OAuth Anthropic credentials (proxies / gateways / `ANTHROPIC_AUTH_TOKEN` / `ANTHROPIC_API_KEY`), put them in `~/.claude-mem/.env`:
\```
ANTHROPIC_BASE_URL=https://your-proxy.example
ANTHROPIC_AUTH_TOKEN=your-token
\```
The file is read at worker spawn time and re-injected into the SDK subprocess. **Parent-shell exports of these variables are intentionally ignored** — they are in `BLOCKED_ENV_VARS` to prevent host-config bleed-through (#2375).
If you only have an OAuth subscription, no `.env` is needed; the worker reads the token from your keychain at spawn time.
npm run test:env-discipline passes on the post-fix tree.process.env read — only spawn*() call sites need sanitizeEnv. Reads are fine..env file path to .gitignore — it lives in ~/.claude-mem/, not in the repo, so it's already outside.KnowledgeAgent is touched in both plans (this one for getModelId, Plan 05 for tool enforcement). Sequence Plan 05 first (security urgency), then Plan 06.BLOCKED_ENV_VARS is still an Array<string> and not converted to a Set (Phase 2 refactor risk).main before starting (bun test).effort is still absent from src/ (grep -rn "effort" src/) — if a future change adds the parameter, Phase 3.2's regex needs revisiting.node_modules/@anthropic-ai/claude-agent-sdk/sdk.d.ts to confirm query() options does NOT support effort natively. If the SDK adds it, Phase 3.2's body-text regex still works as a fallback, but a code-level strip becomes the right fix.~/.claude-mem/.env permissions are 0o600 post-fix (the saver enforces this; readers should not weaken it).