docs/public/configuration/litellm-gateway.mdx
claude-mem can route its background memory agent through a LiteLLM proxy. This lets teams keep claude-mem's Claude Agent SDK workflow while using LiteLLM for model routing, centralized credentials, usage tracking, budgets, audit logs, and provider failover.
The important detail: claude-mem does not call LiteLLM with the OpenAI client directly. claude-mem still uses the Claude Agent SDK, and the SDK sends Anthropic-format requests to LiteLLM. LiteLLM then translates those requests to the upstream model provider you configured.
Claude Code session
-> claude-mem hooks
-> claude-mem worker
-> Claude Agent SDK subprocess
-> ANTHROPIC_BASE_URL=http://localhost:4000
-> LiteLLM proxy
-> OpenAI / Azure / Vertex / Bedrock / OpenRouter / local model
This keeps the memory agent on one implementation path. The Claude provider, knowledge agents, session resume behavior, XML observation prompts, and queue retry logic all continue to use the same SDK code path whether the upstream model is Anthropic or routed through LiteLLM.
Use LiteLLM gateway mode when you want:
Use the native OpenRouter Provider or Gemini Provider instead if you want claude-mem's REST providers directly and do not need the Claude Agent SDK path.
The LiteLLM integration is intentionally small. There is no custom LiteLLM provider, no Python handler, and no OpenAI-compatible server embedded in claude-mem.
At runtime:
~/.claude-mem/.env.~/.claude-mem/settings.json keeps CLAUDE_MEM_PROVIDER set to claude.ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN.The code paths involved are:
| Layer | Responsibility |
|---|---|
src/npx-cli/commands/install.ts | Prompts for "LiteLLM or custom gateway", stores the gateway URL/token, and allows custom gateway model names |
src/shared/EnvManager.ts | Stores credentials in ~/.claude-mem/.env, blocks shell-leaked auth vars, and injects only explicit claude-mem credentials |
src/services/worker/ClaudeProvider.ts | Starts the Claude Agent SDK for observation extraction with the isolated environment |
src/services/worker/knowledge/KnowledgeAgent.ts | Uses the same isolated SDK path for knowledge corpus Q&A |
CLAUDE_MEM_PROVIDER Stays claudeLiteLLM is a gateway for the Claude Agent SDK path, not a fourth claude-mem provider.
{
"CLAUDE_MEM_PROVIDER": "claude",
"CLAUDE_MEM_CLAUDE_AUTH_METHOD": "gateway",
"CLAUDE_MEM_MODEL": "claude-haiku-4-5-20251001"
}
Keeping the provider as claude matters because the worker should continue to use ClaudeProvider, not the native Gemini or OpenRouter REST providers. The gateway URL changes where the SDK sends model traffic; it does not change how claude-mem manages memory sessions.
LiteLLM must expose an Anthropic-compatible endpoint for Claude Code / Claude Agent SDK traffic. Anthropic's gateway guidance recommends the unified LiteLLM endpoint as the normal setup:
export ANTHROPIC_BASE_URL=http://localhost:4000
For claude-mem, that value goes in ~/.claude-mem/.env, not your shell, so the background worker uses it consistently across restarts.
Create a LiteLLM config that defines the model name claude-mem will request:
# litellm-config.yaml
model_list:
- model_name: claude-haiku-4-5-20251001
litellm_params:
model: openai/gpt-4o-mini
api_key: os.environ/OPENAI_API_KEY
litellm_settings:
master_key: sk-litellm-local
Start LiteLLM:
OPENAI_API_KEY=sk-your-openai-key \
litellm --config litellm-config.yaml --host 127.0.0.1 --port 4000
In this example, claude-mem asks the SDK for claude-haiku-4-5-20251001, LiteLLM accepts that model alias, and LiteLLM forwards the request to openai/gpt-4o-mini.
Run the installer:
npx claude-mem install
Choose:
Claude Agent SDKAPI key or gatewayLiteLLM or custom gatewayhttp://127.0.0.1:4000The installer stores provider settings in ~/.claude-mem/settings.json and gateway credentials in ~/.claude-mem/.env.
Edit ~/.claude-mem/settings.json:
{
"CLAUDE_MEM_PROVIDER": "claude",
"CLAUDE_MEM_CLAUDE_AUTH_METHOD": "gateway",
"CLAUDE_MEM_MODEL": "claude-haiku-4-5-20251001"
}
Edit ~/.claude-mem/.env:
# ~/.claude-mem/.env
ANTHROPIC_BASE_URL=http://127.0.0.1:4000
ANTHROPIC_AUTH_TOKEN=sk-litellm-local
If your LiteLLM proxy does not require authentication, omit ANTHROPIC_AUTH_TOKEN.
Restart the worker after manual edits:
npm run worker:restart
claude-mem deliberately does not trust whatever Anthropic credentials happen to be exported in your shell or project .env file.
The worker blocks inherited ANTHROPIC_API_KEY, ANTHROPIC_AUTH_TOKEN, and stale CLAUDE_CODE_OAUTH_TOKEN values. It then re-injects only the credentials stored in ~/.claude-mem/.env.
This avoids two common failure modes:
ANTHROPIC_API_KEY silently bypasses LiteLLM and bills the public Anthropic API.If ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, or ANTHROPIC_API_KEY is present in ~/.claude-mem/.env, the worker treats that as explicit gateway/API configuration and skips Claude OAuth lookup. This prevents a configured gateway from falling back to api.anthropic.com.
CLAUDE_MEM_MODEL is passed through to the Claude Agent SDK. In gateway mode, claude-mem allows any non-empty model string because the valid model list is owned by LiteLLM.
Recommended pattern:
model_list:
- model_name: claude-haiku-4-5-20251001
litellm_params:
model: openai/gpt-4o-mini
api_key: os.environ/OPENAI_API_KEY
Then keep:
{
"CLAUDE_MEM_MODEL": "claude-haiku-4-5-20251001"
}
Alternatively, use a descriptive custom alias:
model_list:
- model_name: memory-compressor
litellm_params:
model: azure/gpt-4o-mini-memory
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2024-10-21"
{
"CLAUDE_MEM_MODEL": "memory-compressor"
}
When a Claude Code session produces transcript events, claude-mem's worker queues them for observation extraction. In gateway mode the extraction flow is:
ClaudeProvider builds the observation prompt and selected model.buildIsolatedEnvWithFreshOAuth() loads ~/.claude-mem/.env.ANTHROPIC_BASE_URL pointing at LiteLLM.The knowledge-agent APIs use the same gateway environment, so corpus priming and corpus Q&A route through LiteLLM too.
LiteLLM replaces:
LiteLLM does not replace:
Check claude-mem's worker logs:
npm run worker:logs
You should see SDK startup logs that report gateway auth, followed by normal observation processing.
Check LiteLLM's logs for a corresponding request to the configured model alias. If LiteLLM never receives traffic, confirm:
CLAUDE_MEM_PROVIDER is claudeCLAUDE_MEM_CLAUDE_AUTH_METHOD is gatewayANTHROPIC_BASE_URL is in ~/.claude-mem/.env/v1 suffix for the unified Anthropic endpointThe model name sent by claude-mem does not match a LiteLLM model_name. Make CLAUDE_MEM_MODEL and the LiteLLM alias match exactly.
Check ~/.claude-mem/.env. Gateway settings must be stored there. Shell exports are not the reliable configuration source for the worker.
Also make sure ANTHROPIC_BASE_URL is present. A token alone authenticates a gateway, but the base URL is what redirects traffic away from the default Anthropic endpoint.
If LiteLLM uses a master key or virtual key, store it as ANTHROPIC_AUTH_TOKEN in ~/.claude-mem/.env. The Claude Agent SDK sends this value as gateway authorization.
If you previously configured a direct Anthropic API key, remove ANTHROPIC_API_KEY from ~/.claude-mem/.env for gateway mode unless your gateway explicitly expects that variable.
Restart the worker:
npm run worker:restart
The SDK environment is built when SDK subprocesses are spawned. Restarting guarantees the next memory agent process sees the new gateway values.
claude-mem's memory worker disables file and shell tools for observation extraction. The LiteLLM gateway is only handling the model call used to compress and summarize memory; it is not a replacement for your interactive Claude Code tool loop.