OpenRouter Provider

Claude-mem supports OpenRouter as an alternative provider for observation extraction. OpenRouter provides a unified API to access 100+ models from different providers including Google, Meta, Mistral, DeepSeek, and many others—often with generous free tiers.

<Tip> **Free Models Available**: OpenRouter offers several completely free models, making it an excellent choice for reducing observation extraction costs to zero while maintaining quality. </Tip>

Why Use OpenRouter?

Access to 100+ models: Choose from models across multiple providers through one API
Free tier options: Several high-quality models are completely free to use
Cost flexibility: Pay-as-you-go pricing on premium models with no commitments
Errors throw clearly: 429s, 5xx, and network failures throw — leaving messages pending so they can be retried
Hot-swappable: Switch providers without restarting the worker
Multi-turn conversations: Full conversation history maintained across API calls

Free Models on OpenRouter

OpenRouter actively supports democratizing AI access by offering free models. These are production-ready models suitable for observation extraction.

Featured Free Models

Model	ID	Parameters	Context	Best For
Xiaomi MiMo-V2-Flash	`xiaomi/mimo-v2-flash:free`	309B (15B active, MoE)	256K	Reasoning, coding, agents
Gemini 2.0 Flash	`google/gemini-2.0-flash-exp:free`	—	1M	General purpose
Gemini 2.5 Flash	`google/gemini-2.5-flash-preview:free`	—	1M	Latest capabilities
DeepSeek R1	`deepseek/deepseek-r1:free`	671B	64K	Reasoning, analysis
Llama 3.1 70B	`meta-llama/llama-3.1-70b-instruct:free`	70B	128K	General purpose
Llama 3.1 8B	`meta-llama/llama-3.1-8b-instruct:free`	8B	128K	Fast, lightweight
Mistral Nemo	`mistralai/mistral-nemo:free`	12B	128K	Efficient performance

<Note> **Default Model**: Claude-mem uses `xiaomi/mimo-v2-flash:free` by default—a 309B parameter mixture-of-experts model that ranks #1 on SWE-bench Verified and excels at coding and reasoning tasks. </Note>

Free Model Considerations

Rate limits: Free models may have stricter rate limits than paid models
Availability: Free capacity depends on provider partnerships and demand
Queue times: During peak usage, requests may be queued briefly
Max tokens: Most free models support 65,536 completion tokens

All free models support:

Tool use and function calling
Temperature and sampling controls
Stop sequences
Streaming responses

Getting an API Key

Go to OpenRouter
Sign in with Google, GitHub, or email
Navigate to API Keys
Click Create Key
Copy and securely store your API key

<Tip> **Free to start**: No credit card required to create an account or use free models. Add credits only if you want to use premium models. </Tip>

Configuration

Settings

Setting	Values	Default	Description
`CLAUDE_MEM_PROVIDER`	`claude`, `gemini`, `openrouter`	`claude`	AI provider for observation extraction
`CLAUDE_MEM_OPENROUTER_API_KEY`	string	—	Your OpenRouter API key
`CLAUDE_MEM_OPENROUTER_MODEL`	string	`xiaomi/mimo-v2-flash:free`	Model identifier (see list above)
`CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES`	number	`20`	Max messages in conversation history
`CLAUDE_MEM_OPENROUTER_MAX_TOKENS`	number	`100000`	Token budget safety limit
`CLAUDE_MEM_OPENROUTER_SITE_URL`	string	—	Optional: URL for analytics attribution
`CLAUDE_MEM_OPENROUTER_APP_NAME`	string	`claude-mem`	Optional: App name for analytics

Using the Settings UI

Open the viewer at http://localhost:37777
Click the gear icon to open Settings
Under AI Provider, select OpenRouter
Enter your OpenRouter API key
Optionally select a different model

Settings are applied immediately—no restart required.

Manual Configuration

Edit ~/.claude-mem/settings.json:

json

{
  "CLAUDE_MEM_PROVIDER": "openrouter",
  "CLAUDE_MEM_OPENROUTER_API_KEY": "sk-or-v1-your-key-here",
  "CLAUDE_MEM_OPENROUTER_MODEL": "xiaomi/mimo-v2-flash:free"
}

Alternatively, set the API key via environment variable:

bash

export OPENROUTER_API_KEY="sk-or-v1-your-key-here"

The settings file takes precedence over the environment variable.

Model Selection Guide

For Free Usage (No Cost)

Recommended: xiaomi/mimo-v2-flash:free

Best-in-class performance on coding benchmarks
256K context window handles large observations
65K max completion tokens
Mixture-of-experts architecture (15B active parameters)

Alternatives:

google/gemini-2.0-flash-exp:free - 1M context, Google's flagship
deepseek/deepseek-r1:free - Excellent reasoning capabilities
meta-llama/llama-3.1-70b-instruct:free - Strong general purpose

For Paid Usage (Higher Quality/Speed)

Model	Price (per 1M tokens)	Best For
`anthropic/claude-3.5-sonnet`	$3 in / $15 out	Highest quality observations
`google/gemini-2.0-flash`	$0.075 in / $0.30 out	Fast, cost-effective
`openai/gpt-4o`	$2.50 in / $10 out	GPT-4 quality

Context Window Management

OpenRouter agent implements intelligent context management to prevent runaway costs:

Automatic Truncation

The agent uses a sliding window strategy:

Checks if message count exceeds MAX_CONTEXT_MESSAGES (default: 20)
Checks if estimated tokens exceed MAX_TOKENS (default: 100,000)
If limits exceeded, keeps most recent messages only
Logs warnings with dropped message counts

Token Estimation

Conservative estimate: 1 token ≈ 4 characters
Used for proactive context management
Actual usage logged from API response

Cost Tracking

Logs include detailed usage information:

OpenRouter API usage: {
  model: "xiaomi/mimo-v2-flash:free",
  inputTokens: 2500,
  outputTokens: 1200,
  totalTokens: 3700,
  estimatedCostUSD: "0.00",
  messagesInContext: 8
}

Provider Switching

You can switch between providers at any time:

No restart required: Changes take effect on the next observation
Conversation history preserved: When switching mid-session, the new provider sees the full conversation context
Seamless transition: All providers use the same observation format

Switching via UI

Open Settings in the viewer
Change the AI Provider dropdown
The next observation will use the new provider

Switching via Settings File

json

{
  "CLAUDE_MEM_PROVIDER": "openrouter"
}

Error Behavior

If OpenRouter errors, claude-mem logs the failure and re-throws so the message stays pending for later retry. There is no Claude SDK fallback — earlier docs claimed automatic Claude fallback, but the wiring was never actually engaged in production (#2087). To switch providers, change CLAUDE_MEM_PROVIDER in settings.

Throwing conditions:

Rate limiting (HTTP 429)
Server errors (HTTP 500, 502, 503)
Network issues (connection refused, timeout)
4xx errors other than 429
Missing API key

Multi-Turn Conversation Support

OpenRouter agent maintains full conversation history across API calls:

Session Created
  ↓
Load Pending Messages (observations from queue)
  ↓
For each message:
  → Add to conversation history
  → Call OpenRouter API with FULL history
  → Parse XML response
  → Store observations in database
  → Sync to Chroma vector DB
  ↓
Session complete

This enables:

Coherent multi-turn exchanges
Context preservation across observations
Seamless provider switching mid-session

Troubleshooting

"OpenRouter API key not configured"

Either:

Set CLAUDE_MEM_OPENROUTER_API_KEY in ~/.claude-mem/settings.json, or
Set the OPENROUTER_API_KEY environment variable

Rate Limiting

Free models may have rate limits during peak usage. If you hit rate limits:

The agent throws and leaves the message pending — it will be retried later
Consider switching to a different free model
Add credits for premium model access

Model Not Found

Verify the model ID is correct:

Check OpenRouter Models for current availability
Use the :free suffix for free model variants
Model IDs are case-sensitive

High Token Usage Warning

If you see warnings about high token usage (>50,000 per request):

Reduce CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES
Reduce CLAUDE_MEM_OPENROUTER_MAX_TOKENS
Consider a model with larger context window

Connection Errors

If you see connection errors:

Check your internet connection
Verify OpenRouter service status at status.openrouter.ai
The agent throws and leaves the message pending for later retry

API Details

OpenRouter uses an OpenAI-compatible REST API:

Endpoint: https://openrouter.ai/api/v1/chat/completions

Headers:

Authorization: Bearer {apiKey}
HTTP-Referer: https://github.com/thedotmack/claude-mem
X-Title: claude-mem
Content-Type: application/json

Request Format:

json

{
  "model": "xiaomi/mimo-v2-flash:free",
  "messages": [
    {"role": "system", "content": "..."},
    {"role": "user", "content": "..."}
  ],
  "temperature": 0.3,
  "max_tokens": 4096
}

Comparing Providers

Feature	Claude (SDK)	Gemini	OpenRouter
Cost	Pay per token	Free tier + paid	Free models + paid
Models	Claude only	Gemini only	100+ models
Quality	Highest	High	Varies by model
Rate limits	Based on tier	5-4000 RPM	Varies by model
On error	Throws	Throws	Throws
Setup	Automatic	API key required	API key required

<Tip> **Recommendation**: Start with OpenRouter's free `xiaomi/mimo-v2-flash:free` model for zero-cost observation extraction. If you need higher quality or encounter rate limits, switch to Claude or add OpenRouter credits for premium models. </Tip>

Next Steps

Configuration - Full settings reference
Gemini Provider - Alternative free provider
Getting Started - Basic usage guide
Troubleshooting - Common issues