Back to Claude Mem

OpenRouter Provider

docs/public/usage/openrouter-provider.mdx

12.7.110.3 KB
Original Source

OpenRouter Provider

Claude-mem supports OpenRouter as an alternative provider for observation extraction. OpenRouter provides a unified API to access 100+ models from different providers including Google, Meta, Mistral, DeepSeek, and many others—often with generous free tiers.

<Tip> **Free Models Available**: OpenRouter offers several completely free models, making it an excellent choice for reducing observation extraction costs to zero while maintaining quality. </Tip>

Why Use OpenRouter?

  • Access to 100+ models: Choose from models across multiple providers through one API
  • Free tier options: Several high-quality models are completely free to use
  • Cost flexibility: Pay-as-you-go pricing on premium models with no commitments
  • Errors throw clearly: 429s, 5xx, and network failures throw — leaving messages pending so they can be retried
  • Hot-swappable: Switch providers without restarting the worker
  • Multi-turn conversations: Full conversation history maintained across API calls

Free Models on OpenRouter

OpenRouter actively supports democratizing AI access by offering free models. These are production-ready models suitable for observation extraction.

ModelIDParametersContextBest For
Xiaomi MiMo-V2-Flashxiaomi/mimo-v2-flash:free309B (15B active, MoE)256KReasoning, coding, agents
Gemini 2.0 Flashgoogle/gemini-2.0-flash-exp:free1MGeneral purpose
Gemini 2.5 Flashgoogle/gemini-2.5-flash-preview:free1MLatest capabilities
DeepSeek R1deepseek/deepseek-r1:free671B64KReasoning, analysis
Llama 3.1 70Bmeta-llama/llama-3.1-70b-instruct:free70B128KGeneral purpose
Llama 3.1 8Bmeta-llama/llama-3.1-8b-instruct:free8B128KFast, lightweight
Mistral Nemomistralai/mistral-nemo:free12B128KEfficient performance
<Note> **Default Model**: Claude-mem uses `xiaomi/mimo-v2-flash:free` by default—a 309B parameter mixture-of-experts model that ranks #1 on SWE-bench Verified and excels at coding and reasoning tasks. </Note>

Free Model Considerations

  • Rate limits: Free models may have stricter rate limits than paid models
  • Availability: Free capacity depends on provider partnerships and demand
  • Queue times: During peak usage, requests may be queued briefly
  • Max tokens: Most free models support 65,536 completion tokens

All free models support:

  • Tool use and function calling
  • Temperature and sampling controls
  • Stop sequences
  • Streaming responses

Getting an API Key

  1. Go to OpenRouter
  2. Sign in with Google, GitHub, or email
  3. Navigate to API Keys
  4. Click Create Key
  5. Copy and securely store your API key
<Tip> **Free to start**: No credit card required to create an account or use free models. Add credits only if you want to use premium models. </Tip>

Configuration

Settings

SettingValuesDefaultDescription
CLAUDE_MEM_PROVIDERclaude, gemini, openrouterclaudeAI provider for observation extraction
CLAUDE_MEM_OPENROUTER_API_KEYstringYour OpenRouter API key
CLAUDE_MEM_OPENROUTER_MODELstringxiaomi/mimo-v2-flash:freeModel identifier (see list above)
CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGESnumber20Max messages in conversation history
CLAUDE_MEM_OPENROUTER_MAX_TOKENSnumber100000Token budget safety limit
CLAUDE_MEM_OPENROUTER_SITE_URLstringOptional: URL for analytics attribution
CLAUDE_MEM_OPENROUTER_APP_NAMEstringclaude-memOptional: App name for analytics

Using the Settings UI

  1. Open the viewer at http://localhost:37777
  2. Click the gear icon to open Settings
  3. Under AI Provider, select OpenRouter
  4. Enter your OpenRouter API key
  5. Optionally select a different model

Settings are applied immediately—no restart required.

Manual Configuration

Edit ~/.claude-mem/settings.json:

json
{
  "CLAUDE_MEM_PROVIDER": "openrouter",
  "CLAUDE_MEM_OPENROUTER_API_KEY": "sk-or-v1-your-key-here",
  "CLAUDE_MEM_OPENROUTER_MODEL": "xiaomi/mimo-v2-flash:free"
}

Alternatively, set the API key via environment variable:

bash
export OPENROUTER_API_KEY="sk-or-v1-your-key-here"

The settings file takes precedence over the environment variable.

Model Selection Guide

For Free Usage (No Cost)

Recommended: xiaomi/mimo-v2-flash:free

  • Best-in-class performance on coding benchmarks
  • 256K context window handles large observations
  • 65K max completion tokens
  • Mixture-of-experts architecture (15B active parameters)

Alternatives:

  • google/gemini-2.0-flash-exp:free - 1M context, Google's flagship
  • deepseek/deepseek-r1:free - Excellent reasoning capabilities
  • meta-llama/llama-3.1-70b-instruct:free - Strong general purpose

For Paid Usage (Higher Quality/Speed)

ModelPrice (per 1M tokens)Best For
anthropic/claude-3.5-sonnet$3 in / $15 outHighest quality observations
google/gemini-2.0-flash$0.075 in / $0.30 outFast, cost-effective
openai/gpt-4o$2.50 in / $10 outGPT-4 quality

Context Window Management

OpenRouter agent implements intelligent context management to prevent runaway costs:

Automatic Truncation

The agent uses a sliding window strategy:

  1. Checks if message count exceeds MAX_CONTEXT_MESSAGES (default: 20)
  2. Checks if estimated tokens exceed MAX_TOKENS (default: 100,000)
  3. If limits exceeded, keeps most recent messages only
  4. Logs warnings with dropped message counts

Token Estimation

  • Conservative estimate: 1 token ≈ 4 characters
  • Used for proactive context management
  • Actual usage logged from API response

Cost Tracking

Logs include detailed usage information:

OpenRouter API usage: {
  model: "xiaomi/mimo-v2-flash:free",
  inputTokens: 2500,
  outputTokens: 1200,
  totalTokens: 3700,
  estimatedCostUSD: "0.00",
  messagesInContext: 8
}

Provider Switching

You can switch between providers at any time:

  • No restart required: Changes take effect on the next observation
  • Conversation history preserved: When switching mid-session, the new provider sees the full conversation context
  • Seamless transition: All providers use the same observation format

Switching via UI

  1. Open Settings in the viewer
  2. Change the AI Provider dropdown
  3. The next observation will use the new provider

Switching via Settings File

json
{
  "CLAUDE_MEM_PROVIDER": "openrouter"
}

Error Behavior

If OpenRouter errors, claude-mem logs the failure and re-throws so the message stays pending for later retry. There is no Claude SDK fallback — earlier docs claimed automatic Claude fallback, but the wiring was never actually engaged in production (#2087). To switch providers, change CLAUDE_MEM_PROVIDER in settings.

Throwing conditions:

  • Rate limiting (HTTP 429)
  • Server errors (HTTP 500, 502, 503)
  • Network issues (connection refused, timeout)
  • 4xx errors other than 429
  • Missing API key

Multi-Turn Conversation Support

OpenRouter agent maintains full conversation history across API calls:

Session Created
  ↓
Load Pending Messages (observations from queue)
  ↓
For each message:
  → Add to conversation history
  → Call OpenRouter API with FULL history
  → Parse XML response
  → Store observations in database
  → Sync to Chroma vector DB
  ↓
Session complete

This enables:

  • Coherent multi-turn exchanges
  • Context preservation across observations
  • Seamless provider switching mid-session

Troubleshooting

"OpenRouter API key not configured"

Either:

  • Set CLAUDE_MEM_OPENROUTER_API_KEY in ~/.claude-mem/settings.json, or
  • Set the OPENROUTER_API_KEY environment variable

Rate Limiting

Free models may have rate limits during peak usage. If you hit rate limits:

  • The agent throws and leaves the message pending — it will be retried later
  • Consider switching to a different free model
  • Add credits for premium model access

Model Not Found

Verify the model ID is correct:

  • Check OpenRouter Models for current availability
  • Use the :free suffix for free model variants
  • Model IDs are case-sensitive

High Token Usage Warning

If you see warnings about high token usage (>50,000 per request):

  • Reduce CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES
  • Reduce CLAUDE_MEM_OPENROUTER_MAX_TOKENS
  • Consider a model with larger context window

Connection Errors

If you see connection errors:

  • Check your internet connection
  • Verify OpenRouter service status at status.openrouter.ai
  • The agent throws and leaves the message pending for later retry

API Details

OpenRouter uses an OpenAI-compatible REST API:

Endpoint: https://openrouter.ai/api/v1/chat/completions

Headers:

Authorization: Bearer {apiKey}
HTTP-Referer: https://github.com/thedotmack/claude-mem
X-Title: claude-mem
Content-Type: application/json

Request Format:

json
{
  "model": "xiaomi/mimo-v2-flash:free",
  "messages": [
    {"role": "system", "content": "..."},
    {"role": "user", "content": "..."}
  ],
  "temperature": 0.3,
  "max_tokens": 4096
}

Comparing Providers

FeatureClaude (SDK)GeminiOpenRouter
CostPay per tokenFree tier + paidFree models + paid
ModelsClaude onlyGemini only100+ models
QualityHighestHighVaries by model
Rate limitsBased on tier5-4000 RPMVaries by model
On errorThrowsThrowsThrows
SetupAutomaticAPI key requiredAPI key required
<Tip> **Recommendation**: Start with OpenRouter's free `xiaomi/mimo-v2-flash:free` model for zero-cost observation extraction. If you need higher quality or encounter rate limits, switch to Claude or add OpenRouter credits for premium models. </Tip>

Next Steps