Overview - Supermemory

The Memory Router is a transparent proxy that sits between your application and your LLM provider, automatically managing context and memories without requiring any code changes.

<Note> **Live Demo**: Try the Memory Router at [supermemory.chat](https://supermemory.chat) to see it in action. </Note> <Tip> **Using Vercel AI SDK?** Check out our [AI SDK integration](/integrations/ai-sdk) for the cleanest implementation with `@supermemory/tools/ai-sdk` - it's our recommended approach for new projects. </Tip>

What is the Memory Router?

The Memory Router gives your LLM applications:

Unlimited Context: No more token limits - conversations can extend indefinitely
Automatic Memory Management: Intelligently chunks, stores, and retrieves relevant context
Zero Code Changes: Works with your existing OpenAI-compatible clients
Cost Optimization: Save up to 70% on token costs through intelligent context management

How It Works

<Steps> <Step title="Proxy Request"> Your application sends requests to Supermemory instead of directly to your LLM provider </Step> <Step title="Context Management"> Supermemory automatically: - Removes unnecessary context from long conversations - Searches relevant memories from previous interactions - Appends the most relevant context to your prompt </Step> <Step title="Forward to LLM"> The optimized request is forwarded to your chosen LLM provider </Step> <Step title="Async Memory Creation"> New memories are created asynchronously without blocking the response </Step> </Steps>

Key Benefits

For Developers

Drop-in Integration: Just change your base URL - no other code changes needed
Provider Agnostic: Works with OpenAI, Anthropic, Google, Groq, and more
Shared Memory Pool: Memories created via API are available to the Router and vice versa
Automatic Fallback: If Supermemory has issues, requests pass through directly

For Applications

Better Long Conversations: Maintains context even after thousands of messages
Consistent Responses: Memories ensure consistent information across sessions
Smart Retrieval: Only relevant context is included, improving response quality
Cost Savings: Automatic chunking reduces token usage significantly

When to Use the Memory Router

The Memory Router is ideal for:

<Tabs> <Tab title="Perfect For"> - **Chat Applications**: Customer support, AI assistants, chatbots - **Long Conversations**: Sessions that exceed model context windows - **Multi-Session Memory**: Users who return and continue conversations - **Quick Prototypes**: Get memory capabilities without building infrastructure </Tab> <Tab title="Consider API Instead"> - **Custom Retrieval Logic**: Need specific control over what memories to fetch - **Non-Conversational Use**: Document processing, analysis tools - **Complex Filtering**: Need advanced metadata filtering - **Batch Operations**: Processing multiple documents at once </Tab> </Tabs>

Supported Providers

The Memory Router works with any OpenAI-compatible endpoint:

Provider	Base URL	Status
OpenAI	`api.openai.com/v1`	✅ Fully Supported
Anthropic	`api.anthropic.com/v1`	✅ Fully Supported
Google Gemini	`generativelanguage.googleapis.com/v1beta/openai`	✅ Fully Supported
Groq	`api.groq.com/openai/v1`	✅ Fully Supported
DeepInfra	`api.deepinfra.com/v1/openai`	✅ Fully Supported
OpenRouter	`openrouter.ai/api/v1`	✅ Fully Supported
Custom	Any OpenAI-compatible	✅ Supported

<Warning> **Not Yet Supported**: - OpenAI Assistants API (`/v1/assistants`) </Warning>

Authentication

The Memory Router requires two API keys:

Supermemory API Key: For memory management
Provider API Key: For your chosen LLM provider

You can provide these via:

Headers (recommended for production)
URL parameters (useful for testing)
Request body (for compatibility)

How Memories Work

When using the Memory Router:

Automatic Extraction: Important information from conversations is automatically extracted
Intelligent Chunking: Long messages are split into semantic chunks
Relationship Building: New memories connect to existing knowledge
Smart Retrieval: Only the most relevant memories are included in context

<Note> Memories are shared between the Memory Router and Memory API when using the same `user_id`, allowing you to use both together. </Note>

Response Headers

The Memory Router adds diagnostic headers to help you understand what's happening:

Header	Description
`x-supermemory-conversation-id`	Unique conversation identifier
`x-supermemory-context-modified`	Whether context was modified (`true`/`false`)
`x-supermemory-tokens-processed`	Number of tokens processed
`x-supermemory-chunks-created`	New memory chunks created
`x-supermemory-chunks-retrieved`	Memory chunks added to context

Error Handling

The Memory Router is designed for reliability:

Automatic Fallback: If Supermemory encounters an error, your request passes through unmodified
Error Headers: x-supermemory-error header provides error details
Zero Downtime: Your application continues working even if memory features are unavailable

Rate Limits & Pricing

Rate Limits

No Supermemory-specific rate limits
Subject only to your LLM provider's limits

Pricing

Free Tier: 100k tokens stored at no cost
Standard Plan: $20/month after free tier
Usage-Based: Each conversation includes 20k free tokens, then $1 per million tokens