supermemory Infinite Chat - Supermemory

import GettingAPIKey from '/snippets/getting-api-key.mdx';

supermemory Infinite Chat is a powerful solution that gives your chat applications unlimited contextual memory. It works as a transparent proxy in front of your existing LLM provider, intelligently managing long conversations without requiring any changes to your application logic.

<Tabs> <Tab title="Key Features"> <CardGroup cols={2}> <Card title="Unlimited Context" icon="infinity" color="#4F46E5"> No more token limits - conversations can extend indefinitely </Card> <Card title="Zero Latency" icon="bolt" color="#10B981"> Transparent proxying with negligible overhead </Card> <Card title="Cost Efficient" icon="coins" color="#F59E0B"> Save up to 70% on token costs for long conversations </Card> <Card title="Provider Agnostic" icon="plug" color="#6366F1"> Works with any OpenAI-compatible endpoint </Card> </CardGroup> </Tab> </Tabs>

Getting Started

To use the Infinite Chat endpoint, you need to:

1. Get a supermemory API key

2. Add supermemory in front of any OpenAI-Compatible API URL

typescript

import OpenAI from "openai";

/**
 * Initialize the OpenAI client with supermemory proxy
 * @param {string} OPENAI_API_KEY - Your OpenAI API key
 * @param {string} SUPERMEMORY_API_KEY - Your supermemory API key
 * @returns {OpenAI} - Configured OpenAI client
 */
const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://api.supermemory.ai/v3/https://api.openai.com/v1",
  headers: {
    "x-supermemory-api-key": process.env.SUPERMEMORY_API_KEY,
    "x-sm-user-id": "Your_users_id"
  },
});

python

import openai
import os

# Configure the OpenAI client with supermemory proxy
openai.api_base = "https://api.supermemory.ai/v3/https://api.openai.com/v1"
openai.api_key = os.environ.get("OPENAI_API_KEY")  # Your regular OpenAI key
openai.default_headers = {
    "x-supermemory--api-key": os.environ.get("SUPERMEMORY_API_KEY"),  # Your supermemory key
}

# Create a chat completion with unlimited context
response = openai.ChatCompletion.create(
  model="gpt-5-nano",
  messages=[{"role": "user", "content": "Your message here"}]
)

</CodeGroup>

How It Works

<Steps> <Step title="Transparent Proxying"> All requests pass through supermemory to your chosen LLM provider with zero latency overhead. </Step> <Step title="Intelligent Chunking"> Long conversations are automatically broken down into optimized segments using our proprietary chunking algorithm that preserves semantic coherence. </Step> <Step title="Smart Retrieval"> When conversations exceed token limits (20k+), supermemory intelligently retrieves the most relevant context from previous messages. </Step> <Step title="Automatic Token Management"> The system intelligently balances token usage, ensuring optimal performance while minimizing costs. </Step> </Steps>

Performance Benefits

<Accordion title="Reduced Token Usage" defaultOpen icon="coins"> Save up to 70% on token costs for long conversations through intelligent context management and caching. </Accordion> <Accordion title="Unlimited Context" icon="infinity"> No more 8k/32k/128k token limits - conversations can extend indefinitely with supermemory's advanced retrieval system. </Accordion> <Accordion title="Improved Response Quality" icon="sparkles"> Better context retrieval means more coherent responses even in very long threads, reducing hallucinations and inconsistencies. </Accordion> <Accordion title="Zero Performance Penalty" icon="bolt"> The proxy adds negligible latency to your requests, ensuring fast response times for your users. </Accordion>

Pricing

<Tabs> <Tab title="Plans"> <div className="mt-4"> <div className="grid grid-cols-1 md:grid-cols-3 gap-4"> <div className="p-4 border rounded-lg"> <h3 className="text-lg font-bold">Free Tier</h3> <p className="text-sm text-gray-600 dark:text-gray-300">100k tokens stored at no cost</p> </div> <div className="p-4 border rounded-lg"> <h3 className="text-lg font-bold">Standard Plan</h3> <p className="text-sm text-gray-600 dark:text-gray-300">$20/month fixed cost after exceeding free tier</p> </div> <div className="p-4 border rounded-lg"> <h3 className="text-lg font-bold">Usage-Based</h3> <p className="text-sm text-gray-600 dark:text-gray-300">Each thread includes 20k free tokens, then $1 per million tokens thereafter</p> </div> </div> </div> </Tab> <Tab title="Comparison"> <div className="mt-4"> <table className="min-w-full divide-y divide-gray-200"> <thead> <tr> <th className="px-6 py-3 text-left text-xs font-medium text-gray-500 uppercase tracking-wider"> Feature </th> <th className="px-6 py-3 text-left text-xs font-medium text-gray-500 uppercase tracking-wider"> Free </th> <th className="px-6 py-3 text-left text-xs font-medium text-gray-500 uppercase tracking-wider"> Standard </th> </tr> </thead> <tbody className="divide-y divide-gray-200"> <tr> <td className="px-6 py-4 whitespace-nowrap text-sm"> Tokens Stored </td> <td className="px-6 py-4 whitespace-nowrap text-sm"> 100k </td> <td className="px-6 py-4 whitespace-nowrap text-sm"> Unlimited </td> </tr> <tr> <td className="px-6 py-4 whitespace-nowrap text-sm"> Conversations </td> <td className="px-6 py-4 whitespace-nowrap text-sm"> 10 </td> <td className="px-6 py-4 whitespace-nowrap text-sm"> Unlimited </td> </tr> </tbody> </table> </div> </Tab> </Tabs>

Error Handling

<Note> supermemory is designed with reliability as the top priority. If any issues occur within the supermemory processing pipeline, the system will automatically fall back to direct forwarding of your request to the LLM provider, ensuring zero downtime for your applications. </Note>

Each response includes diagnostic headers that provide information about the processing:

Header	Description
`x-supermemory-conversation-id`	Unique identifier for the conversation thread
`x-supermemory-context-modified`	Indicates whether supermemory modified the context ("true" or "false")
`x-supermemory-tokens-processed`	Number of tokens processed in this request
`x-supermemory-chunks-created`	Number of new chunks created from this conversation
`x-supermemory-chunks-deleted`	Number of chunks removed (if any)
`x-supermemory-docs-deleted`	Number of documents removed (if any)

If an error occurs, an additional header x-supermemory-error will be included with details about what went wrong. Your request will still be processed by the underlying LLM provider even if supermemory encounters an error.

Rate Limiting

<Info> Currently, there are no rate limits specific to supermemory. Your requests are subject only to the rate limits of your underlying LLM provider. </Info>

Supported Models

supermemory works with any OpenAI-compatible API, including:

<CardGroup cols={3}> <Card title="OpenAI" icon="openai"> GPT-3.5, GPT-4, GPT-4o </Card> <Card title="Anthropic" icon="user-astronaut"> Claude 3 models </Card> <Card title="Other Providers" icon="plug"> Any provider with an OpenAI-compatible endpoint </Card> </CardGroup>