apps/docs/memory-api/ingesting.mdx
Supermemory provides a powerful and flexible ingestion system that can process virtually any type of content. Whether you're adding simple text notes, web pages, PDFs, images, or complex documents from various platforms, our API handles it all seamlessly.
Before diving into the API, it's important to understand how Supermemory processes your content:
When you use the "Add Memory" endpoint, you're actually adding a document. Supermemory's job is to intelligently break that document into optimal memories that can be searched and retrieved.
Your Content → Document → Processing → Multiple Memories
↓ ↓ ↓ ↓
PDF File → Stored Doc → Chunking → Searchable Memories
You can visualize this process in the Supermemory Console where you'll see a graph view showing how your documents are broken down into interconnected memories.
Supermemory accepts content through three main methods:
The ingestion system consists of several key components:
The primary endpoint for adding content that will be processed into documents.
Endpoint: POST /v3/documents
curl https://api.supermemory.ai/v3/documents \
-H "Authorization: Bearer $SUPERMEMORY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "Machine learning is a subset of artificial intelligence that enables computers to learn and make decisions from data without explicit programming.",
"containerTags": ["ai-research", "user_123"],
"metadata": {
"source": "research-notes",
"category": "education",
"priority": "high"
},
"customId": "ml-basics-001"
}'
import Supermemory from 'supermemory'
const client = new Supermemory({
apiKey: process.env.SUPERMEMORY_API_KEY
})
async function addContent() {
const result = await client.add({
content: "Machine learning is a subset of artificial intelligence...",
containerTags: ["ai-research"],
metadata: {
source: "research-notes",
category: "education",
priority: "high"
},
customId: "ml-basics-001"
})
console.log(result) // { id: "abc123", status: "queued" }
}
addContent()
from supermemory import Supermemory
import os
client = Supermemory(api_key=os.environ.get("SUPERMEMORY_API_KEY"))
result = client.add(
content="Machine learning is a subset of artificial intelligence...",
container_tags=["ai-research"],
metadata={
"source": "research-notes",
"category": "education",
"priority": "high"
},
custom_id="ml-basics-001"
)
print(result) # { "id": "abc123", "status": "queued" }
| Parameter | Type | Required | Description |
|---|---|---|---|
content | string | Yes | The content to process into a document. Can be text, URL, or other supported formats |
containerTag | string | No | Recommended: Single tag to group related memories in a space. Defaults to "sm_project_default" |
containerTags | string[] | No | Legacy array format. Use containerTag instead for better performance |
metadata | object | No | Additional key-value metadata (strings, numbers, booleans only) |
customId | string | No | Your own identifier for this document (max 255 characters) |
raw | string | No | Raw content to store alongside processed content |
When you successfully create a document, you'll get back a simple confirmation with the document ID and its initial processing status:
{
"id": "D2Ar7Vo7ub83w3PRPZcaP1",
"status": "queued"
}
What this means:
id: Your document's unique identifier - save this to track processing or reference laterstatus: Current processing state. "queued" means it's waiting to be processed into memoriesGot a PDF, image, or video? Upload it directly and let Supermemory extract the valuable content automatically.
Endpoint: POST /v3/documents/file
What makes this powerful: Instead of manually copying text from PDFs or transcribing videos, just upload the file. Supermemory handles OCR for images, transcription for videos, and intelligent text extraction for documents.
<CodeGroup>curl https://api.supermemory.ai/v3/documents/file \
-H "Authorization: Bearer $SUPERMEMORY_API_KEY" \
-F "[email protected]" \
-F "containerTags=research_project"
# Response:
# {
# "id": "Mx7fK9pL2qR5tE8yU4nC7",
# "status": "processing"
# }
import Supermemory from 'supermemory'
import fs from 'fs'
const client = new Supermemory({
apiKey: process.env.SUPERMEMORY_API_KEY
})
// Method 1: Using SDK uploadFile method (RECOMMENDED)
const result = await client.documents.uploadFile({
file: fs.createReadStream('/path/to/document.pdf'),
containerTags: 'research_project' // String, not array!
})
// Method 2: Using fetch with form data (for browser/manual implementation)
const formData = new FormData()
formData.append('file', fileInput.files[0])
formData.append('containerTags', 'research_project')
const response = await fetch('https://api.supermemory.ai/v3/documents/file', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.SUPERMEMORY_API_KEY}`
},
body: formData
})
const result = await response.json()
console.log(result)
// Output: { id: "Mx7fK9pL2qR5tE8yU4nC7", status: "processing" }
from supermemory import Supermemory
client = Supermemory(api_key="your_api_key")
# Method 1: Using SDK upload_file method (RECOMMENDED)
result = client.documents.upload_file(
file=open('document.pdf', 'rb'),
container_tags='research_project' # String parameter name
)
# Method 2: Using requests with form data
import requests
files = {'file': open('document.pdf', 'rb')}
data = {'containerTags': 'research_project'}
response = requests.post(
'https://api.supermemory.ai/v3/documents/file',
headers={'Authorization': f'Bearer {api_key}'},
files=files,
data=data
)
result = response.json()
print(result)
# Output: {'id': 'Mx7fK9pL2qR5tE8yU4nC7', 'status': 'processing'}
Supermemory automatically detects content types based on:
type MemoryType =
| 'text' // Plain text content
| 'pdf' // PDF documents
| 'tweet' // Twitter/X posts
| 'google_doc' // Google Docs
| 'google_slide'// Google Slides
| 'google_sheet'// Google Sheets
| 'image' // Images with OCR
| 'video' // Videos with transcription
| 'notion_doc' // Notion pages
| 'webpage' // Web pages
| 'onedrive' // OneDrive documents
// Examples of automatic detection
const examples = {
"https://twitter.com/user/status/123": "tweet",
"https://youtube.com/watch?v=abc": "video",
"https://docs.google.com/document/d/123": "google_doc",
"https://docs.google.com/spreadsheets/d/123": "google_sheet",
"https://docs.google.com/presentation/d/123": "google_slide",
"https://notion.so/page-123": "notion_doc",
"https://example.com": "webpage",
"Regular text content": "text",
// PDF files uploaded → "pdf"
// Image files uploaded → "image"
// OneDrive links → "onedrive"
}
Each content type follows a specialized processing pipeline:
<Accordion title="Text Content" defaultOpen> Content is cleaned, normalized, and chunked for optimal retrieval:type: 'webpage'
</Accordion>
Scroll right to see more.
<Tabs> <Tab title="Authentication Errors"> ```json // AuthenticationError class { name: "AuthenticationError", status: 401, message: "401 Unauthorized", error: { message: "Invalid API key", type: "authentication_error" } } ``` **Causes:** - Missing or invalid API key - Expired authentication token - Incorrect authorization header format </Tab> <Tab title="Bad Request Errors (400)"> ```json // BadRequestError class { name: "BadRequestError", status: 400, message: "400 Bad Request", error: { message: "Invalid request parameters", details: { content: "Content cannot be empty", customId: "customId exceeds maximum length" } } } ``` **Causes:** - Missing required fields - Invalid parameter types - Content too large - Custom ID too long - Invalid metadata structure </Tab> <Tab title="Rate Limiting (429)"> ```json // RateLimitError class { name: "RateLimitError", status: 429, // NOT 402! message: "429 Too Many Requests", error: { message: "Rate limit exceeded", retry_after: 60 } } ``` **Causes:** - Monthly token quota exceeded - Rate limits exceeded - Subscription limits reached**Fix:** Implement exponential backoff and respect rate limits
Causes:
// APIConnectionTimeoutError class - NEW
{
name: "APIConnectionTimeoutError",
message: "Request timed out."
}
Causes:
- Network connectivity issues
- DNS resolution failures
- Request timeouts
- Proxy/firewall blocking
</Tab>
</Tabs>
## Best Practices
### Container Tags: Optimize for Performance
Use single container tags for better query performance. Multiple tags are supported but increase latency.
```json
{
"content": "Updated authentication flow to use JWT tokens",
"containerTags": "[project_alpha]",
"metadata": {
"type": "technical_change",
"author": "sarah_dev",
"impact": "breaking"
}
}
Single vs Multiple Tags
// ✅ Recommended: Single tag, faster queries
{ "containerTags": ["project_alpha"] }
// ⚠️ Allowed but slower: Multiple tags increase latency
{ "containerTags": ["project_alpha", "auth", "backend"] }
Why single tags perform better:
Custom IDs prevent duplicates and enable document updates. Two update methods available.
Method 1: POST with customId (Upsert)
# Create document
curl -X POST "https://api.supermemory.ai/v3/documents" \
-H "Authorization: Bearer $SUPERMEMORY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "API uses REST endpoints",
"customId": "api_docs_v1",
"containerTags": ["project_alpha"]
}'
# Response: {"id": "abc123", "status": "queued"}
# Update same document (same customId = upsert)
curl -X POST "https://api.supermemory.ai/v3/documents" \
-H "Authorization: Bearer $SUPERMEMORY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "API migrated to GraphQL",
"customId": "api_docs_v1",
"containerTags": ["project_alpha"]
}'
Method 2: PATCH by ID (Update)
curl -X PATCH "https://api.supermemory.ai/v3/documents/abc123" \
-H "Authorization: Bearer $SUPERMEMORY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "API now uses GraphQL with caching",
"metadata": {"version": 3}
}'
Custom ID Patterns
// External system sync
"jira_PROJ_123"
"confluence_456789"
"github_issue_987"
// Database entities
"user_profile_12345"
"order_67890"
// Versioned content
"meeting_2024_01_15"
"api_docs_auth"
"requirements_v3"
Update Behavior
id and customId.Token Usage
"Hello world" // ≈ 2 tokens
"10-page PDF" // ≈ 2,000-4,000 tokens
"YouTube video (10 min)" // ≈ 1,500-3,000 tokens
"Web article" // ≈ 500-2,000 tokens
Current Limits
| Feature | Free | Starter | Growth |
|---|---|---|---|
| Memory Tokens/month | 100,000 | 1,000,000 | 10,000,000 |
| Search Queries/month | 1,000 | 10,000 | 100,000 |
Limit Exceeded Response
curl -X POST "https://api.supermemory.ai/v3/documents" \
-H "Authorization: Bearer your_api_key" \
-d '{"content": "Some content"}'
Response:
{"error": "Memory token limit reached", "status": 402}
Process large volumes efficiently with rate limiting and error recovery.
interface Document {
id: string;
content: string;
title?: string;
createdAt?: string;
metadata?: Record<string, string | number | boolean>;
}
async function batchIngest(documents: Document[], options = {}) {
const {
batchSize = 5,
delayBetweenBatches = 2000,
maxRetries = 3
} = options;
const results = [];
for (let i = 0; i < documents.length; i += batchSize) {
const batch = documents.slice(i, i + batchSize);
console.log(`Processing batch ${Math.floor(i/batchSize) + 1}/${Math.ceil(documents.length/batchSize)}`);
const batchResults = await Promise.allSettled(
batch.map(doc => ingestWithRetry(doc, maxRetries))
);
results.push(...batchResults);
// Rate limiting between batches
if (i + batchSize < documents.length) {
await new Promise(resolve => setTimeout(resolve, delayBetweenBatches));
}
}
return results;
}
async function ingestWithRetry(doc: Document, maxRetries: number) {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
return await client.add({
content: doc.content,
customId: doc.id,
containerTags: ["batch_import_user_123"], // CORRECTED: Array
metadata: {
source: "migration",
batch_id: generateBatchId(),
original_created: doc.createdAt || new Date().toISOString(),
title: doc.title || "",
...doc.metadata
}
});
} catch (error) {
// CORRECTED: Proper error handling
if (error instanceof AuthenticationError) {
console.error('Authentication failed - check API key');
throw error; // Don't retry auth errors
}
if (error instanceof BadRequestError) {
console.error('Invalid document format:', doc.id);
throw error; // Don't retry validation errors
}
if (error instanceof RateLimitError) {
console.log(`Rate limited on attempt ${attempt}, waiting longer...`);
const delay = Math.pow(2, attempt) * 2000; // Longer delays for rate limits
await new Promise(resolve => setTimeout(resolve, delay));
continue;
}
if (attempt === maxRetries) throw error;
// Exponential backoff for other errors
const delay = Math.pow(2, attempt) * 1000;
console.log(`Retry ${attempt}/${maxRetries} for ${doc.id} in ${delay}ms`);
await new Promise(resolve => setTimeout(resolve, delay));
}
}
}
function generateBatchId(): string {
return `batch_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
}
```
async def batch_ingest(
documents: List[Dict[str, Any]],
options: Optional[Dict[str, Any]] = None
):
options = options or {}
batch_size = options.get('batch_size', 5) # CORRECTED: Conservative size
delay_between_batches = options.get('delay_between_batches', 2.0) # CORRECTED: 2 seconds
max_retries = options.get('max_retries', 3)
results = []
for i in range(0, len(documents), batch_size):
batch = documents[i:i + batch_size]
batch_num = i // batch_size + 1
total_batches = (len(documents) + batch_size - 1) // batch_size
print(f"Processing batch {batch_num}/{total_batches}")
# Process batch with proper error handling
tasks = [ingest_with_retry(doc, max_retries) for doc in batch]
batch_results = await asyncio.gather(*tasks, return_exceptions=True)
results.extend(batch_results)
# Rate limiting between batches
if i + batch_size < len(documents):
await asyncio.sleep(delay_between_batches)
return results
async def ingest_with_retry(doc: Dict[str, Any], max_retries: int):
for attempt in range(1, max_retries + 1):
try:
return await client.add(
content=doc['content'],
custom_id=doc['id'],
container_tags=["batch_import_user_123"], # CORRECTED: List
metadata={
"source": "migration",
"batch_id": generate_batch_id(),
"original_created": doc.get('created_at', ''),
"title": doc.get('title', ''),
**doc.get('metadata', {})
}
)
except BadRequestError as e:
logging.error(f"Invalid document {doc['id']}: {e}")
raise # Don't retry validation errors
except RateLimitError as e:
logging.warning(f"Rate limited on attempt {attempt}")
delay = 2 ** attempt * 2 # Longer delays for rate limits
await asyncio.sleep(delay)
continue
except Exception as error:
if attempt == max_retries:
raise error
# Exponential backoff
delay = 2 ** attempt
logging.info(f"Retry {attempt}/{max_retries} for {doc['id']} in {delay}s")
await asyncio.sleep(delay)
def generate_batch_id() -> str:
import random
import string
return f"batch_{int(time.time())}_{random.choices(string.ascii_lowercase, k=8)}"
```
Sample Output
Processing batch 1/50 (documents 1-3)
Successfully processed: 2/3 documents
Failed: 1/3 documents (BadRequestError: Invalid content)
Progress: 3/150 (2.0%) - Next batch in 2s