apps/docs/concepts/memory-vs-rag.mdx
Most developers confuse RAG (Retrieval-Augmented Generation) with agent memory. They're not the same thing, and using RAG for memory is why your agents keep forgetting important context. Let's understand the fundamental difference.
When building AI agents, developers often treat memory as just another retrieval problem. They store conversations in a vector database, embed queries, and hope semantic search will surface the right context.
This approach fails because memory isn't about finding similar text—it's about understanding relationships, temporal context, and user state over time.
Supermemory makes a clear distinction between these two concepts:
Documents are the raw content you send to Supermemory—PDFs, web pages, text files. They represent static knowledge that doesn't change based on who's accessing it.
Characteristics:
Use Cases:
Memories are the insights, preferences, and relationships extracted from documents and conversations. They're tied to specific users or entities and evolve over time.
Characteristics:
Use Cases:
Let's look at a real scenario that illustrates the problem:
<Tabs> <Tab title="The Scenario"> ``` Day 1: "I love Adidas sneakers" Day 30: "My Adidas broke after a month, terrible quality" Day 31: "I'm switching to Puma" Day 45: "What sneakers should I buy?" ``` </Tab> <Tab title="RAG Approach (Wrong)"> ```python # RAG sees these as isolated embeddings query = "What sneakers should I buy?"# Semantic search finds closest match
result = vector_search(query)
# Returns: "I love Adidas sneakers" (highest similarity)
# Agent recommends Adidas 🤦
```
**Problem**: RAG finds the most semantically similar text but misses the temporal progression and causal relationships.
# Memory retrieval considers:
# 1. Temporal validity (Adidas preference is outdated)
# 2. Causal relationships (broke → disappointment → switch)
# 3. Current state (now prefers Puma)
# Agent correctly recommends Puma ✅
```
**Solution**: Memory systems track when facts become invalid and understand causal chains.
Query → Embedding → Vector Search → Top-K Results → LLM
RAG excels at finding information that's semantically similar to your query. It's stateless—each query is independent.
Query → Entity Recognition → Graph Traversal → Temporal Filtering → Context Assembly → LLM
Memory systems build a knowledge graph that understands:
```python
# Good for RAG
"What are the specs of iPhone 15?"
"Compare Nike and Adidas running shoes"
"Show me waterproof jackets"
```
```python
# Needs Memory
"What size do I usually wear?"
"Did I like my last purchase?"
"What's my budget preference?"
```
```python
# Good for RAG
"How do I reset my password?"
"What's your return policy?"
"Troubleshooting WiFi issues"
```
```python
# Needs Memory
"Is my issue from last week resolved?"
"What plan am I on?"
"You were helping me with..."
```
Supermemory provides a unified platform that correctly handles both patterns:
# Add a document for RAG-style retrieval
client.add(
content="iPhone 15 has a 48MP camera and A17 Pro chip",
# No user association - universal knowledge
)
# Add a user-specific memory
client.add(
content="User prefers Android over iOS",
container_tags=["user_123"], # User-specific
metadata={
"type": "preference",
"confidence": "high"
}
)
# Search combines both approaches
results = client.documents.search(
query="What phone should I recommend?",
container_tags=["user_123"], # Gets user memories
# Also searches general knowledge
)
# Results include:
# - User's Android preference (memory)
# - Latest Android phone specs (documents)
Stop treating memory like a retrieval problem. Your agents need both:
Supermemory provides both capabilities in a unified platform, ensuring your agents have the right context at the right time.