skills/mem0/references/architecture.md
How Mem0 processes, stores, and retrieves memories under the hood.
Mem0 is a managed memory layer that sits between your AI application and users. Every integration follows the same 3-step loop:
User Input → Retrieve relevant memories → Enrich LLM prompt → Generate response → Store new memories
Mem0 handles the complexity of extraction, deduplication, conflict resolution, and semantic retrieval so your application only needs to call search() and add().
Dual storage architecture:
client.add()Messages In
│
▼
┌─────────────────────┐
│ 1. EXTRACTION │ LLM analyzes messages, extracts key facts
│ (infer=True) │ If infer=False, stores raw text as-is
└─────────┬───────────┘
│
▼
┌─────────────────────┐
│ 2. CONFLICT │ Checks existing memories for duplicates
│ RESOLUTION │ Latest truth wins (newer overrides older)
│ │ Only runs when infer=True
└─────────┬───────────┘
│
▼
┌─────────────────────┐
│ 3. STORAGE │ Generates embeddings → vector store
│ │ Optional: entity extraction → graph store
│ │ Indexes metadata, categories, timestamps
└─────────┬───────────┘
│
▼
Memory Object
(id, memory, categories, structured_attributes)
Async (default, async_mode=True):
{"status": "PENDING", "event_id": "..."}Sync (async_mode=False):
id, event, memoryInferred (infer=True, default):
Raw (infer=False):
user role messages are stored; assistant messages ignoredWarning: Don't mix infer=True and infer=False for the same data — the same fact will be stored twice.
client.search()Query In
│
▼
┌─────────────────────┐
│ 1. QUERY EMBEDDING │ Convert query to vector representation
└─────────┬───────────┘
│
▼
┌─────────────────────┐
│ 2. VECTOR SEARCH │ Cosine similarity across stored embeddings
│ │ Scoped by filters (user_id, agent_id, etc.)
└─────────┬───────────┘
│
▼ (optional enhancements)
┌─────────────────────┐
│ 3a. KEYWORD SEARCH │ Expands results with specific terms (+10ms)
│ 3b. RERANKING │ Deep semantic reordering (+150-200ms)
│ 3c. FILTER MEMORIES │ Precision filtering, removes low-relevance (+200-300ms)
└─────────┬───────────┘
│
▼ (if enable_graph=True)
┌─────────────────────┐
│ 4. GRAPH LOOKUP │ Finds entity relationships
│ │ Appends relations WITHOUT reranking vector results
└─────────┬───────────┘
│
▼
Results + Relations
| Configuration | Latency | Best for |
|---|---|---|
| Base search only | ~100ms | Simple lookups |
keyword_search=True | ~110ms | Entity-heavy queries, broad coverage |
rerank=True | ~250-300ms | User-facing results, top-N precision |
keyword_search=True + rerank=True | ~310ms | Balanced (recommended for most apps) |
rerank=True + filter_memories=True | ~400-500ms | Safety-critical, production systems |
When you search with user_id="alice" only, Mem0 returns memories where agent_id, app_id, and run_id are all null. This prevents cross-scope leakage by default.
To include memories with non-null fields, use explicit filters:
# Gets memories for alice regardless of agent/app/run
filters={"OR": [{"user_id": "alice"}]}
CREATE ──→ ACTIVE ──→ UPDATE ──→ ACTIVE
│ │ │
│ ▼ ▼
│ EXPIRED EXPIRED
│ (still stored, (still stored,
│ not retrieved) not retrieved)
│ │ │
▼ ▼ ▼
DELETE DELETE DELETE
(permanent)
client.add(messages, user_id="...")created_at timestamptimestamp, expiration_date, metadata, immutableclient.update(memory_id, text="...") replaces text and reindexesclient.batch_update([...]) for up to 1000 memories at onceimmutable=True) cannot be updated — must delete and re-addadd() with infer=Trueexpiration_date parameter (ISO 8601 or YYYY-MM-DD)client.delete(memory_id) — permanent, no recoveryclient.batch_delete([memory_ids]) — up to 1000client.delete_all(user_id="alice") — all memories for entitydelete_all() without filters raises error to prevent accidental data lossclient.history(memory_id) returns version timeline{previous_value, new_value, action, timestamps}{
"id": "uuid-string",
"memory": "Extracted memory text",
"user_id": "user-identifier",
"agent_id": null,
"app_id": null,
"run_id": null,
"metadata": { "source": "chat", "priority": "high" },
"categories": ["health", "preferences"],
"created_at": "2025-03-12T12:34:56Z",
"updated_at": "2025-03-12T12:34:56Z",
"expiration_date": null,
"immutable": false,
"structured_attributes": {
"day": 12, "month": 3, "year": 2025,
"hour": 12, "minute": 34,
"day_of_week": "wednesday",
"is_weekend": false,
"quarter": 1, "week_of_year": 11
},
"score": 0.85
}
| Field | Type | Description |
|---|---|---|
id | UUID | Unique identifier, used for update/delete |
memory | string | Extracted or stored text content |
user_id | string | Primary entity scope |
agent_id | string | Agent scope |
app_id | string | Application scope |
run_id | string | Session/run scope |
metadata | object | Custom key-value pairs for filtering |
categories | array | Auto-assigned or custom category tags |
created_at | datetime | Creation timestamp |
updated_at | datetime | Last modification timestamp |
expiration_date | datetime | Auto-expiry date (stops retrieval, data persists) |
immutable | boolean | If true, prevents modification |
structured_attributes | object | Temporal breakdown for time-based queries |
score | float | Semantic similarity (search results only, 0-1) |
Mem0 separates memories across four dimensions to prevent data mixing:
| Dimension | Field | Purpose | Example |
|---|---|---|---|
| User | user_id | Persistent persona or account | "customer_6412" |
| Agent | agent_id | Distinct agent or tool | "meal_planner" |
| App | app_id | Product surface or deployment | "ios_retail_app" |
| Session | run_id | Short-lived flow or thread | "ticket-9241" |
Each entity combination creates separate records. A memory with user_id="alice" is stored separately from one with user_id="alice" + agent_id="bot".
# This returns NOTHING — user and agent memories are stored separately
filters={"AND": [{"user_id": "alice"}, {"agent_id": "bot"}]}
# Use OR to query multiple scopes
filters={"OR": [{"user_id": "alice"}, {"agent_id": "bot"}]}
# Use wildcard to include any non-null value
filters={"AND": [{"user_id": "*"}]} # All users (excludes null)
# User-level: persistent preferences
client.add(messages, user_id="alice")
# Session-level: temporary context
client.add(messages, user_id="alice", run_id="session_123")
# Clean up when done: client.delete_all(run_id="session_123")
# Agent-level: agent-specific knowledge
client.add(messages, agent_id="support_bot", app_id="helpdesk")
# Multi-tenant: full isolation
client.add(messages, user_id="alice", agent_id="bot", app_id="acme_corp", run_id="ticket_42")
Mem0 supports three layers of memory, from shortest to longest lived:
run_id parameterclient.delete_all(run_id="session_id")user_id parameterdef chat(user_input: str, user_id: str, session_id: str) -> str:
# 1. Retrieve user memories (long-term preferences)
user_mems = mem0.search(user_input, user_id=user_id)
# 2. Retrieve session memories (current task context)
session_mems = mem0.search(user_input, filters={
"AND": [{"user_id": user_id}, {"run_id": session_id}]
})
# 3. Combine both layers for LLM context
context = format_memories(user_mems) + format_memories(session_mems)
# 4. Generate response
response = llm.generate(context=context, input=user_input)
# 5. Store in session scope (temporary) + user scope (persistent)
messages = [{"role": "user", "content": user_input}, {"role": "assistant", "content": response}]
mem0.add(messages, user_id=user_id, run_id=session_id)
return response
| Operation | Typical Latency |
|---|---|
| Base vector search | ~100ms |
| + keyword_search | +10ms |
| + reranking | +150-200ms |
| + filter_memories | +200-300ms |
| Add (async, default) | < 50ms response, background processing |
| Add (sync) | 500ms-2s depending on extraction complexity |
| Graph operations | Slight overhead for large stores |
user_id for all user-facing queries (most common, fastest)run_id for session isolation (narrows search space)"*" filters on large datasets (scans all non-null records)top_k to limit result count when you only need a few memories| Approach | Pros | Cons |
|---|---|---|
| Raw vector DB | Fast, full control | No extraction, no dedup, no conflict resolution |
| In-memory chat history | Zero latency | Lost on restart, no cross-session, grows unbounded |
| RAG over documents | Good for static knowledge | No personalization, no memory updates |
| Mem0 Platform | Managed extraction + dedup + graph + scoping | External dependency, async processing delay |
Mem0 combines the best of vector search (semantic retrieval) with automatic extraction (LLM-powered), conflict resolution (deduplication), and structured scoping (multi-tenancy) — in a single managed API.