docs/cookbooks/essentials/controlling-memory-ingestion.mdx
AI assistants plugged with memory systems face a problem - they often store everything. Not every conversation needs to be remembered, and not every detail should go to the memory store. Without proper controls, memory systems accumulate unreliable data.
Mem0 lets you control your memory ingestion pipeline. In this cookbook, we'll demonstrate these controls using a medical assistant example - showing how to filter unwanted data, enforce data formats, and implement confidence-based storage.
Without controls, everything gets stored - speculation, low-confidence data, and information that shouldn't persist. This uncontrolled ingestion leads to cluttered memory and retrieval failures.
Mem0 provides three tools to control what gets stored:
In this tutorial, we will:
from mem0 import MemoryClient
client = MemoryClient(api_key="your-api-key")
Uncontrolled ingestion stores everything, including speculation:
# Patient mentions speculation
messages = [{"role": "user", "content": "I think I might be allergic to penicillin"}]
client.add(messages, user_id="patient_123")
# Check what got stored
results = client.search("patient allergies", filters={"user_id": "patient_123"})
print(results['results'][0]['memory'])
Output:
Patient is allergic to penicillin
The speculation became a confirmed fact. Let's add controls.
Custom instructions tell Mem0 what to store and what to ignore.
instructions = """
Only store CONFIRMED medical facts.
Store:
- Confirmed diagnoses from doctors
- Known allergies with documented reactions
- Current medications being taken
Ignore:
- Speculation (words like "might", "maybe", "I think")
- Unverified symptoms
- Casual mentions without confirmation
"""
client.project.update(custom_instructions=instructions)
# Same speculative statement
messages = [{"role": "user", "content": "I think I might be allergic to penicillin"}]
client.add(messages, user_id="patient_123")
# Check what got stored
results = client.get_all(filters={"user_id": "patient_123"})
print(f"Memories stored: {len(results['results'])}")
Output:
Memories stored: 0
The speculation was filtered out.
When designing instructions, consider the trade-off between precision and recall:
Too restrictive: You'll miss important information (false negatives)
# Too strict - filters out useful context
"""
Only store information if explicitly stated by a doctor with full name,
date, time, and medical license number.
"""
Too permissive: You'll store unreliable data (false positives)
# Too loose - stores speculation as fact
"""
Store any health-related information mentioned.
"""
Balanced approach:
# Clear categories with examples
"""
Store CONFIRMED facts:
- Diagnoses: "Dr. Smith diagnosed hypertension on March 15th"
- Allergies: "Patient had hives reaction to penicillin"
- Medications: "Taking Lisinopril 10mg daily"
Ignore SPECULATION:
- "I think I might have..."
- "Maybe it's..."
- "Could be related to..."
"""
Start with clear categories and iterate based on retrieval quality.
Mem0 assigns confidence scores to extracted memories. Use these to filter low-quality data.
Setting the right confidence threshold depends on your application:
Test your pipeline with multiple input examples and threshold combinations to find what works for your use case.
# Configure stricter instructions
client.project.update(
custom_instructions="""
Only extract memories with HIGH confidence.
Require specific details (dates, dosages, doctor names) for medical facts.
Skip vague or uncertain statements.
"""
)
# Test with uncertain statement
messages = [{"role": "user", "content": "The doctor mentioned something about my blood pressure"}]
result1 = client.add(messages, user_id="patient_123")
# Test with confirmed fact
messages = [{"role": "user", "content": "Dr. Smith diagnosed me with hypertension on March 15th"}]
result2 = client.add(messages, user_id="patient_123")
print("Vague statement stored:", len(result1['results']) > 0)
print("Confirmed fact stored:", len(result2['results']) > 0)
Output:
Vague statement stored: False
Confirmed fact stored: True
The vague statement was filtered for low confidence. The confirmed fact with specific details was stored.
Custom instructions can prevent storing personal identifiers:
client.project.update(
custom_instructions="""
Medical memory rules:
STORE:
- Confirmed diagnoses
- Verified allergies
- Current medications
NEVER STORE:
- Social Security Numbers
- Insurance policy numbers
- Credit card information
- Full addresses
- Phone numbers
Replace identifiers with generic references if mentioned.
"""
)
# Test with PII
messages = [
{"role": "user", "content": "My SSN is 123-45-6789 and I'm allergic to penicillin"}
]
client.add(messages, user_id="patient_123")
# Check what was stored
results = client.get_all(filters={"user_id": "patient_123"})
for result in results['results']:
print(result['memory'])
Output:
Patient is allergic to penicillin
The SSN was filtered out, but the allergy was stored.
When information changes, update existing memories instead of creating duplicates.
# Initial allergy stored
result = client.add(
[{"role": "user", "content": "Patient confirmed allergy to penicillin with documented hives reaction"}],
user_id="patient_123"
)
memory_id = result['results'][0]['id']
print(f"Stored memory: {memory_id}")
# Later, patient gets retested - allergy was false positive
client.update(
memory_id=memory_id,
text="Patient tested negative for penicillin allergy on April 2nd, 2025. Previous allergy was false positive.",
metadata={"verified": True, "updated_date": "2025-04-02"}
)
# Retrieve the updated memory
updated = client.get(memory_id)
print(f"\\nUpdated memory: {updated['memory']}")
print(f"Metadata: {updated['metadata']}")
Output:
Stored memory: mem_abc123
Updated memory: Patient tested negative for penicillin allergy on April 2nd, 2025. Previous allergy was false positive.
Metadata: {'verified': True, 'updated_date': '2025-04-02'}
Preserves history:
created_at shows when the memory was first storedupdated_at shows when it was modifiedAvoids conflicts:
Maintains relationships:
| Mode | What it does | Best for | Watch out for |
|---|---|---|---|
infer=True (default) | Runs the LLM pipeline so Mem0 extracts structured facts and resolves conflicts automatically. | Daily conversations, preference tracking, anything you want deduped. | Slightly slower because inference runs on every write. |
infer=False | Stores your payload exactly as-is—no inference, no dedupe. | Bulk imports, compliance snapshots, curated facts you already trust. | Later infer=True calls for the same fact will create duplicates you must clean manually. |
When should you update vs delete?
# Medication dosage changed
client.update(
memory_id=med_id,
text="Taking Lisinopril 20mg daily (increased from 10mg on March 1st)"
)
# Duplicate entry
client.delete(memory_id)
Here's a complete ingestion pipeline with all controls:
from mem0 import MemoryClient
import os
# Initialize client
client = MemoryClient(api_key=os.getenv("MEM0_API_KEY"))
# Configure custom instructions
client.project.update(
custom_instructions="""
Medical memory assistant rules:
STORE:
- Confirmed diagnoses (with doctor name and date)
- Verified allergies (with reaction details)
- Current medications (with dosage)
IGNORE:
- Speculation (might, maybe, possibly)
- Unverified symptoms
- Personal identifiers (SSN, insurance numbers)
CONFIDENCE:
Require high confidence. Reject vague or uncertain statements.
Require specific details: names, dates, dosages.
"""
)
# Helper function for safe ingestion
def add_medical_memory(content, user_id, metadata=None):
"""Add memory with automatic filtering."""
result = client.add(
[{"role": "user", "content": content}],
user_id=user_id,
metadata=metadata or {}
)
if result['results']:
print(f"✓ Stored: {result['results'][0]['memory']}")
else:
print(f"✗ Filtered: {content}")
return result
# Test cases
print("Testing ingestion pipeline:\\n")
test_cases = [
"I think I might be allergic to penicillin",
"Dr. Johnson confirmed penicillin allergy on Jan 15th with hives reaction",
"Patient SSN is 123-45-6789",
"Currently taking Lisinopril 10mg daily for hypertension",
"Feeling tired lately",
"Dr. Martinez diagnosed Type 2 diabetes on February 3rd, 2025"
]
for content in test_cases:
add_medical_memory(content, user_id="patient_123")
print()
Output:
Testing ingestion pipeline:
✗ Filtered: I think I might be allergic to penicillin
✓ Stored: Patient has confirmed penicillin allergy diagnosed by Dr. Johnson on January 15th with hives reaction
✗ Filtered: Patient SSN is 123-45-6789
✓ Stored: Patient is currently taking Lisinopril 10mg daily for hypertension
✗ Filtered: Feeling tired lately
✓ Stored: Patient diagnosed with Type 2 diabetes by Dr. Martinez on February 3rd, 2025
You can override project-level instructions for specific conversations:
First define custom instructions
custom_instructions="""Emergency intake mode:Store ALL symptoms and observations immediately.
Flag for later review and verification."""
# Emergency intake - store everything temporarily
emergency_messages = [
{"role": "user", "content": "Patient arrived with chest pain and shortness of breath"}
]
client.add(
emergency_messages,
user_id="patient_456",
custom_instructions=custom_instructions,
metadata={"type": "emergency", "review_required": True}
)
This is useful for:
You now have a medical assistant with production-grade memory controls:
These controls prevent retrieval failures and ensure your AI assistant works with reliable, verified information.
Start with conservative filters (only store confirmed facts) and iterate based on your application's needs. Combine custom instructions with confidence thresholds for the most reliable memory ingestion pipeline.
<Card title="Build a Mem0 Companion" icon="users" href="/cookbooks/essentials/building-ai-companion"> Learn core memory patterns including temporary vs permanent data handling. </Card>