Back to Mem0

Reranker-Enhanced Search

docs/open-source/features/reranker-search.mdx

2.0.111.6 KB
Original Source

Reranker-enhanced search adds a second scoring pass after vector retrieval so Mem0 can return the most relevant memories first. Enable it when keyword similarity alone misses nuance or when you need the highest-confidence context for an agent decision.

<Info> **You’ll use this when…** - Queries are nuanced and require semantic understanding beyond vector distance. - Large memory collections produce too many near matches to review manually. - You want consistent scoring across providers by delegating ranking to a dedicated model. </Info> <Warning> Reranking raises latency and, for hosted models, API spend. Benchmark with production traffic and define a fallback path for latency-sensitive requests. </Warning> <Note> All configuration snippets translate directly to the TypeScript SDK—swap dictionaries for objects while keeping the same keys (`provider`, `config`, `rerank` flags). </Note>

Feature anatomy

  • Initial vector search: Retrieve candidate memories by similarity.
  • Reranker pass: A specialized model scores each candidate against the original query.
  • Reordered results: Mem0 sorts responses using the reranker’s scores before returning them.
  • Optional fallbacks: Toggle reranking per request or disable it entirely if performance or cost becomes a concern.
<AccordionGroup> <Accordion title="Supported providers"> - **[Cohere](/components/rerankers/models/cohere)** – Multilingual hosted reranker with API-based scoring. - **[Sentence Transformer](/components/rerankers/models/sentence_transformer)** – Local Hugging Face cross-encoders for GPU or CPU. - **[Hugging Face](/components/rerankers/models/huggingface)** – Bring any hosted or on-prem reranker model ID. - **[LLM Reranker](/components/rerankers/models/llm_reranker)** – Use your preferred LLM (OpenAI, etc.) for prompt-driven scoring. - **[Zero Entropy](/components/rerankers/models/zero_entropy)** – High-quality neural reranking tuned for retrieval tasks. </Accordion> <Accordion title="Provider comparison"> | Provider | Latency | Quality | Cost | Local deploy | | --- | --- | --- | --- | --- | | Cohere | Medium | High | API cost | ❌ | | Sentence Transformer | Low | Good | Free | ✅ | | Hugging Face | Low–Medium | Variable | Free | ✅ | | LLM Reranker | High | Very high | API cost | Depends | </Accordion> </AccordionGroup>

Configure it

Basic setup

python
from mem0 import Memory

config = {
    "reranker": {
        "provider": "cohere",
        "config": {
            "model": "rerank-english-v3.0",
            "api_key": "your-cohere-api-key"
        }
    }
}

m = Memory.from_config(config)
<Info icon="check"> Confirm `results["results"][0]["score"]` reflects the reranker output—if the field is missing, the reranker was not applied. </Info> <Tip> Set `top_k` to the smallest candidate pool that still captures relevant hits. Smaller pools keep reranking costs down. </Tip>

Provider-specific options

python
# Cohere reranker
config = {
    "reranker": {
        "provider": "cohere",
        "config": {
            "model": "rerank-english-v3.0",
            "api_key": "your-cohere-api-key",
            "top_k": 10,
            "return_documents": True
        }
    }
}

# Sentence Transformer reranker
config = {
    "reranker": {
        "provider": "sentence_transformer",
        "config": {
            "model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
            "device": "cuda",
            "max_length": 512
        }
    }
}

# Hugging Face reranker
config = {
    "reranker": {
        "provider": "huggingface",
        "config": {
            "model": "BAAI/bge-reranker-base",
            "device": "cuda",
            "batch_size": 32
        }
    }
}

# LLM-based reranker
config = {
    "reranker": {
        "provider": "llm_reranker",
        "config": {
            "provider": "openai",
            "model": "gpt-4o-mini",
            "api_key": "your-openai-api-key",
            "top_k": 5
        }
    }
}
<Note> Keep authentication keys in environment variables when you plug these configs into production projects. </Note>

Full stack example

python
config = {
    "vector_store": {
        "provider": "qdrant",
        "config": {
            "host": "localhost",
            "port": 6333
        }
    },
    "llm": {
        "provider": "openai",
        "config": {
            "model": "gpt-4",
            "api_key": "your-openai-api-key"
        }
    },
    "embedder": {
        "provider": "openai",
        "config": {
            "model": "text-embedding-3-small",
            "api_key": "your-openai-api-key"
        }
    },
    "reranker": {
        "provider": "cohere",
        "config": {
            "model": "rerank-english-v3.0",
            "api_key": "your-cohere-api-key",
            "top_k": 15,
            "return_documents": True
        }
    }
}

m = Memory.from_config(config)
<Info icon="check"> A quick search should now return results with both vector and reranker scores, letting you compare improvements immediately. </Info>

Async support

python
from mem0 import AsyncMemory

async_memory = AsyncMemory.from_config(config)

async def search_with_rerank():
    return await async_memory.search(
        "What are my preferences?",
        filters={"user_id": "alice"},
        rerank=True
    )

import asyncio
results = asyncio.run(search_with_rerank())
<Info icon="check"> Inspect the async response to confirm reranking still applies; the scores should match the synchronous implementation. </Info>

Tune performance and cost

python
# GPU-friendly local reranker configuration
config = {
    "reranker": {
        "provider": "sentence_transformer",
        "config": {
            "model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
            "device": "cuda",
            "batch_size": 32,
            "top_k": 10,
            "max_length": 256
        }
    }
}

# Smart toggle for hosted rerankers
def smart_search(query, user_id, use_rerank=None):
    if use_rerank is None:
        use_rerank = len(query.split()) > 3
    return m.search(query, filters={"user_id": user_id}, rerank=use_rerank)
<Tip> Use heuristics (query length, user tier) to decide when to rerank so high-signal queries benefit without taxing every request. </Tip>

Handle failures gracefully

python
try:
    results = m.search("test query", filters={"user_id": "alice"}, rerank=True)
except Exception as exc:
    print(f"Reranking failed: {exc}")
    results = m.search("test query", filters={"user_id": "alice"}, rerank=False)
<Warning> Always fall back to vector-only search—dropped queries introduce bigger accuracy issues than slightly less relevant ordering. </Warning>

Migrate from v0.x

python
# Before: basic vector search
results = m.search("query", filters={"user_id": "alice"})

# After: same API with reranking enabled via config
config = {
    "reranker": {
        "provider": "sentence_transformer",
        "config": {
            "model": "cross-encoder/ms-marco-MiniLM-L-6-v2"
        }
    }
}

m = Memory.from_config(config)
results = m.search("query", filters={"user_id": "alice"})

See it in action

python
results = m.search(
    "What are my food preferences?",
    filters={"user_id": "alice"}
)

for result in results["results"]:
    print(f"Memory: {result['memory']}")
    print(f"Score: {result['score']}")
<Info icon="check"> Expect each result to list the reranker-adjusted score so you can compare ordering against baseline vector results. </Info>

Toggle reranking per request

python
results_with_rerank = m.search(
    "What movies do I like?",
    filters={"user_id": "alice"},
    rerank=True
)

results_without_rerank = m.search(
    "What movies do I like?",
    filters={"user_id": "alice"},
    rerank=False
)
<Tip> Log the reranked vs. non-reranked lists during rollout so stakeholders can see the improvement before enforcing it everywhere. </Tip> <Info icon="check"> You should see the same memories in both lists, but the reranked response will reorder them based on semantic relevance. </Info>

Combine with metadata filters

python
results = m.search(
    "important work tasks",
    filters={
        "AND": [
            {"user_id": "alice"},
            {"category": "work"},
            {"priority": {"gte": 7}}
        ]
    },
    rerank=True,
    top_k=20
)
<Info icon="check"> Verify filtered reranked searches still respect every metadata clause—reranking only reorders candidates, it never bypasses filters. </Info>

Real-world playbooks

Customer support

python
config = {
    "reranker": {
        "provider": "cohere",
        "config": {
            "model": "rerank-english-v3.0",
            "api_key": "your-cohere-api-key"
        }
    }
}

m = Memory.from_config(config)

results = m.search(
    "customer having login issues with mobile app",
    filters={"agent_id": "support_bot", "category": "technical_support"},
    rerank=True
)
<Info icon="check"> Top results should highlight tickets matching the login issue context so agents can respond faster. </Info>

Content recommendation

python
results = m.search(
    "science fiction books with space exploration themes",
    filters={"user_id": "reader123", "content_type": "book_recommendation"},
    rerank=True,
    top_k=10
)

for result in results["results"]:
    print(f"Recommendation: {result['memory']}")
    print(f"Relevance: {result['score']:.3f}")
<Info icon="check"> Expect high-scoring recommendations that match both the requested theme and any metadata limits you applied. </Info>

Personal assistant

python
results = m.search(
    "What restaurants did I enjoy last month that had good vegetarian options?",
    filters={
        "AND": [
            {"user_id": "foodie_user"},
            {"category": "dining"},
            {"rating": {"gte": 4}},
            {"date": {"gte": "2024-01-01"}}
        ]
    },
    rerank=True
)
<Tip> Reuse this pattern for other lifestyle queries—swap the filters and prompt text without changing the rerank configuration. </Tip> <Note> Each workflow keeps the same `m.search(...)` signature, so you can template these queries across agents with only the prompt and filters changing. </Note>

Verify the feature is working

  • Inspect result payloads for both score (vector) and reranker scores; mismatched fields indicate the reranker didn’t execute.
  • Track latency before and after enabling reranking to ensure SLAs hold.
  • Review provider logs or dashboards for throttling or quota warnings.
  • Run A/B comparisons (rerank on/off) to validate improved relevance before defaulting to reranked responses.

Best practices

  1. Start local: Try Sentence Transformer models to prove value before paying for hosted APIs.
  2. Monitor latency: Add metrics around reranker duration so you notice regressions quickly.
  3. Control spend: Use top_k and selective toggles to cap hosted reranker costs.
  4. Keep a fallback: Always catch reranker failures and continue with vector-only ordering.
  5. Experiment often: Swap providers or models to find the best fit for your domain and language mix.

<CardGroup cols={2}> <Card title="Configure Rerankers" icon="sliders" href="/components/rerankers/config"> Review provider fields, defaults, and environment variables before going live. </Card> <Card title="Build a Custom LLM Reranker" icon="sparkles" href="/components/rerankers/models/llm_reranker"> Extend scoring with prompt-tuned LLM rerankers for niche workflows. </Card> </CardGroup>