Cohere Rerank Guardrail Translation Handler

Handler for processing the rerank endpoint (/v1/rerank) with guardrails.

Overview

This handler processes rerank requests by:

Extracting the query text from the request
Applying guardrails to the query
Updating the request with the guardrailed query
Returning the output unchanged (rankings are not text)

Note: Documents are not processed by guardrails as they represent the corpus being searched, not user input. Only the query is guardrailed.

Data Format

Input Format

With String Documents:

json

{
  "model": "rerank-english-v3.0",
  "query": "What is the capital of France?",
  "documents": [
    "Paris is the capital of France.",
    "Berlin is the capital of Germany.",
    "Madrid is the capital of Spain."
  ],
  "top_n": 2
}

With Dict Documents:

json

{
  "model": "rerank-english-v3.0",
  "query": "What is the capital of France?",
  "documents": [
    {"text": "Paris is the capital of France.", "id": "doc1"},
    {"text": "Berlin is the capital of Germany.", "id": "doc2"},
    {"text": "Madrid is the capital of Spain.", "id": "doc3"}
  ],
  "top_n": 2
}

Output Format

json

{
  "id": "rerank-abc123",
  "results": [
    {"index": 0, "relevance_score": 0.98},
    {"index": 2, "relevance_score": 0.12}
  ],
  "meta": {
    "billed_units": {"search_units": 1}
  }
}

Usage

The handler is automatically discovered and applied when guardrails are used with the rerank endpoint.

Example: Using Guardrails with Rerank

bash

curl -X POST 'http://localhost:4000/v1/rerank' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer your-api-key' \
-d '{
    "model": "rerank-english-v3.0",
    "query": "What is machine learning?",
    "documents": [
        "Machine learning is a subset of AI.",
        "Deep learning uses neural networks.",
        "Python is a programming language."
    ],
    "guardrails": ["content_filter"],
    "top_n": 2
}'

The guardrail will be applied to the query only (not the documents).

Example: PII Masking in Query

bash

curl -X POST 'http://localhost:4000/v1/rerank' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer your-api-key' \
-d '{
    "model": "rerank-english-v3.0",
    "query": "Find documents about John Doe from [email protected]",
    "documents": [
        "Document 1 content here.",
        "Document 2 content here.",
        "Document 3 content here."
    ],
    "guardrails": ["mask_pii"],
    "top_n": 3
}'

The query will be masked to: "Find documents about [NAME_REDACTED] from [EMAIL_REDACTED]"

Example: Mixed Document Types

bash

curl -X POST 'http://localhost:4000/v1/rerank' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer your-api-key' \
-d '{
    "model": "rerank-english-v3.0",
    "query": "Technical documentation",
    "documents": [
        {"text": "This is document 1", "metadata": {"source": "wiki"}},
        {"text": "This is document 2", "metadata": {"source": "docs"}},
        "This is document 3 as a plain string"
    ],
    "guardrails": ["content_moderation"]
}'

Implementation Details

Input Processing

Query Field: query (string)
- Processing: Apply guardrail to query text
- Result: Updated query
Documents Field: documents (list)
- Processing: Not processed (corpus being searched, not user input)
- Result: Unchanged

Output Processing

Processing: Not applicable (output contains relevance scores, not text)
Result: Response returned unchanged

Use Cases

PII Protection: Remove PII from queries before reranking
Content Filtering: Filter inappropriate content from search queries
Compliance: Ensure queries meet requirements
Data Sanitization: Clean up query text before semantic search operations

Extension

Override these methods to customize behavior:

process_input_messages(): Customize how query is processed
process_output_response(): Currently a no-op, but can be overridden if needed

Supported Call Types

CallTypes.rerank - Synchronous rerank
CallTypes.arerank - Asynchronous rerank

Notes

Only the query is processed by guardrails
Documents are not processed (they represent the corpus, not user input)
Output processing is a no-op since rankings don't contain text
Both sync and async call types use the same handler
Works with all rerank providers (Cohere, Together AI, etc.)

Common Patterns

PII Masking in Search

python

import litellm

response = litellm.rerank(
    model="rerank-english-v3.0",
    query="Find info about [email protected]",
    documents=[
        "Document 1 content.",
        "Document 2 content.",
        "Document 3 content."
    ],
    guardrails=["mask_pii"],
    top_n=2
)

# Query will have PII masked
# query becomes: "Find info about [EMAIL_REDACTED]"
print(response.results)

Content Filtering

python

import litellm

response = litellm.rerank(
    model="rerank-english-v3.0",
    query="Search query here",
    documents=[
        {"text": "Document 1 content", "id": "doc1"},
        {"text": "Document 2 content", "id": "doc2"},
    ],
    guardrails=["content_filter"],
)

Async Rerank with Guardrails

python

import litellm
import asyncio

async def rerank_with_guardrails():
    response = await litellm.arerank(
        model="rerank-english-v3.0",
        query="Technical query",
        documents=["Doc 1", "Doc 2", "Doc 3"],
        guardrails=["sanitize"],
        top_n=2
    )
    return response

result = asyncio.run(rerank_with_guardrails())