docs/examples/rag_mongodb.ipynb
| Step | Tech | Execution |
|---|---|---|
| Embedding | Voyage AI | π Remote |
| Vector store | MongoDB | π Remote |
| Gen AI | Azure Open AI | π Remote |
This notebook demonstrates how to build a Retrieval-Augmented Generation (RAG) pipeline using MongoDB as a vector store and Voyage AI embedding models for semantic search. The workflow involves extracting and chunking text from documents, generating embeddings with Voyage AI, storing vectors in MongoDB, and leveraging OpenAI for generative responses.
By combining these technologies, you can build scalable, production-ready RAG systems for advanced document understanding and question answering.
First, we'll install the necessary libraries and configure our environment. These packages enable document processing, database connections, embedding generation, and AI model interaction. We're using Docling for document handling, PyMongo for MongoDB integration, VoyageAI for embeddings, and OpenAI client for generation capabilities.
%%capture
%pip install docling~="2.7.0"
%pip install pymongo[srv]
%pip install voyageai
%pip install openai
import logging
import warnings
warnings.filterwarnings("ignore")
logging.getLogger("pymongo").setLevel(logging.ERROR)
Part of what makes Docling so remarkable is the fact that it can run on commodity hardware. This means that this notebook can be run on a local machine with GPU acceleration. If you're using a MacBook with a silicon chip, Docling integrates seamlessly with Metal Performance Shaders (MPS). MPS provides out-of-the-box GPU acceleration for macOS, seamlessly integrating with PyTorch and TensorFlow, offering energy-efficient performance on Apple Silicon, and broad compatibility with all Metal-supported GPUs.
The code below checks to see if a GPU is available, either via CUDA or MPS.
import torch
# Check if GPU or MPS is available
if torch.cuda.is_available():
device = torch.device("cuda")
print(f"CUDA GPU is enabled: {torch.cuda.get_device_name(0)}")
elif torch.backends.mps.is_available():
device = torch.device("mps")
print("MPS GPU is enabled.")
else:
raise OSError(
"No GPU or MPS device found. Please check your environment and ensure GPU or MPS support is configured."
)
To begin, we will focus on a single seminal paper and treat it as the entire knowledge base. Building a Retrieval-Augmented Generation (RAG) pipeline on just one document serves as a clear, controlled baseline before scaling to multiple sources. This helps validate each stage of the workflow (parsing, chunking, embedding, retrieval, generation) without confounding factors introduced by inter-document noise.
# Influential machine learning papers
source_urls = [
"https://arxiv.org/pdf/1706.03762" # Attention is All You Need
]
Convert each source URL to Markdown with Docling, reusing any already-converted document to avoid redundant downloads/parsing. Produces a dict mapping URLs to their Markdown content.
There are other methods that can be used to
from pprint import pprint
from docling.document_converter import DocumentConverter
# Instantiate the doc converter
doc_converter = DocumentConverter()
# Since we want to use a single document, we will convert just the first URL. For multiple documents, you can use convert_all() method and then iterate through the list of converted documents.
pdf_doc = source_urls[0]
converted_doc = doc_converter.convert(pdf_doc).document
We use Docling's HierarchicalChunker() to perform hierarchy-aware chunking of our list of documents. This is meant to preserve some of the structure and relationships within the document, which enables more accurate and relevant retrieval in our RAG pipeline.
from docling_core.transforms.chunker import HierarchicalChunker
# Initialize the chunker
chunker = HierarchicalChunker()
# Perform hierarchical chunking on the converted document and get text from chunks
chunks = list(chunker.chunk(converted_doc))
chunk_texts = [chunk.text for chunk in chunks]
chunk_texts[:20] # Display a few chunk texts
We will be using VoyageAI embedding model for converting the above chunks to embeddings, thereafter pushing them to MongoDB for further consumption.
VoyageAI has a load of offerings for embedding models, we will be using voyage-context-3 for best results in this case, which is a contextualized chunk embedding model, where chunk embedding encodes not only the chunkβs own content, but also captures the contextual information from the full document.
You can go through the blogpost to understand how it performas in comparison to other embedding models.
Create an account on Voyage and get you API key.
import voyageai
# Voyage API key
VOYAGE_API_KEY = "**********************"
# Initialize the VoyageAI client
vo = voyageai.Client(VOYAGE_API_KEY)
result = vo.contextualized_embed(inputs=[chunk_texts], model="voyage-context-3")
contextualized_chunk_embds = [emb for r in result.results for emb in r.embeddings]
# Check lengths to ensure they match
print("Chunk Texts Length:", chunk_texts.__len__())
print("Contextualized Chunk Embeddings Length:", contextualized_chunk_embds.__len__())
# Combine chunks with their embeddings
chunk_data = [
{"text": text, "embedding": emb}
for text, emb in zip(chunk_texts, contextualized_chunk_embds)
]
With the generated embeddings prepared, we now insert them into MongoDB so they can be leveraged in the RAG pipeline.
MongoDB is an ideal vector store for RAG applications because:
The chunks with their embeddings will be stored in a MongoDB collection, allowing us to perform similarity searches when responding to user queries.
# Insert to MongoDB
from pymongo import MongoClient
client = MongoClient(
"mongodb+srv://*******.mongodb.net/"
) # Replace with your MongoDB connection string
db = client["rag_db"] # Database name
collection = db["documents"] # Collection name
# Insert chunk data into MongoDB
response = collection.insert_many(chunk_data)
print(f"Inserted {len(response.inserted_ids)} documents into MongoDB.")
Using pymongo we can create a vector index, that will help us search through our vectors and respond to user queries. This index is crucial for efficient similarity searches between user questions and our document chunks. MongoDB Atlas Vector Search provides fast and accurate retrieval of semantically related content, which forms the foundation of our RAG pipeline.
from pymongo.operations import SearchIndexModel
# Create your index model, then create the search index
search_index_model = SearchIndexModel(
definition={
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 1024,
"similarity": "dotProduct",
}
]
},
name="vector_index",
type="vectorSearch",
)
result = collection.create_search_index(model=search_index_model)
print("New search index named " + result + " is building.")
To perform a query on the vectorized data stored in MongoDB, we can use the $vectorSearch aggregation pipeline. This powerful feature of MongoDB Atlas enables semantic search capabilities by finding documents based on vector similarity.
When executing a vector search query:
This enables us to find semantically related content rather than relying on exact keyword matches. The similarity metric we're using (dot product) measures the cosine similarity between vectors, allowing us to identify content that is conceptually similar even if it uses different terminology.
For RAG applications, this vector search capability is crucial as it allows us to retrieve the most relevant context from our document collection based on the semantic meaning of a user's query, providing the foundation for generating accurate and contextually appropriate responses.
We specify a prompt that includes the field we want to search through in the database (in this case it's text), a query that includes our search term, and the number of retrieved results to use in the generation.
import os
from openai import AzureOpenAI
from rich.console import Console
from rich.panel import Panel
# Create MongoDB vector search query for "Attention is All You Need"
# (prompt already defined above, reuse if present; else keep this definition)
prompt = "Give me top 3 learning points from `Attention is All You Need`, using only the retrieved context."
# Generate embedding for the query using VoyageAI (vo already initialized earlier)
query_embd_context = (
vo.contextualized_embed(
inputs=[[prompt]], model="voyage-context-3", input_type="query"
)
.results[0]
.embeddings[0]
)
# Vector search pipeline
search_pipeline = [
{
"$vectorSearch": {
"index": "vector_index",
"path": "embedding",
"queryVector": query_embd_context,
"numCandidates": 10,
"limit": 10,
}
},
{"$project": {"text": 1, "_id": 0, "score": {"$meta": "vectorSearchScore"}}},
]
results = list(collection.aggregate(search_pipeline))
if not results:
raise ValueError(
"No vector search results returned. Verify the index is built before querying."
)
context_texts = [doc["text"] for doc in results]
combined_context = "\n\n".join(context_texts)
# Expect these environment variables to be set (do NOT hardcode secrets):
# AZURE_OPENAI_API_KEY
# AZURE_OPENAI_ENDPOINT -> e.g. https://your-resource-name.openai.azure.com/
# AZURE_OPENAI_API_VERSION (optional, else fallback)
AZURE_OPENAI_API_KEY = "**********************"
AZURE_OPENAI_ENDPOINT = "**********************"
AZURE_OPENAI_API_VERSION = "**********************"
# Initialize Azure OpenAI client (endpoint must NOT include path segments)
client = AzureOpenAI(
api_key=AZURE_OPENAI_API_KEY,
azure_endpoint=AZURE_OPENAI_ENDPOINT.rstrip("/"),
api_version=AZURE_OPENAI_API_VERSION,
)
# Chat completion using retrieved context
response = client.chat.completions.create(
model="gpt-4o-mini", # Azure deployment name
messages=[
{
"role": "system",
"content": "You are a helpful assistant. Use only the provided context to answer questions. If the context is insufficient, say so.",
},
{
"role": "user",
"content": f"Context:\n{combined_context}\n\nQuestion: {prompt}",
},
],
temperature=0.2,
)
response_text = response.choices[0].message.content
console = Console()
console.print(Panel(f"{prompt}", title="Prompt", border_style="bold red"))
console.print(
Panel(response_text, title="Generated Content", border_style="bold green")
)
This notebook demonstrated a powerful RAG pipeline using MongoDB, VoyageAI, and Azure OpenAI. By combining MongoDB's vector search capabilities with VoyageAI's embeddings and Azure OpenAI's language models, we created an intelligent document retrieval system.
Start building your own intelligent document retrieval system today!