Back to Docling

Retrieval with Qdrant

docs/examples/retrieval_qdrant.ipynb

2.92.03.0 KB
Original Source

<a href="https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/retrieval_qdrant.ipynb" target="_parent"></a>

Retrieval with Qdrant

StepTechExecution
EmbeddingFastEmbed💻 Local
Vector storeQdrant💻 Local

Overview

This example demonstrates using Docling with Qdrant to perform a hybrid search across your documents using dense and sparse vectors.

We'll chunk the documents using Docling before adding them to a Qdrant collection. By limiting the length of the chunks, we can preserve the meaning in each vector embedding.

Setup

  • 👉 Qdrant client uses FastEmbed to generate vector embeddings. You can install the fastembed-gpu package if you've got the hardware to support it.
python
%pip install --no-warn-conflicts -q qdrant-client docling fastembed

Let's import all the classes we'll be working with.

python
from qdrant_client import QdrantClient

from docling.chunking import HybridChunker
from docling.datamodel.base_models import InputFormat
from docling.document_converter import DocumentConverter
  • For Docling, we'll set the allowed formats to HTML since we'll only be working with webpages in this tutorial.
  • If we set a sparse model, Qdrant client will fuse the dense and sparse results using RRF. Reference.
python
COLLECTION_NAME = "docling"

doc_converter = DocumentConverter(allowed_formats=[InputFormat.HTML])
client = QdrantClient(location=":memory:")
# The :memory: mode is a Python imitation of Qdrant's APIs for prototyping and CI.
# For production deployments, use the Docker image: docker run -p 6333:6333 qdrant/qdrant
# client = QdrantClient(location="http://localhost:6333")

client.set_model("sentence-transformers/all-MiniLM-L6-v2")
client.set_sparse_model("Qdrant/bm25")

We can now download and chunk the document using Docling. For demonstration, we'll use an article about chunking strategies :)

python
result = doc_converter.convert(
    "https://www.sagacify.com/news/a-guide-to-chunking-strategies-for-retrieval-augmented-generation-rag"
)
documents, metadatas = [], []
for chunk in HybridChunker().chunk(result.document):
    documents.append(chunk.text)
    metadatas.append(chunk.meta.export_json_dict())

Let's now upload the documents to Qdrant.

  • The add() method batches the documents and uses FastEmbed to generate vector embeddings on our machine.
python
_ = client.add(
    collection_name=COLLECTION_NAME,
    documents=documents,
    metadata=metadatas,
    batch_size=64,
)

Retrieval

python
points = client.query(
    collection_name=COLLECTION_NAME,
    query_text="Can I split documents?",
    limit=10,
)
python
for i, point in enumerate(points):
    print(f"=== {i} ===")
    print(point.document)
    print()