examples/text_embedding_qdrant/README.md
Walk, chunk, embed locally, upsert — incrementally — in plain async Python.
</p> <p align="center"> <strong>Star us ❤️ →</strong> <a href="https://github.com/cocoindex-io/cocoindex" title="Star CocoIndex on GitHub"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/star-btn-small-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/star-btn-small-light.svg"></picture></a> · <a href="https://cocoindex.io/docs/examples/text-embedding-qdrant/" title="Read the full walkthrough"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/docs-inline-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/docs-inline-light.svg"></picture></a> · <a href="https://discord.com/invite/zpA9S2DR7s" title="Join the CocoIndex Discord"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/discord-inline-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/discord-inline-light.svg"></picture></a> </p> <div align="center"> </div>This is Semantic Search 101 with one thing changed: instead of Postgres + pgvector, the vectors land in a Qdrant collection. Walk Markdown, chunk, embed locally with all-MiniLM-L6-v2 — all identical — and only the target differs. You declare the transformation in native Python and your own types — target_state = transformation(source_state) — and the heavy lifting (incremental processing, change tracking, managed targets) runs in a Rust engine underneath, so editing one file re-embeds one file, not the whole folder.
Read each file, split into overlapping chunks, embed each chunk, then upsert it as a Qdrant point — text and offsets in the payload, the embedding as the vector. The one Qdrant-specific call is mount_collection_target, which derives the vector dimensions straight from the embedder (QdrantVectorDef(schema=EMBEDDER) — no hardcoded 384) and manages the collection for you. Read it in main.py:
@coco.fn
async def process_chunk(chunk, filename, id_gen, target: qdrant.CollectionTarget) -> None:
embedding_vec = await coco.use_context(EMBEDDER).embed(chunk.text)
point = qdrant.PointStruct(
id=await id_gen.next_id(chunk.text), # stable id derived from chunk text
vector=embedding_vec.tolist(),
payload={"filename": str(filename), "chunk_start": chunk.start.char_offset,
"chunk_end": chunk.end.char_offset, "text": chunk.text},
)
target.declare_point(point)
@coco.fn
async def app_main(sourcedir: pathlib.Path) -> None:
target_collection = await qdrant.mount_collection_target(
QDRANT_DB, collection_name=QDRANT_COLLECTION,
schema=await qdrant.CollectionSchema.create(vectors=qdrant.QdrantVectorDef(schema=EMBEDDER)),
)
files = localfs.walk_dir(sourcedir, recursive=True,
path_matcher=PatternFilePathMatcher(included_patterns=["**/*.md"]), live=True)
await coco.mount_each(process_file, files.items(), target_collection)
target.declare_point declares the point as a target state; CocoIndex inserts, updates, or deletes it to match — you never write upsert calls yourself.
Step-by-step walkthrough with the Qdrant client setup, the collection target, the point schema, and the incremental story.
</p>mount_collection_target handles collection creation, idempotent point upserts, and orphan cleanup when a file disappears — the same managed-target guarantees pgvector gets in the base example.QdrantVectorDef(schema=EMBEDDER), so swap the model and the schema follows.@coco.fn(memo=True) on process_file skips files whose content and code are unchanged; each point's id is derived from its chunk text, so only changed points are upserted and vanished ones are deleted.prefer_grpc=True); the same local all-MiniLM-L6-v2 embedder is reused at query time so indexing and search stay consistent.1. Start Qdrant — HTTP on 6333, gRPC on 6334:
docker run -d -p 6333:6333 -p 6334:6334 qdrant/qdrant
2. Configure & install:
cp .env.example .env # no required secrets; optional QDRANT_URL override (default http://localhost:6334/)
pip install -e .
3. Build the index — the example ships a markdown_files/ folder of sample docs:
cocoindex update main # catch-up: scan, sync, exit
cocoindex update -L main # live: keep watching for file changes
4. Search — embeds your query with the same model and asks Qdrant for the nearest points:
python main.py "what is self-attention?"
You can also browse the collection in the Qdrant dashboard. The most semantically similar chunks come back ranked — even when they share none of the words in your query.
<a href="https://cocoindex.io/docs">Docs</a> · <a href="https://cocoindex.io/docs/examples/text-embedding-qdrant/">Walkthrough</a> · <a href="https://discord.com/invite/zpA9S2DR7s">Discord</a> · <a href="https://github.com/cocoindex-io/cocoindex/tree/main/examples"><b>See all examples →</b></a>
</p>