examples/image_search_colpali/README.md
Finer-grained retrieval on dense, text-heavy, busy images — live, in plain async Python.
</p> <p align="center"> <strong>Star us ❤️ →</strong> <a href="https://github.com/cocoindex-io/cocoindex" title="Star CocoIndex on GitHub"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/star-btn-small-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/star-btn-small-light.svg"></picture></a> · <a href="https://cocoindex.io/docs/examples/image-search-colpali/" title="Read the full walkthrough"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/docs-inline-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/docs-inline-light.svg"></picture></a> · <a href="https://discord.com/invite/zpA9S2DR7s" title="Join the CocoIndex Discord"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/discord-inline-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/discord-inline-light.svg"></picture></a> </p> <div align="center"> </div>This is the multi-vector cousin of the CLIP image search example. Same idea — type "long neck", get the giraffe back — but instead of squeezing each image into a single vector, ColPali emits a bag of vectors, one per image patch, and matches a query token-against-patch. You declare the transformation in native Python and your own types — target_state = transformation(source_state) — while incremental processing, change tracking, and the managed Qdrant collection run in a Rust engine underneath, in live mode inside the API server. The cost is more vectors per image; the payoff is retrieval that holds up on dense, text-heavy, or busy images where a single embedding blurs everything together.
The indexing path is short — there's no text to chunk, just one multi-vector embedding per image:
.jpg / .jpeg / .png.list[list[float]], not one vector.uuid5 of the path.The store does the heavy lifting on the query side: the query's bag of token vectors and an image's bag of patch vectors are scored late-interaction style — each query vector finds its best-matching patch, summed across the query. The only difference from the CLIP version is the shape of the embedding. Read it in pipeline.py:
@coco.fn(memo=True) # unchanged image is never re-embedded
async def process_file(file: FileLike, target: qdrant.CollectionTarget) -> None:
content = await file.read()
embedding = embed_image_bytes(content) # list[list[float]] — multi-vector
point = qdrant.PointStruct(
id=_image_id(file.file_path.path), # uuid5 of the path — stable
vector=embedding,
payload={"filename": str(file.file_path.path)},
)
target.declare_point(point)
# the collection itself carries the multi-vector setup:
schema = await qdrant.CollectionSchema.create(
vectors=qdrant.QdrantVectorDef(
schema=MultiVectorSchema(vector_schema=VectorSchema(dtype=np.dtype(np.float32), size=dim)),
distance="cosine",
multivector_comparator="max_sim", # late-interaction MaxSim
)
)
api.py is a FastAPI app whose lifespan starts the flow in live mode, blocks startup until the initial sweep is READY, then watches img/ while serving /search — which hands Qdrant the query's bag of vectors and lets it do the MaxSim scoring.
Step-by-step walkthrough with multi-vector embeddings, the MaxSim multivector collection, the live-mode API server, and how it differs from the CLIP sibling.
</p>MultiVectorSchema + multivector_comparator="max_sim" makes Qdrant do the late-interaction scoring; the query side just hands over the query's bag of vectors.img/ is searchable within a second, no rebuild step.@coco.fn(memo=True) skips unchanged images; each photo is its own processing component, so deleting one removes its Qdrant point automatically.Needs Qdrant (vector store) plus the ColPali model deps (
torch,transformers,pillow), all pulled in bypip install -e .(cocoindex[colpali,qdrant]).
1. Start Qdrant:
docker run -d -p 6333:6333 -p 6334:6334 qdrant/qdrant
2. Configure & install:
cp .env.example .env # QDRANT_URL (defaults to the local container above)
pip install -e .
3. Run it as a service — the example ships an img/ folder (a cat, a dog, an elephant, a giraffe). The server runs the index in live mode in the background and blocks startup until the first sweep finishes, so there's no separate indexing command:
python -m uvicorn api:app --reload --host 0.0.0.0 --port 8000
4. Open the frontend:
cd frontend && npm install && npm run dev # http://localhost:5173
The React app posts your query to /search, which embeds the text into ColPali's per-token space and runs a MaxSim search in Qdrant — the match is by meaning, patch by patch, never by metadata.
<a href="https://cocoindex.io/docs">Docs</a> · <a href="https://cocoindex.io/docs/examples/image-search-colpali/">Walkthrough</a> · <a href="https://discord.com/invite/zpA9S2DR7s">Discord</a> · <a href="https://github.com/cocoindex-io/cocoindex/tree/main/examples"><b>See all examples →</b></a>
</p>