examples/image_search/README.md
Vectors live in Qdrant, the index runs live inside a FastAPI app, and it's all plain async Python.
</p> <p align="center"> <strong>Star us ❤️ →</strong> <a href="https://github.com/cocoindex-io/cocoindex" title="Star CocoIndex on GitHub"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/star-btn-small-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/star-btn-small-light.svg"></picture></a> · <a href="https://cocoindex.io/docs/examples/image-search/" title="Read the full walkthrough"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/docs-inline-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/docs-inline-light.svg"></picture></a> · <a href="https://discord.com/invite/zpA9S2DR7s" title="Join the CocoIndex Discord"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/discord-inline-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/discord-inline-light.svg"></picture></a> </p> <div align="center"> </div>A folder of photos is searchable by meaning the moment you stop relying on filenames and tags. CLIP is the trick: it embeds an image and its caption into the same space, so a text query and a matching picture land near each other. You declare the transformation in native Python and your own types — target_state = transformation(source_state) — and the heavy lifting (incremental processing, change tracking, the managed Qdrant collection) runs in a Rust engine underneath, in live mode inside the API server, so dropping a new photo into the folder updates the index within a second.
The indexing path is short — there's no text to chunk, just one embedding per image:
.jpg / .jpeg / .png.uuid5 of the path, with the filename in the payload.The whole point is one shared space: the same CLIP model embeds images at index time and text at query time, so a cosine search with a text vector finds the nearest image vectors. Each image runs as its own processing component, so delete a photo and its point is removed automatically. Read it in pipeline.py:
@coco.fn(memo=True) # unchanged image is never re-embedded
async def process_file(file: FileLike, target: qdrant.CollectionTarget) -> None:
content = await file.read()
embedding = embed_image_bytes(content)
point = qdrant.PointStruct(
id=_image_id(file.file_path.path), # uuid5 of the path — stable
vector=embedding,
payload={"filename": str(file.file_path.path)},
)
target.declare_point(point)
def embed_query(text: str) -> list[float]: # query side — same model, text encoder
model, processor = get_clip_model()
inputs = processor(text=[text], return_tensors="pt", padding=True)
with torch.no_grad():
out = model.get_text_features(**inputs)
return _projected_features(out)[0].tolist()
api.py is a FastAPI app whose lifespan starts the flow in live mode, blocks startup until the initial sweep is READY, then keeps watching img/ while serving /search. There's no separate "build the index" step.
Step-by-step walkthrough with the shared image-text space, the Qdrant collection setup, the live-mode API server, and the React frontend.
</p>img/ and it's searchable within a second, no rebuild step.@coco.fn(memo=True) skips unchanged images; each photo is its own processing component, so deleting one removes its Qdrant point automatically.mount_collection_target creates and reconciles the collection — the vector size comes straight from model.config.projection_dim, so swapping CLIP variants just works.Needs Qdrant (vector store) and the CLIP model deps (
torch,transformers,pillow), all pulled in bypip install -e ..
1. Start Qdrant:
docker run -d -p 6333:6333 -p 6334:6334 qdrant/qdrant
2. Configure & install:
cp .env.example .env # QDRANT_URL (defaults to the local container above)
pip install -e .
3. Run it as a service — the example ships an img/ folder (a cat, a dog, an elephant, a giraffe). The server runs the index in live mode in the background and blocks startup until the first sweep finishes, so there's no separate indexing command:
python -m uvicorn api:app --reload --host 0.0.0.0 --port 8000
4. Open the frontend:
cd frontend && npm install && npm run dev # http://localhost:5173
Query "long neck" and the giraffe ranks first, then the other animals by CLIP similarity — none of which was ever tagged with a word. That's the whole point of a shared image-text space: the match is by meaning.
<a href="https://cocoindex.io/docs">Docs</a> · <a href="https://cocoindex.io/docs/examples/image-search/">Walkthrough</a> · <a href="https://discord.com/invite/zpA9S2DR7s">Discord</a> · <a href="https://github.com/cocoindex-io/cocoindex/tree/main/examples"><b>See all examples →</b></a>
</p>