cookbook/data_labeling/image_search/README.md
A working image search engine.
One AgentOS process, one HTML file, four endpoints.
This is the productized version of _09_image_extraction_to_vectordb — that cookbook is the minimal pipeline; this one wraps it in a workflow, endpoints, and a UI.
uv venv .venvs/image_search --python 3.12
source .venvs/image_search/bin/activate
uv pip install -r cookbook/data_labeling/image_search/requirements.txt
./cookbook/scripts/run_pgvector.sh
That brings up agnohq/pgvector:18 on port 5532 with database ai and credentials ai/ai — which is what settings.py expects out of the box. Point DB_URL at your own instance if needed.
export GOOGLE_API_KEY="..."
The demo uses gemini-3.5-flash for vision + structured output and gemini-embedding-001 for embeddings.
fastapi dev cookbook/data_labeling/image_search/run.py --port 7777
Then open http://localhost:7777/ui.
The first time the page loads it will be empty. Click Reindex to fire the ingest workflow against the 38 built-in Lorem Picsum URLs, processed INGEST_CONCURRENCY at a time (default 3) against gemini-3.5-flash. When it completes, gallery and search are populated.
| Endpoint | Source | Purpose |
|---|---|---|
GET /ui | explicit route | Single-file HTML UI |
GET /knowledge/content | AgentOS (native) | Gallery list (paginated) |
POST /knowledge/search | AgentOS (native) | Vector search |
POST /workflows/image-ingest/runs | AgentOS (native) | Reindex (background, polled) |
All four routes come from a single AgentOS(knowledge=..., workflows=..., base_app=...) call.
Ingest — the image-ingest workflow fetches each URL (httpx, redirects on), passes the bytes to a Gemini agent with output_schema=ImageDescription, and inserts the structured result into one shared Knowledge instance. The flattened description (caption + subjects + scene + style + tags) becomes the embedded text; the full ImageDescription plus the source URL becomes the metadata. URLs are processed concurrently with a ThreadPoolExecutor. The workflow is idempotent — items already present in contents_db are skipped.
Gallery — the UI hits GET /knowledge/content. Items render as cards with the image, caption, subjects, scene, visual style, and tag chips.
Search — the UI hits POST /knowledge/search with search_type=hybrid. PgVector combines vector similarity (cosine over GeminiEmbedder vectors) with PostgreSQL full-text search (to_tsvector + websearch_to_tsquery) into one fused score, so car matches cars via stemming without dragging in carnivore. The top hits come back with their full metadata for rendering.
Reindex — the UI's Reindex button hits the workflow endpoint with background=true, polls the run for status, and refreshes the gallery on completion. Top-right counter shows N indexed.
In settings.py:
IMAGE_URLS — swap the Picsum list for your own URLs (e.g. a list pulled from S3).INGEST_CONCURRENCY — raise for faster ingest on a higher quota.EXTRACTOR_MODEL_ID — bump to gemini-3.5-pro for higher-quality descriptions at slower / pricier ingest.EMBEDDER_MODEL_ID — swap to a different Gemini embedding model.In schemas.py:
ImageDescription fields determine what gets embedded and what the UI can render. Keep new fields short and search-flavored.This is demo-grade. For production:
authorization=True with a JWTValidator).