cookbook/data_labeling/image_search/README.md
A working searchable image library, end-to-end. An extraction agent describes each image with search-tuned metadata, descriptions are embedded and stored in a vector DB, and a browser UI lets you query the library in natural language. One AgentOS process, one HTML file, four endpoints.
This is the productized version of
_09_image_extraction_to_vectordb —
that cookbook is the minimal pipeline; this one wraps it in a workflow,
endpoints, and a UI.
| Endpoint | Source | Purpose |
|---|---|---|
GET /ui | explicit route | Single-file HTML UI |
GET /knowledge/content | AgentOS (native) | Gallery list (paginated) |
POST /knowledge/search | AgentOS (native) | Vector search |
POST /workflows/image-ingest/runs | AgentOS (native) | Reindex (background, polled) |
All four routes come from a single AgentOS(knowledge=..., workflows=..., base_app=...)
call. The only custom Python is the labeling agent and the ingest loop.
Ingest — the image-ingest workflow fetches each URL (httpx,
redirects on), passes the bytes to a Gemini agent with
output_schema=ImageDescription, and inserts the structured result into
one shared Knowledge instance. The flattened description (caption +
subjects + scene + style + tags) becomes the embedded text; the full
ImageDescription plus the source URL becomes the metadata. URLs are
processed concurrently with a ThreadPoolExecutor. The workflow is
idempotent — items already present in contents_db are skipped.
Gallery — the UI hits GET /knowledge/content. Items render as
cards with the image, caption, subjects, scene, visual style, and tag
chips.
Search — the UI hits POST /knowledge/search with
search_type=hybrid. PgVector combines vector similarity (cosine
over GeminiEmbedder vectors) with PostgreSQL full-text search
(to_tsvector + websearch_to_tsquery) into one fused score, so
car matches cars via stemming without dragging in carnivore.
The top hits come back with their full metadata for rendering.
Reindex — the UI's Reindex button hits the workflow endpoint with
background=true, polls the run for status, and refreshes the gallery
on completion. Top-right counter shows N indexed.
cookbook/data_labeling/image_search/
├── README.md
├── TEST_LOG.md
├── requirements.in
├── requirements.txt
├── generate_requirements.sh
├── run.py AgentOS + UI route (entrypoint)
├── settings.py URLs, paths, model IDs, concurrency
├── schemas.py ImageDescription + flatten helper
├── db.py get_db(), get_knowledge() — shared singletons
├── workflows/
│ └── ingest.py extractor agent + Step + Workflow
├── public/
│ └── index.html single-file UI (Tailwind + Alpine + Lucide — all CDN)
└── data/ gitignored, kept for future local artifacts
uv venv .venvs/image_search --python 3.12
source .venvs/image_search/bin/activate
uv pip install -r cookbook/data_labeling/image_search/requirements.txt
./cookbook/scripts/run_pgvector.sh
That brings up agnohq/pgvector:18 on port 5532 with database ai and
credentials ai/ai — which is what settings.py expects
out of the box. Point DB_URL at your own instance if you'd rather.
export GOOGLE_API_KEY="..."
The demo uses gemini-3.5-flash for vision + structured output and
gemini-embedding-001 for embeddings.
fastapi dev cookbook/data_labeling/image_search/run.py --port 7777
Then open http://localhost:7777/ui.
(Port 7777 because Docker Desktop's webview holds 8000, the fastapi-dev default, which silently intercepts requests in confusing ways.)
The first time the page loads it will be empty. Click Reindex to fire
the ingest workflow against the 38 built-in Lorem Picsum URLs. With
INGEST_CONCURRENCY=8 it takes ~15-20s against gemini-3.5-flash. When
it completes, gallery and search are populated.
In settings.py:
IMAGE_URLS — swap the Picsum list for your own URLs (e.g. a list
pulled from S3).INGEST_CONCURRENCY — raise for faster ingest on a higher quota.EXTRACTOR_MODEL_ID — bump to gemini-3.5-pro for higher-quality
descriptions at slower / pricier ingest.EMBEDDER_MODEL_ID — swap to a different Gemini embedding model.In schemas.py:
ImageDescription fields determine what gets embedded and what the
UI can render. Keep new fields short and search-flavored.This is demo-grade. For production:
authorization=True with a JWTValidator).