Back to Agno

Image Search

cookbook/data_labeling/image_search/README.md

2.6.114.5 KB
Original Source

Image Search

A working image search engine.

  1. An extraction agent describes each image with search-tuned metadata
  2. Descriptions are embedded and stored in a vector DB
  3. A browser UI lets you query the library in natural language

One AgentOS process, one HTML file, four endpoints.

This is the productized version of _09_image_extraction_to_vectordb — that cookbook is the minimal pipeline; this one wraps it in a workflow, endpoints, and a UI.

Get started

1. Create a virtual environment

bash
uv venv .venvs/image_search --python 3.12
source .venvs/image_search/bin/activate

2. Install dependencies

bash
uv pip install -r cookbook/data_labeling/image_search/requirements.txt

3. Start pgvector

bash
./cookbook/scripts/run_pgvector.sh

That brings up agnohq/pgvector:18 on port 5532 with database ai and credentials ai/ai — which is what settings.py expects out of the box. Point DB_URL at your own instance if needed.

4. Set your API key

bash
export GOOGLE_API_KEY="..."

The demo uses gemini-3.5-flash for vision + structured output and gemini-embedding-001 for embeddings.

5. Serve

bash
fastapi dev cookbook/data_labeling/image_search/run.py --port 7777

Then open http://localhost:7777/ui.

The first time the page loads it will be empty. Click Reindex to fire the ingest workflow against the 38 built-in Lorem Picsum URLs, processed INGEST_CONCURRENCY at a time (default 3) against gemini-3.5-flash. When it completes, gallery and search are populated.

What you get

EndpointSourcePurpose
GET /uiexplicit routeSingle-file HTML UI
GET /knowledge/contentAgentOS (native)Gallery list (paginated)
POST /knowledge/searchAgentOS (native)Vector search
POST /workflows/image-ingest/runsAgentOS (native)Reindex (background, polled)

All four routes come from a single AgentOS(knowledge=..., workflows=..., base_app=...) call.

How it works

  1. Ingest — the image-ingest workflow fetches each URL (httpx, redirects on), passes the bytes to a Gemini agent with output_schema=ImageDescription, and inserts the structured result into one shared Knowledge instance. The flattened description (caption + subjects + scene + style + tags) becomes the embedded text; the full ImageDescription plus the source URL becomes the metadata. URLs are processed concurrently with a ThreadPoolExecutor. The workflow is idempotent — items already present in contents_db are skipped.

  2. Gallery — the UI hits GET /knowledge/content. Items render as cards with the image, caption, subjects, scene, visual style, and tag chips.

  3. Search — the UI hits POST /knowledge/search with search_type=hybrid. PgVector combines vector similarity (cosine over GeminiEmbedder vectors) with PostgreSQL full-text search (to_tsvector + websearch_to_tsquery) into one fused score, so car matches cars via stemming without dragging in carnivore. The top hits come back with their full metadata for rendering.

  4. Reindex — the UI's Reindex button hits the workflow endpoint with background=true, polls the run for status, and refreshes the gallery on completion. Top-right counter shows N indexed.

Tuning

In settings.py:

  • IMAGE_URLS — swap the Picsum list for your own URLs (e.g. a list pulled from S3).
  • INGEST_CONCURRENCY — raise for faster ingest on a higher quota.
  • EXTRACTOR_MODEL_ID — bump to gemini-3.5-pro for higher-quality descriptions at slower / pricier ingest.
  • EMBEDDER_MODEL_ID — swap to a different Gemini embedding model.

In schemas.py:

  • The ImageDescription fields determine what gets embedded and what the UI can render. Keep new fields short and search-flavored.

Productionizing

This is demo-grade. For production:

  • Auth on the AgentOS (authorization=True with a JWTValidator).
  • Presigned URLs in place of public-read S3.
  • CloudFront in front of the bucket for cold-load latency.
  • Background worker pool for ingest at real scale; the in-process Workflow is fine up to maybe a few thousand items.
  • Move from the local Docker pgvector to a managed Postgres (RDS, Planetscale, etc.) once you outgrow a laptop.