examples/hn_trending_topics/README.md
This example scrapes recent HackerNews threads (and their comments) via the Algolia HN API, uses an LLM to extract topics from each message, and stores everything in Postgres. A small CLI demo prints trending topics ranked by mention score and lets you search messages by topic.
A running Postgres. If you don't have one, start a local instance with the compose file in this repo:
docker compose -f ../../dev/postgres.yaml up -d
POSTGRES_URL set, e.g.
export POSTGRES_URL="postgres://cocoindex:cocoindex@localhost/cocoindex"
An API key for the LLM. The default model is gemini/gemini-2.5-flash (set GEMINI_API_KEY). Any provider supported by litellm works — change LLM_MODEL in main.py and set the matching credential.
You can put these in a .env file in this directory; python main.py loads it automatically.
Install deps:
pip install -e .
Build/update the index (one-shot catch-up; this example doesn't use a live source):
cocoindex update main
Each run fetches the latest MAX_THREADS (default 10) threads, runs LLM topic extraction on the thread + each comment, and writes rows into coco_examples.hn_messages and coco_examples.hn_topics. CocoIndex memoizes per-message extraction, so re-running is incremental.
Query — show top trending topics, then enter a search loop:
python main.py
Or jump straight to a topic search:
python main.py "rust"