README - Cocoindex — ContextQMD

<a href="https://cocoindex.io/docs/examples/product-recommendation/" title="Build a product recommendation graph with LLM taxonomy extraction and CocoIndex — Neo4j, incremental, in plain async Python"> </a> <h1 align="center">Turn a product catalog into a recommendation graph.</h1> An LLM tags what each product is and what pairs with it; the shared taxonomy edges become a "people who bought this also need…" engine — in plain async Python.

Point it at a folder of product JSON, and it re-extracts only what changes as you edit the catalog.

Star us ❤️ → <a href="https://github.com/cocoindex-io/cocoindex" title="Star CocoIndex on GitHub"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/star-btn-small-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/star-btn-small-light.svg"></picture></a>  ·  <a href="https://cocoindex.io/docs/examples/product-recommendation/" title="Read the full walkthrough"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/docs-inline-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/docs-inline-light.svg"></picture></a>  ·  <a href="https://discord.com/invite/zpA9S2DR7s" title="Join the CocoIndex Discord"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/discord-inline-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/discord-inline-light.svg"></picture></a> <div align="center">

</div>

A pile of product listings has the recommendations hiding in plain sight — a pen pairs with ink refills and a notebook; a monitor pairs with a stand and an HDMI cable. But that knowledge is locked in prose. You declare the transformation in native Python and your own types — target_state = transformation(source_state) — and the heavy lifting (incremental processing, change tracking, managed graph targets) runs in a Rust engine underneath, so editing one product re-extracts one product, not the catalog.

How it works

Two node types, two relationship types, and the recommendation falls out of the graph:

Product nodes — one per listing (title, price).
Taxonomy nodes — one per distinct label (gel pen, notebook, ink refill), keyed by value and shared across products.
PRODUCT_TAXONOMY edges — Product → Taxonomy: what the product is.
PRODUCT_COMPLEMENTARY_TAXONOMY edges — Product → Taxonomy: what a buyer might also need.

Products whose complementary taxonomy matches another product's is-a taxonomy are the things to recommend together.

Because taxonomy labels are shared across products, the pipeline runs in two phases — read it top-to-bottom in main.py:

python

@coco.fn(memo=True)  # caches each extraction by content — re-tag only changed products
async def extract_taxonomy(detail: str) -> ProductTaxonomyInfo:
    client = instructor.from_litellm(litellm.acompletion, mode=instructor.Mode.JSON)
    result = await client.chat.completions.create(
        model=coco.use_context(LLM_MODEL), response_model=ProductTaxonomyInfo,
        messages=[{"role": "system", "content": TAXONOMY_PROMPT}, {"role": "user", "content": detail}],
    )
    return ProductTaxonomyInfo.model_validate(result.model_dump())

@coco.fn(memo=True)   # Phase 1 — per product: declare the node, extract, carry labels forward
async def process_file(file: FileLike, product_table: neo4j.TableTarget[Product]) -> ProductTaxonomies:
    raw = json.loads(await file.read_text())
    product_id = file.file_path.path.name.removesuffix(".json")
    product_table.declare_record(row=Product(id=product_id, title=raw["title"], price=...))
    info = await extract_taxonomy(PRODUCT_TEMPLATE.render(**raw))
    return ProductTaxonomies(product_id, [t.name for t in info.taxonomies], ...)

@coco.fn              # Phase 2 — one pass owns the shared Taxonomy nodes + both edge types
async def build_graph(products, taxonomy_table, product_taxonomy_rel, complementary_rel) -> None:
    for value in {t for p in products for t in (*p.taxonomies, *p.complementary)}:
        taxonomy_table.declare_record(row=Taxonomy(value=value))
    for p in products:
        for t in set(p.taxonomies):    product_taxonomy_rel.declare_relation(from_id=p.product_id, to_id=t)
        for t in set(p.complementary): complementary_rel.declare_relation(from_id=p.product_id, to_id=t)

📘 <a href="https://cocoindex.io/docs/examples/product-recommendation/">Full Tutorial →</a>

Step-by-step walkthrough with the data model, the two-phase flow, the extraction schema, and exactly what happens on each kind of change.

Why it's worth a star ⭐

Shared nodes, done right. Taxonomy labels are deduplicated and owned by a single graph pass, so gel pen is one node every product can point at — not a copy per product.
Incremental by default. @coco.fn(memo=True) caches each LLM extraction by content; edit one product and only that product re-extracts, then the graph diffs — adding new nodes/edges and removing ones no longer supported anywhere.
The graph IS the recommender. No separate model. One Cypher query walks complementary → is-a edges to surface what to cross-sell.
Plain Python, your stack. Extraction is instructor over LiteLLM — swap LLM_MODEL for any provider (OpenAI, Ollama, …). No DSL.
Honest cache busting. LLM_MODEL is declared with detect_change=True, so swapping the model re-extracts everything against it with no cache to clear by hand.

Run it

1. Start Neo4j:

docker run -d -p 7474:7474 -p 7687:7687 -e NEO4J_AUTH=neo4j/cocoindex --name cocoindex-neo4j neo4j:5.26-community

2. Configure & install:

cp .env.example .env     # set OPENAI_API_KEY (or LLM_MODEL=ollama/llama3.2)
pip install -e .

3. Build the graph — the example ships a products/ folder of sample listings (pens, notebooks, monitors, …):

cocoindex update main

On the 9 sample products that's 9 Product nodes, ~40 Taxonomy nodes, and the two edge types wired up.

4. Explore the recommendations — open Neo4j Browser (neo4j / cocoindex) and ask the graph:

cypher

-- Recommend products to pair with anything that is a "gel pen":
-- find products whose is-a taxonomy matches a pen's complementary taxonomy
MATCH (:Taxonomy {value: "gel pen"})<-[:PRODUCT_TAXONOMY]-(:Product)
      -[:PRODUCT_COMPLEMENTARY_TAXONOMY]->(need:Taxonomy)
MATCH (rec:Product)-[:PRODUCT_TAXONOMY]->(need)
RETURN DISTINCT rec.title

On the sample data, recommending for a pen surfaces the notepad and the multipurpose paper — exactly the cross-sell you'd want.

If this turned your catalog into a recommender, <a href="https://github.com/cocoindex-io/cocoindex">give CocoIndex a star ⭐</a> — it helps a lot.

<a href="https://cocoindex.io/docs">Docs</a> · <a href="https://cocoindex.io/docs/examples/product-recommendation/">Walkthrough</a> · <a href="https://discord.com/invite/zpA9S2DR7s">Discord</a> · <a href="https://github.com/cocoindex-io/cocoindex/tree/main/examples">See all examples →</a>