examples/product_recommendation/README.md
Point it at a folder of product JSON, and it re-extracts only what changes as you edit the catalog.
</p> <p align="center"> <strong>Star us ❤️ →</strong> <a href="https://github.com/cocoindex-io/cocoindex" title="Star CocoIndex on GitHub"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/star-btn-small-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/star-btn-small-light.svg"></picture></a> · <a href="https://cocoindex.io/docs/examples/product-recommendation/" title="Read the full walkthrough"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/docs-inline-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/docs-inline-light.svg"></picture></a> · <a href="https://discord.com/invite/zpA9S2DR7s" title="Join the CocoIndex Discord"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/discord-inline-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/discord-inline-light.svg"></picture></a> </p> <div align="center"> </div>A pile of product listings has the recommendations hiding in plain sight — a pen pairs with ink refills and a notebook; a monitor pairs with a stand and an HDMI cable. But that knowledge is locked in prose. You declare the transformation in native Python and your own types — target_state = transformation(source_state) — and the heavy lifting (incremental processing, change tracking, managed graph targets) runs in a Rust engine underneath, so editing one product re-extracts one product, not the catalog.
Two node types, two relationship types, and the recommendation falls out of the graph:
Product nodes — one per listing (title, price).Taxonomy nodes — one per distinct label (gel pen, notebook, ink refill), keyed by value and shared across products.PRODUCT_TAXONOMY edges — Product → Taxonomy: what the product is.PRODUCT_COMPLEMENTARY_TAXONOMY edges — Product → Taxonomy: what a buyer might also need.Products whose complementary taxonomy matches another product's is-a taxonomy are the things to recommend together.
Because taxonomy labels are shared across products, the pipeline runs in two phases — read it top-to-bottom in main.py:
@coco.fn(memo=True) # caches each extraction by content — re-tag only changed products
async def extract_taxonomy(detail: str) -> ProductTaxonomyInfo:
client = instructor.from_litellm(litellm.acompletion, mode=instructor.Mode.JSON)
result = await client.chat.completions.create(
model=coco.use_context(LLM_MODEL), response_model=ProductTaxonomyInfo,
messages=[{"role": "system", "content": TAXONOMY_PROMPT}, {"role": "user", "content": detail}],
)
return ProductTaxonomyInfo.model_validate(result.model_dump())
@coco.fn(memo=True) # Phase 1 — per product: declare the node, extract, carry labels forward
async def process_file(file: FileLike, product_table: neo4j.TableTarget[Product]) -> ProductTaxonomies:
raw = json.loads(await file.read_text())
product_id = file.file_path.path.name.removesuffix(".json")
product_table.declare_record(row=Product(id=product_id, title=raw["title"], price=...))
info = await extract_taxonomy(PRODUCT_TEMPLATE.render(**raw))
return ProductTaxonomies(product_id, [t.name for t in info.taxonomies], ...)
@coco.fn # Phase 2 — one pass owns the shared Taxonomy nodes + both edge types
async def build_graph(products, taxonomy_table, product_taxonomy_rel, complementary_rel) -> None:
for value in {t for p in products for t in (*p.taxonomies, *p.complementary)}:
taxonomy_table.declare_record(row=Taxonomy(value=value))
for p in products:
for t in set(p.taxonomies): product_taxonomy_rel.declare_relation(from_id=p.product_id, to_id=t)
for t in set(p.complementary): complementary_rel.declare_relation(from_id=p.product_id, to_id=t)
Step-by-step walkthrough with the data model, the two-phase flow, the extraction schema, and exactly what happens on each kind of change.
</p>gel pen is one node every product can point at — not a copy per product.@coco.fn(memo=True) caches each LLM extraction by content; edit one product and only that product re-extracts, then the graph diffs — adding new nodes/edges and removing ones no longer supported anywhere.LLM_MODEL for any provider (OpenAI, Ollama, …). No DSL.LLM_MODEL is declared with detect_change=True, so swapping the model re-extracts everything against it with no cache to clear by hand.1. Start Neo4j:
docker run -d -p 7474:7474 -p 7687:7687 -e NEO4J_AUTH=neo4j/cocoindex --name cocoindex-neo4j neo4j:5.26-community
2. Configure & install:
cp .env.example .env # set OPENAI_API_KEY (or LLM_MODEL=ollama/llama3.2)
pip install -e .
3. Build the graph — the example ships a products/ folder of sample listings (pens, notebooks, monitors, …):
cocoindex update main
On the 9 sample products that's 9 Product nodes, ~40 Taxonomy nodes, and the two edge types wired up.
4. Explore the recommendations — open Neo4j Browser (neo4j / cocoindex) and ask the graph:
-- Recommend products to pair with anything that is a "gel pen":
-- find products whose is-a taxonomy matches a pen's complementary taxonomy
MATCH (:Taxonomy {value: "gel pen"})<-[:PRODUCT_TAXONOMY]-(:Product)
-[:PRODUCT_COMPLEMENTARY_TAXONOMY]->(need:Taxonomy)
MATCH (rec:Product)-[:PRODUCT_TAXONOMY]->(need)
RETURN DISTINCT rec.title
On the sample data, recommending for a pen surfaces the notepad and the multipurpose paper — exactly the cross-sell you'd want.
<a href="https://cocoindex.io/docs">Docs</a> · <a href="https://cocoindex.io/docs/examples/product-recommendation/">Walkthrough</a> · <a href="https://discord.com/invite/zpA9S2DR7s">Discord</a> · <a href="https://github.com/cocoindex-io/cocoindex/tree/main/examples"><b>See all examples →</b></a>
</p>