Back to Cocoindex

README

examples/meeting_notes_graph_falkordb/README.md

1.0.1410.6 KB
Original Source
<p align="center"> <a href="https://cocoindex.io/docs/examples/meeting-notes-to-knowledge-graph/" title="Turn Google Drive meeting notes into a self-updating knowledge graph with CocoIndex and FalkorDB — LLM extraction, embedding-based person resolution, in plain async Python"> </a> </p> <h1 align="center">Turn meeting notes into a <em>self-updating</em> graph in FalkorDB.</h1> <p align="center"> <b>An LLM pulls the organizer, participants, and tasks out of each meeting; an embedding + LLM pass collapses "Alice", "Alice Chen", and "alice c." into <em>one</em> Person node — into FalkorDB, in plain async Python.</b>

Point it at a Drive folder of Markdown notes, and it re-extracts only the note you edited, then reconciles the graph.

</p> <p align="center"> <strong>Star us&nbsp;❤️&nbsp;→</strong>&nbsp;<a href="https://github.com/cocoindex-io/cocoindex" title="Star CocoIndex on GitHub"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/star-btn-small-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/star-btn-small-light.svg"></picture></a> &nbsp;·&nbsp; <a href="https://cocoindex.io/docs/examples/meeting-notes-to-knowledge-graph/" title="Read the full walkthrough"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/docs-inline-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/docs-inline-light.svg"></picture></a> &nbsp;·&nbsp; <a href="https://discord.com/invite/zpA9S2DR7s" title="Join the CocoIndex Discord"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/discord-inline-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/discord-inline-light.svg"></picture></a> </p> <div align="center">

</div>

This is the meeting-notes knowledge graph, targeting FalkorDB instead of Neo4j — a Redis-based property graph you talk to in Cypher. Meeting notes are a graph pretending to be a folder of documents: every note records who ran the meeting, who showed up, what got decided, and who owns each task. But it's prose, scattered across a shared drive, so you can full-text search it and not much else. You declare the transformation in native Python and your own types — target_state = transformation(source_state) — and the heavy lifting (incremental processing, change tracking, managed graph targets) runs in a Rust engine underneath, so editing one note re-extracts one note, and the graph reconciles itself: no orphaned people, no stale edges, no cleanup scripts.

How it works

Three node types, three relationship types, and "who is on the hook for what" becomes an edge you traverse:

  • Meeting nodes — one per meeting section, keyed by a stable integer id derived from (note_file, date).
  • Person nodes — canonical organizers, participants, and assignees, deduplicated by an embedding + LLM entity-resolution pass.
  • Task nodes — tasks decided in meetings, keyed by description.
  • ATTENDED edges — Person → Meeting, carrying an is_organizer flag. DECIDED edges — Meeting → Task. ASSIGNED_TO edges — Person → Task.

Because people are shared across notes, the pipeline runs in three phases — read it top-to-bottom in main.py:

python
@coco.fn(memo=True)  # Phase 1 — per note: split into meetings, declare Meeting/Task + DECIDED, carry raw names forward
async def process_file(file, meeting_table, task_table, decided_rel) -> list[MeetingExtraction]:
    for section in _split_meetings(await file.read_text()):
        extracted = await extract_meeting(section)
        meeting_id = await id_generator.next_id(extracted.time)
        meeting_table.declare_record(row=Meeting(id=meeting_id, ...))
        for task in extracted.tasks:
            task_table.declare_record(row=Task(description=task.description))
            decided_rel.declare_relation(from_id=meeting_id, to_id=task.description)
        ...

@coco.fn(memo=True)  # Phase 2 — collapse "Alice" / "Alice Chen" / "alice c." into canonical names
async def _resolve_persons(raw_persons: set[str]) -> ResolvedEntities:
    return await resolve_entities(entities=raw_persons, embedder=coco.use_context(EMBEDDER),
                                  resolve_pair=LlmPairResolver(model=coco.use_context(RESOLUTION_LLM_MODEL)))

@coco.fn              # Phase 3 — declare canonical Person nodes + ATTENDED / ASSIGNED_TO using resolved names
async def create_person_relations(meetings, persons, person_table, attended_rel, assigned_rel) -> None:
    for canonical_name in persons.canonicals():  person_table.declare_record(row=Person(name=canonical_name))
    ...

Extraction is instructor over LiteLLM with your own Pydantic models; DECIDED and ASSIGNED_TO carry no payload, so the FalkorDB connector derives their identity from the endpoints — one edge per pair.

<p align="center"> 📘 <b><a href="https://cocoindex.io/docs/examples/meeting-notes-to-knowledge-graph/">Full Tutorial →</a></b>

The closest walkthrough is the Neo4j version — same extraction, resolution, and three-phase flow; only the graph store differs. Step-by-step coverage of the property-graph schema, entity resolution, and exactly what happens on each kind of change.

</p>

Why it's worth a star ⭐

  • Entity resolution built in. CocoIndex's entity_resolution op embeds every raw name, filters by vector similarity, and asks the LLM to confirm only the close pairs — so the same person written five ways collapses to one node, cheaply.
  • Cross-file nodes, owned in one place. People are shared across notes, so no single note's component can own a Person node. The two cross-file phases own the canonical set and the person-touching edges, exactly once.
  • Incremental by default. @coco.fn(memo=True) caches each extraction by content; edit one note and only that note re-extracts, then resolution and the graph diff. A no-change re-run makes zero LLM calls.
  • Two models on purpose. A stronger LLM_MODEL does the structured extraction; a cheaper RESOLUTION_LLM_MODEL confirms resolution pairs — both are LiteLLM provider strings you can swap.
  • Honest cache busting. The model ids and embedder are declared with detect_change=True, so swapping any of them re-extracts against it with no cache to clear by hand.

Run it

1. Start FalkorDB (Docker) — the image bundles a browser UI on port 3000:

sh
docker run -d -p 6379:6379 -p 3000:3000 --name cocoindex-falkordb falkordb/falkordb:latest

2. Configure & install — this source reads notes from one or more Google Drive folders shared with a service account (see Setting up a service account):

sh
cp .env.example .env     # set OPENAI_API_KEY, GOOGLE_SERVICE_ACCOUNT_CREDENTIAL, GOOGLE_DRIVE_ROOT_FOLDER_IDS
pip install -e .

Both the extraction and resolution models default to openai/gpt-5-mini; override LLM_MODEL / RESOLUTION_LLM_MODEL for any LiteLLM provider. FALKORDB_URI and FALKORDB_GRAPH default to falkor://localhost:6379 and meeting_notes.

3. Build the graph:

sh
cocoindex update main

4. Explore the graph — open the bundled FalkorDB Browser at localhost:3000, select the meeting_notes graph, and ask:

cypher
-- Who attended which meetings (including organizer; one edge per attendee)
MATCH (p:Person)-[:ATTENDED]->(m:Meeting)
RETURN p.name, m.note_file, m.time

-- Everything one person is on the hook for
MATCH (p:Person {name: "Alice Chen"})-[:ASSIGNED_TO]->(t:Task)
RETURN t.description

-- Meetings someone organized
MATCH (p:Person)-[r:ATTENDED {is_organizer: true}]->(m:Meeting)
RETURN p.name, m.note_file, m.time

This pipeline is the docs knowledge graph plus an entity-resolution pass — the natural next step when the LLM names the same thing two ways. Want Neo4j instead? See the Neo4j variant.


<p align="center"> If this turned your shared drive into a graph, <a href="https://github.com/cocoindex-io/cocoindex"><b>give CocoIndex a star ⭐</b></a> — it helps a lot.

<a href="https://cocoindex.io/docs">Docs</a> · <a href="https://cocoindex.io/docs/examples/meeting-notes-to-knowledge-graph/">Walkthrough</a> · <a href="https://discord.com/invite/zpA9S2DR7s">Discord</a> · <a href="https://github.com/cocoindex-io/cocoindex/tree/main/examples"><b>See all examples →</b></a>

</p>