examples/meeting_notes_graph_neo4j/README.md
Extract structured information from meeting notes stored in Google Drive and build a knowledge graph in Neo4j. The flow ingests Markdown notes, splits them by headings into per-meeting sections, uses an LLM (via LiteLLM + instructor) to parse participants, organizer, time, and tasks, and writes nodes and relationships into the graph.
Please drop CocoIndex on Github a star to support us and stay tuned for more updates. Thank you so much 🥥🤗.
Meeting nodes — one per meeting section, keyed by a stable integer id
derived from (note_file, date)Person nodes — canonical organizers, participants, and task assignees,
deduplicated by an embedding + LLM entity-resolution pass (so "Alice",
"Alice Chen", and "alice c." collapse to a single node)Task nodes — tasks decided in meetings (keyed by description)ATTENDED — Person → Meeting (with is_organizer flag)DECIDED — Meeting → TaskASSIGNED_TO — Person → TaskThe source is one or more Google Drive folders shared with a service account. The flow watches for changes and keeps the graph up to date incrementally.
The pipeline runs in three phases:
# / ##) into meeting sections, and for each section
extract a structured Meeting via LiteLLM + instructor (date, note,
organizer, participants, tasks with assignees). Meeting and Task nodes
plus DECIDED edges are declared in this phase. Raw person names are
carried forward.Person nodes are declared, then
ATTENDED and ASSIGNED_TO edges are wired up using resolved names.CocoIndex reconciles changes incrementally — re-running after editing one note only re-processes the affected sections, and the resolution phase only re-runs when the set of raw names changes.
A running Neo4j 5.18+ instance:
docker run -d \
-p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/cocoindex \
--name cocoindex-neo4j \
neo4j:5.26-community
The browser UI is at http://localhost:7474; log in with neo4j /
cocoindex.
Why 5.18+? Vector index DDL (
CREATE VECTOR INDEX … OPTIONS { indexConfig: {...} }) shipped in 5.18. Older Neo4j 5 servers need thedb.index.vector.createNodeIndexprocedure, which this connector doesn't emit. The flow itself doesn't use vector indexes, but the connector requires 5.18+ for parity.
An LLM key (defaults to OpenAI; configure via LLM_MODEL for other
providers — see LiteLLM providers).
A Google Cloud service account with read access to the source folders, and the folder IDs you want to ingest. See Setup for Google Drive.
Set the following variables (copy .env.example to .env and fill in):
export OPENAI_API_KEY=sk-...
export GOOGLE_SERVICE_ACCOUNT_CREDENTIAL=/absolute/path/to/service_account.json
export GOOGLE_DRIVE_ROOT_FOLDER_IDS=folderId1,folderId2
export NEO4J_URI=bolt://localhost:7687
export NEO4J_USER=neo4j
export NEO4J_PASSWORD=cocoindex
export NEO4J_DATABASE=neo4j
export LLM_MODEL=openai/gpt-5.4
export RESOLUTION_LLM_MODEL=openai/gpt-5-mini # used for entity resolution
Then:
set -a && source .env && set +a
Install dependencies:
uv pip install -e .
Build/update the graph:
cocoindex update main
Open Neo4j Browser at http://localhost:7474, log in, and run Cypher queries:
// All relationships
MATCH p=()-->() RETURN p LIMIT 100
// Who attended which meetings (including organizer; one edge per attendee)
MATCH (p:Person)-[:ATTENDED]->(m:Meeting)
RETURN p.name, m.note_file, m.time, m.id
// Tasks decided in meetings
MATCH (m:Meeting)-[:DECIDED]->(t:Task)
RETURN m.note_file, m.time, t.description
// Task assignments
MATCH (p:Person)-[:ASSIGNED_TO]->(t:Task)
RETURN p.name, t.description
// Meetings someone organized
MATCH (p:Person)-[r:ATTENDED {is_organizer: true}]->(m:Meeting)
RETURN p.name, m.note_file, m.time
To wipe the graph between runs:
MATCH (n) DETACH DELETE n
You can also use cypher-shell from the command line:
docker exec -it cocoindex-neo4j cypher-shell -u neo4j -p cocoindex \
"MATCH (p:Person)-[:ATTENDED]->(m:Meeting) RETURN p.name, m.note_file, m.time"