README.md
Deutsch | English | Español | français | 日本語 | 한국어 | Português | Русский | 中文
</div> <h2 align="center">Built with CocoIndex ❤️</h2> <!-- Flagship: CocoIndex-code — full-bleed clickable hero --> <p align="center"> <a href="https://cocoindex.io/cocoindex-code" title="CocoIndex-code — flagship MCP server for AI coding agents: AST-aware, incremental, semantic code index. Claude Code and Cursor see your whole repo instantly."><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/cocoindex-code-hero-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/cocoindex-code-hero-light.svg"></picture></a> </p> <p align="center"><a href="examples"><b>See all 20+ examples · updated every week →</b></a></p> <h3 align="center">Get started</h3>pip install -U cocoindex
Declare what should be in your target — CocoIndex keeps it in sync forever, recomputing only the Δ.
import cocoindex as coco
from cocoindex.connectors import localfs, postgres
from cocoindex.ops.text import RecursiveSplitter
@coco.fn(memo=True) # ← cached by hash(input) + hash(code)
async def index_file(file, table):
for chunk in RecursiveSplitter().split(await file.read_text()):
table.declare_row(text=chunk.text, embedding=embed(chunk.text))
@coco.fn
async def main(src):
table = await postgres.mount_table_target(PG, table_name="docs")
table.declare_vector_index(column="embedding")
await coco.mount_each(index_file, localfs.walk_dir(src).items(), table)
coco.App(coco.AppConfig(name="docs"), main, src="./docs").update_blocking()
Drop in our <a href="skills/cocoindex/"><b>CocoIndex skill</b></a> so your agent writes correct v1 code — concepts, APIs, patterns, all in one file.
<sub>See <a href="https://cocoindex.io/docs/getting_started/ai_coding_agents/">Use with AI coding agents</a> for install steps.</sub>
</p> <p align="center"> <a href="https://cocoindex.io/docs/getting_started/quickstart" title="Full CocoIndex quickstart — install, declare sources and targets, run the incremental engine, set up vector search or knowledge graph in 5 minutes"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/quickstart-btn-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/quickstart-btn-light.svg"></picture></a> <a href="https://cocoindex.io/docs/programming_guide/core_concepts" title="Learn the CocoIndex core concepts — sources, targets, flows, incremental engine, lineage"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/learn-concept-btn-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/learn-concept-btn-light.svg"></picture></a> </p> <p align="center"> <a href="https://github.com/cocoindex-io/cocoindex" title="Star CocoIndex on GitHub — open-source Python framework for live agent context"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/comm-github-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/comm-github-light.svg"></picture></a> </p> <h2 align="center">React — <em>for data engineering</em></h2> <p align="center"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/react4de-hero-dark.svg"> <source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/react4de-hero-light.svg"> </picture> </p> <p align="center"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/either-side-change-dark.svg"> <source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/either-side-change-light.svg"> </picture> </p> <p align="center"><a href="https://cocoindex.io/react-cocoindex"><b>See the React ↔ CocoIndex mental model →</b></a></p> <h2 align="center"><em>Incremental engine</em> for long-horizon agents</h2> <p align="center"> Data transformation for any engineer, designed for AI workloads —with a smart incremental engine for <em>always-fresh, explainable data.</em>
</p> <p align="center"> <a href="https://cocoindex.io/docs/programming_guide/core_concepts" title="Learn the CocoIndex core concepts — sources, targets, flows, incremental engine, lineage"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/learn-concept-btn-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/learn-concept-btn-light.svg"></picture></a> </p> <p align="center"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/incremental-engine-dark.svg"> <source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/incremental-engine-light.svg"> </picture> </p> <h2 align="center">Why <em>incremental?</em></h2> <p align="center">Your agents are only as good as the data they see. Batch pipelines drift stale. CocoIndex stays live — and only runs the Δ.</p> <p align="center"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/why-incremental-dark.svg"> <source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/why-incremental-dark.svg"> </picture> </p> <h2 align="center">What can you <em>build?</em></h2> <p align="center"><a href="examples" title="Browse all 20+ CocoIndex examples on GitHub — code, PDF, HN, knowledge graph, podcast, CSV-to-Kafka, image, and more"><b>See all 20+ examples · updated every week →</b></a></p> <p align="center"><b>Working starters from <a href="examples">the examples tree</a> — clone, plug your source, ship.</b></p> <p align="center"> <a href="examples/code_embedding" title="Real-time code index — walk a git repo, chunk source files with an AST-aware splitter, embed with sentence-transformers, and upsert to pgvector / LanceDB. Fully incremental: only files touched by the latest commit re-embed. Good for coding agents, code review, semantic find-by-meaning."></a> </p> <p align="center"> <a href="examples/pdf_embedding" title="PDF → RAG index — ingest PDFs from local / S3 / Google Drive, extract text, chunk with a recursive splitter, embed each chunk, and upsert into pgvector / LanceDB with a vector index. Classic RAG stack, incremental — only edited PDFs re-embed."></a> </p> <p align="center"> <a href="examples/hn_trending_topics" title="HN trending topics — fetch Hacker News threads via the Algolia API, recursively pull nested comments, LLM-extract typed topic lists per message with Gemini 2.5 Flash, and rank topics by weighted mention count (thread = 5 points, comment = 1 point)."></a> </p> <p align="center"> <a href="examples/conversation_to_knowledge" title="Conversation → knowledge graph — pull people, topics, decisions, and action items out of meeting transcripts, Slack, podcasts, or support calls with an LLM extractor, and upsert into Neo4j or Kuzu. Incremental: only changed turns re-extract."></a> </p> <p align="center"> <a href="examples/multi_codebase_summarization" title="Multi-repo summarization — walk N git repositories, extract READMEs / public APIs / modules, LLM-summarize each one, and roll up into a single top-level summary. Incremental: only repos with new commits re-run."></a> </p> <p align="center"> <a href="examples/patient_intake_extraction_baml" title="Structured extraction — read messy forms, PDFs, invoices, or free-text and extract typed, schema-validated fields with BAML or DSPy, then write rows into Postgres or a warehouse. Incremental: only changed documents re-extract."></a> </p> <p align="center"> <a href="examples/conversation_to_knowledge" title="Podcast → knowledge graph — download YouTube podcast audio, transcribe with speaker diarization (Whisper / AssemblyAI), LLM-extract structured statements and entities per speaker, resolve duplicates across episodes with embeddings, and store the whole graph (speakers, statements, topics) in SurrealDB or Neo4j. Incremental."></a> </p> <p align="center"> <a href="examples/csv_to_kafka" title="CSV → Kafka live — watch a folder of CSV files (local or S3) and publish each row as a JSON message keyed by its primary key to a Kafka topic on StreamNative / Confluent / self-hosted. Sub-second incremental — only changed rows publish."></a> </p> <p align="center"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/share-build-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/share-build-light.svg"></picture></p> <p align="center">Building something with CocoIndex? <b>We want to see it.</b> Tag <a href="https://x.com/cocoindex_io" title="Tag @cocoindex_io on X to showcase your CocoIndex project">@cocoindex_io</a> on X or drop a link in <a href="https://discord.com/invite/zpA9S2DR7s" title="Share your project in the CocoIndex Discord #showcase channel">#showcase</a> on Discord. We'll boost it. 🥥</p> <h2 align="center">Community</h2> <table width="100%" border="0" cellspacing="0" role="presentation"> <tr> <td align="center" valign="middle" width="25%"> <a href="https://discord.com/invite/zpA9S2DR7s" title="Join the CocoIndex Discord — community chat, showcase, help, release notes"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/comm-discord-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/comm-discord-light.svg"></picture></a> </td> <td align="center" valign="middle" width="25%"> <a href="https://www.youtube.com/@cocoindex-io" title="Subscribe to the CocoIndex YouTube channel — live demos, tutorials, and deep dives"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/comm-youtube-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/comm-youtube-light.svg"></picture></a> </td> <td align="center" valign="middle" width="25%"> <a href="https://cocoindex.io/blogs/" title="Read the CocoIndex blog — engineering posts, release notes, and tutorials"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/comm-blog-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/comm-blog-light.svg"></picture></a> </td> <td align="center" valign="middle" width="25%"> <a href="https://x.com/cocoindex_io" title="Follow @cocoindex_io on X (Twitter) for release notes, demos, and updates"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/comm-x-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/comm-x-light.svg"></picture></a> </td> </tr> </table> <p align="center"> </p> <p align="center"> <b>We are <em>so</em> excited to meet you.</b>Every typo fix, new connector, doc tweak, or full-on rewrite makes CocoIndex better.
Come hang out — big PRs and small ones, both welcome.
</p> <p align="center"> 📝 <a href="https://cocoindex.io/docs/contributing/guide"><b>Read the contributing guide</b></a> · 🐛 <a href="https://github.com/cocoindex-io/cocoindex/labels/good%20first%20issue"><b>good first issues</b></a> · 💬 <a href="https://discord.com/invite/zpA9S2DR7s"><b>Say hi on Discord</b></a> </p> <h2 align="center">CocoIndex <em>Enterprise</em></h2> <p align="center"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/enterprise-scale-dark.svg"> <source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/enterprise-scale-light.svg"> </picture> </p> <h3 align="center">Large corpus — <em>built for enterprise scale.</em></h3> <p align="center"> Incremental compute is the only way to keep large corpora fresh without re-embedding them every cycle.CocoIndex scales from a single repo to petabyte-scale stores — parallel by default, delta-only by design.
</p> <h3 align="center">Process once. <em>Reconcile forever.</em></h3> <p align="center"> When a source changes, CocoIndex identifies the affected records, propagates the changeacross joins and lookups, updates the target, and retires stale rows —
without touching anything that didn't change.
</p> <h3 align="center">Built on a <em>Rust engine.</em></h3> <p align="center"> The core is Rust — production-grade from day zero.Parallel chunking, zero-copy transforms where possible, and failure isolation
so one bad record doesn't stall the flow.
</p> <p align="center"> <a href="https://cocoindex.io/enterprise/" title="Explore CocoIndex Enterprise — PB-scale incremental data pipelines for AI agents"></a> </p> <p align="center"><sub>Apache 2.0 · © CocoIndex contributors 🥥</sub></p>