docs/src/content/docs/getting_started/overview.mdx
CocoIndex is an ultra-performant framework for building data processing pipelines for AI workloads, with built-in incremental processing.
CocoIndex uses a declarative, state-driven programming model. You specify what your target should look like as a function of your source data β not how to incrementally update it. CocoIndex handles change detection and applies only the necessary updates automatically.
If youβve used React, spreadsheets, or materialized views, this will feel familiar:
CocoIndex executes pipelines on a high-performance Rust engine, delivering resilient and scalable data processing.
CocoIndex tracks fine-grained dependencies and only recomputes what changed in the input data or the code. End-to-end updates drop from hours/days to seconds while keeping full correctness.
Every processing step, intermediate result, and execution path is inspectable. This helps it remain compliant with the EU AI Act for transparency, and satisfies enterprise auditability/traceability requirements.
Sources and targets plug in through a standard, open interface (no vendor lock-in). Leverage the full Python ecosystem for models, functions, and libraries.
Pipelines automatically parallelize with managed concurrency and request batching β reducing GPU cost, RPC fanout, and end-to-end latency.
The engine gracefully retries transient failures and resumes from previous progress after interruptions β eliminating manual backfills and replays.
CocoIndex removes the need for elaborate plumbing: refreshing datasets, maintaining state, handling backfills, ensuring correctness, coordinating GPUs, scaling workers, and managing infra are all handled by the engine.
CocoIndex continuously maintains and tracks state while processing only new or changed data. It is designed to support incremental processing from day zero.
What incremental processing means:
You write simple batch-style code β no delta logic, no state handling. CocoIndex automatically runs your pipeline incrementally and keeps the output up to date for serving, training, or feature computation.