docs/mintlify/cloud/sync/overview.mdx
Chroma Sync is a managed ingestion service for Chroma Cloud. Point a source — an S3 bucket, a GitHub repository, a website, or an individual file upload — at a Chroma database, and Chroma parses, chunks, embeds, and indexes the data into a collection that's ready to query. No ingest infrastructure to write, no embedding API keys to manage. The Sync API is available to all Chroma Cloud users and the first $5 of usage is free with a new account.
Sync runs the same pipeline regardless of source:
Chroma Sync supports four source types. Each has its own walkthrough and configuration reference:
Need a source type that isn't here? Email [email protected].
Chroma Sync has three primary concepts: source types, sources, and invocations.
A source type defines a kind of entity that can be chunked, embedded, and indexed (e.g. S3, GitHub, Web, File Upload). A source is a configured instance of a source type — for example, a specific S3 bucket with credentials and a path prefix. An invocation is one sync run over a source's data; each invocation produces or appends to one Chroma collection.
Every source, regardless of type, is configured with a target database and an embedding configuration. Source-type-specific fields (bucket name, repository, starting URL, etc.) are documented on each source type's page.
{
"database_name": "string",
"embedding": {
"dense": {
"model": "Qwen/Qwen3-Embedding-0.6B"
}
}
}
database_name is the Chroma database in which collections will be created. The database must already exist.embedding.dense.model is the dense embedding model. Currently only Qwen/Qwen3-Embedding-0.6B is supported. Reach out to [email protected] to request additional models.You can optionally configure sparse embeddings alongside dense embeddings:
{
"embedding": {
"dense": { "model": "Qwen/Qwen3-Embedding-0.6B" },
"sparse": {
"model": "Chroma/BM25",
"key": "sparse_embedding"
}
}
}
embedding.sparse.model — Chroma/BM25 or prithivida/Splade_PP_en_v1.embedding.sparse.key — metadata key under which sparse embeddings are stored.You can also override the chunking strategy:
{
"chunking": {
"type": "tree_sitter",
"max_size_bytes": 8192
}
}
chunking.type — tree_sitter (syntax-aware, with max_size_bytes) or lines (line-based, with max_lines and max_size_bytes).Each invocation may specify a target collection:
{
"target_collection_name": "string"
}
target_collection_name is the Chroma collection to write into. The collection is created on first use, or appended to if it already exists. Required for GitHub and Web invocations; optional for S3 (defaults to the source's collection_name); set automatically for file uploads via the collection_name form field. If a collection has already finished an ingest (finished_ingest=true metadata), invocation creation returns 409 Conflict.Source-type-specific invocation fields (S3 object_key, GitHub ref_identifier, etc.) are documented on each source type's page.
The Sync API authenticates with a Chroma Cloud API key sent in the x-chroma-token header.
For the full request and response schemas of every endpoint, see the Sync API Reference.