docs/mintlify/cloud/schema/schema-basics.mdx
import { Callout } from '/snippets/callout.mdx';
A Schema has two main components that work together to control indexing behavior:
Defaults define index configuration for all keys of a given data type. When you add metadata to your collection, Chroma looks at the value type (string, int, float, etc.) and applies the default index configuration for that type.
For example, if you disable string inverted indexes globally, no string metadata fields will be indexed unless you create a key-specific override.
Keys define index configuration for specific metadata fields. These override the defaults for individual fields, giving you fine-grained control.
For example, you might disable string indexing globally but enable it specifically for a "category" field that you frequently filter on.
When determining whether to index a field, Chroma follows this precedence:
This means you can set broad defaults and then override them for specific fields as needed.
Without providing a Schema, collections use built-in defaults for indexing. For a complete overview of all value types, index types, and their defaults, see the Index Configuration Reference.
Chroma uses two reserved key names:
K.DOCUMENT (#document) stores document text content with FTS enabled and String Inverted Index disabled. This allows full-text search while avoiding redundant indexing.
K.EMBEDDING (#embedding) stores dense vector embeddings with Vector Index enabled, sourcing from K.DOCUMENT. This enables semantic similarity search.
collection.add( ids=["id1"], documents=["Some text"], # FTS index embeddings=[[1.0, 2.0]], # Vector index metadatas=[{ "category": "science", # String inverted index "year": 2024, # Int inverted index "score": 0.95, # Float inverted index "published": True # Bool inverted index }] )
```typescript TypeScript
// Without Schema - uses defaults from table above
const collection = await client.createCollection({ name: "my_collection" });
await collection.add({
ids: ["id1"],
documents: ["Some text"],
metadatas: [{
category: "science", // String inverted index
year: 2024, // Int inverted index
score: 0.95, // Float inverted index
published: true // Bool inverted index
}]
});
Create a Schema object to customize index configuration:
<CodeGroup> ```python Python from chromadb import Schemaschema = Schema()
```typescript TypeScript
import { Schema } from 'chromadb';
// Create an empty schema (starts with defaults)
const schema = new Schema();
// The schema is now ready to be configured
Use create_index() to enable or configure indexes. The method takes:
config: An index configuration object (or None to enable all indexes for a key)key: Optional - specify a metadata field name for key-specific configurationThe method returns the Schema object, enabling method chaining.
Create indexes that apply globally. This example shows configuring the vector index with custom settings:
<CodeGroup> ```python Python from chromadb import Schema, VectorIndexConfig from chromadb.utils.embedding_functions import OpenAIEmbeddingFunctionschema = Schema()
embedding_function = OpenAIEmbeddingFunction( api_key_env_var="OPENAI_API_KEY", model_name="text-embedding-3-small" )
schema.create_index(config=VectorIndexConfig( space="cosine", embedding_function=embedding_function ))
```typescript TypeScript
import { Schema, VectorIndexConfig } from 'chromadb';
import { OpenAIEmbeddingFunction } from '@chroma-core/openai';
const schema = new Schema();
// Configure vector index with custom embedding function
const embeddingFunction = new OpenAIEmbeddingFunction({
apiKeyEnvVar: "OPENAI_API_KEY",
modelName: "text-embedding-3-small"
});
schema.createIndex(new VectorIndexConfig({
space: "cosine",
embeddingFunction: embeddingFunction
}));
Configure indexes for specific metadata fields. This example shows configuring the sparse vector index with custom settings:
<CodeGroup> ```python Python from chromadb import Schema, SparseVectorIndexConfig, K from chromadb.utils.embedding_functions import ChromaCloudSpladeEmbeddingFunctionschema = Schema()
sparse_ef = ChromaCloudSpladeEmbeddingFunction() schema.create_index( config=SparseVectorIndexConfig( source_key=K.DOCUMENT, embedding_function=sparse_ef ), key="sparse_embedding" )
```typescript TypeScript
import { Schema, SparseVectorIndexConfig, K } from 'chromadb';
import { ChromaCloudSpladeEmbeddingFunction } from '@chroma-core/chroma-cloud-splade';
const schema = new Schema();
// Add sparse vector index for a specific key (required for hybrid search)
const sparseEf = new ChromaCloudSpladeEmbeddingFunction({
apiKeyEnvVar: "CHROMA_API_KEY"
});
schema.createIndex(
new SparseVectorIndexConfig({
sourceKey: K.DOCUMENT,
embeddingFunction: sparseEf
}),
"sparse_embedding"
);
Use delete_index() to disable indexes. Like create_index(), it takes:
config: An index configuration object (or None to disable all indexes for a key)key: Optional - specify a metadata field name for key-specific configurationReturns the Schema object for method chaining.
schema = Schema()
schema.delete_index(config=StringInvertedIndexConfig())
schema.delete_index(config=IntInvertedIndexConfig(), key="unimportant_count")
schema.delete_index(key="temporary_field")
```typescript TypeScript
import { Schema, StringInvertedIndexConfig, IntInvertedIndexConfig } from 'chromadb';
const schema = new Schema();
// Disable string inverted index globally
schema.deleteIndex(new StringInvertedIndexConfig());
// Disable int inverted index for a specific key
schema.deleteIndex(new IntInvertedIndexConfig(), "unimportant_count");
// Disable all indexes for a specific key
schema.deleteIndex(undefined, "temporary_field");
Both create_index() and delete_index() return the Schema object, enabling fluent method chaining:
schema = (Schema() .delete_index(config=StringInvertedIndexConfig()) # Disable globally .create_index(config=StringInvertedIndexConfig(), key="category") # Enable for category .create_index(config=StringInvertedIndexConfig(), key="tags") # Enable for tags .delete_index(config=IntInvertedIndexConfig())) # Disable int indexing
```typescript TypeScript
import { Schema, StringInvertedIndexConfig, IntInvertedIndexConfig } from 'chromadb';
const schema = new Schema()
.deleteIndex(new StringInvertedIndexConfig()) // Disable globally
.createIndex(new StringInvertedIndexConfig(), "category") // Enable for category
.createIndex(new StringInvertedIndexConfig(), "tags") // Enable for tags
.deleteIndex(new IntInvertedIndexConfig()); // Disable int indexing
Pass the configured schema to create_collection() or get_or_create_collection():
collection = client.get_or_create_collection( name="my_collection", schema=schema )
```typescript TypeScript
// Create collection with schema
const collection = await client.createCollection({
name: "my_collection",
schema: schema
});
// Or use getOrCreateCollection
const collection = await client.getOrCreateCollection({
name: "my_collection",
schema: schema
});
Schema configuration is automatically saved with the collection. When you retrieve a collection with get_collection() or get_or_create_collection(), the schema is loaded automatically. You don't need to provide the schema again.