docs/mintlify/cloud/schema/sparse-vector-search.mdx
import { Callout } from '/snippets/callout.mdx';
Sparse vectors are high-dimensional vectors with mostly zero values, designed for keyword-based retrieval. Unlike dense embeddings which capture semantic meaning, sparse vectors excel at:
Sparse vectors use models like SPLADE that assign importance weights to specific tokens, making them complementary to dense semantic embeddings.
To use sparse vectors, add a sparse vector index to your schema. The key parameter is the metadata field name where sparse embeddings will be stored - you can name it whatever you want:
schema = Schema()
sparse_ef = ChromaCloudSpladeEmbeddingFunction() schema.create_index( config=SparseVectorIndexConfig( source_key=K.DOCUMENT, embedding_function=sparse_ef ), key="sparse_embedding" )
```typescript TypeScript
import { Schema, SparseVectorIndexConfig, K } from 'chromadb';
import { ChromaCloudSpladeEmbeddingFunction } from '@chroma-core/chroma-cloud-splade';
const schema = new Schema();
// Add sparse vector index for keyword-based search
// "sparse_embedding" is just a metadata key name - use any name you prefer
const sparseEf = new ChromaCloudSpladeEmbeddingFunction({
apiKeyEnvVar: "CHROMA_API_KEY"
});
schema.createIndex(
new SparseVectorIndexConfig({
sourceKey: K.DOCUMENT,
embeddingFunction: sparseEf
}),
"sparse_embedding"
);
client = chromadb.CloudClient( tenant="your-tenant", database="your-database", api_key="your-api-key" )
collection = client.create_collection( name="hybrid_search_collection", schema=schema )
```typescript TypeScript
import { CloudClient } from 'chromadb';
const client = new CloudClient({
tenant: "your-tenant",
database: "your-database",
apiKey: "your-api-key"
});
const collection = await client.createCollection({
name: "hybrid_search_collection",
schema: schema
});
When you add documents, sparse embeddings are automatically generated from the source key:
<CodeGroup> ```python Python collection.add( ids=["doc1", "doc2", "doc3"], documents=[ "The quick brown fox jumps over the lazy dog", "A fast auburn fox leaps over a sleepy canine", "Machine learning is a subset of artificial intelligence" ], metadatas=[ {"category": "animals"}, {"category": "animals"}, {"category": "technology"} ] )
```typescript TypeScript
await collection.add({
ids: ["doc1", "doc2", "doc3"],
documents: [
"The quick brown fox jumps over the lazy dog",
"A fast auburn fox leaps over a sleepy canine",
"Machine learning is a subset of artificial intelligence"
],
metadatas: [
{ category: "animals" },
{ category: "animals" },
{ category: "technology" }
]
});
// Sparse embeddings for "sparse_embedding" are generated automatically
// from the documents (source_key=K.DOCUMENT)
Once configured, you can search using sparse vectors alone or combine them with dense embeddings for hybrid search.
Use sparse vectors for keyword-based retrieval:
<CodeGroup> ```python Python from chromadb import Search, K, Knnsparse_rank = Knn(query="fox animal", key="sparse_embedding")
search = (Search() .rank(sparse_rank) .limit(10) .select(K.DOCUMENT, K.SCORE))
results = collection.search(search)
for row in results.rows()[0]: print(f"Score: {row['score']:.3f} - {row['document']}")
```typescript TypeScript
import { Search, K, Knn } from 'chromadb';
// Search using sparse embeddings only
const sparseRank = Knn({ query: "fox animal", key: "sparse_embedding" });
// Build and execute search
const search = new Search()
.rank(sparseRank)
.limit(10)
.select(K.DOCUMENT, K.SCORE);
const results = await collection.search(search);
// Process results
for (const row of results.rows()[0]) {
console.log(`Score: ${row.score.toFixed(3)} - ${row.document}`);
}
Hybrid search combines dense semantic embeddings with sparse keyword embeddings for improved retrieval quality. By merging results from both approaches using Reciprocal Rank Fusion (RRF), you often achieve better results than either approach alone.
Use RRF (Reciprocal Rank Fusion) to merge dense and sparse search results:
<CodeGroup> ```python Python from chromadb import Search, K, Knn, Rrfhybrid_rank = Rrf( ranks=[ Knn(query="fox animal", return_rank=True), # Dense semantic search Knn(query="fox animal", key="sparse_embedding", return_rank=True) # Sparse keyword search ], weights=[0.7, 0.3], # 70% semantic, 30% keyword k=60 )
search = (Search() .rank(hybrid_rank) .limit(10) .select(K.DOCUMENT, K.SCORE))
results = collection.search(search)
for row in results.rows()[0]: print(f"Score: {row['score']:.3f} - {row['document']}")
```typescript TypeScript
import { Search, K, Knn, Rrf } from 'chromadb';
// Create RRF ranking combining dense and sparse embeddings
const hybridRank = Rrf({
ranks: [
Knn({ query: "fox animal", returnRank: true }), // Dense semantic search
Knn({ query: "fox animal", key: "sparse_embedding", returnRank: true }) // Sparse keyword search
],
weights: [0.7, 0.3], // 70% semantic, 30% keyword
k: 60
});
// Build and execute search
const search = new Search()
.rank(hybridRank)
.limit(10)
.select(K.DOCUMENT, K.SCORE);
const results = await collection.search(search);
// Process results
for (const row of results.rows()[0]) {
console.log(`Score: ${row.score.toFixed(3)} - ${row.document}`);
}