stores/couchbase/README.md
A Mastra vector store implementation for Couchbase, enabling powerful vector similarity search capabilities using the official Couchbase Node.js SDK (v4+). Leverages Couchbase Server's built-in Vector Search feature (available in version 7.6.4+).
@mastra/core.username, password) with permissions to (Docs):
kv role usually covers this).search_admin role on the relevant bucket/scope).npm install @mastra/couchbase
# or using pnpm
pnpm add @mastra/couchbase
# or using yarn
yarn add @mastra/couchbase
Let's set up @mastra/couchbase to store and search vectors in your Couchbase cluster.
Step 1: Connect to Your Cluster
Instantiate CouchbaseVector with your cluster details.
import { CouchbaseVector } from '@mastra/couchbase';
const connectionString = 'couchbases://your_cluster_host?ssl=no_verify'; // Use couchbases:// for Capella/TLS, couchbase:// for local/non-TLS
const username = 'your_couchbase_user';
const password = 'your_couchbase_password';
const bucketName = 'your_vector_bucket';
const scopeName = '_default'; // Or your custom scope name
const collectionName = 'vector_data'; // Or your custom collection name
const vectorStore = new CouchbaseVector({
connectionString,
username,
password,
bucketName,
scopeName,
collectionName,
});
console.log('CouchbaseVector instance created. Connecting...');
Note: The actual connection to Couchbase happens lazily upon the first operation.
Step 2: Create a Vector Search Index
Define and create a Search Index specifically for vector search on your collection.
const indexName = 'my_vector_search_index';
const vectorDimension = 1536; // Example: OpenAI embedding dimension
try {
await vectorStore.createIndex({
indexName: indexName,
dimension: vectorDimension,
metric: 'cosine', // Or 'euclidean', 'dotproduct'
});
console.log(`Search index '${indexName}' created or updated successfully.`);
} catch (error) {
console.error(`Failed to create index '${indexName}':`, error);
}
Note: Index creation in Couchbase is asynchronous. It might take a short while for the index to become fully built and queryable.
Best practice: Implement a delay or polling mechanism to ensure the index is ready using simple delay approach (await new Promise(resolve => setTimeout(resolve, 2000));) or implement a more robust solution that polls the index status
Step 3: Add Your Vectors (Upsert Documents)
Store your vectors and metadata as documents in the designated Couchbase collection.
const vectors = [
Array(vectorDimension).fill(0.1), // Replace with your actual vectors
Array(vectorDimension).fill(0.2),
];
const metadata = [
{ source: 'doc1.txt', page: 1, category: 'finance' },
{ source: 'doc2.pdf', page: 5, text: 'This is the text content.', category: 'tech' }, // Example with text
];
try {
// IDs will be auto-generated UUIDs if not provided
const ids = await vectorStore.upsert({
indexName: indexName, // Required for dimension validation if tracked
vectors: vectors,
metadata: metadata,
// ids: ['custom_id_1', 'custom_id_2'] // Optionally provide your own IDs
});
console.log('Upserted documents with IDs:', ids);
} catch (error) {
console.error('Failed to upsert vectors:', error);
}
Note: For large vector batches, Couchbase may need time to process and index all documents. Consider implementing appropriate waiting periods before querying newly inserted vectors like a simple delay (await new Promise(resolve => setTimeout(resolve, 1000));) for smaller batches
Document structure in Couchbase will resemble:
Document ID: <generated_or_provided_id>
{
"embedding": [0.1, ...],
"metadata": { "source": "doc1.txt", "page": 1, "category": "finance" }
}
Document ID: <generated_or_provided_id>
{
"embedding": [0.2, ...],
"metadata": { "source": "doc2.pdf", "page": 5, "text": "...", "category": "tech" },
"content": "This is the text content." // 'content' field added if metadata.text exists
}
Step 4: Find Similar Vectors (Query the Index)
Use the Search Index to find documents with vectors similar to your query vector.
const queryVector = Array(vectorDimension).fill(0.15); // Your query vector
const k = 5; // Number of nearest neighbors to retrieve
try {
const results = await vectorStore.query({
indexName: indexName,
queryVector: queryVector,
topK: k,
});
console.log(`Found ${results.length} similar results:`, results);
} catch (error) {
console.error('Failed to query vectors:', error);
}
Note: Metadata filter and includeVector not yet supported in query()
Results format:
[
{
id: string, // Document ID
score: number, // Similarity score (higher is better for cosine/dotproduct, lower for euclidean)
metadata: Record<string, any> // Fields stored in the index (typically includes 'metadata', 'content')
},
// ... more results
]
Step 5: Manage Indexes
List, inspect, or delete your vector search indexes.
try {
// List all Search Indexes in the cluster (may include non-vector indexes)
const indexes = await vectorStore.listIndexes();
console.log('Available search indexes:', indexes);
// Get details about our specific vector index
for (const indexName of indexes) {
const stats = await vectorStore.describeIndex(indexName);
console.log(`Stats for index '${indexName}':`, stats);
}
// Delete the index when no longer needed
await vectorStore.deleteIndex(indexName);
console.log(`Search index '${indexName}' deleted.`);
} catch (error) {
console.error('Failed to manage indexes:', error);
}
Note: Deleting Index does NOT delete the vectors in the associated Couchbase Collection
metric parameter in createIndex and describeIndex uses Mastra terms. These map to Couchbase index definitions as follows:
cosine → cosineeuclidean → l2_normdotproduct → dot_productcreateIndex method constructs a Couchbase Search Index definition tailored for vector search. It indexes the embedding field (as type vector) and the content field (as type text), targeting documents within the specified scopeName.collectionName. It enables store and docvalues for these fields. For fine-grained control over the index definition (e.g., different analyzers, type mappings), you would need to use the Couchbase SDK or UI directly.embedding field.metadata field.metadata.text exists, it's copied to the content field.query results currently return stored fields like metadata and content in the metadata property of the result object, but not the embedding field itself.CouchbaseVector Methods)constructor(cnn_string, username, password, bucketName, scopeName, collectionName): Creates a new instance and prepares the connection promise.getCollection(): (Primarily internal) Establishes connection lazily and gets the Couchbase Collection object.createIndex({ indexName, dimension, metric? }): Creates or updates a Couchbase Search Index configured for vector search on the collection.upsert({ indexName, vectors, metadata?, ids? }): Upserts documents containing vectors and metadata into the Couchbase collection. Returns the document IDs used.query({ indexName, queryVector, topK?, filter?, includeVector? }): Queries the specified Search Index for similar vectors using Couchbase Vector Search. Note: filter and includeVector options are not currently supported.updateVector({ indexName, id, update }): Updates a specific vector entry by its ID with new vector data and/or metadata. Note: Filter-based updates are not yet implemented.deleteVector({ indexName, id }): Deletes a single vector by its ID.deleteVectors({ indexName, ids }): Deletes multiple vectors by their IDs. Note: Filter-based deletion is not yet implemented.listIndexes(): Lists the names of all Search Indexes in the cluster. Returns fully qualified names (e.g., bucket.scope.index).describeIndex({ indexName }): Gets the configured dimension, metric (Mastra name), and document count for a specific Search Index (using its short name).deleteIndex({ indexName }): Deletes a Search Index (using its short name).disconnect(): Closes the Couchbase client connection. Should be called when done using the store.cnn_string: Couchbase connection string (e.g., couchbases://host?ssl=no_verify, couchbase://localhost). See Couchbase SDK Docs for all options.username: Couchbase user with necessary permissions (see Prerequisites).password: Password for the Couchbase user.bucketName: Name of the target Couchbase Bucket.scopeName: Name of the target Scope within the Bucket.collectionName: Name of the target Collection within the Scope.wanDevelopment configuration profile when connecting via the Couchbase SDK. This profile adjusts certain timeouts suitable for development and some cloud environments. For production tuning, consider modifying the library or managing the SDK connection externally.createIndex method defines and creates/updates a Couchbase Search index configured for vector search. Index creation in Couchbase is asynchronous; allow a short time after creation before querying, especially on larger datasets."embedding"."metadata".metadata contains a text property, its value is also copied to a top-level "content" field in the document, which is indexed by the Search index created by this library.upsert operation adds/modifies documents directly in the Collection. It does not depend on the Search index existing at the time of upsert. You can insert data before or after creating the index. Couchbase allows multiple Search indexes over the same Collection data.createIndex call within the same CouchbaseVector instance. If tracked, it performs a basic length check during upsert.upsert. Errors related to dimension mismatches will typically occur only during the query operation against that specific index.filter parameter in the query method is not yet supported by this library. Filtering must be done client-side after retrieving results or by using the Couchbase SDK's Search capabilities directly for more complex queries.includeVector: true option in the query method is not yet supported. To retrieve the vector embedding, you must fetch the full document using its ID (returned in the query results) via the Couchbase SDK's Key-Value operations (collection.get(id)).describeIndex method currently returns -1 for the count of indexed documents. Use Couchbase tools (UI, CLI, SQL++ query on the collection, Search API) for accurate index statistics.We truly appreciate your interest in this project! This project is community-maintained, which means it's not officially supported by our support team.
If you need help, have found a bug, or want to contribute improvements, the best place to do that is right here — by opening a GitHub issue (Update this link to your project's issue tracker!). Our support portal is unable to assist with requests related to this project, so we kindly ask that all inquiries stay within GitHub.
Your collaboration helps us all move forward together — thank you!