ai/VECTORDB_CONFIGURATION.md
This document describes the configuration format for vector databases in Conductor.
Conductor supports multiple vector database providers with the ability to configure multiple named instances of each type. This allows you to:
Vector databases are configured using a list-based approach under conductor.vectordb.instances:
conductor:
vectordb:
instances:
- name: "instance-name" # Unique identifier for this instance
type: "database-type" # Type: postgres, mongodb, or pinecone
<type-specific-config>: # Configuration block for the database type
# ... type-specific properties
conductor:
vectordb:
instances:
- name: "postgres-main"
type: "postgres"
postgres:
datasourceURL: "jdbc:postgresql://localhost:5432/vectors"
user: "conductor"
password: "secret"
dimensions: 1536
connectionPoolSize: 10
indexingMethod: "hnsw" # Options: hnsw, ivfflat
distanceMetric: "cosine" # Options: l2, cosine, inner_product
tablePrefix: "conductor"
conductor:
vectordb:
instances:
- name: "postgres-prod"
type: "postgres"
postgres:
datasourceURL: "jdbc:postgresql://prod-db:5432/vectors"
user: "conductor"
password: "prod-secret"
dimensions: 1536
- name: "postgres-dev"
type: "postgres"
postgres:
datasourceURL: "jdbc:postgresql://dev-db:5432/vectors"
user: "conductor"
password: "dev-secret"
dimensions: 768
conductor:
vectordb:
instances:
- name: "mongodb-embeddings"
type: "mongodb"
mongodb:
connectionString: "mongodb+srv://user:[email protected]/"
database: "conductor"
collection: "embeddings"
numCandidates: 100
conductor:
vectordb:
instances:
- name: "pinecone-search"
type: "pinecone"
pinecone:
apiKey: "your-pinecone-api-key"
conductor:
vectordb:
instances:
- name: "postgres-prod"
type: "postgres"
postgres:
datasourceURL: "jdbc:postgresql://prod:5432/vectors"
user: "conductor"
password: "secret"
dimensions: 1536
- name: "pinecone-embeddings"
type: "pinecone"
pinecone:
apiKey: "pk-xxx"
- name: "mongodb-cache"
type: "mongodb"
mongodb:
connectionString: "mongodb://localhost:27017"
database: "conductor"
When using vector database tasks in your workflows, reference the instance by its configured name:
{
"name": "store_embeddings",
"taskReferenceName": "store_embeddings_ref",
"type": "LLM_STORE_EMBEDDINGS",
"inputParameters": {
"vectorDB": "postgres-prod",
"index": "documents",
"namespace": "my_namespace",
"embeddings": "${embedding_task.output.embeddings}",
"metadata": {
"documentId": "${workflow.input.docId}"
}
}
}
| Property | Type | Default | Description |
|---|---|---|---|
datasourceURL | String | Required | JDBC connection URL |
user | String | Required | Database username |
password | String | Required | Database password |
dimensions | Integer | 256 | Vector dimensions |
connectionPoolSize | Integer | 5 | Connection pool size |
indexingMethod | String | "hnsw" | Index method (hnsw or ivfflat) |
distanceMetric | String | "l2" | Distance metric (l2, cosine, inner_product) |
invertedListCount | Integer | 100 | IVFFlat index parameter |
tablePrefix | String | null | Prefix for table names |
| Property | Type | Default | Description |
|---|---|---|---|
connectionString | String | Required | MongoDB connection string |
database | String | Required | Database name |
collection | String | Optional | Collection name |
numCandidates | Integer | Optional | Vector search parameter |
| Property | Type | Default | Description |
|---|---|---|---|
apiKey | String | Required | Pinecone API key |
conductor:
vectordb:
postgres:
datasourceURL: "jdbc:postgresql://localhost:5432/vectors"
user: "conductor"
password: "secret"
conductor:
vectordb:
instances:
- name: "pgvectordb" # Use old type name for backward compatibility
type: "postgres"
postgres:
datasourceURL: "jdbc:postgresql://localhost:5432/vectors"
user: "conductor"
password: "secret"
Note: The type identifiers have been simplified:
pgvectordb → postgresmongovectordb → mongodbpineconedb → pineconeHowever, for backward compatibility, you can still reference instances using the old type names if you name your instance accordingly.
Use descriptive names: Choose instance names that clearly indicate their purpose (e.g., postgres-prod, pinecone-embeddings-search)
Separate environments: Use different instances for different environments to avoid accidental data mixing
Optimize dimensions: Configure dimensions to match your embedding model to avoid runtime errors
Connection pooling: Adjust connectionPoolSize based on your workload and database capacity
Index selection:
hnsw for better query performance (default)ivfflat for faster indexing with slightly lower query performanceDistance metrics:
cosine for normalized embeddingsl2 (Euclidean) for absolute distancesinner_product for dot product similarityIf you see an error like "Vector DB instance not found: xyz", check:
CREATE EXTENSION vector;