docs/docs/integrations/embedding-stores/yugabytedb.md
YugabyteDB is a distributed SQL database that provides PostgreSQL compatibility with horizontal scalability and high availability across multiple regions. YugabyteDB's native vector search capabilities with the pgvector extension make it an excellent choice for storing and querying vector embeddings in distributed environments.
:::note
Since YugabyteDB support is part of langchain4j-community, it will be available starting from version 1.15.1-beta25 or later.
:::
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-community-yugabytedb</artifactId>
<version>1.15.1-beta25</version>
</dependency>
The YugabyteDB integration provides three main classes:
YugabyteDBEmbeddingStoreThe main interface for storing and searching vector embeddings. This implements LangChain4j's EmbeddingStore interface and provides methods for:
YugabyteDBEngineManages the database connection and connection pooling using HikariCP. This class:
YugabyteDBSchemaDefines the database schema configuration including:
Here is how to create a YugabyteDBEmbeddingStore instance:
YugabyteDBEmbeddingStore store = YugabyteDBEmbeddingStore.builder()
.<builderParameters>
.build();
Where <builderParameters> must include dimension and engine, along with other optional ones.
| Parameter | Description | Default Value | Required/Optional |
|---|---|---|---|
host | Hostname of the YugabyteDB server | localhost | Required if using engine builder |
port | Port number of the YugabyteDB server | 5433 | Required if using engine builder |
database | Name of the database to connect to | yugabyte | Required if using engine builder |
username | Username for database authentication | yugabyte | Required if using engine builder |
password | Password for database authentication | "" (empty) | Required if using engine builder |
schema | Database schema name | public | Optional |
usePostgreSQLDriver | Use PostgreSQL JDBC driver instead of YugabyteDB Smart Driver | false | Optional |
useSsl | Enable SSL/TLS for database connection | false | Optional |
sslMode | SSL mode configuration | disable | Optional |
maxPoolSize | Maximum number of connections in the pool | 10 | Optional |
minPoolSize | Minimum number of idle connections in the pool | 5 | Optional |
connectionTimeout | Connection timeout in milliseconds | 10000 | Optional |
idleTimeout | Idle timeout in milliseconds | 300000 | Optional |
maxLifetime | Maximum lifetime of a connection in milliseconds | 900000 | Optional |
applicationName | Application name for connection identification | langchain4j-yugabytedb | Optional |
| Parameter | Description | Default Value | Required/Optional |
|---|---|---|---|
engine | The YugabyteDBEngine instance for database connections | None | Required |
dimension | The dimensionality of the embedding vectors. This should match the embedding model being used. Use embeddingModel.dimension() to dynamically set it. | None | Required |
tableName | The name of the database table used for storing embeddings | langchain4j_embeddings | Optional |
schemaName | Database schema name | public | Optional |
idColumn | Name of the ID column | id | Optional |
contentColumn | Name of the content/text column | content | Optional |
embeddingColumn | Name of the embedding vector column | embedding | Optional |
metadataColumn | Name of the metadata column | metadata | Optional |
metricType | Distance metric for similarity search: COSINE, EUCLIDEAN, or DOT_PRODUCT | COSINE | Optional |
vectorIndex | Vector index configuration (see Index Configuration below) | HNSWIndex with default settings | Optional |
createTableIfNotExists | Specifies whether to automatically create the embeddings table | true | Optional |
metadataStorageConfig | Configuration object for handling metadata associated with embeddings. Supports three storage modes: | ||
• COMBINED_JSONB: For dynamic metadata stored in JSONB format for optimized querying (recommended) | |||
• COMBINED_JSON: For dynamic metadata stored as JSON | |||
• COLUMN_PER_KEY: For static metadata when you know the metadata keys in advance | COMBINED_JSONB | Optional |
| Parameter | Description | Default Value | Required/Optional |
|---|---|---|---|
m | Maximum number of connections per layer. Higher values = better recall but more memory | 16 | Optional |
efConstruction | Size of dynamic candidate list during construction. Higher values = better index quality but slower build time | 64 | Optional |
metricType | Distance metric: COSINE, EUCLIDEAN, or DOT_PRODUCT | COSINE | Optional |
name | Custom index name | Auto-generated | Optional |
Use new NoIndex() for sequential scan without an index. Best for small datasets (< 10,000 vectors) or when exact results are required.
// Create engine first
YugabyteDBEngine engine = YugabyteDBEngine.builder()
.host("localhost")
.port(5433)
.database("yugabyte")
.username("yugabyte")
.password("")
.usePostgreSQLDriver(true) // Use PostgreSQL JDBC driver
.build();
// Minimal configuration
YugabyteDBEmbeddingStore store = YugabyteDBEmbeddingStore.builder()
.engine(engine)
.dimension(384)
.build();
// Custom configuration
YugabyteDBEmbeddingStore store = YugabyteDBEmbeddingStore.builder()
.engine(engine)
.dimension(768)
.tableName("my_embeddings")
.metricType(MetricType.EUCLIDEAN)
.build();
For more control over connection settings, use YugabyteDBEngine:
// Create engine with custom settings
YugabyteDBEngine engine = YugabyteDBEngine.builder()
.host("localhost")
.port(5433)
.database("yugabyte")
.username("yugabyte")
.password("")
.maxPoolSize(20)
.minPoolSize(5)
.connectionTimeout("30000")
.idleTimeout("300000")
.maxLifetime("900000")
.useSsl(false)
.usePostgreSQLDriver(false) // Use YugabyteDB Smart Driver
.build();
// Use engine in embedding store
YugabyteDBEmbeddingStore store = YugabyteDBEmbeddingStore.builder()
.engine(engine)
.dimension(384)
.tableName("embeddings")
.build();
YugabyteDB supports different vector index types for similarity search optimization:
// Create engine
YugabyteDBEngine engine = YugabyteDBEngine.builder()
.host("localhost")
.port(5433)
.database("yugabyte")
.username("yugabyte")
.password("")
.build();
// HNSW index with custom parameters
HNSWIndex hnswIndex = HNSWIndex.builder()
.m(16) // Maximum connections per layer
.efConstruction(64) // Construction quality
.metricType(MetricType.COSINE)
.name("my_hnsw_index")
.build();
YugabyteDBEmbeddingStore store = YugabyteDBEmbeddingStore.builder()
.engine(engine)
.dimension(384)
.vectorIndex(hnswIndex)
.build();
// Create engine
YugabyteDBEngine engine = YugabyteDBEngine.builder()
.host("localhost")
.port(5433)
.database("yugabyte")
.username("yugabyte")
.password("")
.build();
// No index for exact search (slower but exact)
YugabyteDBEmbeddingStore store = YugabyteDBEmbeddingStore.builder()
.engine(engine)
.dimension(384)
.vectorIndex(new NoIndex()) // Sequential scan
.build();
// Create engine first
YugabyteDBEngine engine = YugabyteDBEngine.builder()
.host("localhost")
.port(5433)
.database("yugabyte")
.username("yugabyte")
.password("")
.build();
// Create embedding store
YugabyteDBEmbeddingStore store = YugabyteDBEmbeddingStore.builder()
.engine(engine)
.dimension(384)
.build();
// Add embeddings
TextSegment segment1 = TextSegment.from("YugabyteDB is a distributed SQL database");
Embedding embedding1 = embeddingModel.embed(segment1).content();
String id1 = store.add(embedding1, segment1);
TextSegment segment2 = TextSegment.from("PostgreSQL compatibility with horizontal scalability");
Embedding embedding2 = embeddingModel.embed(segment2).content();
String id2 = store.add(embedding2, segment2);
// Search embeddings
Embedding queryEmbedding = embeddingModel.embed("What is YugabyteDB?").content();
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.maxResults(5)
.minScore(0.7)
.build();
List<EmbeddingMatch<TextSegment>> matches = store.search(request).matches();
matches.forEach(match -> {
System.out.println("Score: " + match.score());
System.out.println("Text: " + match.embedded().text());
});
YugabyteDB supports different metadata storage modes:
// Create engine
YugabyteDBEngine engine = YugabyteDBEngine.builder()
.host("localhost")
.port(5433)
.database("yugabyte")
.username("yugabyte")
.password("")
.build();
// JSONB storage (recommended for PostgreSQL compatibility)
MetadataStorageConfig jsonbConfig = MetadataStorageConfig.builder()
.storageMode(MetadataStorageMode.COMBINED_JSONB)
.build();
// JSON storage
MetadataStorageConfig jsonConfig = MetadataStorageConfig.builder()
.storageMode(MetadataStorageMode.COMBINED_JSON)
.build();
// Column-per-key storage
MetadataStorageConfig columnConfig = MetadataStorageConfig.builder()
.storageMode(MetadataStorageMode.COLUMN_PER_KEY)
.build();
YugabyteDBEmbeddingStore store = YugabyteDBEmbeddingStore.builder()
.engine(engine)
.dimension(384)
.metadataStorageConfig(jsonbConfig)
.build();
YugabyteDB supports both PostgreSQL JDBC driver and YugabyteDB Smart Driver:
// PostgreSQL JDBC Driver (standard SQL compatibility)
YugabyteDBEngine postgresEngine = YugabyteDBEngine.builder()
.host("localhost")
.port(5433)
.database("yugabyte")
.username("yugabyte")
.password("")
.usePostgreSQLDriver(true)
.build();
YugabyteDBEmbeddingStore postgresStore = YugabyteDBEmbeddingStore.builder()
.engine(postgresEngine)
.dimension(384)
.build();
// YugabyteDB Smart Driver (advanced distributed features)
YugabyteDBEngine smartEngine = YugabyteDBEngine.builder()
.host("localhost")
.port(5433)
.database("yugabyte")
.username("yugabyte")
.password("")
.usePostgreSQLDriver(false) // Default: use Smart Driver
.build();
YugabyteDBEmbeddingStore smartStore = YugabyteDBEmbeddingStore.builder()
.engine(smartEngine)
.dimension(384)
.build();
m (default: 16): Maximum connections per layerefConstruction (default: 64): Construction qualitypgvector extension to be enabled for vector operationsm, efConstruction) affect both performance and memory usagemaxPoolSize and minPoolSize based on your workload