docs/docs/integrations/embedding-stores/arcadedb.md
ArcadeDB is a multi-model NoSQL database that supports graph, document, key-value, time-series, and vector data. It provides a built-in LSM_VECTOR index (powered by JVector/HNSW) for high-performance approximate nearest neighbor (ANN) vector search.
The langchain4j integration supports two operation modes:
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-community-arcadedb</artifactId>
<version>${latest version here}</version>
</dependency>
Note: This is a community integration module. You may need to add the langchain4j-community repository to your project configuration.
ArcadeDBEmbeddingStoreRemote mode connects to a running ArcadeDB server. See Running ArcadeDB with Docker to start one locally.
EmbeddingStore<TextSegment> embeddingStore = ArcadeDBEmbeddingStore.builder()
.host("localhost")
.port(2480)
.databaseName("my_database")
.username("root")
.password("playwithdata")
.dimension(384) // Must match your embedding model's dimension
.build();
EmbeddingStore<TextSegment> embeddingStore = ArcadeDBEmbeddingStore.builder()
.host("localhost")
.port(2480)
.databaseName("my_database")
.username("root")
.password("playwithdata")
.dimension(384)
.createDatabase(true) // Create database if it doesn't exist
.build();
Embedded mode runs ArcadeDB inside the same JVM. No server is needed — just provide a path on the local filesystem where the database should be stored. The database is created automatically if it does not already exist.
Always call close() when you are finished to release resources.
ArcadeDBEmbeddingStore embeddingStore = ArcadeDBEmbeddingStore.embeddedBuilder()
.databasePath("/path/to/my-database")
.dimension(384) // Must match your embedding model's dimension
.build();
// ... use the store ...
embeddingStore.close();
Use try-finally (or try-with-resources via a wrapper) to ensure close() is always called:
ArcadeDBEmbeddingStore embeddingStore = ArcadeDBEmbeddingStore.embeddedBuilder()
.databasePath("/path/to/my-database")
.dimension(384)
.build();
try {
// ... use the store ...
} finally {
embeddingStore.close();
}
The search API is identical in both modes:
// Add a text segment with its embedding
TextSegment segment = TextSegment.from("Hello, world!", Metadata.from("source", "example"));
Embedding embedding = embeddingModel.embed(segment).content();
embeddingStore.add(embedding, segment);
// Search for similar embeddings
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.maxResults(5)
.minScore(0.7)
.build();
List<EmbeddingMatch<TextSegment>> matches = embeddingStore.search(request).matches();
EmbeddingStore<TextSegment> embeddingStore = ArcadeDBEmbeddingStore.builder()
.host("localhost") // Required: ArcadeDB server hostname
.port(2480) // Default: 2480 (HTTP port)
.databaseName("my_database") // Required: database name
.username("root") // Required: username
.password("playwithdata") // Required: password
.typeName("EmbeddingDocument") // Default: "EmbeddingDocument" — vertex type name
.dimension(384) // Required: embedding vector dimension
.similarityFunction("COSINE") // Default: "COSINE" — similarity metric
.maxConnections(16) // Default: 16 — HNSW graph connections per node
.beamWidth(100) // Default: 100 — HNSW search beam width
.createDatabase(false) // Default: false — auto-create the database
.metadataPrefix("meta_") // Default: "meta_" — prefix for metadata properties
.build();
ArcadeDBEmbeddingStore embeddingStore = ArcadeDBEmbeddingStore.embeddedBuilder()
.databasePath("/path/to/my-database") // Required: local filesystem path for the database
.typeName("EmbeddingDocument") // Default: "EmbeddingDocument" — vertex type name
.dimension(384) // Default: 384 — embedding vector dimension
.maxConnections(16) // Default: 16 — HNSW graph connections per node
.beamWidth(100) // Default: 100 — HNSW search beam width
.metadataPrefix("") // Default: "" (no prefix) — prefix for metadata properties
.build();
Shared parameters (both modes):
Remote-only parameters:
COSINE — Cosine similarity; best for normalized vectors (default)EUCLIDEAN — Euclidean distanceSQUARED_EUCLIDEAN — Squared Euclidean distance; faster than EUCLIDEANtrue to automatically create the database if it does not existEmbedded-only parameters:
com.arcadedb.database.Database instance directly instead of a pathArcadeDB supports filtering search results by metadata. Filters are applied after the vector index lookup.
// Filter by a single metadata value
Filter filter = new IsEqualTo("source", "wikipedia");
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.maxResults(5)
.filter(filter)
.build();
List<EmbeddingMatch<TextSegment>> matches = embeddingStore.search(request).matches();
Comparison operators:
IsEqualTo, IsNotEqualToIsGreaterThan, IsGreaterThanOrEqualToIsLessThan, IsLessThanOrEqualToIsIn, IsNotInLogical operators:
And, Or, Not// Remove by list of IDs
embeddingStore.removeAll(List.of("id1", "id2"));
// Remove by metadata filter
embeddingStore.removeAll(new IsEqualTo("source", "old-source"));
// Remove all embeddings
embeddingStore.removeAll();
Double.MIN_VALUE (4.9E-324) underflows to 0.0 and cannot be stored preciselyContainsString) are not supported; only the metadata filter types listed above are availablesimilarityFunction option; the index uses its default metricRequired for remote mode. The quickest way to get started:
docker run -d \
--name arcadedb \
-p 2480:2480 \
-e JAVA_OPTS="-Darcadedb.server.rootPassword=playwithdata" \
arcadedata/arcadedb:latest
Then connect your store:
EmbeddingStore<TextSegment> embeddingStore = ArcadeDBEmbeddingStore.builder()
.host("localhost")
.port(2480)
.databaseName("embeddings")
.username("root")
.password("playwithdata")
.dimension(384)
.createDatabase(true)
.build();