metadata-integration/java/as-a-library-v2.md
The DataHub Java SDK V2 provides a modern, type-safe interface for interacting with DataHub's metadata platform. Built on top of the existing DataHub infrastructure, SDK V2 offers an intuitive, fluent API for creating and managing metadata entities.
SDK V2 represents a significant evolution from the V1 emitter-based approach, offering:
Leverage Java's type system with fluent builders that guide you to create valid entities. No more manual URN construction or aspect wiring.
Dataset dataset = Dataset.builder()
.platform("snowflake")
.name("my_database.my_schema.my_table")
.env("PROD")
.description("User profile dataset")
.build();
Perform create, read, update, and delete operations with a clean, intuitive API:
client.entities().upsert(dataset); // Create or update
client.entities().update(dataset); // Update with patches
Dataset loaded = client.entities().get(urn); // Read from server
Make incremental metadata changes without fetching or replacing entire aspects:
dataset.addTag("pii")
.addOwner("urn:li:corpuser:john", OwnershipType.TECHNICAL_OWNER)
.addCustomProperty("team", "data-engineering");
client.entities().update(dataset); // Applies only the changes
Efficiently fetch entity aspects on-demand with built-in TTL-based caching, reducing unnecessary network calls.
Supports both interactive SDK mode and high-throughput ingestion mode to fit your use case.
Add the DataHub client library to your project:
dependencies {
implementation 'io.acryl:datahub-client:__version__'
}
<dependency>
<groupId>io.acryl</groupId>
<artifactId>datahub-client</artifactId>
<version>__version__</version>
</dependency>
Note: Check the Maven repository for the latest version.
Here's a complete example of creating a dataset with metadata using SDK V2:
import datahub.client.v2.DataHubClientV2;
import datahub.client.v2.entity.Dataset;
import com.linkedin.common.OwnershipType;
// Create the client
DataHubClientV2 client = DataHubClientV2.builder()
.server("http://localhost:8080")
.token("your-access-token") // Optional for authentication
.build();
// Build a dataset with metadata
Dataset dataset = Dataset.builder()
.platform("snowflake")
.name("analytics.public.user_events")
.env("PROD")
.description("User interaction events")
.displayName("User Events")
.build();
// Add tags and owners
dataset.addTag("pii")
.addTag("analytics")
.addOwner("urn:li:corpuser:datateam", OwnershipType.TECHNICAL_OWNER)
.addCustomProperty("retention", "90_days");
// Upsert to DataHub
client.entities().upsert(dataset);
System.out.println("Created dataset: " + dataset.getUrn());
// Close the client when done
client.close();
SDK V2 provides entity classes for major DataHub entities:
Each entity offers a fluent builder and methods for managing metadata.
The DataHubClientV2 provides centralized access to operations:
entities() - CRUD operations for entitiestestConnection() - Verify connectivity to DataHub serverInstead of replacing entire aspects, SDK V2 uses patch operations to make surgical updates to specific metadata fields. This is more efficient and reduces the risk of overwriting concurrent changes.
Explore detailed guides for working with SDK V2:
Find complete, runnable examples in the examples directory:
| Feature | V1 (RestEmitter) | V2 (DataHubClientV2) |
|---|---|---|
| Entity Creation | Manual MCP construction | Fluent entity builders |
| Type Safety | Low - manual aspect wiring | High - compile-time validation |
| URN Management | Manual string construction | Automatic from builder |
| Updates | Replace entire aspects | Patch-based incremental updates |
| API Style | Low-level emitter | High-level CRUD operations |
| Learning Curve | Steep - requires MCP knowledge | Gentle - intuitive builders |
See the detailed migration guide for help transitioning from V1 to V2.
For questions, issues, or contributions: