Java SDK V2

The DataHub Java SDK V2 provides a modern, type-safe interface for interacting with DataHub's metadata platform. Built on top of the existing DataHub infrastructure, SDK V2 offers an intuitive, fluent API for creating and managing metadata entities.

Why SDK V2?

SDK V2 represents a significant evolution from the V1 emitter-based approach, offering:

Type-Safe Entity Builders

Leverage Java's type system with fluent builders that guide you to create valid entities. No more manual URN construction or aspect wiring.

java

Dataset dataset = Dataset.builder()
    .platform("snowflake")
    .name("my_database.my_schema.my_table")
    .env("PROD")
    .description("User profile dataset")
    .build();

Simplified CRUD Operations

Perform create, read, update, and delete operations with a clean, intuitive API:

java

client.entities().upsert(dataset);          // Create or update
client.entities().update(dataset);          // Update with patches
Dataset loaded = client.entities().get(urn); // Read from server

Efficient Patch-Based Updates

Make incremental metadata changes without fetching or replacing entire aspects:

java

dataset.addTag("pii")
       .addOwner("urn:li:corpuser:john", OwnershipType.TECHNICAL_OWNER)
       .addCustomProperty("team", "data-engineering");

client.entities().update(dataset);  // Applies only the changes

Lazy Loading & Caching

Efficiently fetch entity aspects on-demand with built-in TTL-based caching, reducing unnecessary network calls.

Mode-Aware Design

Supports both interactive SDK mode and high-throughput ingestion mode to fit your use case.

Installation

Add the DataHub client library to your project:

Gradle

gradle

dependencies {
    implementation 'io.acryl:datahub-client:__version__'
}

Maven

xml

<dependency>
    <groupId>io.acryl</groupId>
    <artifactId>datahub-client</artifactId>
    <version>__version__</version>
</dependency>

Note: Check the Maven repository for the latest version.

Quick Start

Here's a complete example of creating a dataset with metadata using SDK V2:

java

import datahub.client.v2.DataHubClientV2;
import datahub.client.v2.entity.Dataset;
import com.linkedin.common.OwnershipType;

// Create the client
DataHubClientV2 client = DataHubClientV2.builder()
    .server("http://localhost:8080")
    .token("your-access-token")  // Optional for authentication
    .build();

// Build a dataset with metadata
Dataset dataset = Dataset.builder()
    .platform("snowflake")
    .name("analytics.public.user_events")
    .env("PROD")
    .description("User interaction events")
    .displayName("User Events")
    .build();

// Add tags and owners
dataset.addTag("pii")
       .addTag("analytics")
       .addOwner("urn:li:corpuser:datateam", OwnershipType.TECHNICAL_OWNER)
       .addCustomProperty("retention", "90_days");

// Upsert to DataHub
client.entities().upsert(dataset);

System.out.println("Created dataset: " + dataset.getUrn());

// Close the client when done
client.close();

Core Concepts

Entities

SDK V2 provides entity classes for major DataHub entities:

Dataset - Tables, views, and other data containers
Chart - Visualizations and reports
Dashboard - Collections of charts (coming soon)

Each entity offers a fluent builder and methods for managing metadata.

Client Operations

The DataHubClientV2 provides centralized access to operations:

entities() - CRUD operations for entities
testConnection() - Verify connectivity to DataHub server

Patch-Based Updates

Instead of replacing entire aspects, SDK V2 uses patch operations to make surgical updates to specific metadata fields. This is more efficient and reduces the risk of overwriting concurrent changes.

Documentation

Explore detailed guides for working with SDK V2:

Getting Started Guide - Comprehensive tutorial with authentication, configuration, and basic operations
Design Principles - Architectural overview and engineering principles behind SDK V2
DataHubClientV2 - Client configuration, connection management, and operation modes
Entities Overview - Common patterns across all entity types
Dataset Entity - Complete guide to working with datasets
Chart Entity - Creating and managing chart entities
Patch Operations - Deep dive into efficient incremental updates
Migration from V1 - Step-by-step guide for upgrading from SDK V1

Example Code

Find complete, runnable examples in the examples directory:

DatasetCreateExample.java - Basic dataset creation
DatasetPatchExample.java - Adding tags, owners, and custom properties
DatasetFullExample.java - Comprehensive metadata management
ChartCreateExample.java - Chart entity creation

Comparison with V1

Feature	V1 (RestEmitter)	V2 (DataHubClientV2)
Entity Creation	Manual MCP construction	Fluent entity builders
Type Safety	Low - manual aspect wiring	High - compile-time validation
URN Management	Manual string construction	Automatic from builder
Updates	Replace entire aspects	Patch-based incremental updates
API Style	Low-level emitter	High-level CRUD operations
Learning Curve	Steep - requires MCP knowledge	Gentle - intuitive builders

See the detailed migration guide for help transitioning from V1 to V2.

Support

For questions, issues, or contributions:

DataHub GitHub
DataHub Slack
API Tutorials - Language-agnostic guides with Java examples

What's Next?

Getting Started: Follow the comprehensive tutorial to build your first application
Learn the Architecture: Read about SDK V2's design principles
Explore Entities: Dive into Dataset or Chart guides
Migrate from V1: Use the migration guide to upgrade existing code