metadata-integration/java/docs/sdk-v2/getting-started.md
This guide walks you through setting up and using the DataHub Java SDK V2 to interact with DataHub's metadata platform.
Add the DataHub client library to your project's build configuration.
Add to your build.gradle:
dependencies {
implementation 'io.acryl:datahub-client:__version__'
}
Add to your pom.xml:
<dependency>
<groupId>io.acryl</groupId>
<artifactId>datahub-client</artifactId>
<version>__version__</version>
</dependency>
Tip: Find the latest version on Maven Central.
The DataHubClientV2 is your entry point to all SDK operations. Create one by specifying your DataHub server URL:
import datahub.client.v2.DataHubClientV2;
DataHubClientV2 client = DataHubClientV2.builder()
.server("http://localhost:8080")
.build();
For DataHub Cloud or secured instances, provide a personal access token:
DataHubClientV2 client = DataHubClientV2.builder()
.server("https://your-instance.acryl.io")
.token("your-personal-access-token")
.build();
How to get a token: In DataHub UI, go to Settings → Access Tokens → Generate Personal Access Token
Verify your client can reach the DataHub server:
try {
boolean connected = client.testConnection();
if (connected) {
System.out.println("Successfully connected to DataHub!");
} else {
System.out.println("Failed to connect to DataHub");
}
} catch (Exception e) {
System.err.println("Connection error: " + e.getMessage());
}
Let's create a dataset with some metadata.
import datahub.client.v2.DataHubClientV2;
import datahub.client.v2.entity.Dataset;
import com.linkedin.common.OwnershipType;
Use the fluent builder to construct a dataset:
Dataset dataset = Dataset.builder()
.platform("snowflake")
.name("analytics.public.user_events")
.env("PROD")
.description("User interaction events from web and mobile")
.displayName("User Events")
.build();
Breaking down the builder:
platform - Data platform identifier (e.g., "snowflake", "bigquery", "postgres")name - Fully qualified dataset name (database.schema.table or similar)env - Environment (PROD, DEV, STAGING, etc.)description - Human-readable description of the datasetdisplayName - Friendly name shown in DataHub UIEnrich the dataset with tags, owners, and custom properties:
dataset.addTag("pii")
.addTag("analytics")
.addOwner("urn:li:corpuser:john_doe", OwnershipType.TECHNICAL_OWNER)
.addCustomProperty("retention_days", "90")
.addCustomProperty("team", "data-engineering");
Send the dataset to DataHub:
try {
client.entities().upsert(dataset);
System.out.println("Successfully created dataset: " + dataset.getUrn());
} catch (IOException | ExecutionException | InterruptedException e) {
System.err.println("Failed to create dataset: " + e.getMessage());
}
Here's a complete, runnable example:
import datahub.client.v2.DataHubClientV2;
import datahub.client.v2.entity.Dataset;
import com.linkedin.common.OwnershipType;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
public class DataHubQuickStart {
public static void main(String[] args) {
// Create client
DataHubClientV2 client = DataHubClientV2.builder()
.server("http://localhost:8080")
.token("your-token-here") // Optional
.build();
try {
// Test connection
if (!client.testConnection()) {
System.err.println("Cannot connect to DataHub");
return;
}
// Build dataset
Dataset dataset = Dataset.builder()
.platform("snowflake")
.name("analytics.public.user_events")
.env("PROD")
.description("User interaction events")
.displayName("User Events")
.build();
// Add metadata
dataset.addTag("pii")
.addTag("analytics")
.addOwner("urn:li:corpuser:datateam", OwnershipType.TECHNICAL_OWNER)
.addCustomProperty("retention_days", "90");
// Upsert to DataHub
client.entities().upsert(dataset);
System.out.println("Created dataset: " + dataset.getUrn());
} catch (IOException | ExecutionException | InterruptedException e) {
e.printStackTrace();
} finally {
try {
client.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
For more complete examples, see the Dataset Entity Guide.
Load an existing entity from DataHub:
import com.linkedin.common.urn.DatasetUrn;
DatasetUrn urn = new DatasetUrn(
"snowflake",
"analytics.public.user_events",
"PROD"
);
try {
Dataset loaded = client.entities().get(urn);
if (loaded != null) {
System.out.println("Dataset description: " + loaded.getDescription());
System.out.println("Is read-only: " + loaded.isReadOnly()); // true
}
} catch (IOException | ExecutionException | InterruptedException e) {
e.printStackTrace();
}
Important: Entities fetched from the server are read-only by default. Additional aspects are lazy-loaded on demand.
When you fetch an entity from DataHub, it's immutable to prevent accidental modifications:
Dataset dataset = client.entities().get(urn);
// Reading works fine
String description = dataset.getDescription();
List<String> tags = dataset.getTags();
// But mutation throws ReadOnlyEntityException
// dataset.addTag("pii"); // ERROR: Cannot mutate read-only entity!
Why? Immutability-by-default makes mutation intent explicit, prevents accidental changes when passing entities between functions, and enables safe entity sharing.
To modify a fetched entity, create a mutable copy first:
// 1. Load existing dataset (read-only)
Dataset dataset = client.entities().get(urn);
// 2. Get mutable copy
Dataset mutable = dataset.mutable();
// 3. Add new tags and owners (patch operations)
mutable.addTag("gdpr")
.addOwner("urn:li:corpuser:new_owner", OwnershipType.TECHNICAL_OWNER);
// 4. Apply patches to DataHub
client.entities().update(mutable);
The update() method sends only the changes (patches) to DataHub, not the full entity. This is more efficient and safer for concurrent updates.
Understanding when entities are mutable vs read-only:
Builder-created entities - Mutable from creation:
Dataset dataset = Dataset.builder()
.platform("snowflake")
.name("my_table")
.build();
dataset.isMutable(); // true - can mutate immediately
dataset.addTag("test"); // Works without .mutable()
Server-fetched entities - Read-only by default:
Dataset dataset = client.entities().get(urn);
dataset.isReadOnly(); // true
// dataset.addTag("test"); // ERROR!
Dataset mutable = dataset.mutable(); // Get writable copy
mutable.addTag("test"); // Now works
See the Patch Operations Guide for details.
SDK V2 provides two methods for persisting entities:
upsert(entity)client.entities().upsert(dataset);
update(entity)client.entities().update(dataset);
SDK V2 supports multiple entity types beyond datasets:
import datahub.client.v2.entity.Chart;
Chart chart = Chart.builder()
.tool("looker")
.id("my_sales_chart")
.title("Sales Performance by Region")
.description("Monthly sales broken down by geographic region")
.build();
client.entities().upsert(chart);
See the Chart Entity Guide for details.
Coming soon! Dashboard entity support is planned for a future release.
Customize the client for your environment:
DataHubClientV2 client = DataHubClientV2.builder()
.server("https://your-instance.acryl.io")
.token("your-access-token")
// Configure operation mode
.operationMode(DataHubClientConfigV2.OperationMode.SDK) // or INGESTION
// Customize underlying REST emitter
.restEmitterConfig(config -> config
.timeoutSec(30)
.maxRetries(5)
.retryIntervalSec(2)
)
.build();
SDK V2 supports two operation modes:
// SDK mode (default) - interactive use
DataHubClientV2 sdkClient = DataHubClientV2.builder()
.server("http://localhost:8080")
.operationMode(DataHubClientConfigV2.OperationMode.SDK)
.build();
// Ingestion mode - ETL pipelines
DataHubClientV2 ingestionClient = DataHubClientV2.builder()
.server("http://localhost:8080")
.operationMode(DataHubClientConfigV2.OperationMode.INGESTION)
.build();
See DataHubClientV2 Configuration for all available options.
Handle errors gracefully:
try {
client.entities().upsert(dataset);
} catch (IOException e) {
// Network or serialization errors
System.err.println("I/O error: " + e.getMessage());
} catch (ExecutionException e) {
// Server-side errors
System.err.println("Server error: " + e.getCause().getMessage());
} catch (InterruptedException e) {
// Operation cancelled
Thread.currentThread().interrupt();
}
Always close the client when done to release resources:
try (DataHubClientV2 client = DataHubClientV2.builder()
.server("http://localhost:8080")
.build()) {
// Use client here
client.entities().upsert(dataset);
} // Client automatically closed
Or close explicitly:
try {
// Use client
} finally {
client.close();
}
Now that you've created your first entity, explore more advanced topics:
Or check out complete examples in the entity guides: