Back to Datahub

Patch Operations Guide

metadata-integration/java/docs/sdk-v2/patch-operations.md

1.5.0.310.6 KB
Original Source

Patch Operations Guide

SDK V2 uses patch-based updates for efficient, surgical modifications to metadata. This guide explains how patches work and when to use them.

What Are Patches?

Patches are incremental updates that modify specific fields without replacing entire aspects. Instead of sending the full datasetProperties aspect, a patch sends only the changes.

Patch vs Full Update

Full Update (V1 Style):

java
// Fetch entire aspect
DatasetProperties props = getDatasetProperties(urn);

// Modify one field
props.setDescription("New description");

// Send entire aspect back (overwrites everything)
sendAspect(urn, props);

Patch Update (V2 Style):

java
// Send only the change
dataset.setDescription("New description");
client.entities().update(dataset);
// Sends JSON Patch: { "op": "add", "path": "/description", "value": "New description" }

Benefits of Patches

  1. Efficiency - Only changed fields sent over network
  2. Concurrency Safety - Less risk of overwriting concurrent changes
  3. Atomicity - Multiple patches applied together or not at all
  4. Bandwidth - Reduced payload size

How Patches Work in SDK V2

Patch Accumulation Pattern

Entities accumulate patches in a pending list until save:

java
Dataset dataset = Dataset.builder()
    .platform("snowflake")
    .name("my_table")
    .build();

// Each method creates a patch MCP and adds to pendingPatches list
dataset.addTag("pii");              // Patch 1
dataset.addTag("sensitive");        // Patch 2
dataset.addOwner("user", OwnershipType.TECHNICAL_OWNER);  // Patch 3

// Check pending patches
System.out.println("Pending patches: " + dataset.getPendingPatches().size());
// Output: Pending patches: 3

// Emit all patches atomically
client.entities().update(dataset);

// Patches cleared after emission
System.out.println("Pending patches: " + dataset.getPendingPatches().size());
// Output: Pending patches: 0

Under the Hood

java
// From Dataset.java
public Dataset addTag(@Nonnull String tagUrn) {
    // Create patch using existing patch builder
    GlobalTagsPatchBuilder patch = new GlobalTagsPatchBuilder()
        .urn(getUrn())
        .addTag(tag, null);

    // Add to pending patches list
    addPatchMcp(patch.build());

    return this;
}

When update() is called:

java
// From EntityClient.java
public void upsert(Entity entity) {
    if (entity.hasPendingPatches()) {
        // Emit patches
        for (MetadataChangeProposal patchMcp : entity.getPendingPatches()) {
            emitter.emit(patchMcp, null);
        }
        entity.clearPendingPatches();
    } else {
        // No patches, emit full aspects
        for (MetadataChangeProposalWrapper mcp : entity.toMCPs()) {
            emitter.emit(mcp);
        }
    }
}

Reusing Existing Patch Builders

SDK V2 reuses existing patch builders from datahub.client.patch package:

Available Patch Builders

BuilderPurposeExample
OwnershipPatchBuilderAdd/remove ownersaddOwner(), removeOwner()
GlobalTagsPatchBuilderAdd/remove tagsaddTag(), removeTag()
GlossaryTermsPatchBuilderAdd/remove termsaddTerm(), removeTerm()
DomainsPatchBuilderSet/remove domainsetDomain(), removeDomain()
DatasetPropertiesPatchBuilderUpdate propertiessetDescription(), addCustomProperty()
EditableDatasetPropertiesPatchBuilderUpdate editable propertiessetEditableDescription()

Why Reuse?

  • Battle-tested - Used by Python SDK V2 in production
  • Correctness - Complex JSON Patch logic already validated
  • Consistency - Same semantics across language SDKs
  • Maintainability - Single implementation to maintain

When to Use Patches

Use Patches For:

Incremental changes to existing entities

java
Dataset dataset = client.entities().get(urn);
dataset.addTag("new-tag");
client.entities().update(dataset);  // Patch

Adding metadata to entities

java
dataset.addOwner("urn:li:corpuser:new_owner", OwnershipType.TECHNICAL_OWNER);
dataset.addCustomProperty("updated_at", String.valueOf(System.currentTimeMillis()));
client.entities().update(dataset);  // Multiple patches

Surgical updates without full entity knowledge

java
// Don't need to fetch entire entity
dataset.addTag("gdpr");
client.entities().update(dataset);  // Just adds tag

Use Full Upsert For:

Creating new entities

java
Dataset dataset = Dataset.builder()
    .platform("snowflake")
    .name("my_table")
    .description("New dataset")
    .build();

client.entities().upsert(dataset);  // Full upsert

Replacing entire aspects

java
// Set complete schema
SchemaMetadata schema = buildCompleteSchema();
dataset.setSchema(schema);
client.entities().upsert(dataset);  // Sends full schema aspect

Builder-provided metadata

java
Dataset dataset = Dataset.builder()
    .platform("postgres")
    .name("my_table")
    .description("Description from builder")
    .build();

// Builder populates aspectCache with full aspects
client.entities().upsert(dataset);  // Sends cached aspects

Patch Operations by Entity

Dataset Patches

Ownership:

java
dataset.addOwner("urn:li:corpuser:john", OwnershipType.TECHNICAL_OWNER);
dataset.removeOwner("urn:li:corpuser:jane");

Tags:

java
dataset.addTag("pii");
dataset.removeTag("deprecated");

Glossary Terms:

java
dataset.addTerm("urn:li:glossaryTerm:CustomerData");
dataset.removeTerm("urn:li:glossaryTerm:OldTerm");

Domain:

java
dataset.setDomain("urn:li:domain:Marketing");
dataset.removeDomain();

Properties:

java
dataset.addCustomProperty("team", "data-eng");
dataset.removeCustomProperty("old_property");
dataset.setDescription("New description");

Chart Patches

Chart supports the same patch operations as Dataset:

java
chart.addOwner("urn:li:corpuser:analyst", OwnershipType.TECHNICAL_OWNER);
chart.addTag("visualization");
chart.addTerm("urn:li:glossaryTerm:SalesMetrics");
chart.setDomain("urn:li:domain:BusinessIntelligence");

See Chart Entity Guide for complete details.

Advanced: Manual Patch Construction

For advanced use cases, construct patches directly:

java
import com.linkedin.metadata.aspect.patch.builder.OwnershipPatchBuilder;
import com.linkedin.common.urn.Urn;

// Manual patch construction
OwnershipPatchBuilder patchBuilder = new OwnershipPatchBuilder()
    .urn(dataset.getUrn())
    .addOwner(
        Urn.createFromString("urn:li:corpuser:alice"),
        OwnershipType.DATA_STEWARD
    );

MetadataChangeProposal patch = patchBuilder.build();

// Add to entity's pending patches
dataset.addPatchMcp(patch);

// Or emit directly
emitter.emit(patch, null);

Patch vs Upsert Decision Tree

New entity from builder?
├─ Yes → Use upsert() (sends cached aspects)
└─ No → Loaded from server or reference?
    ├─ Yes → Making incremental changes?
    │   ├─ Yes → Use update() (sends patches)
    │   └─ No → Replacing entire aspect?
    │       └─ Yes → Use upsert() (sends full aspect)
    └─ No → Just adding tags/owners/etc?
        └─ Yes → Use update() (sends patches)

Pending Patches Management

Check for Pending Patches

java
if (dataset.hasPendingPatches()) {
    System.out.println("Entity has pending patches");
}

Get Pending Patches

java
List<MetadataChangeProposal> patches = dataset.getPendingPatches();
for (MetadataChangeProposal patch : patches) {
    System.out.println("Patch for aspect: " + patch.getAspectName());
}

Clear Pending Patches

java
// Manually clear without emitting
dataset.clearPendingPatches();

Batch Multiple Changes

java
// Accumulate many patches
dataset.addTag("tag1")
       .addTag("tag2")
       .addTag("tag3")
       .addOwner("user1", OwnershipType.TECHNICAL_OWNER)
       .addOwner("user2", OwnershipType.DATA_STEWARD)
       .addCustomProperty("key1", "value1")
       .addCustomProperty("key2", "value2");

// All 7 patches emitted in single update() call
client.entities().update(dataset);

Performance Considerations

Network Efficiency

java
// Inefficient: 3 separate network calls
dataset.addTag("tag1");
client.entities().update(dataset);
dataset.addTag("tag2");
client.entities().update(dataset);
dataset.addTag("tag3");
client.entities().update(dataset);

// Efficient: 1 network call with 3 patches
dataset.addTag("tag1")
       .addTag("tag2")
       .addTag("tag3");
client.entities().update(dataset);

Payload Size

Full upsert (datasetProperties):

  • ~2-5 KB for typical dataset aspect

Patch (add tag):

  • ~200-300 bytes for single tag patch

10 tags: Patches = ~3 KB, Full upsert = ~5 KB

JSON Patch Format

Patches use JSON Patch (RFC 6902) format:

Add operation:

json
{
  "op": "add",
  "path": "/tags/urn:li:tag:pii",
  "value": {
    "tag": "urn:li:tag:pii"
  }
}

Remove operation:

json
{
  "op": "remove",
  "path": "/tags/urn:li:tag:deprecated"
}

SDK V2 abstracts this complexity - you work with Java methods, not JSON.

Troubleshooting

Patches Not Applied

Issue: Changes not visible in DataHub

Solutions:

  • Verify update() was called (patches don't emit automatically)
  • Check for errors in emission response
  • Ensure entity is bound to client

Concurrent Updates

Issue: Patches conflict with concurrent changes

Solutions:

  • Patches are generally safe for concurrent updates
  • Each patch is atomic
  • For complex scenarios, load entity first to get latest state

Patch Cleared Unexpectedly

Issue: Pending patches disappear

Reason: upsert() or update() clears patches after emission

Solution: This is expected behavior - patches are one-time use

Next Steps

API Reference

Key classes:

  • Entity.java - Patch accumulation
  • EntityClient.java - Patch emission
  • datahub.client.patch.* - Patch builders