metadata-models/docs/entities/dataPlatformInstance.md
A Data Platform Instance represents a specific deployment or instance of a data platform. While a dataPlatform represents a technology type (e.g., MySQL, Snowflake, BigQuery), a dataPlatformInstance represents a particular running instance of that platform (e.g., "production-mysql-cluster", "dev-snowflake-account", "analytics-bigquery-project").
This entity is crucial for organizations that run multiple instances of the same platform technology across different environments, regions, or organizational units. It enables DataHub to distinguish between assets from different platform instances and provides a way to organize and manage platform-level metadata and credentials.
Data Platform Instances are identified by two components:
urn:li:dataPlatform:snowflake)The complete URN follows the pattern:
urn:li:dataPlatformInstance:(urn:li:dataPlatform:<platform>,<instance_id>)
urn:li:dataPlatformInstance:(urn:li:dataPlatform:mysql,production-mysql-01)
urn:li:dataPlatformInstance:(urn:li:dataPlatform:snowflake,acme-prod-account)
urn:li:dataPlatformInstance:(urn:li:dataPlatform:bigquery,analytics-project)
urn:li:dataPlatformInstance:(urn:li:dataPlatform:iceberg,data-lake-warehouse)
The dataPlatformInstanceProperties aspect contains descriptive metadata about the platform instance:
This aspect helps users understand what each platform instance represents and how it should be used.
<details> <summary>Python SDK: Create a platform instance with properties</summary>{{ inline /metadata-ingestion/examples/library/platform_instance_create.py show_path_as_comment }}
DataHub can serve as an Iceberg catalog, managing Iceberg tables through platform instances. The icebergWarehouseInfo aspect stores the configuration needed to manage an Iceberg warehouse:
This enables DataHub to manage Iceberg tables as a REST catalog, handling metadata operations and credential vending for data access.
The datahub iceberg CLI provides commands to create, update, list, and delete Iceberg warehouses. See the Iceberg integration documentation for details.
Like other DataHub entities, platform instances support:
These aspects enable governance and discoverability of platform instances.
<details> <summary>Python SDK: Add metadata to a platform instance</summary>{{ inline /metadata-ingestion/examples/library/platform_instance_add_metadata.py show_path_as_comment }}
Platform instances can be marked with status information:
This helps communicate lifecycle information about platform instances to users.
The most common way to create platform instances is through the ingestion framework, which automatically creates them when the platform_instance configuration is specified in source configs. However, you can also create them programmatically:
{{ inline /metadata-ingestion/examples/library/platform_instance_create.py show_path_as_comment }}
When ingesting metadata, the dataPlatformInstance aspect links datasets to their platform instance. This is typically done by ingestion connectors but can also be done manually:
{{ inline /metadata-ingestion/examples/library/dataset_attach_platform_instance.py show_path_as_comment }}
You can retrieve platform instance information using the REST API or GraphQL:
<details> <summary>Python SDK: Query platform instance via REST API</summary>{{ inline /metadata-ingestion/examples/library/platform_instance_query.py show_path_as_comment }}
curl 'http://localhost:8080/entities/urn%3Ali%3AdataPlatformInstance%3A(urn%3Ali%3AdataPlatform%3Amysql%2Cproduction-cluster)'
query {
search(
input: {
type: DATA_PLATFORM_INSTANCE
query: "dataPlatform:iceberg"
start: 0
count: 10
}
) {
searchResults {
entity {
... on DataPlatformInstance {
urn
platform {
name
}
instanceId
properties {
name
description
}
}
}
}
}
}
Platform instances are referenced by many entities through the dataPlatformInstance aspect:
This creates a powerful organizational dimension across all data assets.
Most DataHub ingestion sources support a platform_instance configuration parameter. When specified, the connector automatically attaches the platform instance to all ingested entities:
source:
type: mysql
config:
host_port: "mysql.prod.company.com:3306"
platform_instance: "production-mysql-cluster"
# ... other config
The platform instance is then used to:
For platforms that support multiple instances, the platform instance is often incorporated into dataset names to ensure uniqueness. For example:
urn:li:dataset:(urn:li:dataPlatform:mysql,db.schema.table,PROD)urn:li:dataset:(urn:li:dataPlatform:mysql,prod-cluster.db.schema.table,PROD)This ensures that tables with the same name across different instances have distinct URNs.
When DataHub serves as an Iceberg REST catalog, platform instances represent Iceberg warehouses. Each warehouse configuration includes:
DataHub manages the lifecycle of Iceberg tables within these warehouses, handling:
See the datahub iceberg CLI commands for managing Iceberg warehouses as platform instances.
Data Platform Instances are categorized as "internal" entities in DataHub's entity registry, meaning they are primarily used for organization and metadata management rather than being primary discovery targets. Users typically interact with datasets, dashboards, and other assets rather than directly browsing platform instances.
However, platform instances are searchable and can be viewed in the DataHub UI when investigating asset organization or platform-level configurations.
Platform instances are distinct from the environment/fabric concept used in entity URNs (PROD, DEV, QA, etc.). While environment is a required part of many entity identifiers, platform instance is optional and provides a finer-grained organizational dimension.
A single platform instance typically corresponds to one environment, but you can have multiple instances within the same environment (e.g., "prod-us-west", "prod-us-east", "prod-eu-central" all in PROD environment).
Platform instances are typically created implicitly during ingestion rather than being explicitly defined beforehand. When an ingestion source references a platform instance that doesn't exist, DataHub will automatically create a basic platform instance entity. You can then enrich it with additional metadata like properties, ownership, and tags.
Unlike primary entities like datasets and dashboards, platform instances have limited search functionality in GraphQL. The search query with type: DATA_PLATFORM_INSTANCE is supported, but some advanced search features may not be fully implemented. REST API access provides full functionality.
Once created, a platform instance's key components (platform URN and instance ID) cannot be changed. If you need to rename an instance, you must create a new platform instance entity and migrate references from the old one.