metadata-models/docs/entities/domain.md
Domains are curated, top-level categories for organizing data assets within an organization. They represent logical groupings that typically align with business units, departments, or functional areas. Unlike tags which are informal labels, Domains provide a structured way to organize assets with centralized or distributed management. A data asset can belong to only one Domain at a time.
Domains are identified by a single piece of information:
An example of a domain identifier is urn:li:domain:marketing.
For auto-generated domains, the URN might look like urn:li:domain:6289fccc-4af2-4cbb-96ed-051e7d1de93c.
Domain properties are stored in the domainProperties aspect and contain the core metadata about a domain:
Here is an example of creating a domain with properties:
<details> <summary>Python SDK: Create a domain</summary>{{ inline /metadata-ingestion/examples/library/domain_create.py show_path_as_comment }}
Domains support hierarchical organization through parent-child relationships. This enables representing organizational structures with multiple levels. For example, you might have a top-level "Engineering" domain with child domains for "Data Engineering", "ML Engineering", and "Infrastructure Engineering".
<details> <summary>Python SDK: Create a nested domain</summary>{{ inline /metadata-ingestion/examples/library/domain_create_nested.py show_path_as_comment }}
Like other entities in DataHub, domains can have owners assigned to them using the ownership aspect. Domain owners are typically responsible for:
Ownership types for domains follow the same patterns as other entities, including TECHNICAL_OWNER, BUSINESS_OWNER, DATA_STEWARD, etc.
{{ inline /metadata-ingestion/examples/library/domain_add_owner.py show_path_as_comment }}
Domains support documentation through the institutionalMemory aspect, which allows linking to external resources such as:
{{ inline /metadata-ingestion/examples/library/domain_add_documentation.py show_path_as_comment }}
The primary purpose of domains is to organize data assets. Assets are assigned to domains using the domains aspect on the asset entity (not on the domain entity itself). This creates a relationship between the asset and the domain.
{{ inline /metadata-ingestion/examples/library/dataset_add_domain.py show_path_as_comment }}
When you assign an asset to a domain, it will:
You can query domains and their associated entities using both the REST API and GraphQL API.
curl 'http://localhost:8080/entities/urn%3Ali%3Adomain%3Amarketing' \
-H 'Authorization: Bearer <token>'
This will return the domain entity with all its aspects, including:
domainKey: The unique identifierdomainProperties: Name, description, parent domainownership: Owners of the domaininstitutionalMemory: Links and documentationDomains maintain relationships to all assets assigned to them. You can query these relationships to find all entities within a domain.
<details> <summary>REST API: Find all assets in a domain</summary>curl 'http://localhost:8080/relationships?direction=INCOMING&urn=urn%3Ali%3Adomain%3Amarketing&types=AssociatedWith' \
-H 'Authorization: Bearer <token>'
This returns all entities that have been associated with the specified domain.
</details> <details> <summary>Python SDK: Query domain from a dataset</summary>{{ inline /metadata-ingestion/examples/library/dataset_query_domain.py show_path_as_comment }}
Once assets are assigned to domains, you can:
The domains field on assets is indexed and searchable, making it efficient to filter large datasets by domain membership.
{{ inline /metadata-ingestion/examples/library/search_filter_by_domain.py show_path_as_comment }}
Domains integrate with several key DataHub features:
Domains have relationships with:
domains aspectownership aspect) who are responsible for managing the domainThe domain entity is supported by several GraphQL resolvers in the datahub-graphql-core module:
CreateDomainResolver: Creates new domainsSetDomainResolver: Assigns assets to domainsUnsetDomainResolver: Removes assets from domainsListDomainsResolver: Lists all available domainsDeleteDomainResolver: Deletes a domainDomainEntitiesResolver: Retrieves all entities within a domainParentDomainsResolver: Resolves the parent hierarchy of a domainBatchSetDomainResolver: Assigns multiple assets to a domain in one operationMoveDomainResolver: Moves a domain to a different parentCommon usage patterns include:
During metadata ingestion, domains can be automatically assigned using the domain configuration in ingestion recipes. This allows:
See the Domains feature guide for detailed ingestion configuration examples.
Unlike tags and glossary terms which support multiple assignments, an asset can belong to only one domain at a time. If you assign an asset to a new domain, it will automatically be removed from its previous domain.
When using bare domain names (like "Marketing") in ingestion recipes, DataHub will attempt to resolve them to provisioned domains. The resolution process checks:
urn:li:domain:MarketingIf resolution fails, ingestion will fail to ensure data integrity. To avoid resolution issues, you can use fully-qualified domain URNs in ingestion configurations.
When organizing domains hierarchically:
Managing domains requires the "Manage Domains" platform privilege. This includes:
Individual asset assignment can also be controlled by "Edit Domain" metadata policies on specific entity types.