metadata-ingestion/src/datahub/ingestion/source/rdf/entities/domain/SPEC.md
Part of: RDF Specification
This document specifies how DataHub domains are constructed from entity IRI paths.
Domains are not extracted from RDF graphs. Instead, they are constructed from the IRI path segments of glossary terms. Domains provide hierarchical organization for business entities.
Important: Domains are not registered entities (no ENTITY_METADATA). They are built by the DomainBuilder class from existing entities.
Domains are created from the parent path segments of entity IRIs:
Entity IRI: https://bank.com/finance/accounts/customer_id
Path Segments: ['bank.com', 'finance', 'accounts', 'customer_id']
Parent Segments (for domain creation): ['bank.com', 'finance', 'accounts']
Domains Created:
bank.com (root domain)finance (child of bank.com)accounts (child of finance, leaf domain)Entity Assignment: Term assigned to accounts domain (most specific parent)
Domains form a hierarchical tree structure:
bank.com (root)
└── finance
└── accounts (leaf - contains entities)
parent_domain_urnparent_domain_urnRule: Domains that have glossary terms in their hierarchy are created.
Entities are assigned to their immediate parent domain (leaf domain):
Example:
https://bank.com/finance/accounts/customer_id → Assigned to accounts domainDomain URNs are generated from path segments:
Format: urn:li:domain:({path_segments})
Example:
('bank.com', 'finance', 'accounts')urn:li:domain:(bank.com,finance,accounts)Path segments are represented as tuples:
('bank.com',) - Root domain('bank.com', 'finance') - Second-level domain('bank.com', 'finance', 'accounts') - Third-level domain (leaf)"accounts")Domains are created via DataHub MCPs:
Input Entities:
https://bank.com/finance/accounts/customer_idDomains Created:
DataHubDomain(
urn="urn:li:domain:(bank.com,finance,accounts)",
name="accounts",
parent_domain_urn="urn:li:domain:(bank.com,finance)",
glossary_terms=[...], # customer_id term
)
DataHubDomain(
urn="urn:li:domain:(bank.com,finance)",
name="finance",
parent_domain_urn="urn:li:domain:(bank.com)",
glossary_terms=[],
)
DataHubDomain(
urn="urn:li:domain:(bank.com)",
name="bank.com",
parent_domain_urn=None, # Root domain
glossary_terms=[],
)