metadata-models/docs/entities/glossaryNode.md
A GlossaryNode represents a hierarchical grouping or category within DataHub's Business Glossary. GlossaryNodes act as folders or containers that organize GlossaryTerms into a logical structure, making it easier to navigate and manage large business glossaries.
In practice, GlossaryNodes allow you to:
For example, you might create a GlossaryNode called "Finance" containing terms like "Revenue", "Profit", and "EBITDA", with a nested GlossaryNode "Compliance" underneath containing "SOX", "GDPR", and "CCPA" terms.
GlossaryNodes are uniquely identified by a single field: their name. This name serves as the persistent identifier for the node throughout its lifecycle.
The URN (Uniform Resource Name) for a GlossaryNode follows this pattern:
urn:li:glossaryNode:<node_name>
Where:
<node_name>: A unique string identifier for the node. This can be human-readable (e.g., "Finance") or a generated ID (e.g., "fin-category-001" or a UUID).# Simple node name
urn:li:glossaryNode:Finance
# Hierarchical naming convention (common pattern)
urn:li:glossaryNode:Finance.Revenue
urn:li:glossaryNode:Classification
urn:li:glossaryNode:Classification.DataSensitivity
# UUID-based identifier
urn:li:glossaryNode:41516e31-0acb-fd90-76ff-fc2c98d2d1a3
# Descriptive identifier
urn:li:glossaryNode:PersonalInformation
Finance.Revenue, Classification.PII) to indicate structure even though the name is flat.name field in glossaryNodeInfo for the display name.The glossaryNodeInfo aspect contains the essential information about a glossary node:
Example:
{
"name": "Financial Metrics",
"definition": "Category for all financial and accounting-related business terms including revenue, costs, and profitability measures.",
"parentNode": "urn:li:glossaryNode:Finance"
}
GlossaryNodes support arbitrary nesting through the parentNode field, creating tree structures:
GlossaryNode: DataGovernance
├── GlossaryNode: Classification
│ ├── GlossaryTerm: Public
│ ├── GlossaryTerm: Internal
│ └── GlossaryTerm: Confidential
│
├── GlossaryNode: PersonalInformation
│ ├── GlossaryNode: DirectIdentifiers
│ │ ├── GlossaryTerm: Email
│ │ └── GlossaryTerm: SSN
│ └── GlossaryNode: IndirectIdentifiers
│ ├── GlossaryTerm: IPAddress
│ └── GlossaryTerm: DeviceID
│
└── GlossaryNode: Compliance
├── GlossaryTerm: GDPR
└── GlossaryTerm: CCPA
Key characteristics:
GlossaryNodes support standard ownership metadata through the ownership aspect. Ownership at the node level can represent:
Ownership is particularly powerful for GlossaryNodes because:
GlossaryNodes support the institutionalMemory aspect, allowing you to:
This is especially useful for top-level nodes representing major domains or initiatives.
{{ inline /metadata-ingestion/examples/library/glossary_node_create.py show_path_as_comment }}
{{ inline /metadata-ingestion/examples/library/glossary_node_create_nested.py show_path_as_comment }}
{{ inline /metadata-ingestion/examples/library/glossary_term_create_hierarchy.py show_path_as_comment }}
{{ inline /metadata-ingestion/examples/library/glossary_node_add_owner.py show_path_as_comment }}
# Fetch a GlossaryNode entity
curl -X GET 'http://localhost:8080/entities/urn%3Ali%3AglossaryNode%3AFinance' \
-H 'Authorization: Bearer <token>'
# Response includes all aspects:
# - glossaryNodeKey (identity)
# - glossaryNodeInfo (definition, name, parentNode, etc.)
# - ownership (who owns this node)
# - institutionalMemory (links to documentation)
# - etc.
query GetRootGlossaryNodes {
getRootGlossaryNodes {
nodes {
urn
properties {
name
definition
}
ownership {
owners {
owner {
... on CorpUser {
urn
username
}
}
}
}
}
}
}
query GetGlossaryNodeChildren {
glossaryNode(urn: "urn:li:glossaryNode:Finance") {
urn
properties {
name
definition
}
children {
count
relationships {
entity {
... on GlossaryNode {
urn
properties {
name
}
}
... on GlossaryTerm {
urn
properties {
name
definition
}
}
}
}
}
}
}
# business_glossary.yml
version: "1"
source: MyOrganization
owners:
users:
- datahub
nodes:
- name: DataGovernance
description: Top-level governance structure
nodes:
- name: Classification
description: Data classification categories
terms:
- name: Public
description: Publicly available data
- name: Internal
description: Internal use only
- name: Confidential
description: Restricted access data
- name: PersonalInformation
description: Personal and sensitive data categories
nodes:
- name: DirectIdentifiers
description: Direct personal identifiers
terms:
- name: Email
description: Email addresses
- name: SSN
description: Social Security Numbers
- name: IndirectIdentifiers
description: Indirect identifiers
terms:
- name: IPAddress
description: Internet Protocol addresses
- name: DeviceID
description: Device identifiers
# Ingest using the DataHub CLI:
# datahub ingest -c business_glossary.yml
See the Business Glossary Source documentation for the full YAML format specification.
</details>GlossaryNodes provide organizational structure for GlossaryTerms. The relationship is established through:
glossaryTermInfo.parentNode field references its containing nodeThink of this relationship as:
GlossaryNodes form a tree structure through self-referential parent-child relationships:
glossaryNodeInfo.parentNodeKey operations:
getRootGlossaryNodes: Fetch all top-level nodes (no parent)parentNodes: Navigate upward to find all ancestorschildren: Navigate downward to find immediate childrenparentNode reference and affects the entire subtreeThe GraphQL API provides specialized operations for GlossaryNodes:
Queries:
glossaryNode(urn): Fetch a specific node with childrengetRootGlossaryNodes: Get all root-level nodessearch(entity: "glossaryNode"): Search nodes by name/definitionMutations:
createGlossaryNode: Create a new node with optional parentupdateParentNode: Move a node to a different parentupdateName: Update the display nameupdateDescription: Update the definitionResolvers:
children: Fetch immediate children (nodes and terms)childrenCount: Count of children under this nodeparentNodes: Fetch ancestor path from node to rootSee the Business Glossary documentation for UI operations.
GlossaryNodes support fine-grained access control through special glossary-specific privileges:
Users with this privilege on a node can:
Use case: Department leads managing their immediate category structure
Users with this privilege on a node can:
Use case: Data governance team managing an entire domain (e.g., all PII-related terms)
Users with this platform-level privilege can:
These privileges are checked hierarchically - if you have permission on a parent node, it may grant permissions on children depending on the privilege type.
While GlossaryNodes don't get applied to data assets directly (that's the role of GlossaryTerms), they enhance discoverability by:
Similar to GlossaryTerms, the URN identifier (name in glossaryNodeKey) is separate from the display name (name in glossaryNodeInfo):
This separation allows you to rename nodes in the UI without breaking references.
The hierarchy must be a tree structure (directed acyclic graph):
If you attempt to create a circular reference, the operation will fail with a validation error.
Nodes with no parent (parentNode is null or not set) appear at the root level of the glossary:
Current behavior (subject to change):
Best practice: Always move or reassign children before deleting a node, or use bulk operations that handle the entire subtree.
GlossaryNodes support the displayProperties aspect (added in newer versions), which provides additional UI customization:
This is an optional enhancement for organizations that want more visual control over their glossary.
Unlike GlossaryTerms, GlossaryNodes are not directly applied to data assets:
If you need to tag assets with a category, create a GlossaryTerm within that node and apply the term.
When you move a node to a new parent:
This makes reorganization efficient but requires care to avoid unintended moves.