metadata-models/docs/entities/glossaryTerm.md
A GlossaryTerm represents a standardized business definition or vocabulary term that can be associated with data assets across your organization. GlossaryTerms are the fundamental building blocks of DataHub's Business Glossary feature, enabling teams to establish and maintain a shared vocabulary for describing data concepts.
In practice, GlossaryTerms allow you to:
For example, a GlossaryTerm might define "Customer Lifetime Value (CLV)" with a precise business definition, relate it to other terms like "Revenue" and "Customer", and be applied to specific dataset columns that store CLV calculations.
GlossaryTerms are uniquely identified by a single field: their name. This name serves as the persistent identifier for the term throughout its lifecycle.
The URN (Uniform Resource Name) for a GlossaryTerm follows this pattern:
urn:li:glossaryTerm:<term_name>
Where:
<term_name>: A unique string identifier for the term. This can be human-readable (e.g., "CustomerLifetimeValue") or a generated ID (e.g., "clv-001" or a UUID).# Simple term name
urn:li:glossaryTerm:Revenue
# Hierarchical naming convention (common pattern)
urn:li:glossaryTerm:Finance.Revenue
urn:li:glossaryTerm:Classification.PII
urn:li:glossaryTerm:Classification.Confidential
# UUID-based identifier
urn:li:glossaryTerm:41516e31-0acb-fd90-76ff-fc2c98d2d1a3
# Descriptive identifier
urn:li:glossaryTerm:CustomerLifetimeValue
Classification.PII, Finance.Revenue) to indicate structure even though the name is flat.name field in glossaryTermInfo for the display name.The glossaryTermInfo aspect contains the essential business information about a term:
Example:
{
"name": "Customer Lifetime Value",
"definition": "The total revenue a business can expect from a single customer account throughout the business relationship.",
"termSource": "INTERNAL",
"parentNode": "urn:li:glossaryNode:Finance"
}
GlossaryTerms support several relationship types that help model the semantic connections between business concepts:
Indicates that one term is a specialized type of another term. This creates an "Is-A" hierarchy where more specific terms inherit the characteristics of broader terms.
Use case: Email IsA PersonalInformation, SocialSecurityNumber IsA PersonalInformation
Indicates that one term contains or is composed of another term. This creates a "Has-A" relationship where a complex concept consists of simpler parts.
Use case: Address HasA ZipCode, Address HasA Street, Address HasA City
Defines the allowed values for an enumerated term. Useful for controlled vocabularies where a term has a fixed set of valid values.
Use case: ColorEnum HasValues Red, Green, Blue
General-purpose relationship for terms that are semantically related but don't fit the other categories.
Use case: Revenue RelatedTo Profit, Customer RelatedTo Account
GlossaryTerms can be organized hierarchically through GlossaryNodes (term groups). The parentNode field in glossaryTermInfo establishes this relationship:
GlossaryNode: Classification
├── GlossaryTerm: Sensitive
├── GlossaryTerm: Confidential
└── GlossaryTerm: HighlyConfidential
GlossaryNode: PersonalInformation
├── GlossaryTerm: Email
├── GlossaryTerm: Address
└── GlossaryTerm: PhoneNumber
This hierarchy is visible in the DataHub UI and helps users navigate large glossaries.
GlossaryTerms become valuable when applied to actual data assets. Terms can be attached to:
When a term is applied to a data asset, it creates a TermedWith relationship, which enables:
{{ inline /metadata-ingestion/examples/library/glossary_term_create.py show_path_as_comment }}
{{ inline /metadata-ingestion/examples/library/glossary_term_create_with_metadata.py show_path_as_comment }}
{{ inline /metadata-ingestion/examples/library/glossary_term_add_relationships.py show_path_as_comment }}
{{ inline /metadata-ingestion/examples/library/dataset_add_term.py show_path_as_comment }}
{{ inline /metadata-ingestion/examples/library/dataset_add_column_term.py show_path_as_comment }}
# Fetch a GlossaryTerm entity
curl -X GET 'http://localhost:8080/entities/urn%3Ali%3AglossaryTerm%3ACustomerLifetimeValue' \
-H 'Authorization: Bearer <token>'
# Response includes all aspects:
# - glossaryTermKey (identity)
# - glossaryTermInfo (definition, name, etc.)
# - glossaryRelatedTerms (relationships)
# - ownership (who owns this term)
# - institutionalMemory (links to documentation)
# - etc.
# Find all datasets tagged with a specific term
curl -X POST 'http://localhost:8080/entities?action=search' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <token>' \
-d '{
"entity": "dataset",
"input": "*",
"filter": {
"or": [
{
"and": [
{
"field": "glossaryTerms",
"value": "urn:li:glossaryTerm:Classification.PII",
"condition": "EQUAL"
}
]
}
]
},
"start": 0,
"count": 10
}'
{{ inline /metadata-ingestion/examples/library/dataset_query_terms.py show_path_as_comment }}
# business_glossary.yml
version: "1"
source: MyOrganization
owners:
users:
- datahub
nodes:
- name: Classification
description: Data classification categories
terms:
- name: PII
description: Personally Identifiable Information
- name: Confidential
description: Confidential business data
- name: Public
description: Publicly available data
- name: Finance
description: Financial domain terms
terms:
- name: Revenue
description: Total income from business operations
- name: Profit
description: Financial gain after expenses
related_terms:
- Finance.Revenue
# Ingest using the DataHub CLI:
# datahub ingest -c business_glossary.yml
See the Business Glossary Source documentation for the full YAML format specification.
</details>GlossaryNodes (term groups) provide hierarchical organization for GlossaryTerms. Think of GlossaryNodes as folders and GlossaryTerms as files within those folders.
parentNode in glossaryTermInfo)GlossaryTerms can be applied to most entity types in DataHub through the glossaryTerms aspect:
Supported entities:
When you apply a term to an entity, DataHub creates:
glossaryTerms aspect on the target entity containing the term associationThe GraphQL API provides rich querying and mutation capabilities for GlossaryTerms:
Queries:
Mutations:
createGlossaryTerm: Create a new termaddTerms, addTerm: Apply terms to entitiesremoveTerm, batchRemoveTerms: Remove terms from entitiesupdateParentNode: Move a term to a different parent groupSee the GraphQL API documentation for detailed examples.
GlossaryTerms enhance discoverability in multiple ways:
GlossaryTerms support fine-grained access control through DataHub's policy system:
See the Business Glossary documentation for details on privileges.
The URN identifier (name in glossaryTermKey) is separate from the display name (name in glossaryTermInfo). Best practice:
When using terms from external standards (FIBO, ISO, industry glossaries):
termSource to "EXTERNAL"sourceRef with the standard name (e.g., "FIBO")sourceUrl linking to the authoritative definitionDon't confuse:
parentNode → GlossaryNode): Organizational structure for browsingglossaryRelatedTerms): Meaning connections between conceptsA term can have a parentNode for organization (e.g., term "Email" under node "PersonalInformation") AND semantic relationships (e.g., "Email" IsA "PII", "Email" RelatedTo "Contact").
GlossaryTerms support the schemaMetadata aspect, which is rarely used but can be helpful for defining structured attributes on terms themselves. This is an advanced feature for when terms need to carry typed properties beyond simple custom properties.
When a GlossaryTerm is deprecated (via the deprecation aspect):