metadata-models/docs/entities/erModelRelationship.md
Entity-Relationship (ER) Model Relationships represent the connections between entities in an entity-relationship diagram, specifically modeling how dataset fields relate to each other through foreign key constraints, joins, and other referential relationships. In DataHub, these relationships capture the semantic connections between tables, enabling users to understand data structure, enforce referential integrity, and trace data lineage at the field level.
ER Model Relationships are particularly valuable for documenting database schemas, data warehouse models, and any structured data system where understanding table relationships is critical for data governance, impact analysis, and query optimization.
ER Model Relationships are uniquely identified by a single identifier:
The URN structure follows the pattern:
urn:li:erModelRelationship:<id>
urn:li:erModelRelationship:employee_to_company
urn:li:erModelRelationship:a1b2c3d4e5f6g7h8i9j0
When creating relationships through the UI or API, the ID is often generated deterministically using a hash function to ensure consistency:
Destination, ERModelRelationName, SourceThis ensures that the same relationship between two datasets always gets the same ID, regardless of creation order.
ER Model Relationships capture essential metadata about how datasets connect to each other through the erModelRelationshipProperties aspect. This core aspect contains:
DataHub supports four cardinality types that describe how records in one dataset relate to records in another:
ONE_ONE: One-to-one relationship. Each record in the source dataset corresponds to exactly one record in the destination dataset.
ONE_N: One-to-many relationship. Each record in the source dataset can correspond to multiple records in the destination dataset.
N_ONE: Many-to-one relationship. Multiple records in the source dataset can correspond to one record in the destination dataset.
N_N: Many-to-many relationship. Records in both datasets can have multiple corresponding records in the other dataset.
The relationshipFieldMappings array defines which specific fields connect the two datasets. Each mapping contains:
Multiple field mappings can be specified for composite keys where the relationship depends on multiple fields.
Like other DataHub entities, ER Model Relationships support custom properties for storing additional metadata such as:
Relationships include optional timestamp information to track when they were created and last modified in the source system:
Here's a complete example showing how to create two datasets and establish a many-to-one relationship between them:
<details> <summary>Python SDK: Create an ER Model Relationship</summary>{{ inline /metadata-ingestion/examples/library/ermodelrelationship_create_basic.py show_path_as_comment }}
The editableERModelRelationshipProperties aspect allows users to add or modify relationship metadata through the DataHub UI without overwriting information ingested from source systems. This separation follows the same pattern used across DataHub entities.
Editable properties include:
{{ inline /metadata-ingestion/examples/library/ermodelrelationship_update_properties.py show_path_as_comment }}
ER Model Relationships support tagging and glossary term attachment just like other DataHub entities. This allows you to categorize relationships, mark them with data classification tags, or link them to business concepts.
Tags can be used to classify relationships by type, importance, or data domain:
<details> <summary>Python SDK: Add a tag to an ER Model Relationship</summary>{{ inline /metadata-ingestion/examples/library/ermodelrelationship_add_tag.py show_path_as_comment }}
Glossary terms connect relationships to business concepts and terminology:
<details> <summary>Python SDK: Add a glossary term to an ER Model Relationship</summary>{{ inline /metadata-ingestion/examples/library/ermodelrelationship_add_term.py show_path_as_comment }}
Ownership can be assigned to ER Model Relationships to indicate who is responsible for maintaining the relationship definition or who should be consulted about changes to the connected datasets.
<details> <summary>Python SDK: Add an owner to an ER Model Relationship</summary>{{ inline /metadata-ingestion/examples/library/ermodelrelationship_add_owner.py show_path_as_comment }}
ER Model Relationships can model sophisticated data structures including composite keys and many-to-many relationships through junction tables:
<details> <summary>Python SDK: Create a many-to-many relationship with composite keys</summary>{{ inline /metadata-ingestion/examples/library/ermodelrelationship_complex_many_to_many.py show_path_as_comment }}
ER Model Relationships can be queried using the standard DataHub REST API:
<details> <summary>Fetch an ER Model Relationship</summary>curl 'http://localhost:8080/entities/urn%3Ali%3AerModelRelationship%3Aemployee_to_company'
The response includes all aspects of the relationship:
{
"urn": "urn:li:erModelRelationship:employee_to_company",
"aspects": {
"erModelRelationshipKey": {
"id": "employee_to_company"
},
"erModelRelationshipProperties": {
"name": "Employee to Company Relationship",
"source": "urn:li:dataset:(urn:li:dataPlatform:mysql,Employee,PROD)",
"destination": "urn:li:dataset:(urn:li:dataPlatform:mysql,Company,PROD)",
"relationshipFieldMappings": [
{
"sourceField": "company_id",
"destinationField": "id"
}
],
"cardinality": "N_ONE",
"customProperties": {
"constraint": "Foreign Key"
}
}
}
}
You can discover relationships connected to a specific dataset by querying the relationships API:
# Find relationships where the dataset is the source
curl 'http://localhost:8080/relationships?direction=OUTGOING&urn=urn%3Ali%3Adataset%3A(urn%3Ali%3AdataPlatform%3Amysql,Employee,PROD)&types=ermodelrelationA'
# Find relationships where the dataset is the destination
curl 'http://localhost:8080/relationships?direction=INCOMING&urn=urn%3Ali%3Adataset%3A(urn%3Ali%3AdataPlatform%3Amysql,Company,PROD)&types=ermodelrelationB'
ER Model Relationships integrate with several other DataHub entities and features:
ER Model Relationships are fundamentally connected to Dataset entities. Each relationship must reference exactly two datasets:
While the entity stores field paths as strings, these correspond to SchemaField entities within the referenced datasets. This enables:
ER Model Relationships complement but are distinct from DataHub's lineage features:
Together, these features provide a complete picture of both data structure and data flow.
The DataHub GraphQL API provides rich querying capabilities for ER Model Relationships:
erModelRelationship(urn: String!): Fetch a specific relationshipCreating and modifying ER Model Relationships requires appropriate permissions in DataHub's policy framework. Users must have edit permissions on both the source and destination datasets to create a relationship between them.
While ER Model Relationships have "source" and "destination" fields, these do not necessarily imply directionality in the traditional sense of foreign keys:
ER Model Relationships are currently separate from the datasets they connect:
ER Model Relationships reference field paths as strings, not versioned schema references:
Not all data platforms have first-class support for ER Model Relationships:
The ER Model Relationship entity may evolve to include: