metadata-models/docs/entities/corpGroup.md
The corpGroup entity represents organizational groups, teams, or departments within an enterprise. These groups can be synchronized from external identity providers like LDAP, Active Directory, or SAML/SSO systems, or created natively within DataHub. CorpGroups are essential for managing access control, ownership assignments, and organizational metadata in DataHub.
CorpGroups are uniquely identified by a single string field: the group name.
The URN structure for a corpGroup is:
urn:li:corpGroup:<encoded-group-name>
The <encoded-group-name> is a URL-encoded version of the group name that serves as a globally unique identifier within DataHub. The encoding is handled automatically by the SDK.
Here are some typical URN patterns for different group naming conventions:
urn:li:corpGroup:eng-team
urn:li:corpGroup:data-platform
urn:li:corpGroup:cn%3Dadmins%2Cou%3Dgroups%2Cdc%3Dexample%2Cdc%3Dcom # LDAP DN
urn:li:corpGroup:S-1-5-21-123456789-123456789-123456789-1234 # Active Directory SID
urn:li:corpGroup:marketing-team
The name field is searchable and supports autocomplete, making it easy to find groups across DataHub.
Group information is stored in two aspects:
This aspect stores the source-of-truth information from external systems:
Note: The admins, members, and groups fields in corpGroupInfo are deprecated and maintained only for backwards compatibility. Group membership is now managed through the GroupMembership aspect.
This aspect stores information that can be edited in the DataHub UI:
When both aspects contain the same field (like description), the UI typically prioritizes the editable version for display.
Group membership is managed through the groupMembership aspect, which is attached to corpUser entities (not the group itself). This design allows for efficient queries of which groups a user belongs to.
To add a user to a group, you update the groupMembership aspect on the user entity to include the group's URN.
The origin aspect tracks where a group originated from:
For external groups, the externalType field can specify the source system (e.g., "LDAP", "AzureAD", "Okta").
Groups can have owners assigned through the standard ownership aspect. Owners are typically administrators or managers responsible for the group. Ownership types include TECHNICAL_OWNER, BUSINESS_OWNER, and others.
Like other entities in DataHub, groups support:
{{ inline /metadata-ingestion/examples/library/corpgroup_create.py show_path_as_comment }}
{{ inline /metadata-ingestion/examples/library/corpgroup_add_members.py show_path_as_comment }}
{{ inline /metadata-ingestion/examples/library/corpgroup_update_info.py show_path_as_comment }}
To retrieve a group entity with all its aspects:
curl 'http://localhost:8080/entities/urn%3Ali%3AcorpGroup%3Aeng-team'
To find all users who are members of a specific group:
curl 'http://localhost:8080/relationships?direction=INCOMING&urn=urn%3Ali%3AcorpGroup%3Aeng-team&types=IsMemberOfGroup'
The response will include all corpUser entities that have the group in their groupMembership aspect:
{
"start": 0,
"count": 3,
"relationships": [
{
"type": "IsMemberOfGroup",
"entity": "urn:li:corpuser:jdoe"
},
{
"type": "IsMemberOfGroup",
"entity": "urn:li:corpuser:asmith"
},
{
"type": "IsMemberOfGroup",
"entity": "urn:li:corpuser:bwilliams"
}
],
"total": 3
}
CorpGroups are tightly integrated with corpUser entities through the groupMembership aspect. When a user is added to a group, their groupMembership aspect is updated to include the group's URN, establishing a bidirectional relationship.
Groups can be assigned as owners of any DataHub entity (datasets, dashboards, charts, etc.) through the ownership aspect. This allows team-based ownership where all group members are considered owners.
Example ownership assignment:
# A dataset can have a group as an owner
dataset.add_owner(CorpGroupUrn("data-engineering"))
While not directly stored in the corpGroup aspects, groups are a fundamental component of DataHub's RBAC (Role-Based Access Control) system. Groups can be:
DataHub provides ingestion connectors for syncing groups from external systems:
The LDAP source connector can extract groups and their memberships:
source:
type: ldap
config:
ldap_server: "ldap://ldap.example.com"
ldap_user: "cn=admin,dc=example,dc=com"
ldap_password: "${LDAP_PASSWORD}"
base_dn: "ou=groups,dc=example,dc=com"
filter: "(objectClass=groupOfNames)"
Groups extracted from LDAP will have:
origin aspect set to EXTERNAL with externalType="LDAP"The Azure AD source connector syncs groups from Microsoft Azure Active Directory:
source:
type: azure-ad
config:
client_id: "${AZURE_CLIENT_ID}"
tenant_id: "${AZURE_TENANT_ID}"
client_secret: "${AZURE_CLIENT_SECRET}"
ingest_users: true
ingest_groups: true
Azure AD groups will have:
origin aspect set to EXTERNAL with externalType="AzureAD"The corpGroup entity is fully supported in DataHub's GraphQL API. Common queries include:
query GetGroup {
corpGroup(urn: "urn:li:corpGroup:eng-team") {
urn
name
properties {
displayName
description
email
}
ownership {
owners {
owner {
... on CorpUser {
urn
username
}
}
}
}
}
}
The members, admins, and groups fields in the corpGroupInfo aspect are deprecated. These fields were originally used to store group membership directly on the group entity, but this approach had scalability and consistency issues.
Current best practice is to:
Groups can be created in two ways:
External groups are typically treated as read-only in DataHub to prevent conflicts with the source system. Updates should be made in the source system (LDAP, Azure AD, etc.) and re-synchronized to DataHub.
Group names are URL-encoded in URNs to handle special characters commonly found in LDAP DNs and Active Directory paths. When using the SDK, encoding is handled automatically. However, when constructing URNs manually or in API requests, ensure proper URL encoding:
# Correct - SDK handles encoding
CorpGroupUrn("cn=admins,ou=groups,dc=example,dc=com")
# Result: urn:li:corpGroup:cn%3Dadmins%2Cou%3Dgroups%2Cdc%3Dexample%2Cdc%3Dcom
# Incorrect - manual construction without encoding
"urn:li:corpGroup:cn=admins,ou=groups,dc=example,dc=com" # Will fail
While the corpGroupInfo aspect includes a deprecated groups field for nested groups, DataHub does not currently have first-class support for group hierarchies. Group membership is flat - a user is either a member of a group or not. If you need hierarchical group structures, consider: