metadata-models/docs/entities/corpuser.md
CorpUser represents an individual user (or account) in the enterprise. These entities serve as the identity layer within DataHub, representing people who interact with data assets, own resources, belong to groups, and have roles and permissions within the organization. CorpUsers can represent LDAP users, Active Directory accounts, SSO identities, or native DataHub users.
CorpUsers are uniquely identified by a single piece of information:
The URN structure for CorpUser is:
urn:li:corpuser:<username>
urn:li:corpuser:jdoe
urn:li:corpuser:[email protected]
urn:li:corpuser:[email protected]
The username is stored in the corpUserKey aspect, which is the identity aspect for this entity. The username field is marked as searchable and enables autocomplete functionality in the DataHub UI.
The username can follow various conventions depending on your organization's identity provider:
jdoe, john.doe, john_doe[email protected], [email protected]It's important to maintain consistency in username formats across your DataHub deployment to ensure proper identity resolution and relationship tracking.
The core profile information about a user is stored in the corpUserInfo aspect. This is typically populated automatically by ingestion connectors from identity providers like LDAP, Active Directory, Azure AD, Okta, or other SSO systems.
Key Fields:
corpUserStatus)The managerUrn field creates a relationship between users, enabling organizational hierarchy visualization in DataHub.
The corpUserEditableInfo aspect contains information that users can modify through the DataHub UI, allowing users to enrich their profiles beyond what's provided by the identity provider.
Key Fields:
The corpUserStatus aspect tracks the current status of the user account, replacing the deprecated active field in corpUserInfo.
Key Fields:
This aspect provides more granular control over user account states compared to the simple boolean active field.
Users can be members of groups through two different aspects:
groupMembership: Represents membership in CorpGroups that may be managed within DataHub or synchronized from external systems. This creates IsMemberOfGroup relationships.
nativeGroupMembership: Represents membership in groups that are native to an external identity provider (like Active Directory groups). This creates IsMemberOfNativeGroup relationships.
Both aspects store arrays of group URNs, allowing users to belong to multiple groups simultaneously.
The roleMembership aspect associates users with DataHub roles, which define their permissions and access within the platform.
Key Fields:
This creates IsMemberOfRole relationships and is fundamental to DataHub's role-based access control (RBAC) system.
The corpUserCredentials aspect stores authentication information for native DataHub users (users created directly in DataHub rather than synchronized from an external identity provider).
Key Fields:
This aspect is only used for native authentication and is not populated for users authenticated through SSO or LDAP.
The corpUserSettings aspect stores user-specific preferences for the DataHub UI and features.
Key Fields:
showSimplifiedHomepage: Whether to show a simplified homepage with only datasets, charts, and dashboardsshowThemeV2: Whether to use the V2 themedefaultView: The user's default DataHub viewpageTemplate: The user's default page templatedismissedAnnouncementUrns: List of announcements the user has dismissedThe origin aspect tracks where the user entity originated from, distinguishing between native DataHub users and those synchronized from external systems.
Key Fields:
This information is useful for understanding the source of truth for user data and managing synchronization processes.
The slackUserInfo aspect contains detailed information about a user's Slack identity, enabling rich Slack integration features within DataHub.
Key Fields:
Like other DataHub entities, CorpUsers support:
These common aspects enable flexible metadata management and integration with DataHub's broader metadata framework.
The simplest way to create a CorpUser is using the high-level Python SDK:
<details> <summary>Python SDK: Create a basic user</summary>{{ inline /metadata-ingestion/examples/library/corpuser_create_basic.py show_path_as_comment }}
Users are often members of groups. Here's how to create a user and assign them to groups:
<details> <summary>Python SDK: Create user with group memberships</summary>{{ inline /metadata-ingestion/examples/library/corpuser_create_with_groups.py show_path_as_comment }}
To update editable profile information for an existing user:
<details> <summary>Python SDK: Update user profile</summary>{{ inline /metadata-ingestion/examples/library/corpuser_update_profile.py show_path_as_comment }}
Users can be tagged for categorization and discovery:
<details> <summary>Python SDK: Add tags to a user</summary>{{ inline /metadata-ingestion/examples/library/corpuser_add_tag.py show_path_as_comment }}
You can fetch user information using the REST API:
<details> <summary>REST API: Get user information</summary># Get a user by URN
curl -X GET "http://localhost:8080/entities/urn%3Ali%3Acorpuser%3Ajdoe" \
-H "Authorization: Bearer <your-access-token>"
# Get specific aspects of a user
curl -X GET "http://localhost:8080/aspects/urn%3Ali%3Acorpuser%3Ajdoe?aspect=corpUserInfo&aspect=corpUserEditableInfo&aspect=groupMembership" \
-H "Authorization: Bearer <your-access-token>"
You can search for users using the GraphQL API or search API:
<details> <summary>GraphQL: Search for users</summary>query searchUsers {
search(input: { type: CORP_USER, query: "john", start: 0, count: 10 }) {
start
count
total
searchResults {
entity {
... on CorpUser {
urn
username
properties {
displayName
email
title
fullName
}
editableProperties {
aboutMe
teams
skills
slack
}
}
}
}
}
}
CorpUsers have several important relationships with other DataHub entities:
Ownership Relationships:
Group Relationships:
IsMemberOfGroup relationshipsRole Relationships:
IsMemberOfRole relationshipsOrganizational Hierarchy:
managerUrn field in corpUserInfo creates ReportsTo relationshipsPlatform Usage:
platforms field in corpUserEditableInfo creates IsUserOf relationshipsPersona Assignment:
persona fieldCorpUsers are typically synchronized from external identity providers:
LDAP/Active Directory:
uid or sAMAccountNameSSO Providers (Okta, Azure AD, etc.):
corpUserInfo aspectorigin aspect tracks the SSO provider as the sourceNative DataHub Users:
corpUserCredentials aspectorigin.type = NATIVECorpUsers are central to DataHub's security model:
Authentication:
Authorization (RBAC):
roleMembership aspectMetadata Access:
CorpUsers can represent both human users and system/service accounts. The system field in corpUserInfo distinguishes between these:
system: false): Actual people who interact with DataHubsystem: true): Service accounts, automated processes, or system-level operationsSystem users should be marked appropriately to distinguish them in reports, ownership lists, and access reviews.
The active field in corpUserInfo is deprecated. Use the corpUserStatus aspect instead, which provides:
When working with users, prefer checking corpUserStatus.status over corpUserInfo.active.
The username (in corpUserKey) is immutable once a user is created. If a user's username changes in the source system:
status aspectPlan your username strategy carefully to avoid frequent username changes.
Display names can appear in multiple aspects with this precedence:
corpUserEditableInfo.displayName (user-specified, highest priority)corpUserInfo.displayName (from identity provider)corpUserInfo.fullName (fallback if no display name is set)The DataHub UI resolves these in order, showing the most specific value available.