docs/features/feature-guides/access-roles.md
import FeatureAvailability from '@site/src/components/FeatureAvailability'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';
Note: This feature is under active development and subject to significant change.
DataHub's Access Roles feature allows you to ingest external roles from your source systems with your data assets in DataHub so that users can understand which roles they need in order to access a given asset.
Whereas Data Access Workflows enable you to create workflows for requesting and reviewing access for tables, dashboards, etc, this feature enables users to understand the roles that already have access for a given asset, and redirect to an external platform to request access to the role.
This creates a unified view of access control across your data ecosystem, helping data consumers:
By integrating your external roles into DataHub, teams can reduce access request friction and ensure users have the right level of access to the data they need.
For self-hosted DataHub deployments, the Access Management feature is disabled by default. To enable it,
simply set the SHOW_ACCESS_MANAGEMENT environment variable for the datahub-gms service container
to true. For example in your docker/datahub-gms/docker.env, you'd configure:
SHOW_ACCESS_MANAGEMENT=true
If you're using DataHub Cloud, enabling the Access Management feature just requires contacting your DataHub Cloud CustomerSuccess representative. They can enable this feature for your environment without any configuration changes on your part.
Under a dataset, the new tab "Access Management" should appear if configured correctly.
<p align="center"> </p>Access management introduces a new entity in DataHub's metadata model called a Role. A Role is comprised of:
This role must then be associated with datasets through a new aspect called access.
:::note Important Note Currently, only Dataset entities support Access Management. :::
:::caution Do not confuse role with datahubrole The "role" entity refers to an external role definition that exists in your source systems (like Snowflake or BigQuery), while "datahubrole" is for the management of privileges within DataHub itself (i.e., the admin role can accept proposed metadata changes). :::
You can set up Access Management through either the CLI or Python API. Here's how to complete the three main steps:
datahub put --urn "urn:li:role:reader" --aspect roleProperties -d - <<-EOF
{
"name": "Snowflake Reader Role",
"description": "Description for Snowflake Reader Role",
"type": "READ",
"requestUrl": "http://custom-url-for-redirection.com"
}
EOF
import datahub.emitter.mce_builder as builder
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.metadata.schema_classes import RolePropertiesClass, ChangeTypeClass
# Create a role properties aspect
role_properties = RolePropertiesClass(
name="Snowflake Reader Role",
description="Description for Snowflake Reader Role",
type="READ",
requestUrl="http://custom-url-for-redirection.com"
)
# Create a metadata change proposal
mcp = MetadataChangeProposalWrapper(
changeType=ChangeTypeClass.UPSERT,
entityUrn="urn:li:role:reader",
aspectName="roleProperties",
aspect=role_properties
)
# Emit the metadata
emitter = DatahubRestEmitter(gms_server="http://localhost:8080")
emitter.emit(mcp)
datahub put --urn "urn:li:role:reader" --aspect actors -d - <<-EOF
{
"users": [
{"user": "urn:li:corpuser:datahubuser"}
]
}
EOF
from datahub.metadata.schema_classes import ActorsClass, ActorClass
# Create an actors aspect
actors = ActorsClass(
users=[
ActorClass(user="urn:li:corpuser:datahubuser")
]
)
# Create a metadata change proposal
mcp = MetadataChangeProposalWrapper(
changeType=ChangeTypeClass.UPSERT,
entityUrn="urn:li:role:reader",
aspectName="actors",
aspect=actors
)
# Emit the metadata
emitter.emit(mcp)
datahub put --urn "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)" --aspect access -d - <<-EOF
{
"roles": [
{"urn": "urn:li:role:reader"},
{"urn": "urn:li:role:writer"}
]
}
EOF
from datahub.metadata.schema_classes import AccessClass, RoleAssociationClass
dataset_urn = "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)"
# Create an access aspect with multiple roles
access_aspect = AccessClass(
roles=[
RoleAssociationClass(urn="urn:li:role:reader"),
RoleAssociationClass(urn="urn:li:role:writer")
]
)
# Create a metadata change proposal
mcp = MetadataChangeProposalWrapper(
changeType=ChangeTypeClass.UPSERT,
entityUrn=dataset_urn,
aspectName="access",
aspect=access_aspect
)
# Emit the metadata
emitter.emit(mcp)
Here are some common scenarios where integrating external roles into DataHub is valuable:
To see Access Management in action, check out our DataHub Townhall demo where we showcase how to use this feature in a real-world scenario.
Future enhancements planned for Access Management include: