metadata-ingestion/docs/sources/databricks/unity-catalog_pre.md
The unity-catalog module ingests metadata from Databricks into DataHub. It is intended for production ingestion workflows and module-specific capabilities are documented below.
You can authenticate with Databricks using OAuth, Azure authentication, a Personal Access Token (legacy), or Databricks unified authentication:
Option 1: OAuth
Option 2: Azure Authentication (for Azure Databricks)
client_id (Application ID), tenant_id (Directory ID), and create a client_secretOption 3: Personal Access Token (PAT) (legacy)
Option 4: Unified authentication
USE CATALOG privilege on any catalogs you want to ingestUSE SCHEMA privilege on any schemas you want to ingestSELECT privilege on any tables and views you want to ingestinclude_hive_metastore - enabled by default), your service principal must have all of the following:
READ_METADATA and USAGE privilege on hive_metastore catalogREAD_METADATA and USAGE privilege on schemas you want to ingestREAD_METADATA and USAGE privilege on tables and views you want to ingestCAN_READ privileges on the folders containing the notebooks you want to ingest: guide.include_usage_statistics (enabled by default), your service principal must have one of the following:
CAN_MANAGE permissions on any SQL Warehouses you want to ingest: guide.usage_data_source is set to SYSTEM_TABLES or AUTO (default) with warehouse_id configured: SELECT privilege on system.query.history table for improved performance with large query volumes and multi-workspace setups.profiling information with the default SQLAlchemy profiler (method: sqlalchemy), you need SELECT privilege on tables and views.profiling information with method: ge (requires pip install 'acryl-datahub[profiling-ge]'), you need SELECT privileges on all profiled tables.profiling information with method: analyze and call_analyze: true (enabled by default), your service principal must have ownership or MODIFY privilege on any tables you want to profile.
call_analyze to false.
You will still need SELECT privilege on those tables to fetch the results.workspace_url and either token (for PAT authentication) or azure_auth credentials (for Azure authentication) with your information from the previous steps.If you plan to use DataHub Cloud's Freshness, Volume, or Column Assertions on Databricks, the required Unity Catalog privileges depend on which Source you select in the assertion builder:
| Source Type | Required Privilege(s) | Notes |
|---|---|---|
| Table Statistics | MODIFY (or ownership) on the target table | Runs ANALYZE TABLE ... COMPUTE STATISTICS followed by DESCRIBE TABLE EXTENDED. On Delta tables this is metadata-only (reads file-level stats from the transaction log). Tables only, not Views. Default Volume Source. |
| Information Schema | USE CATALOG + USE SCHEMA on the containing catalog/schema, plus SELECT on system.information_schema.tables | Queries the Unity Catalog information_schema.tables view. Tables only, not Views. |
| Audit Log | SELECT on system.access.audit (requires Unity Catalog system schemas to be enabled) | Reads workspace audit events. Tables only. |
| File Metadata | SELECT on the target table | Reads file-level modification time via Delta transaction log metadata. Delta tables only. |
| Query / Last Modified Column / High Watermark Column / Field Value | SELECT on the target table | Runs SQL queries against the table. Works for Tables and Views. |
| DataHub Operation / DataHub Dataset Profile | (none) | Uses DataHub metadata only, no Databricks access needed. |
In addition, the service principal used for assertion evaluation needs USE CATALOG and USE SCHEMA on the catalog and schema containing the target tables, and must be granted access to a SQL Warehouse (CAN_USE permission) to run statements.