Back to Datahub

Unity Catalog Post

metadata-ingestion/docs/sources/databricks/unity-catalog_post.md

1.5.0.32.0 KB
Original Source

Capabilities

Use the Important Capabilities table above as the source of truth for supported features and whether additional configuration is required.

Advanced

Multiple Databricks Workspaces

If you have multiple databricks workspaces that point to the same Unity Catalog metastore, our suggestion is to use separate recipes for ingesting the workspace-specific Hive Metastore catalog and Unity Catalog metastore's information schema.

To ingest Hive metastore information schema

  • Setup one ingestion recipe per workspace
  • Use platform instance equivalent to workspace name
  • Ingest only hive_metastore catalog in the recipe using config catalogs: ["hive_metastore"]

To ingest Unity Catalog information schema

  • Disable hive metastore catalog ingestion in the recipe using config include_hive_metastore: False
  • Ideally, just ingest from one workspace
  • To ingest from both workspaces (e.g. if each workspace has different permissions and therefore restricted view of the UC metastore):
    • Use same platform instance for all workspaces using same UC metastore
    • Ingest usage from only one workspace (you lose usage from other workspace)
    • Use filters to only ingest each catalog once, but shouldn’t be necessary

Limitations

Module behavior is constrained by source APIs, permissions, and metadata exposed by the platform. Refer to capability notes for unsupported or conditional features.

Troubleshooting

No data lineage captured or missing lineage

Check that you meet the Unity Catalog lineage requirements.

Also check the Unity Catalog limitations to make sure that lineage would be expected to exist in this case.

Lineage extraction is too slow

Unity Catalog REST API requires one call per table (table lineage) and one call per column (column lineage). To improve performance, disable column lineage with include_column_lineage: false.