metadata-ingestion/docs/sources/fabric-onelake/fabric-onelake_pre.md
The fabric-onelake module ingests metadata from Fabric Onelake into DataHub. It is intended for production ingestion workflows and module-specific capabilities are documented below.
:::tip Quick Start
Workspace.Read.All and workspace accessfabric-onelake_recipe.yml as a templatedatahub ingest -c fabric-onelake_recipe.yml
:::queryinsights.exec_requests_historyAzure Authentication
Fabric Concepts
Before running ingestion, ensure network connectivity to the source, valid authentication credentials, and read permissions for metadata APIs required by this module.
The connector supports multiple Azure authentication methods:
| Method | Best For | Configuration |
|---|---|---|
| Service Principal | Production environments | authentication_method: service_principal |
| Managed Identity | Azure-hosted deployments (VMs, AKS, App Service) | authentication_method: managed_identity |
| Azure CLI | Local development | authentication_method: cli (run az login first) |
| DefaultAzureCredential | Flexible environments | authentication_method: default |
For service principal setup, see Register an application with Microsoft Entra ID.
The connector requires read-only access to Fabric workspaces and their contents. The authenticated identity (service principal, managed identity, or user) must have:
Workspace-Level Permissions:
API Permissions: The service principal or user must have the following Microsoft Entra API permissions:
Workspace.Read.All (delegated) - Required to list and read workspace metadataWorkspace.ReadWrite.All (delegated) - Provides read and write accessToken Audiences: The connector uses two different token audiences depending on the operation:
https://api.fabric.microsoft.com): Uses Power BI API scope (https://analysis.windows.net/powerbi/api/.default) for listing workspaces, lakehouses, warehouses, and basic table metadatahttps://onelake.table.fabric.microsoft.com): Uses Storage audience (https://storage.azure.com/.default) for accessing schemas and tables in schemas-enabled lakehousesThe connector automatically handles both token audiences. For schemas-enabled lakehouses, it will use OneLake Delta Table APIs with Storage audience tokens. For schemas-disabled lakehouses, it uses the standard Fabric REST API.
OneLake Data Access Permissions: For schemas-enabled lakehouses, you may also need OneLake data access permissions:
Note: The connector automatically detects whether a lakehouse has schemas enabled and uses the appropriate API endpoint and token audience. No additional configuration is required.
For detailed information on permissions, see:
For Service Principal:
Workspace.Read.AllFor Managed Identity:
Schema extraction via the SQL Analytics Endpoint requires ODBC drivers to be installed on the system.
First, install the ODBC driver manager (UnixODBC) on your system:
Ubuntu/Debian:
sudo apt-get update
sudo apt-get install -y unixodbc unixodbc-dev
RHEL/CentOS/Fedora:
# RHEL/CentOS 7/8
sudo yum install -y unixODBC unixODBC-devel
# Fedora / RHEL 9+
sudo dnf install -y unixODBC unixODBC-devel
macOS:
brew install unixodbc
Install the Microsoft ODBC Driver 18 for SQL Server (required for connecting to Fabric SQL Analytics Endpoint):
Ubuntu 20.04/22.04:
curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
curl https://packages.microsoft.com/config/ubuntu/$(lsb_release -rs)/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18
RHEL/CentOS 7/8:
sudo curl -o /etc/yum.repos.d/mssql-release.repo https://packages.microsoft.com/config/rhel/$(rpm -E %{rhel})/mssql-release.repo
sudo ACCEPT_EULA=Y yum install -y msodbcsql18
RHEL 9 / Fedora:
sudo curl -o /etc/yum.repos.d/mssql-release.repo https://packages.microsoft.com/config/rhel/9/mssql-release.repo
sudo ACCEPT_EULA=Y dnf install -y msodbcsql18
macOS:
brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release
brew update
HOMEBREW_ACCEPT_EULA=Y brew install msodbcsql18 mssql-tools18
Windows: Download and install from Microsoft ODBC Driver for SQL Server.
After installation, verify that the ODBC driver is available:
odbcinst -q -d
You should see ODBC Driver 18 for SQL Server in the list.
Your Azure identity must have access to query the SQL Analytics Endpoint (same permissions as accessing the endpoint via SQL tools).
The fabric-onelake extra includes sqlalchemy and pyodbc dependencies. Install them with:
pip install 'acryl-datahub[fabric-onelake]'
Note: If you encounter libodbc.so.2: cannot open shared object file errors, ensure the ODBC driver manager is installed (step 1 above).
View extraction reuses the SQL Analytics Endpoint connection from SQL Analytics Endpoint Setup — the same ODBC driver applies, and views are skipped unless sql_endpoint.enabled is true.
Reading view definitions (needed for view-to-table lineage) requires VIEW DEFINITION permission on the SQL Analytics Endpoint. The workspace Viewer role used for table ingestion is not sufficient — it grants db_datareader only, which causes INFORMATION_SCHEMA.VIEWS.VIEW_DEFINITION to return NULL. There is no workspace-level toggle for this permission; you must choose one of:
Grant VIEW DEFINITION per Lakehouse/Warehouse (recommended for least privilege — keeps the identity at workspace Viewer):
GRANT VIEW DEFINITION ON DATABASE::<lakehouse_or_warehouse_name> TO [<service_principal_name>];
Assign a higher workspace role (Contributor, Member, or Admin) on the workspaces you ingest.
If neither is acceptable for your environment, set extract_views: false to skip view ingestion. Views will still appear without definitions if you ingest them at Viewer level, but lineage will be missing.
References: Fabric Warehouse roles and permissions, Lakehouse workspace roles.
Usage extraction reads queryinsights.exec_requests_history over the SQL Analytics Endpoint. It reuses the ODBC setup from SQL Analytics Endpoint Setup, so sql_endpoint.enabled must be true — the configuration validator rejects usage.include_usage_statistics=true otherwise.
Required role. Visibility into queryinsights is scoped per workspace. The ingesting identity (service principal, managed identity, or user) needs Contributor or higher on each workspace whose Lakehouses/Warehouses you want usage for. The workspace Viewer role used for table ingestion is not sufficient: per Microsoft's docs, queryinsights requires "contributor or higher permissions" on a Premium-capacity workspace, and full query text — needed for SQL parsing and column-level usage — is only exposed to Admin, Member, and Contributor roles.
Retention and latency. Fabric retains queryinsights for 30 days only — older history cannot be backfilled, so configure usage.start_time accordingly. Newly executed queries can take up to 15 minutes to appear, increasing under concurrency. System queries and queries from outside a user's context are not surfaced.