metadata-ingestion/docs/sources/fabric-onelake/fabric-onelake_post.md
Use the Important Capabilities table above as the source of truth for supported features and whether additional configuration is required.
source:
type: fabric-onelake
config:
# Authentication (using service principal)
credential:
authentication_method: service_principal
client_id: ${AZURE_CLIENT_ID}
client_secret: ${AZURE_CLIENT_SECRET}
tenant_id: ${AZURE_TENANT_ID}
# Optional: Platform instance (use as tenant identifier)
# platform_instance: "contoso-tenant"
# Optional: Environment
# env: PROD
# Optional: Filter workspaces by name pattern
# workspace_pattern:
# allow:
# - "prod-.*"
# deny:
# - ".*-test"
# Optional: Filter lakehouses by name pattern
# lakehouse_pattern:
# allow:
# - ".*"
# deny: []
# Optional: Filter warehouses by name pattern
# warehouse_pattern:
# allow:
# - ".*"
# deny: []
# Optional: Filter tables by name pattern
# table_pattern:
# allow:
# - ".*"
# deny: []
sink:
type: datahub-rest
config:
server: "http://localhost:8080"
source:
type: fabric-onelake
config:
credential:
authentication_method: service_principal
client_id: ${AZURE_CLIENT_ID}
client_secret: ${AZURE_CLIENT_SECRET}
tenant_id: ${AZURE_TENANT_ID}
# Platform instance (represents tenant)
platform_instance: "contoso-tenant"
# Environment
env: PROD
# Filtering
workspace_pattern:
allow:
- "prod-.*"
- "shared-.*"
deny:
- ".*-test"
- ".*-dev"
lakehouse_pattern:
allow:
- ".*"
deny:
- ".*-backup"
warehouse_pattern:
allow:
- ".*"
deny: []
table_pattern:
allow:
- ".*"
deny:
- ".*_temp"
- ".*_backup"
# Feature flags
extract_lakehouses: true
extract_warehouses: true
extract_schemas: true # Set to false to skip schema containers
# API timeout (seconds)
api_timeout: 30
# Stateful ingestion (optional)
stateful_ingestion:
enabled: true
remove_stale_metadata: true
sink:
type: datahub-rest
config:
server: "http://localhost:8080"
source:
type: fabric-onelake
config:
credential:
authentication_method: managed_identity
# For user-assigned managed identity, specify client_id
# client_id: ${MANAGED_IDENTITY_CLIENT_ID}
platform_instance: "contoso-tenant"
env: PROD
sink:
type: datahub-rest
config:
server: "http://localhost:8080"
source:
type: fabric-onelake
config:
credential:
authentication_method: cli
# Run 'az login' first
platform_instance: "contoso-tenant"
env: DEV
sink:
type: datahub-rest
config:
server: "http://localhost:8080"
Schema extraction (column metadata) is supported via the SQL Analytics Endpoint. This feature extracts column names, data types, nullability, and ordinal positions from tables in both Lakehouses and Warehouses.
Schema extraction via SQL Analytics Endpoint requires ODBC drivers to be installed on the system.
First, install the ODBC driver manager (UnixODBC) on your system:
Ubuntu/Debian:
sudo apt-get update
sudo apt-get install -y unixodbc unixodbc-dev
RHEL/CentOS/Fedora:
# RHEL/CentOS 7/8
sudo yum install -y unixODBC unixODBC-devel
# Fedora / RHEL 9+
sudo dnf install -y unixODBC unixODBC-devel
macOS:
brew install unixodbc
Install the Microsoft ODBC Driver 18 for SQL Server (required for connecting to Fabric SQL Analytics Endpoint):
Ubuntu 20.04/22.04:
curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
curl https://packages.microsoft.com/config/ubuntu/$(lsb_release -rs)/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18
RHEL/CentOS 7/8:
sudo curl -o /etc/yum.repos.d/mssql-release.repo https://packages.microsoft.com/config/rhel/$(rpm -E %{rhel})/mssql-release.repo
sudo ACCEPT_EULA=Y yum install -y msodbcsql18
RHEL 9 / Fedora:
sudo curl -o /etc/yum.repos.d/mssql-release.repo https://packages.microsoft.com/config/rhel/9/mssql-release.repo
sudo ACCEPT_EULA=Y dnf install -y msodbcsql18
macOS:
brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release
brew update
HOMEBREW_ACCEPT_EULA=Y brew install msodbcsql18 mssql-tools18
Windows: Download and install from Microsoft ODBC Driver for SQL Server.
After installation, verify that the ODBC driver is available:
odbcinst -q -d
You should see ODBC Driver 18 for SQL Server in the list.
Your Azure identity must have access to query the SQL Analytics Endpoint (same permissions as accessing the endpoint via SQL tools).
The fabric-onelake extra includes sqlalchemy and pyodbc dependencies. Install them with:
pip install 'acryl-datahub[fabric-onelake]'
Note: If you encounter libodbc.so.2: cannot open shared object file errors, ensure the ODBC driver manager is installed (step 1 above).
Schema extraction is enabled by default. You can configure it as follows:
source:
type: fabric-onelake
config:
credential:
authentication_method: service_principal
client_id: ${AZURE_CLIENT_ID}
client_secret: ${AZURE_CLIENT_SECRET}
tenant_id: ${AZURE_TENANT_ID}
# Schema extraction configuration
extract_schema:
enabled: true # Enable schema extraction (default: true)
method: sql_analytics_endpoint # Currently only this method is supported
# SQL Analytics Endpoint configuration
sql_endpoint:
enabled: true # Enable SQL endpoint connection (default: true)
# Optional: ODBC connection options
# odbc_driver: "ODBC Driver 18 for SQL Server" # Default: "ODBC Driver 18 for SQL Server"
# encrypt: "yes" # Enable encryption (default: "yes")
# trust_server_certificate: "no" # Trust server certificate (default: "no")
query_timeout: 30 # Timeout for SQL queries in seconds (default: 30)
<unique-identifier>.datawarehouse.fabric.microsoft.com and cannot be constructed from workspace_id alone.INFORMATION_SCHEMA.COLUMNS to extract column metadata (required for schema extraction)References:
<unique-identifier>.datawarehouse.fabric.microsoft.com and cannot be constructed from workspace_id alone. If the endpoint URL cannot be retrieved from the API, schema extraction will fail for that item.To disable schema extraction and ingest tables without column metadata:
source:
type: fabric-onelake
config:
extract_schema:
enabled: false
The connector automatically handles both schemas-enabled and schemas-disabled lakehouses:
https://storage.azure.com/.default)./tables endpoint, which lists all tables. Tables without an explicit schema are automatically assigned to the dbo schema in DataHub. This uses Power BI API scope tokens.Important: All tables in DataHub will have a schema in their URN, even for schemas-disabled lakehouses. Tables without an explicit schema are normalized to use the dbo schema by default. This ensures consistent URN structure across all Fabric entities.
The connector automatically detects the lakehouse type and uses the appropriate API endpoint. No configuration changes are needed.
The connector supports stateful ingestion to track ingested entities and remove stale metadata. Enable it with:
stateful_ingestion:
enabled: true
remove_stale_metadata: true
When enabled, the connector will:
Module behavior is constrained by source APIs, permissions, and metadata exposed by the platform. Refer to capability notes for unsupported or conditional features.
If ingestion fails, validate credentials, permissions, connectivity, and scope filters first. Then review ingestion logs for source-specific errors and adjust configuration accordingly.