metadata-ingestion/docs/sources/fabric-onelake/fabric-onelake_post.md
Use the Important Capabilities table above as the source of truth for supported features and whether additional configuration is required.
source:
type: fabric-onelake
config:
# Authentication (using service principal)
credential:
authentication_method: service_principal
client_id: ${AZURE_CLIENT_ID}
client_secret: ${AZURE_CLIENT_SECRET}
tenant_id: ${AZURE_TENANT_ID}
# Optional: Platform instance (use as tenant identifier)
# platform_instance: "contoso-tenant"
# Optional: Environment
# env: PROD
# Optional: Filter workspaces by name pattern
# workspace_pattern:
# allow:
# - "prod-.*"
# deny:
# - ".*-test"
# Optional: Filter lakehouses by name pattern
# lakehouse_pattern:
# allow:
# - ".*"
# deny: []
# Optional: Filter warehouses by name pattern
# warehouse_pattern:
# allow:
# - ".*"
# deny: []
# Optional: Filter tables by name pattern
# table_pattern:
# allow:
# - ".*"
# deny: []
sink:
type: datahub-rest
config:
server: "http://localhost:8080"
source:
type: fabric-onelake
config:
credential:
authentication_method: service_principal
client_id: ${AZURE_CLIENT_ID}
client_secret: ${AZURE_CLIENT_SECRET}
tenant_id: ${AZURE_TENANT_ID}
# Platform instance (represents tenant)
platform_instance: "contoso-tenant"
# Environment
env: PROD
# Filtering
workspace_pattern:
allow:
- "prod-.*"
- "shared-.*"
deny:
- ".*-test"
- ".*-dev"
lakehouse_pattern:
allow:
- ".*"
deny:
- ".*-backup"
warehouse_pattern:
allow:
- ".*"
deny: []
table_pattern:
allow:
- ".*"
deny:
- ".*_temp"
- ".*_backup"
view_pattern:
allow:
- ".*"
deny:
- ".*_internal"
# Feature flags
extract_lakehouses: true
extract_warehouses: true
extract_schemas: true # Set to false to skip schema containers
extract_views: true # Requires sql_endpoint.enabled
# API timeout (seconds)
api_timeout: 30
# Stateful ingestion (optional)
stateful_ingestion:
enabled: true
remove_stale_metadata: true
sink:
type: datahub-rest
config:
server: "http://localhost:8080"
source:
type: fabric-onelake
config:
credential:
authentication_method: managed_identity
# For user-assigned managed identity, specify client_id
# client_id: ${MANAGED_IDENTITY_CLIENT_ID}
platform_instance: "contoso-tenant"
env: PROD
sink:
type: datahub-rest
config:
server: "http://localhost:8080"
source:
type: fabric-onelake
config:
credential:
authentication_method: cli
# Run 'az login' first
platform_instance: "contoso-tenant"
env: DEV
sink:
type: datahub-rest
config:
server: "http://localhost:8080"
Schema extraction (column metadata) is supported via the SQL Analytics Endpoint. This feature extracts column names, data types, nullability, and ordinal positions from tables in both Lakehouses and Warehouses.
See SQL Analytics Endpoint Setup under Prerequisites for ODBC driver installation.
Schema extraction is enabled by default. You can configure it as follows:
source:
type: fabric-onelake
config:
credential:
authentication_method: service_principal
client_id: ${AZURE_CLIENT_ID}
client_secret: ${AZURE_CLIENT_SECRET}
tenant_id: ${AZURE_TENANT_ID}
# Schema extraction configuration
extract_schema:
enabled: true # Enable schema extraction (default: true)
method: sql_analytics_endpoint # Currently only this method is supported
# SQL Analytics Endpoint configuration
sql_endpoint:
enabled: true # Enable SQL endpoint connection (default: true)
# Optional: ODBC connection options
# odbc_driver: "ODBC Driver 18 for SQL Server" # Default: "ODBC Driver 18 for SQL Server"
# encrypt: "yes" # Enable encryption (default: "yes")
# trust_server_certificate: "no" # Trust server certificate (default: "no")
query_timeout: 30 # Timeout for SQL queries in seconds (default: 30)
<unique-identifier>.datawarehouse.fabric.microsoft.com and cannot be constructed from workspace_id alone.INFORMATION_SCHEMA.COLUMNS to extract column metadata (required for schema extraction)References:
<unique-identifier>.datawarehouse.fabric.microsoft.com and cannot be constructed from workspace_id alone. If the endpoint URL cannot be retrieved from the API, schema extraction will fail for that item.To disable schema extraction and ingest tables without column metadata:
source:
type: fabric-onelake
config:
extract_schema:
enabled: false
Views in Lakehouses and Warehouses are ingested as DataHub Dataset entities with the View subtype. Each view dataset includes:
INFORMATION_SCHEMA.COLUMNS alongside table columns).CREATE VIEW SQL), captured from INFORMATION_SCHEMA.VIEWS.See View Extraction under Prerequisites for required ODBC setup and the VIEW DEFINITION permission needed to read view definitions.
source:
type: fabric-onelake
config:
# View extraction is enabled by default. Set to false to skip views.
extract_views: true
# Filter views by name pattern. Format: 'schema.view' or just 'view' for default schema.
view_pattern:
allow:
- ".*"
deny:
- ".*_internal"
# View extraction requires the SQL Analytics Endpoint (enabled by default).
sql_endpoint:
enabled: true
INFORMATION_SCHEMA.VIEWS on the SQL Analytics Endpoint to list views and capture their definitions.view_pattern using the schema.view_name form.INFORMATION_SCHEMA.COLUMNS query that powers table schema extraction — no extra queries per view.The connector extracts query usage statistics from each Lakehouse and Warehouse by reading the queryinsights.exec_requests_history view on the SQL Analytics Endpoint. Each captured query is parsed by the SQL parsing aggregator and emitted as:
datasetUsageStatistics aspects — query counts, distinct user counts, top users, top fields, and (when enabled) top SQL queries, bucketed by the configured window.operation aspects — per-query operation events (insert, update, delete, etc.) when usage.include_operational_stats is enabled.See Query Usage Statistics under Prerequisites for the required workspace role (Contributor or higher) and ODBC setup.
source:
type: fabric-onelake
config:
# Usage extraction is enabled by default. Set to false to skip query usage.
usage:
include_usage_statistics: true
# When true, the SQL filter excludes rows where status != 'Succeeded'
# (canceled / failed queries are skipped at the source).
skip_failed_queries: true
# Optional: emit per-query operation aspects in addition to aggregated
# datasetUsageStatistics. Defaults to true (inherited from BaseUsageConfig).
include_operational_stats: true
# Optional: include top SQL queries in the usage payload.
include_top_n_queries: true
top_n_queries: 10
# Optional: window the connector queries from queryinsights. Defaults to
# the standard BaseUsageConfig "last bucket" window. Fabric retains
# queryinsights for 30 days.
bucket_duration: DAY
# start_time: "2026-04-01T00:00:00Z"
# end_time: "2026-05-01T00:00:00Z"
# Usage extraction depends on the SQL Analytics Endpoint.
extract_schema:
enabled: true
sql_endpoint:
enabled: true
All standard BaseUsageConfig fields (bucket_duration, start_time, end_time, top_n_queries, format_sql_queries, include_top_n_queries, include_operational_stats, user_email_pattern, etc.) are supported under the usage block.
When stateful ingestion is enabled, the usage time window is checkpointed only after a successful run, so a partial or failed run won't silently skip the next window.
The connector automatically handles both schemas-enabled and schemas-disabled lakehouses:
https://storage.azure.com/.default)./tables endpoint, which lists all tables. Tables without an explicit schema are automatically assigned to the dbo schema in DataHub. This uses Power BI API scope tokens.Important: All tables in DataHub will have a schema in their URN, even for schemas-disabled lakehouses. Tables without an explicit schema are normalized to use the dbo schema by default. This ensures consistent URN structure across all Fabric entities.
The connector automatically detects the lakehouse type and uses the appropriate API endpoint. No configuration changes are needed.
The connector supports stateful ingestion to track ingested entities and remove stale metadata. Enable it with:
stateful_ingestion:
enabled: true
remove_stale_metadata: true
When enabled, the connector will:
Module behavior is constrained by source APIs, permissions, and metadata exposed by the platform. Refer to capability notes for unsupported or conditional features.
sql_endpoint.enabled is false, or if the endpoint is unreachable for a given Lakehouse/Warehouse, views in that item will not be ingested.queryinsights retains query history for only 30 days. Older usage cannot be backfilled, regardless of the configured usage.start_time.queryinsights.exec_requests_history over the SQL Analytics Endpoint. If sql_endpoint.enabled is false, the configuration validator will reject usage.include_usage_statistics=true. If the endpoint is unreachable for a specific Lakehouse/Warehouse, usage for that item is skipped without failing the run.If ingestion fails, validate credentials, permissions, connectivity, and scope filters first. Then review ingestion logs for source-specific errors and adjust configuration accordingly.