metadata-ingestion/docs/sources/fabric-data-factory/fabric-data-factory_pre.md
The fabric-data-factory module ingests metadata from Microsoft Fabric Data Factory into DataHub. It extracts workspaces, data pipelines, activities, and execution history, and resolves lineage from Copy activities to external datasets.
:::tip Quick Start
fabric-data-factory_recipe.yml as a templatedatahub ingest -c fabric-data-factory_recipe.yml:::
platform_instance_map for connecting to externally ingested datasetsAzure Authentication
Fabric Data Factory Concepts
The connector requires Contributor role on each workspace. Contributor is needed to fetch pipeline definitions without it. With Reader role only, the connector will list workspaces and pipelines but will not extract pipeline activities, activity run details, or lineage.
If using delegated auth (e.g., Azure CLI), the signed-in user's existing Fabric permissions apply directly. The connector requires the following delegated scopes:
Workspace.Read.All or Workspace.ReadWrite.All — for listing workspaces and itemsItem.ReadWrite.All or DataPipeline.ReadWrite.All — for Get Item Definition, List Item Connections, and Query Activity Runs (Item.Read.All is not sufficient for definitions and connections)Item.Read.All or DataPipeline.Read.All — sufficient for List Item Job Instances (execution history)The Azure CLI token includes the necessary Fabric API scopes by default.
Service principals and managed identities do not inherit any permissions by default. You need to:
:::warning For service principal and managed identity authentication, a Fabric administrator must enable API access for service principals in the Fabric admin portal. Without this, API calls will fail with 401 errors even if workspace permissions are correctly assigned. :::
As of mid-2025, Microsoft split the original single tenant setting into two separate settings. Configure them as follows:
:::tip If you are on an older tenant where the legacy single setting Service principals can use Fabric APIs is still visible, enable that instead. It will be automatically migrated to the two new settings. :::
:::tip Tenant setting changes can take up to 15 minutes to propagate. If you receive 401 errors immediately after enabling, wait and retry. :::
For detailed instructions, see Developer admin settings and Identity support for Fabric REST APIs.
The connector supports four authentication methods via the shared credential config block. All methods use Azure's TokenCredential interface.
Register an application in Microsoft Entra ID and note the client_id, client_secret, and tenant_id. Then:
credential:
authentication_method: service_principal
client_id: ${AZURE_CLIENT_ID}
client_secret: ${AZURE_CLIENT_SECRET}
tenant_id: ${AZURE_TENANT_ID}
All three fields are required when using this method.
Use this when running DataHub ingestion on an Azure VM, AKS, App Service, or other Azure compute that supports managed identities. The managed identity must be added as a workspace Contributor in Fabric. A Fabric admin must also enable the tenant settings described in Fabric Admin Settings above — these settings govern API access for both service principals and managed identities, despite the setting name referencing only service principals.
# System-assigned managed identity (no additional config needed)
credential:
authentication_method: managed_identity
For user-assigned managed identity, provide the client ID:
credential:
authentication_method: managed_identity
managed_identity_client_id: "<your-managed-identity-client-id>"
Uses the credentials from your local az login session. The signed-in user's existing Fabric permissions apply directly — no additional setup needed beyond workspace access.
credential:
authentication_method: cli
Run az login before starting ingestion. For remote servers without a browser, use az login.
Uses Azure's DefaultAzureCredential chain, which tries multiple credential sources in order: environment variables, workload identity, managed identity, shared token cache, Azure CLI, Azure PowerShell, Azure Developer CLI, and more.
credential:
authentication_method: default
You can exclude specific credential sources from the chain to speed up detection or avoid unintended auth in mixed environments:
credential:
authentication_method: default
exclude_cli_credential: true # Skip Azure CLI (recommended in production)
exclude_environment_credential: false
exclude_managed_identity_credential: false
credential block.az login (or az login --use-device-code on remote servers).