Back to Datahub

Fabric Data Factory Post

metadata-ingestion/docs/sources/fabric-data-factory/fabric-data-factory_post.md

1.6.06.7 KB
Original Source

Capabilities

Use the Important Capabilities table above as the source of truth for supported features and whether additional configuration is required.

Lineage Extraction

Which Activities Produce Lineage?

The connector extracts dataset-level lineage from these Fabric activity types:

Activity TypeLineage Behavior
CopyCreates lineage from input dataset(s) to output dataset
InvokePipelineCreates pipeline-to-pipeline lineage to the child pipeline

Lineage is enabled by default (include_lineage: true).

How Lineage Resolution Works

For lineage to connect properly to datasets ingested from other sources (e.g., Snowflake, BigQuery), the connector resolves Fabric connections to DataHub platforms.

Step 1: Automatic Connection Mapping

The connector automatically maps Fabric connection types to DataHub platforms (e.g., a Snowflake connection maps to the snowflake platform). See FABRIC_CONNECTION_PLATFORM_MAP for the full list of supported mappings. Unsupported connection types fall back to using the connection type string as the platform name.

Step 2: Platform Instance Mapping (for cross-recipe lineage)

If you're ingesting the same data sources with other DataHub connectors (e.g., Snowflake, BigQuery), you need to ensure the platform_instance values match. Use platform_instance_map to map your Fabric connection names to the platform instance used in your other recipes:

yaml
# Fabric Data Factory Recipe
source:
  type: fabric-data-factory
  config:
    credential:
      authentication_method: service_principal
      client_id: ${AZURE_CLIENT_ID}
      client_secret: ${AZURE_CLIENT_SECRET}
      tenant_id: ${AZURE_TENANT_ID}
    platform_instance_map:
      # Key: Your Fabric connection name (exact match required)
      # Value: The platform_instance from your other source recipe
      "snowflake-prod-connection": "prod_warehouse"
      "bigquery-analytics": "analytics_project"
yaml
# Corresponding Snowflake Recipe (platform_instance must match)
source:
  type: snowflake
  config:
    platform_instance: "prod_warehouse" # Must match the value in platform_instance_map
    # ... other config

Without matching platform_instance values, lineage will create separate dataset entities instead of connecting to your existing ingested datasets.

Execution History

Pipeline and activity runs are extracted as DataProcessInstance entities by default:

yaml
source:
  type: fabric-data-factory
  config:
    include_execution_history: true # default
    execution_history_days: 7 # 1-90 days

This provides run status, duration, timestamps, invoke type, and activity-level details including error messages and retry attempts.

:::note The Fabric API returns at most 100 recently completed runs per pipeline. Run ingestion more frequently to capture deeper history. :::

Advanced: Multi-Tenant Setup

When to Use platform_instance

Use the connector's platform_instance config to distinguish separate Fabric tenants when ingesting from multiple environments:

ScenarioRiskSolution
Single tenantNoneNot needed
Multiple tenantsHigh - name collision riskRequired
yaml
# Multi-tenant example
source:
  type: fabric-data-factory
  config:
    platform_instance: "contoso-tenant" # Prevents URN collisions

:::warning Different Fabric tenants could have identically-named workspaces and pipelines. Use platform_instance to prevent entity overwrites. :::

URN Format

Pipeline URNs follow this format:

urn:li:dataFlow:(fabric-data-factory,{workspace_id}.{pipeline_id},{env})

With platform_instance:

urn:li:dataFlow:(fabric-data-factory,{platform_instance}.{workspace_id}.{pipeline_id},{env})

Limitations

  • Run history limit: The Fabric API returns at most 100 recently completed runs per pipeline. If execution_history_days covers more runs than this limit, only the most recent 100 are returned. Run ingestion more frequently to capture deeper history.
  • No Dataflow Gen2 support: Dataflow Gen2 items (standalone workspace-level items with transformation logic) are not extracted.
  • No CopyJob support: Standalone CopyJob items at the workspace level are not extracted. Only Copy activities embedded within pipelines produce lineage.
  • No trigger/schedule metadata: Pipeline triggers and schedules are not extracted.
  • ExecutePipeline not supported: The ExecutePipeline activity type is marked as legacy in Fabric and is not supported for cross-pipeline lineage.

Lineage

  • Lineage scope: Only Copy and InvokePipeline activities produce dataset or pipeline lineage. Other activity types (Lookup, Wait, ForEach, Script, etc.) are ingested as DataJobs without dataset-level lineage.
  • InvokePipeline Activity operation types: Only the InvokeFabricPipeline operation type is supported for cross-pipeline lineage. Other operation types (InvokeAdfPipeline, InvokeExternalPipeline) are not resolved and will be skipped.
  • Query-based Copy sources: When a Copy activity uses sqlReaderQuery or sqlReaderStoredProcedureName instead of a direct table reference, lineage is not extracted.
  • No column-level lineage: The connector extracts dataset-level lineage only. Column-to-column mappings from Copy activity translator configurations are not extracted.
  • No Notebook/SparkJobDefinition lineage: Notebook and SparkJobDefinition activities are ingested as DataJobs but their lineage is not resolved.
  • Connection resolution: Unmapped connection types fall back to using the connection type string as the platform name, which may not match your existing DataHub platform names. Use platform_instance_map to explicitly map connection names.

Troubleshooting

  • 401/403 errors: Ensure the service principal has the correct Fabric API permissions and is added as a workspace member.
  • Empty results: Check that workspace_pattern and pipeline_pattern are not filtering out all items.
  • Missing lineage: Verify that include_lineage: true is set and that Fabric connections are properly configured for the pipelines. Also review the Lineage limitations section for unsupported activity types and scenarios.
  • Stale entities: Enable stateful_ingestion to automatically remove entities that no longer exist in Fabric.