Powerbi Post - Datahub

Capabilities

Use the Important Capabilities table above as the source of truth for supported features and whether additional configuration is required.

User and Ownership Handling

PowerBI Source supports two modes for handling user ownership:

Soft References Mode (Recommended - Default)

When ownership.create_corp_user: false (default), PowerBI will:

Extract ownership information as URN references only
NOT create user entities in DataHub
User profiles must come from your identity provider (LDAP/SCIM/Okta)

This is the recommended approach as it prevents PowerBI from overwriting user profiles from your identity provider.

yaml

ownership:
  create_corp_user: false # Default - soft references only

Full User Creation Mode (Opt-in)

When ownership.create_corp_user: true, PowerBI will:

Create user entities with displayName and email from PowerBI
This may overwrite existing user profiles from LDAP/Okta/SCIM

Warning: Only use this if PowerBI is your authoritative source for user information.

yaml

ownership:
  create_corp_user: true # Opt-in - creates user entities

Filtering Owners by Access Rights

You can limit which users become owners using owner_criteria. Only users with at least one of the specified access rights will be assigned as owners:

yaml

ownership:
  owner_criteria:
    - ReadWriteReshareExplore
    - Owner
    - Admin

Valid values depend on the PowerBI access right types for your resources (e.g., dataset, report, dashboard). If owner_criteria is not set or is an empty list, all users with principalType: User qualify as owners.

Lineage

This source extracts table lineage for tables present in PowerBI Datasets. Lets consider a PowerBI Dataset SALES_REPORT and a PostgreSQL database is configured as data-source in SALES_REPORT dataset.

Consider SALES_REPORT PowerBI Dataset has a table SALES_ANALYSIS which is backed by SALES_ANALYSIS_VIEW of PostgreSQL Database then in this case SALES_ANALYSIS_VIEW will appear as upstream dataset for SALES_ANALYSIS table.

You can control table lineage ingestion using extract_lineage configuration parameter, by default it is set to true.

PowerBI Source extracts the lineage information by parsing PowerBI M-Query expressions and from dataset data returned by the PowerBI API.

The source will attempt to extract information from ODBC connection strings in M-Query expressions to determine the database type. If the database type matches a supported platform and the source is able to extract enough information to construct a valid Dataset URN, it will extract lineage for that data source.

PowerBI Source will extract lineage for the below listed PowerBI Data Sources:

Snowflake
Oracle
PostgreSQL
Microsoft SQL Server
Google BigQuery
Databricks
MySQL
Amazon Redshift
Amazon Athena

Native SQL query parsing is supported for Snowflake, Amazon Redshift, and ODBC data sources.

Athena Federated Query Platform Override

When using Amazon Athena via ODBC that queries federated data sources (e.g., Athena querying MySQL or PostgreSQL via federated connectors), the lineage URNs will default to the Athena platform. Use athena_table_platform_override to point lineage to the actual source platform instead of Athena.

Configuration:

yaml

source:
  type: powerbi
  config:
    # ... other config ...
    dsn_to_platform_name:
      MyAthenaDSN: athena
    athena_table_platform_override:
      # DSN-scoped key (takes precedence)
      "MyAthenaDSN:analytics.users": mysql
      # Global key (fallback for any DSN)
      "reporting.orders": postgres

Key format:

DSN-scoped: "DSN_NAME:database.table" - applies only to specific DSN
Global: "database.table" - applies to all DSNs

DSN-scoped keys take precedence over global keys, allowing different overrides for the same table name across different Athena data sources.

Note: This override only applies to Athena ODBC connections. For other ODBC platforms, lineage will use the platform determined from the DSN configuration.

For example, consider the SQL query shown below. The table OPERATIONS_ANALYTICS.TRANSFORMED_PROD.V_UNIT_TARGET will be ingested as an upstream table.

shell

let
  Source = Value.NativeQuery(
    Snowflake.Databases(
      "sdfsd788.ws-east-2.fakecomputing.com",
      "operations_analytics_prod",
      [Role = "OPERATIONS_ANALYTICS_MEMBER"]
    ){[Name = "OPERATIONS_ANALYTICS"]}[Data],
    "select #(lf)UPPER(REPLACE(AGENT_NAME,\'-\',\'\')) AS Agent,#(lf)TIER,#(lf)UPPER(MANAGER),#(lf)TEAM_TYPE,#(lf)DATE_TARGET,#(lf)MONTHID,#(lf)TARGET_TEAM,#(lf)SELLER_EMAIL,#(lf)concat((UPPER(REPLACE(AGENT_NAME,\'-\',\'\'))), MONTHID) as AGENT_KEY,#(lf)UNIT_TARGET AS SME_Quota,#(lf)AMV_TARGET AS Revenue_Quota,#(lf)SERVICE_QUOTA,#(lf)BL_TARGET,#(lf)SOFTWARE_QUOTA as Software_Quota#(lf)#(lf)from OPERATIONS_ANALYTICS.TRANSFORMED_PROD.V_UNIT_TARGETS#(lf)#(lf)where YEAR_TARGET >= 2020#(lf)and TEAM_TYPE = \'foo\'#(lf)and TARGET_TEAM = \'bar\'",
    null,
    [EnableFolding = true]
  ),
  #"Added Conditional Column" = Table.AddColumn(
    Source,
    "Has PS Software Quota?",
    each
      if [TIER] = "Expansion (Medium)" then
        "Yes"
      else if [TIER] = "Acquisition" then
        "Yes"
      else
        "No"
  )
in
  #"Added Conditional Column"

Use full-table-name in from clause. For example dev.public.category

M-Query Pattern Supported For Lineage Extraction

Lets consider a M-Query which combine two PostgreSQL tables. Such M-Query can be written as per below patterns.

Pattern-1

shell

let
Source = PostgreSQL.Database("localhost", "book_store"),
book_date = Source{[Schema="public",Item="book"]}[Data],
issue_history = Source{[Schema="public",Item="issue_history"]}[Data],
combine_result  = Table.Combine({book_date, issue_history})
in
combine_result

Pattern-2

shell

let
Source = PostgreSQL.Database("localhost", "book_store"),
combine_result  = Table.Combine({Source{[Schema="public",Item="book"]}[Data], Source{[Schema="public",Item="issue_history"]}[Data]})
in
combine_result

Pattern-2 is not supported for upstream table lineage extraction as it uses nested item-selector i.e. {Source{[Schema="public",Item="book"]}[Data], Source{[Schema="public",Item="issue_history"]}[Data]} as argument to M-QUery table function i.e. Table.Combine

Pattern-1 is supported as it first assigns the table from schema to variable and then variable is used in M-Query Table function i.e. Table.Combine

Extract endorsements to tags

By default, extracting endorsement information to tags is disabled. The feature may be useful if organization uses endorsements to identify content quality.

Please note that the default implementation overwrites tags for the ingested entities, if you need to preserve existing tags, consider using a transformer with semantics: PATCH tags instead of OVERWRITE.

Profiling

The profiling implementation is done through querying DAX query endpoint. Therefore, the principal needs to have permission to query the datasets to be profiled. Usually this means that the service principal should have Contributor role for the workspace to be ingested. Profiling is done with column-based queries to be able to handle wide datasets without timeouts.

Take into account that the profiling implementation executes a fairly big number of DAX queries, and for big datasets this is a significant load to the PowerBI system.

The profiling_pattern setting may be used to limit profiling actions to only a certain set of resources in PowerBI. Both allowed and deny rules are matched against the following pattern for every table in a PowerBI Dataset: workspace_name.dataset_name.table_name. Users may limit profiling with these settings at table level, dataset level or workspace level.

Limitations

Some metadata and lineage fields are only available through admin APIs or specific tenant settings.
Lineage quality depends on available model metadata and supported query/source patterns.
Large tenants with many workspaces can require longer extraction windows.

Troubleshooting

Authentication failures: verify tenant_id, client_id, and client_secret, and confirm the app has the required Power BI API permissions.
Missing workspaces/assets: check service principal access to target workspaces or enable the required admin API mode/settings.
Lineage gaps: confirm lineage-related config is enabled and that semantic models expose supported upstream source details.