metadata-ingestion/docs/sources/bigquery/bigquery_post.md
Use the Important Capabilities table above as the source of truth for supported features and whether additional configuration is required.
DataHub's BigQuery connector supports two approaches for extracting lineage and usage statistics:
use_queries_v2: trueRecommended for most users - Uses BigQuery's Information Schema for efficient metadata extraction.
INFORMATION_SCHEMA.JOBS* tables)region_qualifiersbigquery.jobs.listAll permission on target projectsConfiguration:
source:
type: bigquery
config:
use_queries_v2: true # Default
include_queries: true # Enable query entities
include_query_usage_statistics: true # Query popularity stats
region_qualifiers: ["region-us", "region-eu"] # Multi-region support
The pushdown_deny_usernames and pushdown_allow_usernames options push user filtering directly to BigQuery's SQL query, reducing data transfer and improving performance for large query volumes.
When to Use:
Example Configuration:
source:
type: bigquery
config:
use_queries_v2: true # Required for pushdown
pushdown_deny_usernames:
- "bot_%"
- "%@%.iam.gserviceaccount.com" # Exclude service accounts
pushdown_allow_usernames:
- "analyst_%@example.com"
- "data_%@example.com"
Behavior:
LIKEusage.user_email_pattern for client-side filtering% = any characters, _ = single character)bot_% matches [email protected])Prerequisites:
use_queries_v2: true must be enabled (default)Note: These configs are independent from usage.user_email_pattern. The pushdown filters are applied at the SQL query level for performance, while user_email_pattern is applied client-side during processing.
use_queries_v2: falseUse when you need specific legacy features - Processes BigQuery audit logs for metadata extraction.
upstream_lineage_in_report debugging featureTwo data source options:
source:
type: bigquery
config:
use_queries_v2: false
use_exported_bigquery_audit_metadata: false # Default
logging.logEntries.list and logging.privateLogEntries.list permissionssource:
type: bigquery
config:
use_queries_v2: false
use_exported_bigquery_audit_metadata: true
bigquery_audit_metadata_datasets:
- "my-project.audit_dataset"
- "another-project.audit_logs"
cloudaudit_googleapis_com_data_accesstype.googleapis.com/google.cloud.audit.BigQueryAuditMetadata are supportedbigquery_audit_metadata_datasets parameter accepts datasets in $PROJECT.$DATASET format, allowing lineage computation from multiple projects.:::note Profiling Permission Requirement
When profiling is enabled, the bigquery.tables.getData permission is required. This is needed to access detailed table metadata including partition information. See the permissions section above for details.
:::
For performance reasons, we only profile the latest partition for partitioned tables and the latest shard for sharded tables.
You can set partition explicitly with partition.partition_datetime property if you want, though note that partition config will be applied to all partitioned tables.
Module behavior is constrained by source APIs, permissions, and metadata exposed by the platform. Refer to capability notes for unsupported or conditional features.
If ingestion fails, validate credentials, permissions, connectivity, and scope filters first. Then review ingestion logs for source-specific errors and adjust configuration accordingly.