docs/quick-ingestion-guides/bigquery/setup.md
To configure ingestion from BigQuery, you'll need a Service Account configured with the proper permission sets and an associated Service Account Key.
This setup guide will walk you through the steps you'll need to take via your Google Cloud Console.
If you do not have an existing Service Account and Service Account Key, please work with your BigQuery Admin to ensure you have the appropriate permissions and/or roles to continue with this setup guide.
When creating and managing new Service Accounts and Service Account Keys, we have found the following permissions and roles to be required:
iam.serviceAccounts.create permissionserviceusage.services.enable permissionresourcemanager.projects.setIamPolicy permissionroles/iam.serviceAccountKeyAdmin) IAM role:::note Please refer to the BigQuery Permissions and IAM Roles references for details :::
To set up a new Service Account follow this guide
When you are creating a Service Account, assign the following predefined Roles:
:::note You can always add/remove roles to Service Accounts later on. Please refer to the BigQuery Manage access to projects, folders, and organizations guide for more details. :::
If you plan to use DataHub Cloud's Freshness, Volume, Column, or Custom SQL Assertions, the required permissions depend on which source type and assertion type you select.
| Source Type | Required Role(s) | Notes |
|---|---|---|
| Platform API | BigQuery Metadata Viewer (roles/bigquery.metadataViewer) | Free API call, no query costs. Subject to BigQuery API rate limits — stagger custom schedules across many assertions to avoid bursts. |
| Information Schema | BigQuery Metadata Viewer + BigQuery Data Viewer | Uses __TABLES__ system table. |
| Audit Log | logging.logEntries.list + logging.privateLogEntries.list | Via Logs View Accessor role. |
| Query / Last Modified Column / High Watermark Column | BigQuery Data Viewer | Runs SQL queries against the table. |
| DataHub Operation / DataHub Dataset Profile | (none) | Uses DataHub metadata only, no BigQuery access needed. |
| Source Type | Required Role(s) | Notes |
|---|---|---|
| All Rows Query / Changed Rows Query | BigQuery Data Viewer | Runs SQL queries against the table to evaluate column-level conditions. |
| DataHub Dataset Profile | (none) | Uses column metrics from DataHub profiling runs. Only available for certain metric types. |
| Required Role(s) | Notes |
|---|---|
| BigQuery Job User + BigQuery Data Viewer | Executes user-defined SQL queries. Ensure the Service Account can access all tables referenced in the query. |
To filter projects based on the project_labels configuration, first visit cloudresourcemanager.googleapis.com and enable the Cloud Resource Manager API
Create and download a Service Account Key. We will use this to set up authentication within DataHub.
The key file looks like this:
{
"type": "service_account",
"project_id": "project-id-1234567",
"private_key_id": "d0121d0000882411234e11166c6aaa23ed5d74e0",
"private_key": "-----BEGIN PRIVATE KEY-----\nMIIyourkey\n-----END PRIVATE KEY-----",
"client_email": "[email protected]",
"client_id": "113545814931671546333",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/test%suppproject-id-1234567.iam.gserviceaccount.com"
}
Once you've confirmed all of the above in BigQuery, it's time to move on to configure the actual ingestion source within the DataHub UI.