Back to Datahub

Fivetran Post

metadata-ingestion/docs/sources/fivetran/fivetran_post.md

1.5.0.39.4 KB
Original Source

Capabilities

Use the Important Capabilities table above as the source of truth for supported features and whether additional configuration is required.

Database and Schema Name Handling

The Fivetran source uses quoted identifiers for database and schema names to properly handle special characters and case-sensitive names. This follows Snowflake's quoted identifier convention, which is then transpiled to the target database dialect (Snowflake, BigQuery, or Databricks).

Important Notes:

  • Database names are automatically wrapped in double quotes (e.g., use database "my-database")
  • Schema names are automatically wrapped in double quotes (e.g., "my-schema".table_name)
  • This ensures proper handling of database and schema names containing:
    • Hyphens (e.g., my-database)
    • Spaces (e.g., my database)
    • Special characters (e.g., my.database)
    • Case-sensitive names (e.g., MyDatabase)

Migration Impact:

  • If you have database or schema names with special characters, they will now be properly quoted in SQL queries
  • This change ensures consistent behavior across all supported destination platforms
  • No configuration changes are required - the quoting is handled automatically

Case Sensitivity Considerations:

  • Important: In Snowflake, unquoted identifiers are automatically converted to uppercase when stored and resolved (e.g., mydatabase becomes MYDATABASE), while double-quoted identifiers preserve the exact case as entered (e.g., "mydatabase" stays as mydatabase). See Snowflake's identifier documentation for details.
  • Backward Compatibility: The system automatically handles backward compatibility for valid unquoted identifiers (identifiers containing only letters, numbers, and underscores). These identifiers are automatically uppercased before quoting to match Snowflake's behavior for unquoted identifiers. This means:
    • If your database/schema name is a valid unquoted identifier (e.g., fivetran_logs, MY_SCHEMA), it will be automatically uppercased to match existing Snowflake objects created without quotes
    • No configuration changes are required for standard identifiers (letters, numbers, underscores only)
  • Recommended: For best practices and to ensure consistency, maintain the exact case of your database and schema names in your configuration to match what's stored in Snowflake

Snowflake destination Configuration Guide

  1. If your fivetran platform connector destination is snowflake, you need to provide user details and its role with correct privileges in order to fetch metadata.
  2. Snowflake system admin can follow this guide to create a fivetran_datahub role, assign it the required privileges, and assign it to a user by executing the following Snowflake commands from a user with the ACCOUNTADMIN role or MANAGE GRANTS privilege.
sql
create or replace role fivetran_datahub;

// Grant access to a warehouse to run queries to view metadata
grant operate, usage on warehouse "<your-warehouse>" to role fivetran_datahub;

// Grant access to view database and schema in which your log and metadata tables exist
// Note: Database and schema names are automatically quoted, so use quoted identifiers if your names contain special characters
grant usage on DATABASE "<fivetran-log-database>" to role fivetran_datahub;
grant usage on SCHEMA "<fivetran-log-database>"."<fivetran-log-schema>" to role fivetran_datahub;

// Grant access to execute select query on schema in which your log and metadata tables exist
grant select on all tables in SCHEMA "<fivetran-log-database>"."<fivetran-log-schema>" to role fivetran_datahub;

// Grant the fivetran_datahub to the snowflake user.
grant role fivetran_datahub to user snowflake_user;

Bigquery destination Configuration Guide

  1. If your fivetran platform connector destination is bigquery, you need to setup a ServiceAccount as per BigQuery docs and select BigQuery Data Viewer and BigQuery Job User IAM roles.
  2. Create and Download a service account JSON keyfile and provide bigquery connection credential in bigquery destination config.

Databricks destination Configuration Guide

  1. Get your Databricks instance's workspace url
  2. Create a Databricks Service Principal
    1. You can skip this step and use your own account to get things running quickly, but we strongly recommend creating a dedicated service principal for production use.
  3. Generate a Databricks Personal Access token following the following guides:
    1. Service Principals
    2. Personal Access Tokens
  4. Provision your service account, to ingest your workspace's metadata and lineage, your service principal must have all of the following:
    1. One of: metastore admin role, ownership of, or USE CATALOG privilege on any catalogs you want to ingest
    2. One of: metastore admin role, ownership of, or USE SCHEMA privilege on any schemas you want to ingest
    3. Ownership of or SELECT privilege on any tables and views you want to ingest
    4. Ownership documentation
    5. Privileges documentation
  5. Check the starter recipe below and replace workspace_url and token with your information from the previous steps.

Working with Platform Instances

If you have multiple instances of source/destination systems that are referred in your fivetran setup, you'd need to configure platform instance for these systems in fivetran recipe to generate correct lineage edges. Refer the document Working with Platform Instances to understand more about this.

While configuring the platform instance for source system you need to provide connector id as key and for destination system provide destination id as key. When creating the connection details in the fivetran UI make a note of the destination Group ID of the service account, as that will need to be used in the destination_to_platform_instance configuration. I.e:

<p align="center"> </p>

In this case the configuration would be something like:

yaml
destination_to_platform_instance:
  greyish_positive: <--- this comes from bigquery destination - see screenshot
    database: <big query project ID>
    env: PROD
Example - Multiple Postgres Source Connectors each reading from different postgres instance
yml
# Map of connector source to platform instance
sources_to_platform_instance:
  postgres_connector_id1:
    platform_instance: cloud_postgres_instance
    env: PROD

  postgres_connector_id2:
    platform_instance: local_postgres_instance
    env: DEV
Example - Multiple Snowflake Destinations each writing to different snowflake instance
yml
# Map of destination to platform instance
destination_to_platform_instance:
  snowflake_destination_id1:
    platform_instance: prod_snowflake_instance
    env: PROD

  snowflake_destination_id2:
    platform_instance: dev_snowflake_instance
    env: PROD

Limitations

Module behavior is constrained by source APIs, permissions, and metadata exposed by the platform. Refer to capability notes for unsupported or conditional features.

Supported Destinations

Works only for:

  • Snowflake destination
  • Bigquery destination
  • Databricks destination

Ingestion Limits

To prevent excessive data ingestion, the following configurable limits apply per connector:

  • Sync History: Maximum of 500 sync runs per connector (default: 500, configurable via fivetran_log_config.max_jobs_per_connector)
  • Table Lineage: Maximum of 120 table lineage entries per connector (default: 120, configurable via fivetran_log_config.max_table_lineage_per_connector)
  • Column Lineage: Maximum of 1000 column lineage entries per connector (default: 1000, configurable via fivetran_log_config.max_column_lineage_per_connector)

When these limits are exceeded, only the most recent entries are ingested. Warnings will be logged during ingestion to notify you when truncation occurs.

These limits act as safety nets to prevent excessive data ingestion. You can increase them cautiously if you need to ingest more historical data or have connectors with many tables/columns. Example configuration:

yaml
source:
  type: fivetran
  config:
    fivetran_log_config:
      # ... other config ...
      max_jobs_per_connector: 1000 # Increase sync history limit
      max_table_lineage_per_connector: 500 # Increase table lineage limit
      max_column_lineage_per_connector: 5000 # Increase column lineage limit

Troubleshooting

If ingestion fails, validate credentials, permissions, connectivity, and scope filters first. Then review ingestion logs for source-specific errors and adjust configuration accordingly.