metadata-ingestion/docs/sources/snowflake/snowflake_post.md
Use the Important Capabilities table above as the source of truth for supported features and whether additional configuration is required.
If you are using Snowflake Shares to share data across different Snowflake accounts, and you have set up DataHub recipes for ingesting metadata from all these accounts, you may end up having multiple similar dataset entities corresponding to virtual versions of the same table in different Snowflake accounts. The DataHub Snowflake connector can automatically link such tables together through Siblings and Lineage relationships if the user provides information necessary to establish the relationship using the shares configuration in the recipe.
account1 (ingested as platform_instance instance1) owns a database db1. A share X is created in account1 that includes database db1 along with schemas and tables inside it.X is shared with Snowflake account account2 (ingested as platform_instance instance2). A database db1_from_X is created from inbound share X in account2. In this case, all tables and views included in share X will also be present in instance2.db1_from_X.shares configuration section as
shares:
X: # name of the share
database: db1
platform_instance: instance1
consumers: # list of all databases created from share X
- database: db1_from_X
platform_instance: instance2
X is shared with more Snowflake accounts and a database is created from share X in those accounts, then additional entries need to be added to the consumers list for share X, one per Snowflake account. The same shares config can then be copied across recipes for all accounts.DataHub supports two strategies for extracting lineage and usage information from Snowflake:
use_queries_v2: true)The default and recommended approach uses an optimized query extraction method that:
include_queries)include_query_usage_statistics)pushdown_deny_usernames, pushdown_allow_usernames)push_down_database_pattern_access_history)query_dedup_strategy)use_queries_v2: false)The older approach that will be deprecated in future versions:
Both strategies access the same Snowflake system tables (account_usage.query_history, account_usage.access_history), but the new strategy provides significant performance improvements and additional functionality.
When ingesting metadata from large Snowflake environments, you can improve performance by pushing down pattern filters directly to Snowflake SQL queries using the push_down_metadata_patterns configuration option.
Note: This option applies only to metadata extraction queries (
information_schema.databases,schemata,tables,views). For pushing down filters on lineage/usage queries (account_usage.access_history), usepush_down_database_pattern_access_historyinstead. These two options are independent and target completely separate query paths.
source:
type: snowflake
config:
# Enable pattern pushdown for improved performance
push_down_metadata_patterns: true
# Your existing patterns - MUST follow Snowflake RLIKE syntax
database_pattern:
allow:
- "PROD_.*" # Matches databases starting with PROD_
- "ANALYTICS.*" # Matches databases starting with ANALYTICS
deny:
- ".*_TEMP$" # Excludes databases ending with _TEMP
table_pattern:
allow:
- ".*" # Allow all tables
deny:
- ".*_BACKUP$" # Exclude tables ending with _BACKUP
By default, Snowflake views are fetched using SHOW VIEWS, which does not support SQL-level filtering. When push_down_metadata_patterns: true, the view_pattern is pushed down only if fetch_views_from_information_schema: true is also set. Otherwise, view filtering falls back to Python's re.match(), even with pushdown enabled.
This means if you write patterns in Snowflake RLIKE syntax (e.g., PROD.* for prefix matching), they will still work correctly with Python filtering since PROD.* is valid in both. However, patterns that rely on RLIKE's full-string matching semantics (e.g., exact match PROD_DB without .*) will behave differently — Python re.match() treats it as a prefix match.
Recommendation: If you enable push_down_metadata_patterns, also enable fetch_views_from_information_schema: true to ensure consistent behavior for view patterns.
When push_down_metadata_patterns: true, patterns are executed in Snowflake using the RLIKE operator instead of Python's re.match(). These have different matching behaviors:
| Behavior | Python re.match() | Snowflake RLIKE | Fix for Snowflake |
|---|---|---|---|
| Prefix match | 'PROD' matches 'PROD_DB' | 'PROD' does NOT match 'PROD_DB' | Use PROD.* |
| Suffix match | '.*_BACKUP' matches 'TABLE_BACKUP_V2' | Does NOT match | Use .*_BACKUP$ |
| Start anchor | '^PROD' matches 'PROD_DB' | Does NOT match | Use PROD.* |
| Alternation | 'PROD|DEV' matches 'PROD_DB' | Does NOT match | Use (PROD|DEV).* |
| Case sensitivity | Case-insensitive by default | Case-sensitive by default | Handled automatically |
Key difference: Python re.match() anchors at the START only (prefix matching), while Snowflake RLIKE requires a FULL STRING match.
| Intent | Without Pushdown (Python) | With Pushdown (Snowflake RLIKE) |
|---|---|---|
Starts with PROD | PROD | PROD.* |
Ends with _BACKUP | .*_BACKUP | .*_BACKUP$ |
Contains TEMP | .*TEMP.* | .*TEMP.* (same) |
| Exact match | PROD_DB | PROD_DB (same) |
Match PROD or DEV prefix | PROD|DEV | (PROD|DEV).* |
Before enabling pushdown in production, test your patterns in Snowflake:
-- Test prefix matching
SELECT 'PROD_DB' RLIKE 'PROD'; -- FALSE (needs .*)
SELECT 'PROD_DB' RLIKE 'PROD.*'; -- TRUE
-- Test suffix matching
SELECT 'TABLE_BACKUP_V2' RLIKE '.*_BACKUP'; -- FALSE (needs $)
SELECT 'TABLE_BACKUP' RLIKE '.*_BACKUP$'; -- TRUE
-- Test FQN with escaped dots
SELECT 'PROD_DB.PUBLIC.TABLE' RLIKE 'PROD_DB\\.PUBLIC\\..*'; -- TRUE
DataHub supports ingestion of Snowflake Semantic Views, which are business-defined views that define metrics, dimensions, and relationships for consistent data modeling and AI-powered analytics.
Semantic view ingestion is disabled by default (requires Snowflake Enterprise Edition or above). You can enable it using the following configuration options:
# Enable semantic view ingestion (requires Enterprise Edition)
semantic_views:
enabled: true # Default: false
column_lineage: true # Default: false - enable column-level lineage
# Filter semantic views using regex patterns
semantic_view_pattern:
allow:
- "ANALYTICS_DB.PUBLIC.*"
- "SALES_DB.*"
deny:
- ".*_INTERNAL"
extract_tags is enabledREFERENCES or SELECT privileges on semantic views (they are treated as views in Snowflake's permission model)GET_DDL functionModule behavior is constrained by source APIs, permissions, and metadata exposed by the platform. Refer to capability notes for unsupported or conditional features.
monitor privilege for metadata extraction. Without this privilege, dynamic tables will not be visible to DataHub.REFERENCES or SELECT privileges for metadata extraction. Without these privileges, semantic views will not be visible to DataHub.If ingestion fails, validate credentials, permissions, connectivity, and scope filters first. Then review ingestion logs for source-specific errors and adjust configuration accordingly.