docs/assertions/snowflake/snowflake_dmfs.md
The DataHub Open Assertion Compiler allows you to define your Data Quality assertions in a simple YAML format, and then compile them to be executed by Snowflake Data Metric Functions. Once compiled, you'll be able to register the compiled DMFs in your Snowflake environment, and extract their results them as part of your normal ingestion process for DataHub. Results of Snowflake DMF assertions will be reported as normal Assertion Results, viewable on a historical timeline in the context of the table with which they are associated.
datahub init.Permissions required for registering DMFs
According to the latest Snowflake docs, here are the permissions the service account performing the DMF registration and ingestion must have:
| Privilege | Object | Notes |
|---|---|---|
| USAGE | Database, schema | Database and schema where snowflake DMFs will be created. This is configured in compile command described below. |
| CREATE FUNCTION | Schema | This privilege enables creating new DMF in schema configured in compile command. |
| EXECUTE DATA METRIC FUNCTION | Account | This privilege enables you to control which roles have access to server-agnostic compute resources to call the system DMF. |
| USAGE | Database, schema | These objects are the database and schema that contain the referenced table in the query. |
| OWNERSHIP | Table | This privilege enables you to associate a DMF with a referenced table. |
| USAGE | DMF | This privilege enables calling the DMF in schema configured in compile command. |
and the roles that must be granted:
| Role | Notes |
|---|---|
| SNOWFLAKE.DATA_METRIC_USER | To use System DMFs |
Permissions required for running DMFs (scheduled DMFs run with table owner's role)
Because scheduled DMFs run with the role of the table owner, the table owner must have the following privileges:
| Privilege | Object | Notes |
|---|---|---|
| USAGE | Database, schema | Database and schema where snowflake DMFs will be created. This is configured in compile command described below. |
| USAGE | DMF | This privilege enables calling the DMF in schema configured in compile power. |
| EXECUTE DATA METRIC FUNCTION | Account | This privilege enables you to control which roles have access to server-agnostic compute resources to call the system DMF. |
and the roles that must be granted:
| Role | Notes |
|---|---|
| SNOWFLAKE.DATA_METRIC_USER | To use System DMFs |
Permissions required for querying DMF results
In addition, the service account that will be executing DataHub Ingestion, and querying the DMF results, must have been granted the following system application roles:
| Role | Notes |
|---|---|
| DATA_QUALITY_MONITORING_VIEWER | Query the DMF results table |
To learn more about Snowflake DMFs and the privileges required to provision and query them, see the Snowflake documentation.
Example: Granting Permissions
-- setup permissions to <assertion-registration-role> to create DMFs and associate DMFs with table
grant usage on database "<dmf-database>" to role "<assertion-service-role>"
grant usage on schema "<dmf-database>.<dmf-schema>" to role "<assertion-service-role>"
grant create function on schema "<dmf-database>.<dmf-schema>" to role "<assertion-service-role>"
-- grant ownership + rest of permissions to <assertion-service-role>
grant role "<table-owner-role>" to role "<assertion-service-role>"
-- setup permissions for <table-owner-role> to run DMFs on schedule
grant usage on database "<dmf-database>" to role "<table-owner-role>"
grant usage on schema "<dmf-database>.<dmf-schema>" to role "<table-owner-role>"
grant usage on all functions in "<dmf-database>.<dmf-schema>" to role "<table-owner-role>"
grant usage on future functions in "<dmf-database>.<dmf-schema>" to role "<table-owner-role>"
grant database role SNOWFLAKE.DATA_METRIC_USER to role "<table-owner-role>"
grant execute data metric function on account to role "<table-owner-role>"
-- setup permissions for <datahub-role> to query DMF results
grant application role SNOWFLAKE.DATA_QUALITY_MONITORING_VIEWER to role "<datahub_role>"
The following assertion types are currently supported by the DataHub Snowflake DMF Assertion Compiler:
Note that Schema Assertions are not currently supported.
The process for declaring and running assertions backend by Snowflake DMFs consists of a few steps, which will be outlined in the following sections.
See the section Declaring Assertions in YAML below for examples of how to define assertions in YAML.
Use the DataHub CLI to register your assertions with DataHub, so they become visible in the DataHub UI:
datahub assertions upsert -f examples/library/assertions_configuration.yml
Next, we'll use the assertions compile command to generate the SQL code for the Snowflake DMFs,
which can then be registered in Snowflake.
datahub assertions compile -f examples/library/assertions_configuration.yml -p snowflake -x DMF_SCHEMA=<db>.<schema-where-DMF-should-live>
Two files will be generated as output of running this command:
dmf_definitions.sql: This file contains the SQL code for the DMFs that will be registered in Snowflake.dmf_associations.sql: This file contains the SQL code for associating the DMFs with the target tables in Snowflake.By default in a folder called target. You can use config option -o <output_folder> in compile command to write these compile artifacts in another folder.
Each of these artifacts will be important for the next steps in the process.
dmf_definitions.sql
This file stores the SQL code for the DMFs that will be registered in Snowflake, generated from your YAML assertion definitions during the compile step.
-- Example dmf_definitions.sql
-- Start of Assertion 5c32eef47bd763fece7d21c7cbf6c659
CREATE or REPLACE DATA METRIC FUNCTION
test_db.datahub_dmfs.datahub__5c32eef47bd763fece7d21c7cbf6c659 (ARGT TABLE(col_date DATE))
RETURNS NUMBER
COMMENT = 'Created via DataHub for assertion urn:li:assertion:5c32eef47bd763fece7d21c7cbf6c659 of type volume'
AS
$$
select case when metric <= 1000 then 1 else 0 end from (select count(*) as metric from TEST_DB.PUBLIC.TEST_ASSERTIONS_ALL_TIMES )
$$;
-- End of Assertion 5c32eef47bd763fece7d21c7cbf6c659
....
dmf_associations.sql
This file stores the SQL code for associating with the target table, along with scheduling the generated DMFs to run on at particular times.
-- Example dmf_associations.sql
-- Start of Assertion 5c32eef47bd763fece7d21c7cbf6c659
ALTER TABLE TEST_DB.PUBLIC.TEST_ASSERTIONS_ALL_TIMES SET DATA_METRIC_SCHEDULE = 'TRIGGER_ON_CHANGES';
ALTER TABLE TEST_DB.PUBLIC.TEST_ASSERTIONS_ALL_TIMES ADD DATA METRIC FUNCTION test_db.datahub_dmfs.datahub__5c32eef47bd763fece7d21c7cbf6c659 ON (col_date);
-- End of Assertion 5c32eef47bd763fece7d21c7cbf6c659
....
Next, you'll need to run the generated SQL from the files output in Step 3 in Snowflake.
You can achieve this either by running the SQL files directly in the Snowflake UI, or by using the SnowSQL CLI tool:
snowsql -f dmf_definitions.sql
snowsql -f dmf_associations.sql
::: NOTE Scheduling Data Metric Function on table incurs Serverless Credit Usage in Snowflake. Refer Billing and Pricing for more details. Please ensure you DROP Data Metric Function created via dmf_associations.sql if the assertion is no longer in use. :::
Once you've registered the DMFs, they will be automatically executed, either when the target table is updated or on a fixed schedule.
To report the results of the generated Data Quality assertions back into DataHub, you'll need to run the DataHub ingestion process with a special configuration
flag: include_assertion_results: true:
# Your DataHub Snowflake Recipe
source:
type: snowflake
config:
# ...
include_assertion_results: True
# ...
During ingestion we will query for the latest DMF results stored in Snowflake, convert them into DataHub Assertion Results, and report the results back into DataHub during your ingestion process either via CLI or the UI visible as normal assertions.
datahub ingest -c snowflake.yml
In addition to DataHub-created DMFs, you can also ingest results from your own custom Snowflake Data Metric Functions. "External" here means DMFs that were created directly in Snowflake without using DataHub's assertion compiler - they exist outside of DataHub's management.
You might want to ingest external DMFs if:
To ingest external DMFs, add the include_externally_managed_dmfs flag to your Snowflake recipe:
source:
type: snowflake
config:
# ... connection config ...
# Enable assertion results ingestion (required)
include_assertion_results: true
# Enable external DMF ingestion (new)
include_externally_managed_dmfs: true
# Time window for assertion results
start_time: "-7 days"
Both flags must be enabled for external DMF ingestion to work.
External DMFs must return 1 for SUCCESS and 0 for FAILURE.
DataHub interprets the VALUE column from Snowflake's DATA_QUALITY_MONITORING_RESULTS table as:
VALUE = 1 → Assertion PASSEDVALUE = 0 → Assertion FAILEDThis is because DataHub cannot interpret arbitrary return values (e.g., "100 null rows" - is that good or bad?). You must build the pass/fail logic into your DMF.
::: warning What if my DMF returns other values? If your DMF returns values other than 0 or 1, DataHub will mark the assertion result as ERROR:
VALUE = 1 → PASSEDVALUE = 0 → FAILEDVALUE != 0 and VALUE != 1 (e.g., 5, 100, -1) → ERRORThe ERROR state indicates that the DMF is not configured correctly for DataHub ingestion. You can identify these cases by:
DMF 'my_dmf' returned invalid value 100. Expected 1 (pass) or 0 (fail). Marking as ERROR.WRONG - Returns raw count (DataHub can't interpret this):
CREATE DATA METRIC FUNCTION my_null_check(ARGT TABLE(col VARCHAR))
RETURNS NUMBER AS
$$
SELECT COUNT(*) FROM ARGT WHERE col IS NULL
$$;
-- Returns: 0, 5, 100, etc. - DataHub can't determine pass/fail!
CORRECT - Returns 1 (pass) or 0 (fail):
CREATE DATA METRIC FUNCTION my_null_check(ARGT TABLE(col VARCHAR))
RETURNS NUMBER AS
$$
SELECT CASE WHEN COUNT(*) = 0 THEN 1 ELSE 0 END
FROM ARGT WHERE col IS NULL
$$;
-- Returns: 1 if no nulls (pass), 0 if has nulls (fail)
CORRECT - With threshold:
CREATE DATA METRIC FUNCTION my_null_check_threshold(ARGT TABLE(col VARCHAR))
RETURNS NUMBER AS
$$
SELECT CASE WHEN COUNT(*) <= 10 THEN 1 ELSE 0 END
FROM ARGT WHERE col IS NULL
$$;
-- Returns: 1 if ≤10 nulls (pass), 0 if >10 nulls (fail)
| Aspect | DataHub-Created DMFs | External DMFs |
|---|---|---|
| Naming | Prefixed with datahub__ | Any name |
| Definition | Created via datahub assertions compile | Created manually in Snowflake |
| Assertion Type | Based on YAML definition (Freshness, Volume, etc.) | CUSTOM |
| Source | NATIVE (defined in DataHub) | EXTERNAL |
| URN Generation | Extracted from DMF name (datahub__<guid>) | Generated from Snowflake's REFERENCE_ID |
External DMFs appear in DataHub with:
snowflake_dmf_name: The DMF function namesnowflake_reference_id: Snowflake's unique identifier for the DMF-table bindingsnowflake_dmf_columns: Comma-separated list of columns the DMF operates onYou can view external DMF assertions in the Quality tab of the associated dataset in the DataHub UI. They will show pass/fail history alongside any DataHub-created assertions.
Coming soon!