Back to Datahub

Snowflake Agent Setup for DataHub

datahub-agent-context/src/datahub_agent_context/snowflake/README.md

1.5.0.37.2 KB
Original Source

Snowflake Agent Setup for DataHub

This module provides tools to generate and deploy Snowflake UDFs (User-Defined Functions) that enable Snowflake Cortex Intelligence to query DataHub metadata.

Overview

The module generates SQL scripts to create:

  1. Network rules and external access integrations for DataHub API calls
  2. Python UDFs for searching and retrieving metadata from DataHub
  3. Stored procedures for dynamic SQL execution
  4. Cortex Agent configuration with DataHub tools

Implementation

UDFs use the datahub-agent-context package wrapper methods that abstract away GraphQL/REST API complexity.

Usage

Basic Usage

bash
python -m datahub.ai.snowflake.snowflake \
  --sf-account YOUR_ACCOUNT \
  --sf-user YOUR_USER \
  --sf-role YOUR_ROLE \
  --sf-warehouse YOUR_WAREHOUSE \
  --sf-database YOUR_DATABASE \
  --sf-schema YOUR_SCHEMA \
  --datahub-url https://your-datahub.acryl.io \
  --datahub-token YOUR_TOKEN \
  --enable-mutations

Direct Execution

Add --execute and --sf-password to automatically run the generated scripts:

bash
python -m datahub.ai.snowflake.snowflake \
  --sf-account YOUR_ACCOUNT \
  --sf-user YOUR_USER \
  --sf-password YOUR_PASSWORD \
  --sf-role YOUR_ROLE \
  --sf-warehouse YOUR_WAREHOUSE \
  --sf-database YOUR_DATABASE \
  --sf-schema YOUR_SCHEMA \
  --datahub-url https://your-datahub.acryl.io \
  --datahub-token YOUR_TOKEN \
  --enable-mutations \
  --execute

Direct Execution with SSO

Use --sf-authenticator=externalbrowser for SSO authentication (no password required):

bash
python -m datahub.ai.snowflake.snowflake \
  --sf-account YOUR_ACCOUNT \
  --sf-user YOUR_USER \
  --sf-authenticator externalbrowser \
  --sf-role YOUR_ROLE \
  --sf-warehouse YOUR_WAREHOUSE \
  --sf-database YOUR_DATABASE \
  --sf-schema YOUR_SCHEMA \
  --datahub-url https://your-datahub.acryl.io \
  --datahub-token YOUR_TOKEN \
  --enable-mutations \
  --execute

This will open your browser for SSO authentication. Ideal for organizations using SAML, Okta, or other identity providers configured with Snowflake.

Options

OptionDescriptionDefault
--sf-accountSnowflake account identifierRequired
--sf-userSnowflake user nameRequired
--sf-roleSnowflake roleRequired
--sf-warehouseSnowflake warehouse nameRequired
--sf-databaseSnowflake database nameRequired
--sf-schemaSnowflake schema nameRequired
--datahub-urlDataHub instance URLRequired
--datahub-tokenDataHub Personal Access TokenRequired
--agent-nameAgent name in SnowflakeDATAHUB_SQL_AGENT
--agent-display-nameAgent display name in UIDataHub SQL Assistant
--agent-colorAgent color in UIblue
--output-dirOutput directory for SQL files./snowflake_setup
--enable-mutationsInclude mutation/write tools (tags, descriptions, owners, etc.)True (enabled)
--no-enable-mutationsDisable mutation tools (read-only mode with 9 UDFs instead of 20)N/A
--executeExecute scripts directlyFalse
--sf-passwordSnowflake password (required if --execute is used with snowflake authenticator)None
--sf-authenticatorAuthentication method: snowflake (password), externalbrowser (SSO), or oauth (token-based)snowflake

Authentication Methods

Three authentication methods are supported when using --execute:

1. Password Authentication (Default)

Standard username/password authentication:

bash
--execute --sf-password YOUR_PASSWORD

Browser-based SSO authentication (SAML, Okta, Azure AD, etc.):

bash
--execute --sf-authenticator externalbrowser

When using this method:

  • A browser window will automatically open for authentication
  • No password is required on the command line
  • Ideal for organizations with federated identity providers
  • Your Snowflake account must be configured for SSO

3. OAuth Authentication

Token-based OAuth authentication:

bash
--execute --sf-authenticator oauth

Note: OAuth authentication requires additional token configuration in your environment.

Read-Only vs Read+Write Modes

By default, the generator creates 20 UDFs including both read and write operations:

  • Read-only tools (9 UDFs): Search, retrieve metadata, lineage, queries, documents
  • Mutation tools (11 UDFs): Add/remove tags, update descriptions, manage owners, domains, glossary terms, and structured properties

Use --no-enable-mutations to generate only the 9 read-only UDFs for environments where metadata modifications should be restricted.

Generated Files

  1. 00_configuration.sql - Configuration variables and secrets
  2. 01_network_rules.sql - Network rules and external access integration
  3. 02_datahub_udfs.sql - DataHub API UDFs (9 read-only or 20 read+write depending on --enable-mutations)
  4. 03_stored_procedure.sql - Dynamic SQL execution procedure
  5. 04_cortex_agent.sql - Cortex Agent definition

Manual Execution

If not using --execute, run the generated SQL files in order:

sql
-- 1. Set up configuration and secrets
@00_configuration.sql;

-- 2. Create network rules
@01_network_rules.sql;

-- 3. Create DataHub UDFs
@02_datahub_udfs.sql;

-- 4. Create stored procedure
@03_stored_procedure.sql;

-- 5. Create Cortex Agent
@04_cortex_agent.sql;