metadata-ingestion/docs/sources/presto/presto_pre.md
The presto module ingests metadata from Presto into DataHub. It is intended for production ingestion workflows and module-specific capabilities are documented below.
:::info Presto vs. Presto-on-Hive
There are two different ways to ingest Presto metadata into DataHub, depending on your use case:
Option 1: Presto Connector (This Source)
Use when: You want to connect directly to Presto to extract metadata from all catalogs (not just Hive).
Capabilities:
Configuration:
source:
type: presto # ← This connector
config:
host_port: presto-coordinator.company.com:8080
username: datahub_user
password: ${PRESTO_PASSWORD}
Option 2: Hive Metastore Connector with Presto Mode
Use when: You want to ingest Presto views that use the Hive metastore and need storage lineage.
Capabilities:
Configuration:
source:
type: hive-metastore # ← Use this for storage lineage
config:
host_port: metastore-db.company.com:5432
database: metastore
scheme: "postgresql+psycopg2"
mode: presto # ← Set mode to 'presto'
# Enable storage lineage
emit_storage_lineage: true
hive_storage_lineage_direction: upstream
For complete details, see:
Network Access: Access to Presto coordinator on port 8080 (or 443 for HTTPS)
User Account: Presto user with permissions to query metadata
Dependencies: Install PyHive connectivity:
pip install 'acryl-datahub[presto]'
The Presto user account used by DataHub needs minimal permissions:
-- Presto uses catalog-level permissions
-- The user needs SELECT access to system information tables
-- This is typically granted by default to all users
Recommendation: Use a read-only service account with access to all catalogs you want to ingest.
The most common authentication method:
source:
type: presto
config:
host_port: presto.company.com:8080
username: datahub_user
password: ${PRESTO_PASSWORD}
database: hive # Optional: default catalog
For LDAP-based authentication:
source:
type: presto
config:
host_port: presto.company.com:8080
username: datahub_user
password: ${LDAP_PASSWORD}
database: hive
For secure connections:
source:
type: presto
config:
host_port: presto.company.com:443
username: datahub_user
password: ${PRESTO_PASSWORD}
database: hive
options:
connect_args:
protocol: https
For Kerberos-secured Presto clusters:
source:
type: presto
config:
host_port: presto.company.com:8080
database: hive
options:
connect_args:
auth: KERBEROS
kerberos_service_name: presto
Requirements:
kinit before running ingestion)