metadata-ingestion/docs/sources/matillion-dpc/matillion-dpc_pre.md
The matillion-dpc module ingests metadata from Matillion Data Productivity Cloud (DPC) into DataHub. It extracts pipelines, streaming pipelines, projects, environments, execution history, and table and column-level lineage via the Matillion OpenLineage API.
The connector uses OAuth2 client credentials and automatically handles token generation and refresh.
For detailed instructions, see Matillion API Authentication.
The API credentials must have an Account Role with Read permissions to:
/v1/projects)/v1/environments)/v1/pipelines)/v1/schedules)/v1/lineage/events)/v1/pipeline-executions) - optional/v1/streaming-pipelines) - optionalIf using an account role other than Super Admin, grant project and environment-level roles as needed.
See Matillion RBAC documentation for details.
The connector automatically extracts:
/v1/lineage/events) (docs)/v1/pipeline-executions) emitted as DataProcessInstance entities (docs)Optional: Map OpenLineage namespace URIs to DataHub platform instances for lineage connections. If not configured, the connector extracts platform type from URIs (e.g., postgresql://... → postgres) with default environment (PROD).
When to use: Configure this when you need lineage to connect to existing datasets with platform instances.
Example namespaces: postgresql://host:5432, snowflake://account.snowflakecomputing.com, bigquery://project
namespace_to_platform_instance:
"postgresql://prod-db.us-east-1.rds.amazonaws.com:5432":
platform_instance: postgres_prod
env: PROD
database: analytics
schema: public
"snowflake://prod-account.snowflakecomputing.com":
platform_instance: snowflake_prod
env: PROD
convert_urns_to_lowercase: true
Platform instances must match those used when ingesting the source data platforms.