metadata-ingestion/docs/sources/matillion-dpc/matillion-dpc_post.md
Optional configuration to map OpenLineage namespace URIs to DataHub platform information. Without this, the connector extracts platform type from URIs with default environment.
Fields:
platform_instance: Platform instance identifier (must match source ingestion)database / schema: Defaults for incomplete dataset names from OpenLineage
database.schema.tableschema.tableconvert_urns_to_lowercase: Normalize URNs to lowercase (use true for Snowflake)env: Environment tag (PROD, DEV, etc.)Fallback behavior: Unmapped namespaces extract platform type from the URI (e.g., postgresql://... → postgres) without platform instance assignment.
Enable parse_sql_for_lineage: true to parse SQL queries from OpenLineage events for additional column-level lineage.
Requirements:
Snowflake: Use convert_urns_to_lowercase: true in namespace mapping
BigQuery: 3-tier naming (project.dataset.table). Set database: project-id, schema: dataset-name
MySQL / 2-tier: 2-tier naming (schema.table). Set schema only
Postgres / Redshift: 3-tier naming (database.schema.table). Set both database and schema
The connector supports flexible regex-based filtering to control what metadata is ingested.
project_patterns:
allow: ["^prod-.*", "^staging-.*"]
deny: [".*-deprecated$"]
environment_patterns:
allow: ["^production$", "^staging$"]
deny: ["^sandbox.*"]
pipeline_patterns:
allow: [".*"]
deny: ["^test_.*", ".*_backup$"]
streaming_pipeline_patterns:
allow: ["^cdc_.*"]
deny: [".*_test$"]
All patterns are case-insensitive by default and support full regex syntax. Deny patterns take precedence over allow patterns.
The connector automatically detects and tracks when pipelines call other pipelines (via "Run Pipeline" components). This creates step-level dependency relationships in DataHub, showing:
No configuration needed — this feature is automatic when execution history is ingested.
The connector can discover pipelines from two sources:
/published-pipelines API)/pipeline-executions API)By default, both types are ingested. To only ingest published pipelines:
include_unpublished_pipelines: false
This is useful when:
Processing OpenLineage event messagesEnable parse_sql_for_lineage: true (requires DataHub graph connection).
start_time to query further back in time if neededstart_time (e.g., only last 7 days instead of 30)project_patterns to filter projectsenvironment_patterns to filter environmentspipeline_patterns to filter pipelinesstreaming_pipeline_patterns to filter streaming pipelinesinclude_streaming_pipelines if not neededapi_config.request_timeout_sec if needed