Back to Datahub

README

metadata-ingestion/docs/sources/matillion-dpc/README.md

1.6.01.8 KB
Original Source

Overview

Matillion Data Productivity Cloud (DPC) is a cloud-native data integration platform for building, orchestrating, and monitoring data pipelines. Learn more in the official Matillion documentation.

The DataHub integration for Matillion DPC ingests pipelines, streaming pipelines, projects, and environments as DataHub entities. It captures table- and column-level lineage via the Matillion OpenLineage API, pipeline execution history as operational metadata, and child pipeline dependency relationships for end-to-end orchestration visibility.

Concept Mapping

Source ConceptDataHub ConceptNotes
ProjectContainerTop-level grouping of pipelines within a Matillion account.
EnvironmentContainerDeployment environment within a project (e.g. Production, Staging).
PipelineDataFlowAn orchestration pipeline that transforms or moves data.
Pipeline Component / StepDataJobAn individual step within a pipeline.
Streaming PipelineDataFlowA CDC or streaming pipeline, emitted with pipeline_type=streaming.
Pipeline ExecutionDataProcessInstanceA single run of a pipeline, including status and timing.
OpenLineage table referenceDatasetUpstream or downstream dataset referenced via OpenLineage events.
Table/column lineage edgeLineage edgeExtracted from OpenLineage events; column-level via SQL parsing.