Back to Datahub

README

metadata-ingestion/docs/sources/dlt/README.md

1.6.01.8 KB
Original Source

Overview

dlt (data load tool) is an open-source Python ELT library for building data pipelines that load data from REST APIs, databases, and other sources into destinations like Postgres, BigQuery, Snowflake, and DuckDB.

The DataHub integration for dlt reads pipeline metadata from dlt's local state directory (~/.dlt/pipelines/) and emits DataFlow, DataJob, and lineage entities to DataHub. The connector also supports per-run history (DataProcessInstance) when the dlt package is installed and destination credentials are available, plus stateful deletion detection.

Concept Mapping

dltDataHub
Pipeline (pipeline_name)DataFlow
Resource / destination TableDataJob
Destination tableDataset (DataJob output)
User-configured upstreamDataset (DataJob input)
_dlt_loads rowDataProcessInstance

Destination tables are mapped to Dataset URNs that match the destination platform's own DataHub connector (Postgres, BigQuery, etc.), enabling lineage stitching when both connectors run.