Back to Datahub

README

metadata-ingestion/docs/sources/postgres/README.md

1.6.02.7 KB
Original Source

Overview

Postgres is a data platform used to store and query analytical or operational data. Learn more in the official Postgres documentation.

The DataHub integration for Postgres covers core metadata entities such as datasets/tables/views, schema fields, and containers. It also captures table- and column-level lineage, data profiling, and stateful deletion detection.

Concept Mapping

PostgreSQL ConceptDataHub Entity (Subtype)Notes
DatabaseContainer (DATABASE)Top-level namespace. Multiple databases can be ingested in one run.
SchemaContainer (SCHEMA)Nested under its Database container.
TableDataset (TABLE)Includes regular, foreign, temporary, and unlogged tables.
View / Materialized ViewDataset (VIEW)View definition is captured. No subtype distinction between standard and materialized views.
Stored ProcedureDataJob (STORED PROCEDURE)Grouped into a DataFlow (PROCEDURES CONTAINER) per schema. Extracted via pg_proc.
Column / fieldSchemaFieldColumn type, nullability, and comments (as descriptions) are extracted.
Query executor (pg_stat_statements)CorpUserUser attribution for lineage and usage statistics. Requires pg_stat_statements extension.
View / query lineageLineage edgesView-based lineage via pg_depend; query-based lineage via pg_stat_statements (requires PostgreSQL 13+ and pg_read_all_stats role).
Query operations and usageDatasetUsageStatistics, OperationFrom pg_stat_statements. Requires include_query_lineage and include_usage_statistics to be enabled.