Back to Datahub

README

metadata-ingestion/docs/sources/redshift/README.md

1.6.03.0 KB
Original Source

Overview

Redshift is a data platform used to store and query analytical or operational data. Learn more in the official Redshift documentation.

The DataHub integration for Redshift covers core metadata entities such as datasets/tables/views, schema fields, and containers. It also captures table- and column-level lineage, usage statistics, data profiling, ownership, and stateful deletion detection.

Concept Mapping

Redshift ConceptDataHub Entity (Subtype)Notes
Cluster / AccountPlatform InstanceTop-level scope; all URNs include the configured platform instance.
DatabaseContainer (DATABASE)Top-level namespace. Shared databases (datashares) are also ingested.
SchemaContainer (SCHEMA)Nested under its Database container. External schemas (Glue, Hive, PostgreSQL) include external platform metadata.
TableDataset (TABLE)Regular, foreign, and external tables (Redshift Spectrum). Custom properties include dist_style and table_type.
ViewDataset (VIEW)View definition is captured.
Materialized ViewDataset (VIEW)materialized=true in ViewProperties.
Late Binding ViewDataset (VIEW)Columns extracted from pg_get_late_binding_view_cols().
External Table (Spectrum)Dataset (TABLE)Includes location, input/output format, and SerDe parameters.
Column / fieldSchemaFielddist_key and sort_key columns are tagged accordingly.
User (schema / table owner)CorpUserExtracted when extract_ownership is enabled.
Table / column lineageLineage edgesFrom STL_SCAN, view dependencies, COPY/UNLOAD commands, ALTER TABLE RENAME, and datashare references.
Query operations and usageDatasetUsageStatistics, OperationFrom STL_SCAN (provisioned) or SYS_QUERY_DETAIL (serverless).