Back to Datahub

README

metadata-ingestion/docs/sources/glue/README.md

1.5.0.31.6 KB
Original Source

Overview

Glue is a data platform used to store and query analytical or operational data. Learn more in the official Glue documentation.

The DataHub integration for Glue covers core metadata entities such as datasets/tables/views, schema fields, and containers. Depending on module capabilities, it can also capture features such as lineage, usage, profiling, ownership, tags, and stateful deletion detection.

:::tip If you also have files in S3 that you'd like to ingest, we recommend you use Glue's built-in data catalog. See here for a quick guide on how to set up a crawler on Glue and ingest the outputs with DataHub. :::

Concept Mapping

Source ConceptDataHub ConceptNotes
"glue"Data Platform
Glue DatabaseContainerSubtype Database
Glue TableDatasetSubtype Table
Glue JobData Flow
Glue Job TransformData Job
Glue Job Data sourceDataset
Glue Job Data sinkDataset