Back to Datahub

README

metadata-ingestion/docs/sources/excel/README.md

1.5.0.31.3 KB
Original Source

Overview

Excel is a storage and lakehouse platform. Learn more in the official Excel documentation.

The DataHub integration for Excel covers file/lakehouse metadata entities such as datasets, paths, and containers. Depending on module capabilities, it can also capture features such as lineage, usage, profiling, ownership, tags, and stateful deletion detection.

Concept Mapping

Excel EntityDataHub EntityDescription
Excel WorksheetDatasetEach worksheet becomes a dataset with URN pattern: urn:li:dataset:(urn:li:dataPlatform:excel,{path}/[{filename}]{sheet_name},PROD)
File/Directory StructureContainerDirectory hierarchy creates containers with obfuscated URNs for organizing datasets

:::info Excel workbook

The Excel workbook file itself does not become a separate DataHub entity - only the individual worksheets within it are ingested as datasets. :::