Back to Datahub

README

metadata-ingestion/docs/sources/airbyte/README.md

1.6.01.9 KB
Original Source

Overview

Airbyte is an open-source data integration platform that syncs data from sources to destinations through configurable connections. It supports hundreds of pre-built connectors and lets you build custom ones.

This integration extracts metadata from Airbyte to give DataHub visibility into your data pipelines — including connections, sources, destinations, streams, and job execution history. It captures lineage between source and destination datasets at both the table and column level.

Concept Mapping

Here's a table for Concept Mapping between Airbyte and DataHub to provide a clear overview of how entities and concepts in Airbyte are mapped to corresponding entities in DataHub:

Source ConceptDataHub ConceptNotes
WorkspaceDataFlowTop-level container for Airbyte resources
ConnectionDataFlowRepresents an Airbyte connection between source and destination
SourceDatasetSource datasets are mapped to DataHub datasets
DestinationDatasetDestination datasets are mapped to DataHub datasets
StreamDataJobEach stream is represented as a DataJob within the Connection DataFlow
Connection JobDataProcessInstanceExecution information for a connection run
Source SchemaSchemaMetadataSchema information from source datasets
Column MappingFineGrainedLineageColumn-level lineage between source and destination