Back to Developer Roadmap

Data Lineage

src/data/roadmaps/data-engineer/content/[email protected]

4.0795 B
Original Source

Data Lineage

Data Lineage refers to the life-cycle of data, including its origins, movements, characteristics and quality. It's a critical component in Data Engineering for tracking the journey of data through every process in a pipeline, from raw input to model output. Data lineage helps in maintaining transparency, ensuring compliance, and facilitating data debugging or tracing data related bugs. It provides a clear representation of data sources, transformations, and dependencies thereby aiding in audits, governance, or reproduction of machine learning models.

Visit the following resources to learn more: