Back to Developer Roadmap

Data Quality

src/data/roadmaps/data-engineer/content/data-quality@cStrYgFZA2NuYq8TdWWP_.md

4.0617 B
Original Source

Data Quality

Ensuring quality involves validating the accuracy, completeness, consistency, and reliability of the data collected from each source. The fact that you do it from one source or multiple is almost irrelevant since the only extra task would be to homogenize the final schema of the data, ensuring deduplication and normalization.

This last part typically includes verifying the credibility of each data source, standardizing formats (like date/time or currency), performing schema alignment, and running profiling to detect anomalies, duplicates, or mismatches before integrating the data for analysis.