Augmented Data Lineage for Data Scientists and Beyond

Nazar Labunets
Ataccama
Published in
2 min readMay 28, 2020

--

Data lineage is a highly sought-after capability for modern data management and data governance teams. By now, it has become a critical feature of data catalogs and metadata management solutions, offering a wide range of benefits and applications. These include regulatory compliance, impact analysis, and a faster understanding of the enterprise data landscape.

Typically, data lineage is associated with technical roles, such as ETL developers and data engineers. However, when data lineage is enriched with business metadata, it can become a particularly useful and practical capability for business users and analytical roles, such as data scientists.

In this post, we’ll introduce the concept of augmented data lineage as a tool for business users. We will explore how business and analytical roles within enterprises can use it to find data and perform root cause analyses faster while avoiding corporate red tape.

What is Augmented Data Lineage?

Augmented data lineage is “regular” data lineage enriched with information from a data catalog: metadata such as real-time data quality, business terms & categories, and anomalies detected in data loads.

Enhanced with this information, data lineage can speed up the process of locating the right data or support analytical activities, such as root cause analysis or data quality analysis. The visual presentation of augmented data lineage alone makes a big difference in a user’s ability to draw conclusions, as opposed to just viewing a list of data sets on the catalog’s search results page.

Data lineage enhanced with business terms
Data lineage enhanced with business terms

This enriched data lineage can help answer many questions that are typically addressed with a data catalog search query or by consulting standard data lineage:

  • Is this the best data I can use for my data science project or analytic assignment?
  • Has this report been generated from valid and timely data?
  • Why does a metric in a report contain an unexpectedly large or small value?
  • Which data sets contain PII data, and in which systems do they originate?

Let’s examine how these questions can be answered by using augmented data lineage.

To learn about specific use cases of augmented data lineage for data scientists and data stewards, read the rest of this article at ataccama.com.

--

--

Nazar Labunets
Ataccama

Effective communication: images and words at Ataccama.