Member-only story

ETL — High Quality Data Pipelines

4 min readMar 21, 2023

As you are probably aware, data pipeline is a rather broad term. It’s a collection of tasks that transfer, alter, or provide data. A simple example is loading a.txt file into a table/file. It could also be as complex as real-time data aggregations with machine learning scoring and more.

The data tools ecosystem has grown significantly in my experience over the last 16 years. But, I am still in a situation in which the expense of data infrastructure and equipment is prohibitively high.

Projects are complicated and time-consuming.
Even resolving fundamental data quality concerns is quite difficult
It is extremely difficult to maintain confidence in data assets
In many circumstances, decision makers’ data literacy is inadequate.
Stakeholders expect miracles and other things to simply work
Very few individuals can describe the characteristics, computations, measurements, insights, and ramifications
Everyone is merely making empty promises and postponing the inevitable reality of confronting and resolving difficult problems.

When building a data pipeline to bring data from heterogeneous sources to your data landing areas, some very common thoughts come to mind, such as restart-ability…

ETL — High Quality Data Pipelines

Written by Ryan Arjun

No responses yet