Member-only story

ETL — High Quality Data Pipelines

Ryan Arjun
4 min readMar 21, 2023

--

As you are probably aware, data pipeline is a rather broad term. It’s a collection of tasks that transfer, alter, or provide data. A simple example is loading a.txt file into a table/file. It could also be as complex as real-time data aggregations with machine learning scoring and more.

The data tools ecosystem has grown significantly in my experience over the last 16 years. But, I am still in a situation in which the expense of data infrastructure and equipment is prohibitively high.

  • Projects are complicated and time-consuming.
  • Even resolving fundamental data quality concerns is quite difficult
  • It is extremely difficult to maintain confidence in data assets
  • In many circumstances, decision makers’ data literacy is inadequate.
  • Stakeholders expect miracles and other things to simply work
  • Very few individuals can describe the characteristics, computations, measurements, insights, and ramifications
  • Everyone is merely making empty promises and postponing the inevitable reality of confronting and resolving difficult problems.

When building a data pipeline to bring data from heterogeneous sources to your data landing areas, some very common thoughts come to mind, such as restart-ability…

--

--

Ryan Arjun
Ryan Arjun

Written by Ryan Arjun

BI Specialist || Azure || AWS || GCP — SQL|Python|PySpark — Talend, Alteryx, SSIS — PowerBI, Tableau, SSRS

No responses yet