Ryan Arjun
1 min readMay 24, 2023

--

ETL is simply get data, do something to it, stick the data somewhere.

Extraction - get data from somewhere (the location is called the Source). The Source can be files, database tables, a pipe, etc.

Transformation - do something to the source data. E.g. your source contains person and birthdate, so you can "transform" the data to determine the person's age.

Load - Take the transformed data and put it somewhere, such as a different database or the same database. This is called the Target. Almost any type of data store that can be a Source can also be used as a Target.

Some examples:

Extract data from a data warehouse in DB2. Transform the data using a filter to only send along data that matches a certain date range. Load the filtered data into a SQL Server data mart.

Extract data from XML files. Transform the data so that the data is "flattened" into rows. Load the data into a comma delimited file.

Extract data that is coming in on a pipe or a queue, such as IBM MQ. Transform the data by parsing out key fields in the queued message. Load the parsed data into a mainframe.

Some tools:

Big bucks, Enterprise class: Ab Initio, Informatica, IBM's Datastage

Less pricey: Jitterbit, Talend

Somewhere in the middle: SSIS for SQL Server

Who uses them? Primarily companies that need to build and maintain complex data warehouses will invest in an ETL tool. ETL tools are often visual design tools that allow you to build an ETL program visually, rather than using only programmatic techniques.

--

--

Ryan Arjun
Ryan Arjun

Written by Ryan Arjun

BI Specialist || Azure || AWS || GCP — SQL|Python|PySpark — Talend, Alteryx, SSIS — PowerBI, Tableau, SSRS

No responses yet