Python — How to read billions of records (big data file)

Ryan Arjun
3 min readAug 19, 2020

Data is certainly the new diamond in the rough for every organization, and digital technologies have accelerated the establishment of new business models and sources of revenue.

Python can assist you if you are a data engineer or a Python developer who has to read data from a raw data file with billions of entries and save it to another data source such as a SQL Server database or SQLite.

Introduction of the Python Library

Python is one of the most popular programming languages. Python seems to have a tool for everything, whether it’s data manipulation with Pandas, visualization using Seaborn, or deep machine learning with TensorFlow.

Pandas is a Python library that provides quick, versatile, and expressive data structures that make it simple and intuitive to work with structured (tabular, multidimensional, possibly heterogeneous) and time series data. It aspires to be the basic high-level building block for conducting realistic, real-world data analysis in Python.

SQLAlchemy is a powerful tool for developers working with databases in Python, providing a seamless and intuitive interface to work with relational databases efficiently and effectively.

SQLite3 is a relational database management system…

--

--

Ryan Arjun
Ryan Arjun

Written by Ryan Arjun

BI Specialist || Azure || AWS || GCP — SQL|Python|PySpark — Talend, Alteryx, SSIS — PowerBI, Tableau, SSRS

Responses (1)