How is differ Iceberg from BigQuery, Redshift, Postgres and SQL Server?

Ryan Arjun
3 min readNov 29, 2024

In modern big data environments, iceberg modeling is a conceptual and technical framework used to manage, query, and process large datasets efficiently, where only a subset of the data is actively visible or queried, and the rest remains hidden or abstracted for performance and manageability. This approach is particularly useful for handling massive, evolving datasets, enabling scalability, efficient querying, and streamlined data management.

Key Aspects of Iceberg Modeling in Big Data:

✈️ Table Abstraction (Apache Iceberg Framework)

Apache Iceberg is an open-source table format for large-scale datasets designed to work in distributed data lakes.

Visible Tip: Provides a logical abstraction of tables that users interact with through SQL or other query languages.

Hidden Part: The underlying physical data storage, partitioning, versioning, and metadata management, which are abstracted away from users but handled by the Iceberg framework.

Features include:

— Schema evolution without rewriting data.

— Partitioning optimization without requiring a specific query pattern.

— Support for ACID transactions in a distributed environment.

✈️Query Optimization

Visible Tip: Users execute queries on what appears to be a unified, flat table.

--

--

Ryan Arjun
Ryan Arjun

Written by Ryan Arjun

BI Specialist || Azure || AWS || GCP — SQL|Python|PySpark — Talend, Alteryx, SSIS — PowerBI, Tableau, SSRS

No responses yet