Data Warehouses don’t support Primary Key constraint

Ryan Arjun
4 min readNov 16, 2023

Data is the business asset for every organisation which is audited and protected. To gain in their business, it becomes very urgent for every organization to make sure that good quality data is available to everyone who needs it. As data grows in size, it becomes increasingly important to understand the many types and storage methods available.

🔑 A Primary Key constraints is typically applied in OLTP systems. On the OLAP side, I notice that Snowflake, Databricks, and GCP BigQuery do not have a mandated PK.

🏆 On the basis of business decision needs, an ideal data warehouses must be fast, easy to use, available for various users and must have to provide a lot of database-like functionality to get the wholistic views of the business trends.🌟

The data warehouse is the foundation for storing analytical data. It has a long history extending back to the 1990s and is based on organised data storage through the use of relational databases. The data warehouse, which acts as a centralised repository, safely stores huge volumes of historical data, allowing organisations to acquire important insights through sophisticated analytics.

📢 Data Warehouses don’t support PK constraint because it would kill ingestion performance and even columnar format is not fast enough for checking primary key, you’d need an in-memory index like OLTP systems: which can be prohibitively expensive.

--

--

Ryan Arjun
Ryan Arjun

Written by Ryan Arjun

BI Specialist || Azure || AWS || GCP — SQL|Python|PySpark — Talend, Alteryx, SSIS — PowerBI, Tableau, SSRS