We know that with the right technology, we can do much better than just keep up and if we could also ensure flexible development and make it easier to protect our data, process to access, process and analyze data whenever it’s required. With the right tools and best practices, an organization can use all its data, making it accessible to more users and fueling better business decisions.

Image for post
Image for post

New technologies innovations can improve improve the modern cloud-based data lakes, data warehousing and analytics with regard to availability, simplicity, cost, and performance which should be meet current and future needs by ableing…


As a data engineer or python developer and you have to read the data from an excel data file having multiple sheets or tabs and save the data into other data sources like SQL Server database or SQLite then it can be easily possible in Python.

Image for post
Image for post
Read Multiple Excel Sheets or Tabs

There is no limitation of rows in csv or text file format but in case of excel file, there are only 1000000 rows allowed in per excel sheet or tab.

In the below Python code, we are using SQLite to store the data from an excel data file having multiple sheets or tabs. As…


A data lake can store all data, regardless of source, regardless of structure, and usually regardless of size also. An Ideal data lake also supports embrace these nontraditional data types which come from nontraditional data sources. These nontraditional data sources include items such as web server logs, sensor data, social network activity, text, and images. New use cases for these data types continue to be identified.

Image for post
Image for post
Data Reliability Challenges in Building with Data Lakes

In the data lake, since all data is stored in its raw form, access could be provided to someone who needs to analyze the data quickly. For data science, data lakes provide a convenient…


As we know that now a days businesses are going through unprecedented change and disruption and data is the most important business asset for these organisations which must be audited and protected because it plays a key role in helping organisations make better decisions to get the meaningful insights in their business. While data teams are generally growing, business demand for information and insights continues to surge much faster then Data engineering is the key to unlocking all other data-driven workflows.

Image for post
Image for post
Data Engineering Drivers in Building Reliable Data Lakes

To your business growth, you should always focus to increase your customers and always follow a good strategic choice…


Image for post
Image for post
Python API for Apache Spark

As we know that PySpark is a Python API for Apache Spark where as Apache Spark is an Analytical Processing Engine for large scale powerful distributed data processing and machine learning applications.


Image for post
Image for post
The Power And Performance Of Azure SQL

We know that Azure SQL Database is under the “Intelligent Cloud” business and also the part of the Azure SQL family which is fully committed for the intelligent, scalable, relational database service built for the cloud technologies.

With Azure SQL, we can get the followings:

  1. Rehost: Move on-premises workloads to the cloud
  2. Refactor: Modernise applications for the cloud
  3. Rearchitect: Build new cloud-native SQL database applications
  4. Rebuild: Take advantage of the innovative capabilities of the cloud

Perfect for intermittent usage: Azure SQL Database serverless is best for scenarios where usage is intermittent and unpredictable and we only pay for compute resources…


Image for post
Image for post

Working as Python developer, data analysts or data scientists for any organisation then it is very important for you to know how to play with Dataframes. We understand, we can add a column to a dataframe and update its values to the values returned from a function or other dataframe column’s values as given below -

# pandas library for data manipulation in python
import pandas as pd
# create a dataframe with number values
df = pd.DataFrame({'Num':[5,10,15,17,22,25,28,32,36,40,50,]})
#display values from dataframe
df


Image for post
Image for post

A data lake is conceptual data architecture which is not based on any specific technology. So, the technical implementation can vary technology to technology, which means different types of storage can be utilized, which translates into varying features.


We always try to keep normalization in our database and maintain table relationship for each record as possible. To maintain normalization, we always put our records in more than two tables by making relationship between them which are highly tide up mostly on primary and foreign key relationship.

Image for post
Image for post

Example: In an organization, based on the performance, some employees got the appraisal but some of them did not get any appraisal. Now, system needs to update the salary in employee master only for those employees who got the appraisal.

Note: We can only update 1 table at a time by using…


If you are working as a SQL Server developer then you will be responsible for the implementation, configuration, maintenance, and performance of critical SQL Server RDBMS systems and most of the times, you have to follow the agile methodology also.

One of the toughest job is adding columns your existing data table inside SQL Server database with some default values without any failures. There are some points which you should keep in mind the following points, in case you are adding a column in the existing table-

  1. If the column is added nullable, then null will be the value used…

Mukesh Singh

BI Specialist || Azure || AWS || GCP — SQL|Python|R|PySpark — Talend, Alteryx, SSIS — PowerBI, SSRS expert at The Smart Cube

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store