Database Normalization

Ayush Kumar
4 min readMay 24, 2021

let's see what is database normalization? and why do we need it?

What is Database Normalization?

It is the process of structuring a database into normal forms to reduce data redundancy and data integrity.

The upper definition is full of strange words, let me simplify that —

So basically, normalization is the process where we divide the complex big database tables into small tables. does breaking of any big or complex table into any smaller tables will be the database normalization? the answer is No, the breaking of complex tables into just smaller tables won't help. the smaller tables must be according to normal forms i.e a few conditions or rules which we will see later in this blog. and the natural question arises why we are breaking into smaller tables? what is the need for it? why not we just keep everything in one place?

The need for database normalization?

let's discuss few scenarios -

  1. Insertion Anomaly

Let's say we have a database table to store mobile information on any e-commerce platform.

so consider a big table with lots of columns — mobileCompany, mobileModel, mobileSpecifications like dimensions/colors/weight, etc.. and in that only if we have a column to store userReview for that mobile and for that we need many more columns like customerID, customerLocation, customerReview, and other columns like avgCustomerRating, totalQuantitySold, etc.

So if we store all data in one place then there are few problems we might run into-

a) Case 1- let's say we launch a new mobile then all the users and sales-related columns will have surely have no data in it. then we have to store null value in it and there are gonna be lots of nulls value.

b) Case 2 — for existing mobile let say model X and it is sold and reviewed by around 10,000 users. then for storing the reviews of 10,000 users we also have to store all mobile specification-related columns. And that means10,000 records with the same value as mobileCompany, mobileModel, mobileSpecifications. This creates redundancy in data.

c) case 3- another very important situation let say we have a new User who just signup and hasn't purchased anything yet then where will we store the information? as there is no product information is there so a quick solution for that let's make the only user-related field like customerId the primary key. as a primary key can not be null.

Then what if we have a new product launched but haven’t gone for sales yet at like we usually see in case of a new Mobile launch. as it doesn't have any customers yet how we gonna save that record in our table because we made customerId as the primary key.

These are called Insertion anomalies.

2. Deletions Anomaly

So from the above example, let's consider few cases-

a) A new product is listed and 1 customer purchased it and also gave a positive product review and then for some reason he deleted his id so we will be deleting that customer data and then the product will which has an only entry in the database asscoiated with this customer will be deleted as well and won’t be available for sale as there is no record in the database. and that's a problem.

b) Consider a company that has launched a particular product Z and few new users in our platform have purchased it. And later after for some reason, it becomes a very controversial product and by the objection of few sections of society, the company wanted to delete that product then if we will delete that product we will lose all new customers who have only purchased the product Z and nothing else.

These are called Deletion anomalies.

3. Update Anamoly

Let's say we have one well-known seller on our platform who joined our platform 2–3 years back and that seller used to sell a variety of products and now for the business-related stuff he/she wanted to rebrand itself like XYZ to XYZ Global.

Now as he/she is with our platform for a long time and has sold goods around 100,000 times then just to update 1 thing XYX to XYZ Global we will be updating 100,000 records which definitely not an optimal thing to do.

Conclusion

This is the reason why storing all the records in one big complex table will be the not right thing to do. and that leads to problems like Insertion, deletion, update anomaly. so we must break the complex table into smaller tables into normal forms to get rid of these anomalies. and the process is called Normalization.

There are 5 types of Normalization forms-

  1. First Normal Form (1NF)
  2. Second Normal Form (2NF)
  3. Third Normal Form (3NF)
  4. Boyce-Codd Normal Form (BCNF)
  5. Fourth Normal Form (4NF)

These are the most commonly listed Normal Forms over the internet generally you will find till BCNF or 4NF. but in some advanced articles and references, you can find till 6 NF as well like in Wikipedia (https://en.wikipedia.org/wiki/Database_normalization)

And we will be dealing with all these Normalization techniques in the upcoming blog which will specifically about these normalization techniques.

Till then here are few good blogs which you can refer to-

--

--