What is Normalization in DBMS?
In a database, a huge amount of data gets stored in multiple tables. There can be the possibility of redundancy being present in the data. So Normalization in DBMS can be defined as the process which eliminates the redundancy from the data and ensures data integrity. Also, the normalization of data helps in removing the insert, update and delete anomalies.
How Does Normalization work in DBMS?
The normalization in the DBMS can be defined as a technique to design the schema of a database and this is done by modifying the existing schema which also reduces the redundancy and dependency of the data. So with Normalization, the unwanted duplication in data is removed along with the anomalies. In insert anomaly, the values such as null are not allowed to be inserted for a column.
In update anomaly, the data cannot be updated correctly because the same values occur multiple times in a column and in delete anomaly the deletion of a record creates inconsistency as it gets deleted from more than one row. So the aim of normalization is to remove redundant data as well as storing only related data in the table. This decreases the database size and the data gets logically stored in the database.
Types of Normalization in DBMS
The normal forms which are used most commonly in DBMS are as below:
- First Normal Form (1F)
- Second Normal Form (2F)
- Third Normal Form (3F)
- Boyce-Codd Normal Form (BCNF)
1. First Normal Form
The table or relation is said to be in First Normal Form if it does not contain any multi-valued or composite attributes. So the table or relation should contain only single-valued attributes for fulfilling the condition for First Normal Form.
Let us take the example of the STUDENT table as below:
Roll | Name | Subject |
19 | Rajesh | Math, Science |
23 | Supriya | History, English |
32 | Zack | Geography |
The above table is not in First Normal Form as this contains the multi-valued attribute. The below table is transformed into the First Normal Form as it contains only atomic values.
Roll | Name | Subject |
19 | Rajesh | Math |
19 | Rajesh | Science |
23 | Supriya | History |
23 | Supriya | English |
32 | Zack | Geography |
2. Second Normal Form
A relation or table to be in Second Normal Form should be in First Normal Form and it should not hold any partial dependency. So in Second Normal Form, the table should not contain any non-prime attribute depending upon the proper subset of any candidate key.
Let us consider the STUDENT table as cited previously as below:
Roll | Name | Subject |
19 | Rajesh | Math |
19 | Rajesh | Science |
23 | Supriya | History |
23 | Supriya | English |
32 | Zack | Geography |
The above table needs to be broken into two tables as below to make it Second Normal Form compliant.
STUDENT
Roll | Name |
19 | Rajesh |
23 | Supriya |
32 | Zack |
SUBJECT_DETAIL
Roll | Subject |
19 | Math |
19 | Science |
23 | History |
23 | English |
32 | Geography |
The functional dependency from the table ‘STUDENT’ is removed and the column Subject in ‘SUBJECT_DETAIL’ is fully dependent on the primary key ‘Roll’.
3. Third Normal Form
A table is in Third Normal Form if it is in Second Normal Form and there should not be any transitive dependency for the non-prime attributes. So for every non-trivial functional dependency A->B, if any of the two conditions is true from the below, the relation is said to be in Third Normal Form.
- A is a super key.
- B is a prime attribute where each element of B is part of any candidate key.
Let us consider the table ‘EMPLOYEE’ as below:
EMP_ID | EMP_NAME | EMP_DEPT | EMP_STATE | EMP_ COUNTRY |
289 | Mike | Sales | Florida | U.S. |
378 | Sameer | Finance | Maharashtra | India |
989 | Nicki | Marketing | Texas | U.S. |
The candidate key in the above table is EMP_ID and the functional dependency set is EMP_ID->EMP_NAME, EMP_ID->EMP_DEPT, EMP_ID->EMP_STATE, EMP_STATE -> EMP_COUNTRY. The EMP_COUNTRY is transitively dependent upon EMP_STATE. So we need to break the above table to two tables as below for transforming it to the Third Normal Form.
EMPLOYEE:
EMP_ID | EMP_NAME | EMP_DEPT | EMP_STATE |
289 | Mike | Sales | Florida |
378 | Sameer | Finance | Maharashtra |
989 | Nicki | Marketing | Texas |
STATE_COUNTRY:
EMP_STATE | EMP_ COUNTRY |
Florida | U.S. |
Maharashtra | India |
Texas | U.S. |
The EMP_STATE becomes the primary key in the above table and the transitive dependency is removed.
4. Boyce-Codd Normal Form
For a table to be in Boyce-Codd Normal Form, it should be in Third Normal Form and for every functional dependency A->B, A is the super key in the table.
EMP_DEPT table:
ID | COUNTRY | DEPARTMENT | DEPT_TYPE | DEPT_NO |
9890 | India | Marketing | M098 | 045 |
11090 | US | Finance | F0567 | 023 |
12390 | India | Sales | S1002 | 012 |
The functional dependency for the above table is: ID -> COUNTRY, DEPARTMENT -> {DEPT_TYPE, DEPT_NO}. {ID, DEPARTMENT} is the candidate key. To transform the above table to BCNF, we have to break it into three tables as below:
EMP_COUNTRY:
ID | COUNTRY |
9890 | India |
11090 | US |
12390 | India |
DEPT_DETAILS:
DEPARTMENT | DEPT_TYPE | DEPT_NO |
Marketing | M098 | 045 |
Finance | F0567 | 023 |
Sales | S1002 | 012 |
EMP_DEPARTMENT_MAP:
ID | DEPARTMENT |
9890 | Marketing |
11090 | Finance |
12390 | Sales |
The functional dependency for the above is ID -> EMP_COUNTRY, DEPARTMENT-> {DEPT_TYPE, DEPT_NO}. The candidate keys for the tables EMP_COUNTRY, DEPT_DETAILS and EMP_DEPARTMENT_MAP are ID, DEPARTMENT and {ID, DEPARTMENT}.
Advantages
Below are the advantages of Normalization:
- Redundant data gets removed efficiently.
- Improved data quality and flexibility in database designing.
- The improved overall organization of data in the database.
- Data is consistent and logically stored in the database.
Conclusion
Normalization plays a vital role in designing the database. It ensures data integrity and the reduction of unwanted data. With the advantages to offer, Normalization also comes with certain drawbacks which should be kept in the notice. A fully normalized data may present difficulties in understanding the complex business logic which in turn will increase the time to develop and implement. So the designer should have a keen understanding of normalization to use it effectively.
Recommended Articles
This is a guide to Normalization in DBMS. Here we discuss the basic concept, how does normalization works in DBMS? along with the types and advantages. You may also have a look at the following articles to learn more –