Understanding Database Normalization

Intro

Database Normalization is a concept & design technique we follow to efficiently organize data and design proper schema, to avoid redundant & data duplication in multiple tables. It also sets up relations between the tables, to ensure data dependency and room for future additions.

If tables are not properly normalized, then may eat up extra memory space and resources, while causing insertions, updates and deletions anomalies and complex queries.

Case

Suppose we have an LMS System for the college and no normalization design was followed while making it, then we may have many difficulties, to name a few.

student_id	student_name	email	program_name	program_abbr
710002	Suraj Peth	suraj@eample.com	Bachelor of science	B.Sc.
710003	Shion Y	saion@example.com	Bachelor of science,	B.Sc.,B.Ph.
710004	Cycil L	cycil@example.com	Bachelor of science	B.Sc
710005	Deo Bobby	Deo@example.com	Bachelor of Physics	B.Ph.
710006	Albert Doo	albert@example.com	Bachelor of Commerce	B.Com.

In the above design of the student table, we have a few issues

Firstly, row number 2 of student_id 710003 has grouped values for an attribute(column) of program_name.

Secondly, we have added program-related attributes like program_name, and program_abbr which were unrelated to the student entity(table). We could have created a separate program table or entity and used its primary key to reference it here in the student's table and the rest program related attributes can be stored in the program table.

Thirdly, there is the repetition of data like program_name values, as many students have to opt for programs from a selected list, we can have avoided this repetition through separate program tables and maintaining the reference of keys.

While Inserting new students' data, every time the data will be repeated for program_name and abbr. And if someone had earlier made a mistake in program_name during its creation, then we will have to update each row of the said program_name in the current design.

Normalizations are guidelines and a way, for database schema & structure to be laid down.

Normally, the Third Normal Form is enough, however, you can extend it as you deem fit.

First Normal Form(1NF)

First Normal Form sets below a few guidelines for designing database table schema;

Define only the required and related attributes(columns) in the table.
Should have single-valued attributes/columns.
Attributes(columns) should be unique in names and the order doesn't matter.
Should also contain a column denoting as primary keys

Taking forward the previous table we used as an example,

id	student_id	student_name	email	program_name	program_abbr
1	710002	Suraj Peth	suraj@eample.com	Bachelor of science	B.Sc.
2	710003	Shion Y	aion@example.com	Bachelor of science,	B.Sc.,B.Ph.
3	710004	Cycil L	cycil@example.com	Bachelor of science	B.Sc
4	710005	Deo Bobby	Deo@example.com	Bachelor of Physics	B.Ph.
5	710006	Albert Doo	albert@example.com	Bachelor of Commerce	B.Com.

Let's re-design the structure fulfilling all the guidelines.

Looking at the above table, we can see that a student with student_id 710003 has applied for 2 programs and it is inserted in comma-separated format. As we should have single-valued attributes as per the guidelines, let's restructure it and insert the second program as a new row.

The table now looks like,

id	student_id	student_name	email	program_name	program_abbr
1	710002	Suraj Peth	suraj@eample.com	Bachelor of science	B.Sc.
2	710003	Shion Y	aion@example.com	Bachelor of science	B.Sc.
3	710003	Shion Y	aion@example.com	Bachelor of Physics	B.Ph.
4	710004	Cycil L	cycil@example.com	Bachelor of science	B.Sc
5	710005	Deo Bobby	Deo@example.com	Bachelor of Physics	B.Ph.
6	710006	Albert Doo	albert@example.com	Bachelor of Commerce	B.Com.

By doing so, although a few values are getting repeated, values for the program_name column are now atomic for each record/row.

Using the First Normal Form, data redundancy increases, as there will be many columns with the same data in multiple rows but each row as a whole will be unique.

Second Normal Form(2NF)

The transformation of 1NF to 2NF involves the removal of partial dependencies. If a partial dependency exists, we remove the partially dependent attribute(s) from the relation by placing them in a new relation along with a copy of their determinant.

To further normalize and ensure non-repetition of data, we can break the above output table in First Normal Form into 3 different tables, as the single table above also contains information regarding program attributes that are not primary.

CREATE TABLE STUDENT(
    STUDENT_ID INT NOT NULL,
    STUDENT_NAME VARCHAR(100) NOT NULL,
    EMAIL VARCHAR(100),
    ACTIVE CHAR(1) NOT NULL DEFAULT('Y'),
    PRIMARY KEY(STUDENT_ID)
);

The above table will have the following recordset,

student_id	student_name	email
710002	Suraj Peth	suraj@eample.com
710003	Shion Y	aion@example.com
710003	Shion Y	aion@example.com
710004	Cycil L	cycil@example.com
710005	Deo Bobby	Deo@example.com
710006	Albert Doo	albert@example.com

CREATE TABLE PROGRAM(
    PROGRAM_ID INT NOT NULL,
    PROGRAM_NAME VARCHAR(500),
    PROGRAM_ABBR VARCHAR(50),
    ACTIVE CHAR(1) NOT NULL DEFAULT('Y'),
    PRIMARY KEY(PROGRAM_ID)
);

And the program table will have the following data,

program_id	program_name	program_abbr
5120036	Bachelor of science	B.Sc
5120042	Bachelor of Physics	B.Ph
5120040	Bachelor of Commerce	B.Com

And, finally, we will create another table for storing students and their applied Programs, as we allow students to enroll in multiple programs.

CREATE TABLE STUDENT_PROGRAM(
    STUDENT_ID INT NOT NULL,
    PROGRAM_ID INT NOT NULL,
    PRIMARY KEY(STUDENT_ID, PROGRAM_ID),
    FOREIGN KEY(STUDENT_ID) REFERENCES STUDENT(STUDENT_ID),
    FOREIGN KEY(PROGRAM_ID) REFERENCES PROGRAM(PROGRAM_ID)
);

student_id	program_id
710002	5120036
710003	5120036
710003	5120042
710004	5120036
710005	5120042
710006	5120040

Third Normal Form(3NF)

The rule for 3NF is that the table should be in Second Normal Form and we have to ensure that non-prime attributes should not be transitively dependent on the super key(primary key). It violates the rule of the third normal form.

Again taking forward the examples we used previously, let's add state and country-related attributes to the student table.

CREATE TABLE STUDENT(
    STUDENT_ID INT NOT NULL,
    STUDENT_NAME VARCHAR(100) NOT NULL,
    EMAIL VARCHAR(100),
    STATE VARCHAR(100),
    COUNTRY VARCHAR(155),
    ACTIVE CHAR(1) NOT NULL DEFAULT('Y'),
    PRIMARY KEY(STUDENT_ID)
);

student_id	student_name	email	state	country
710002	Suraj Peth	suraj@eample.com	Maharashtra	India
710003	Shion Y	aion@example.com	Karnataka	India
710003	Shion Y	aion@example.com	Karnataka	India
710004	Cycil L	cycil@example.com	Maharashtra	India
710005	Deo Bobby	Deo@example.com	Kerala	India
710006	Albert Doo	albert@example.com	Tamil Nadu	India

Here the State and Country in the above table are dependent on the student, so to normalize it as per Third Normalized Form, we can have another table storing the state & country list and referencing it with the id in the student's table. below is the normalized form,

CREATE TABLE STATE_COUNTRY(
    ID INT NOT NULL,
    STATE VARCHAR(100),
    COUNTRY VARCHAR (155),
    PRIMARY KEY(ID)
);

id	state	country
1	Maharashtra	India
2	Karnataka	India
3	Kerala	India
4	Tamil Nadu	India

CREATE TABLE STUDENT(
    STUDENT_ID INT NOT NULL,
    STUDENT_NAME VARCHAR(100) NOT NULL,
    EMAIL VARCHAR(100),
    STATE_ID INT NOT NULL,
    ACTIVE CHAR(1) NOT NULL DEFAULT('Y'),
    PRIMARY KEY(STUDENT_ID),
    FOREIGN KEY (STATE_ID) REFERENCES STATE_COUNTRY(ID)
);

student_id	student_name	email	state_id
710002	Suraj Peth	suraj@eample.com	1
710003	Shion Y	aion@example.com	2
710003	Shion Y	aion@example.com	2
710004	Cycil L	cycil@example.com	1
710005	Deo Bobby	Deo@example.com	3
710006	Albert Doo	albert@example.com	4

BCNF Normalization Form:

BCNF stands for Boyce-Codd Normal Form, which is an extension of 3NF. In BCNF, a table must not only contain only one type of data, have a unique identifier, and not have any transitive dependencies, but it must also not contain any repeating groups. This means that data in the table must not be repeated.

The benefit of BCNF is that it helps to improve data integrity. By eliminating repeating groups, it ensures that data is stored in the most efficient way possible. This helps to reduce data inconsistency and improve query performance.

Summary

Database normalization has both pros and cons. The pros include improved data integrity, reduced data redundancy, and improve query performance. The cons include increased complexity, as normalizing a database can make queries look complex due to an increase in multiple tables. It can also lead to increased storage costs, as more data may need to be stored to maintain data integrity.
As mentioned earlier in the start, Database normalization is an important process that helps to maintain the integrity of a database. It helps to reduce data redundancy, minimize data inconsistency, and improve query performance. It can be broken down into four major forms of normalization: 1NF, 2NF, 3NF, BCNF and more.
There are also pros and cons to database normalization, and it is important to understand them before implementing a database normalization strategy. Also, note that the Over-normalization of database schema also leads to an increase in complexity.
Denormalization is one of the last resorts for accelerating data retrieval in normalized databases when techniques like indexing are insufficient.

Finally, there are some tips and examples of database normalization that can be used to help achieve the best results.

Understanding Database Normalization

Intro

Case

First Normal Form(1NF)

Second Normal Form(2NF)

Third Normal Form(3NF)

BCNF Normalization Form:

Summary

More from this blog

Learning Real-World Performance Skills from SQL & PostgreSQL Interview Experiences

In-Memory OLTP for Faster SQL Server: Your Complete Guide

Integrating Azure SQL Database with External REST APIs using sp_invoke_external_rest_endpoint and Azure API Management

Implementing History Table or System Version Temporal Table to Track Data History

Handling Exceptions with Try-Catch in SQL Server

Command Palette

Intro

Case

First Normal Form(1NF)

Second Normal Form(2NF)

Third Normal Form(3NF)

BCNF Normalization Form:

Summary

More from this blog