Database.Beginner.What is normalization? Why is it important?

What is Normalization?

Normalization is the process of organizing data in a database to reduce redundancy (duplicate data) and improve data integrity.

It involves breaking a big, messy table into smaller, more focused tables and linking them through relationships (usually using primary and foreign keys).


Key Goals of Normalization:

  • Eliminate redundant data — avoid storing the same data in multiple places.
  • Ensure data dependencies make sense — data should be stored logically.
  • Protect against anomalies:
    • Insertion anomaly: Trouble inserting data without other required data.
    • Update anomaly: Changing data in one place but not another, causing inconsistency.
    • Deletion anomaly: Losing important data because of deleting other data.

Why is Normalization Important?

  1. Reduces data redundancy:
    • Saves storage space.
    • Keeps data clean and consistent.
  2. Improves data integrity:
    • Avoids conflicting information in the database.
    • Ensures changes are propagated correctly.
  3. Makes databases easier to maintain:
    • Easier to add, update, or delete data.
    • Fewer places to update if something changes.
  4. Optimizes query performance (sometimes):
    • Smaller tables mean faster queries.
    • But — sometimes over-normalization can cause too many joins and slow things down (so we balance it carefully).

Example: Why Normalize?

Imagine a table like this:

student_idstudent_namecourse_1course_2
1AliceMathHistory
2BobMathNULL

Problems:

  • What if a student takes 3 courses? We’d have to add more columns.
  • If you update Math to Advanced Math, you have to update multiple places.
  • If you delete Bob, you lose record of Math.

After Normalization:

Students Table:

student_idstudent_name
1Alice
2Bob

Courses Table:

course_idcourse_name
1Math
2History

Enrollments Table (relationship table):

student_idcourse_id
11
12
21

Now:

  • Courses are not duplicated.
  • We can add more courses easily.
  • No update anomalies — change Math name once in Courses.

In Short:

Without NormalizationWith Normalization
Redundant dataReduced redundancy
Update anomaliesEasier updates
Hard to scaleEasier to add new data
Risk of inconsistenciesImproved data integrity

DELETION ANOMALY

Let’s go back to this non-normalized table:

student_idstudent_namecourse_1course_2
1AliceMathHistory
2BobMathNULL

Now imagine you delete Bob:

DELETE FROM table WHERE student_id = 2;

Now the table looks like:

student_idstudent_namecourse_1course_2
1AliceMathHistory

➡️ Problem:

  • Suppose Bob was the only student who had Math as their course (let’s imagine Alice drops Math later).
  • When you delete Bob, you are not just deleting the student — you are also deleting the only place where “Math” was stored in the table.
  • You’ve lost track of the fact that the course “Math” even existed because courses and students are all stored together in one row.

This is called a deletion anomaly.


With Normalization

When you separate Students and Courses into different tables:

Courses Table:

course_idcourse_name
1Math
2History

Now, even if you delete Bob from the Students table:

DELETE FROM Students WHERE student_id = 2;

The Courses table still keeps the record that “Math” is a valid course — no data loss!


In simple terms:

Without normalization:

  • Some important data (like the course “Math”) exists only inside student rows.
  • Deleting the student can accidentally delete information about the course too.

With normalization:

  • Students and courses are separate.
  • Deleting a student deletes only the student, not the course information.

1. Insertion Anomaly

An insertion anomaly happens when the database design prevents you from inserting data unless you have extra, unnecessary information.


Example without normalization:

student_idstudent_namecourse_name
1AliceMath
2BobHistory

Problem:

Suppose you want to add a new course Physics that no students have enrolled in yet.

How would you do it?

You can’t insert:

INSERT INTO table (student_id, student_name, course_name)
VALUES (NULL, NULL, 'Physics');

You can’t insert a course without a student.

The database forces you to either put fake student data or leave it empty, which can break integrity.

With normalization:

Separate the tables:

Courses Table:

course_idcourse_name
1Math
2History

Now you can easily:

INSERT INTO Courses (course_id, course_name) VALUES (3, 'Physics');

✅ No need for any student — you can add courses freely.

2. Update Anomaly

An update anomaly happens when the same data is stored in multiple places and you have to update it in all of them manually — or risk inconsistency.


Example without normalization:

student_idstudent_namecourse_name
1AliceMath
2BobMath
3CharlieMath

Suppose the course “Math” is now renamed to “Advanced Mathematics.”

Problem:

You have to run:

UPDATE table SET course_name = 'Advanced Mathematics' WHERE course_name = 'Math';

But what if:

  • You forget to update one row?
  • You make a mistake in your WHERE condition?

Result: some students show “Math,” others show “Advanced Mathematics” — data inconsistency!

With normalization:

Courses Table:

course_idcourse_name
1Math

Enrollments Table:

student_idcourse_id
11
21
31

Now, to rename the course:

UPDATE Courses SET course_name = 'Advanced Mathematics' WHERE course_id = 1;

✅ Update one place, and all students automatically point to the updated course name.

No inconsistency.

You forget to update one row?

Imagine you have this non-normalized table:

student_idstudent_namecourse_name
1AliceMath
2BobMath
3CharlieMath

Now, suppose you manually update:

UPDATE table SET course_name = 'Advanced Mathematics' WHERE student_id = 1;
  • You updated only Alice‘s row.
  • You forgot to update Bob and Charlie.

Result:

student_idstudent_namecourse_name
1AliceAdvanced Mathematics
2BobMath
3CharlieMath

Now your database is inconsistent:

  • Alice has the new course name.
  • Bob and Charlie still have the old name.


You make a mistake in your WHERE condition?

Let’s say you want to update all rows where the course is “Math”:

UPDATE table SET course_name = 'Advanced Mathematics' WHERE course_name = 'Math';

But what if you accidentally write:

UPDATE table SET course_name = 'Advanced Mathematics' WHERE course_name = 'Mat';  -- typo!
  • Now the database finds zero rows matching 'Mat', so nothing changes.
  • You thought the update worked, but actually no rows got updated.

Or worse, what if you wrote:

UPDATE table SET course_name = 'Advanced Mathematics' WHERE student_id = 2;
UPDATE table SET course_name = 'Advanced Mathematics' WHERE student_id = 2;
  • Only Bob’s course name would be updated — Alice and Charlie stay unchanged.

Why is this bad?

  • Human error: You have to trust that every update touches exactly the right rows — but humans make mistakes.
  • Inconsistency: Different rows now have different values for what should be the same course.

How Normalization helps

If you had a separate Courses table:

course_idcourse_name
1Math

and students just point to course_id, you would do:

UPDATE Courses SET course_name = 'Advanced Mathematics' WHERE course_id = 1;

✅ Only one row to update.
✅ No chance to forget or miss any student.
✅ All students who point to course_id = 1 automatically see the updated course name.

Summary

Without NormalizationRiskWith NormalizationSafe!
Duplicate course names in multiple rowsYou can forget to update some rowsSingle source of truthOnly one place to update

In Short:

AnomalyProblem without normalizationSolved by normalization
InsertionCan’t insert a course without a student.Separate course table — insert freely.
UpdateMust update the same value in many rows, risking inconsistency.Update only once in a separate table.
This entry was posted in Без рубрики. Bookmark the permalink.