288

In relational database design, there is a concept of database normalization or simply normalization, which is a process of organizing columns (attributes) and tables (relations) to reduce data redundancy and improve data integrity. (as written on Wikipedia).

As most articles are somewhat technical and thus harder to understand, I'm asking for someone to write an easier to understand explanation based on examples about what 1NF, 2NF, 3NF, even 3.5NF (Boyce-Codd) mean.

Jax
  • 6,625
  • 4
  • 25
  • 37
barfoon
  • 26,527
  • 25
  • 91
  • 138

4 Answers4

451

1NF is the most basic of normal forms - each cell in a table must contain only one piece of information, and there can be no duplicate rows.

2NF and 3NF are all about being dependent on the primary key. Recall that a primary key can be made up of multiple columns. As Chris said in his response:

The data depends on the key [1NF], the whole key [2NF] and nothing but the key [3NF] (so help me Codd).

2NF

Say you have a table containing courses that are taken in a certain semester, and you have the following data:

|-----Primary Key----|               uh oh |
                                           V
CourseID | SemesterID | #Places  | Course Name  |
------------------------------------------------|
IT101    |   2009-1   | 100      | Programming  |
IT101    |   2009-2   | 100      | Programming  |
IT102    |   2009-1   | 200      | Databases    |
IT102    |   2010-1   | 150      | Databases    |
IT103    |   2009-2   | 120      | Web Design   |

This is not in 2NF, because the fourth column does not rely upon the entire key - but only a part of it. The course name is dependent on the Course's ID, but has nothing to do with which semester it's taken in. Thus, as you can see, we have duplicate information - several rows telling us that IT101 is programming, and IT102 is Databases. So we fix that by moving the course name into another table, where CourseID is the ENTIRE key.

Primary Key |

CourseID    |  Course Name |
---------------------------|
IT101       | Programming  |
IT102       | Databases    |
IT103       | Web Design   |

No redundancy!

3NF

Okay, so let's say we also add the name of the teacher of the course, and some details about them, into the RDBMS:

|-----Primary Key----|                           uh oh |
                                                       V
Course  |  Semester  |  #Places   |  TeacherID  | TeacherName  |
---------------------------------------------------------------|
IT101   |   2009-1   |  100       |  332        |  Mr Jones    |
IT101   |   2009-2   |  100       |  332        |  Mr Jones    |
IT102   |   2009-1   |  200       |  495        |  Mr Bentley  |
IT102   |   2010-1   |  150       |  332        |  Mr Jones    |
IT103   |   2009-2   |  120       |  242        |  Mrs Smith   |

Now hopefully it should be obvious that TeacherName is dependent on TeacherID - so this is not in 3NF. To fix this, we do much the same as we did in 2NF - take the TeacherName field out of this table, and put it in its own, which has TeacherID as the key.

 Primary Key |

 TeacherID   | TeacherName  |
 ---------------------------|
 332         |  Mr Jones    |
 495         |  Mr Bentley  |
 242         |  Mrs Smith   |

No redundancy!!

One important thing to remember is that if something is not in 1NF, it is not in 2NF or 3NF either. So each additional Normal Form requires everything that the lower normal forms had, plus some extra conditions, which must all be fulfilled.

Smashery
  • 54,058
  • 30
  • 96
  • 124
  • 1
    Thanks for this - right what I was looking for. In the 2NF example - how exactly is the course name dependent on the course's ID? How is the teachers name dependent on the ID? – barfoon Apr 07 '09 at 03:59
  • 9
    Just think about it in terms of the relationship between things. If I ask you "what is the name of the course with ID IT101?", you can give me an answer, right? Similarly, if I ask you "what teacher has the ID 332?" you can tell me what teacher that is. Thus, the course name is dependent on its ID. – Smashery Apr 07 '09 at 04:25
  • 9
    However, you can't go the other way - if I were to ask you "What is Mr Jones's ID?" you may not be able to give a unique answer, as there may be two Mr Joneses. So the ID is not dependent on the name - it's the name that is dependent on the ID. – Smashery Apr 07 '09 at 04:27
  • 2
    You can also think of it this way - look at the third table down (the first with TeacherName in it). What's to stop me having "Mr Jones" in the first row, but then putting "Mr Bloggs" in the second row? I _shouldn't_ be allowed to do that, because they've both got the ID of 332. – Smashery Apr 07 '09 at 04:30
  • 1
    But if you design your table so that it is not in 3NF, there's nothing to stop you from writing two different names for the one ID - and then your database is contradicting itself! – Smashery Apr 07 '09 at 04:32
  • 1
    The thing to understand for 2NF is that if you know a course's ID, you already know its name - you don't need its semester as well! That would be like saying "Okay, so your student number is 314156 - but I can't work out your name. Tell me your favourite colour" "Red" "Oh, you must be Johnny" – Smashery Apr 07 '09 at 04:37
  • 2
    A student number _uniquely_ identifies you - you don't need to know anything else. The row in the database won't say you're Johnny because your favourite colour is Red - it'll tell you you're Johnny because your student ID is 314156. So your name is dependent on the ID, and _only_ the ID. – Smashery Apr 07 '09 at 04:39
  • 1
    I see this "so help me Codd" in a lot of places, what does it mean? – instantsetsuna Aug 11 '10 at 13:58
  • @instantsetsuna - http://en.wikipedia.org/wiki/Ted_Codd – Smashery Aug 12 '10 at 02:47
  • 31
    @instantsetsuna - Full explanation: In some courts, a witness is asked if they will tell "The truth, the whole truth, and nothing but the truth, so help me God"; as God is considered to be the one with authority when it comes to knowing if you're telling the truth. In the case of databases, we can say "The data depends on the key, the whole key and nothing but the key, so help me Codd". Ted Codd is the one who came up with the idea of relational databases - things relying on keys, etc., so he would be the authority to go by in the case of relational databases. – Smashery Aug 12 '10 at 02:48
  • I understand now, Thanks for taking time to explain! :) – instantsetsuna Aug 12 '10 at 07:15
  • pls give an example of 1nf also .. – XMen Oct 16 '12 at 07:15
  • 2
    your Semester column just makes me gag, you break a very important rule by combining two columns into a single one. you need to do string manipulation to break out the year. – KM. Oct 23 '12 at 14:37
  • That inst 2NF cause 1NF says that each field must only store atomic values. Storing 2009-1 in a single field is not atomic by any means. –  Mar 22 '13 at 10:17
  • 1
    @TaylorGibb Sure it is - those values are just foreign keys into another table which I haven't shown here, called "Semesters", which has an ID (e.g. "2009-1"), a start date and an end date. The fact that these IDs happen to look like actual semester dates is purely coincidental ;-) – Smashery Mar 26 '13 at 03:24
  • 1
    @Smashery hahaha i see what you did there ;) –  Mar 26 '13 at 06:44
  • @Smashery Could you update your epic answer to add BCNF – mko Nov 04 '13 at 14:33
  • 6
    @Smashery What is the difference between 2NF and 3NF? – Zo Has Jul 22 '15 at 05:23
  • 1
    this 'The data depends on the key [1NF], the whole key [2NF] and nothing but the key [3NF] (so help me Codd)' is extremely helpful for internalizing and simplifying the basics of normalization. I take it this device is more widely known among DBAs? – Mr.Budris Dec 22 '16 at 15:56
  • great answer, helps me understand this easily. But like some others, I don't immediately see a difference between 2NF and 3NF. – Harry Mar 13 '17 at 09:41
  • Why is there no normalization in object orientation programing? – Lealo Aug 15 '17 at 20:08
  • What about something like, | From_ID, From_Name, To_ID, To_Name | I believe this to be 1NF, but not too sure why...? – Marais Rossouw Sep 02 '18 at 11:26
  • @Smashery the teacher's name doesn't depend on the primary key (Courser +Semester) at all. So why did that violate 3NF ? I feel it violates 2NF more, i.e. a table should not contain 2 different types of data. – Qiulang Jan 04 '19 at 08:08
  • If you have a huge table and are having _performance_ problems, some of the NFs may need to be violated to help with speed. For further discussion, start a new question with your specific schema and specific queryl. – Rick James May 01 '22 at 17:07
127

I've never had a good memory for exact wording, but in my database class I think the professor always said something like:

The data depends on the key [1NF], the whole key [2NF] and nothing but the key [3NF].

Chris Shaffer
  • 31,567
  • 5
  • 48
  • 61
  • 77
    ...so help me Codd. http://en.wikipedia.org/wiki/Ted_Codd – Smashery Apr 07 '09 at 02:54
  • 7
    So what is difference between `The data depends on the key [1NF], nothing but the key [3NF]` ? Please dont confuse us, as 1 sentense answer doesnt clarify the answer but confuse visitors! – Pratik Oct 04 '15 at 09:34
  • 2
    *"each cell in a table must contain only one piece of information, and there can be no duplicate rows."* - I don't see how "The data depends on the key" matches all of that. – Simon Forsberg Jul 14 '17 at 13:15
51

Here's a quick, admittedly butchered response, but in a sentence:

1NF : Your table is organized as an unordered set of data, and there are no repeating columns.

2NF: You don't repeat data in one column of your table because of another column.

3NF: Every column in your table relates only to your table's key -- you wouldn't have a column in a table that describes another column in your table which isn't the key.

For more detail, see wikipedia...

Dave Markle
  • 92,195
  • 20
  • 143
  • 169
  • 2
    As far as I can tell, 1NF's eschewing of _repeating **groups**_ does not refer to repeating columns, but rather single _columns_ that represent an arbitrary number of repeated values for the same attribute, i.e. are not atomic. I base this on e.g. (1) http://stackoverflow.com/questions/23194292/normalization-what-does-repeating-groups-mean / (2) http://stackoverflow.com/questions/26357276/1nf-repeating-groups-what-are-they – underscore_d Mar 30 '17 at 15:54
38

1NF: Only one value per column

2NF: All the non primary key columns in the table should depend on the entire primary key.

3NF: All the non primary key columns in the table should depend DIRECTLY on the entire primary key.

I have written an article in more detail over here

Arcturus
  • 2,698
  • 2
  • 18
  • 11
  • Because you didn't provide attribution. I've edited the answer to provide attribution. – Robert Harvey Oct 22 '12 at 23:01
  • Whenever possible, include the example in the answer itself. A casual user should be able to read your answer and get some value from it without having to click the blog link. The answer should be self-contained, in other words. – Robert Harvey Oct 22 '12 at 23:03
  • I will keep that in mind. Thank you :) – Arcturus Oct 22 '12 at 23:04
  • 3
    Also note that this question is two years old, and already has a highly-upvoted answer that was marked as accepted by the OP. Late answers are scrutinized more carefully by the community to assess whether or not they add genuine additional value to the OP. – Robert Harvey Oct 22 '12 at 23:05
  • @RobertHarvey. This way he can "ad" his blog... – gdoron is supporting Monica Oct 23 '12 at 07:34
  • Thank Codd I read through the end of the page. – Rafael Eyng Nov 24 '15 at 02:33
  • 3
    @Arcturus read through the article, still one of the better explanations for normalization out there. – Olian04 Mar 13 '17 at 11:26
  • To be more precise, that 3NF rule is actually BCNF. 3NF carries an additional constraint, where if there is a functional dependency X --> A, then X is not a proper subset of any key, but A is part of some key. (Source: Database Management Systems by Raghu Ramakrishnan & Gehrke). – Dheeru Mundluru Sep 13 '17 at 06:58