I have one table. How can I normalize this.
-
have you read all the different forms of normalization ? – bane19 Aug 24 '15 at 19:53
-
yaa i read it , and i think it should be 3nf but not not sure that why i am here. – sanjeet Aug 24 '15 at 19:54
-
3Any answers won't help you here in the long run. Better learn yourself, try and then come back here with doubts/problems you are facing. – bane19 Aug 24 '15 at 19:56
-
3What are the keys and dependencies in this table? What makes you think it doesn't already satisfy 3NF? – nvogel Aug 24 '15 at 20:28
-
2Please say what minimal normal form you are supposed to "normalize" toa and why this table "should be [in] 3nf" and why you are "not sure". Otherwise your question is just asking for a textbook's chapter on normalization. – philipxy Aug 25 '15 at 04:41
-
You can refer below link from SO https://stackoverflow.com/questions/246701/what-is-normalisation-or-normalization – Ashutosh B Bodake Dec 05 '18 at 06:00
-
Please [use text, not images/links, for text--including tables & ERDs.](https://meta.stackoverflow.com/q/285551/3404097) Paraphrase or quote from other text. Use images only for what cannot be expressed as text or to augment text. Images cannot be searched for or cut & pasted. Include a legend/key & explanation with an image. Make your post self-contained. – philipxy May 22 '19 at 09:10
2 Answers
Good question given your data set. Keep in mind the whole point of normalization is to reduce duplication. 3NF is often the best way to go. But given my experience I've found very little benefit to pulling a repeated value out into a different table if it will be the only value in that table. Take for example your most duplicated column, emp_type. If you were to normalize it into a separate table , it would look like this:
Emp_Type_Id | Emp_type
------------------------
1 | Manager
2 | Engineer
3 | Tech Support
And your current table would look like:
Emp_ID | Emp_Name | Salary | Emp_Type_Id | Emp_Skill
----------------------------------------------------
1 | raj | 90000 | 1 | Department
2 | ravi | 50000 | 2 | Software
3 | shyam | 70000 | 2 | Hardware
.
.
.
This is technically more normalized than before as the emp_type value is no longer repeated in your database. But it's significantly more hassle to deal with a separate table and a relationship for a single value. If emp_type included other information as well such as Valid_Salary_Range, Department_Location, etc. then it would absolutely be best to normalize it out into a separate table. But if it's only a single value you are normalizing, what's the difference between storing the value as opposed to storing an ID pointed to that value? In my opinion it's a pointless extra step.
Long story short, I would not normalize your table at all, it's already normalized to a sufficient level.
- 1,161
- 2
- 20
- 36
-
Normalization to higher NFs does not involve introducing new column names. Also this shows no understanding of normalization. – philipxy May 22 '19 at 09:12
Even before we start, I'll mention each step's rules first. Why so? Because I don't know the dependencies in your table. And so putting forth the rules and my assumptions about your data will clarify how I arrived at the final solution.
In the end, if you aren't happy with the answer, for the least you will take away the understanding of how normalization works and can go ahead with doing it by yourself on the data that only you understand the best.
The Natural Order of Database Designing
- Start with specifying the entities (these are the tables)
- Specify the needed attributes of your entities (these are columns)
- Specify the unique property of your entity (the primary key of your table). If there is none, give one yourself (also called synthetic or surrogate primary keys).
- Finally specify the relation between these entities (primary key-foreign key relation and 1-N, N-N relation etc.)
----------
Now, that the basic designing rule is in place, it is easy to see that your single table actually has four separate entities fused together. They are:
1. Employee - who has id and name
2. Role - which can be Engineer, Tech Support etc
3. Department - which can be Software, Hardware, Department (could use better word though), Voice etc.
4. Salary - which will have EmpId, DateOfChange and Amount. This is because employees have different salaries and also the salaries of same employee changes over time.
So, we are going to break this table to four tables as mentioned above. Out of the four tables, Employee table has id as the primary key, Role and Department need synthetic (maybe even auto-incrementing) keys and Salary will have {EmpId, DataOfChange} together as the primary key. This will look like:
Table Name Columns
Employee Id, Name, RoleId, DeptId
Role RoleId, RoleType
Department DeptId, DeptName
Salary EmpId, DateOfChange, Amount
All the above table can have more items. I'm trying to hit the design that has minimalistic difference from that table you have already given. Like Salary table can also have a field like ReasonOfSalaryChange which could have values like NewHiring, Promotion etc... But we hold the changes to minimal.
Hope all is ok until now. If that is so, we move on to the actual normalization that you have asked.
Common Use Normalizations
I'm mentioning common use rules because honestly I never needed BCNF, 4th and 5th NF in practice.
The rules -
1NF: simply states that all columns must have atomic values. If a column needs more than one value, create another table. Your new tables pass that test. All values in your table's columns are atomic.
2NF: requires 1NF qualification and that any non-key field should be dependent on the entire primary key. All your non-key fields (Name in Employee, RoleType in Role, DeptName in Department, Amount in Salary) depend on the respective table's primary key (Id, RoleId, DeptId and {EmpId, DateOfChange}). So the tables qualify 2NF fine.
3NF: requires 2NF qualification and that no non-key field should depend upon any other non-key field. This means that there should be no dependencies between table's columns except on the primary keys. Role, Department and Salary tables are default 3NF qualified as they have only one non-key column and it's dependent on PK. Employee table, you can verify yourself, has no non-key element that is dependent on any other non-key element. So the tables are perfectly 3NF qualified the way they currently are.
All that's left now is to point out that RoleId, DeptId and EmpId are foreign keys from the Role, Department and Employee table respectively. And that would be my final redesigned and normalized database submission to you.
- 13,303
- 8
- 53
- 73
-
1I would add that in the real world, salary probably needs to be pulled out into a salary history table with start and end dates for the salary as it was over time. This is temporal data and past values may be needed for research into problems and reporting.. – HLGEM Sep 11 '15 at 14:53
-
@HLGEM: I haven't done any database work in many years, so have lost the eye for those details. Your suggestion is a +10. Will incorporate if OP says that it's needed. Otherwise will leave the answer as is. Thanks a lot for pointing out. – displayName Sep 11 '15 at 14:59
-
-
1People forget that we don't store data just to store it. If it is to have value, we have to understand what the data will be used for not just how it gets into the database. Too many people design only to make it easy for the application to insert the data and never consider why they want that data in the first place. When you don't think ahead to why, you don't find out that most of the time it is going to be used in "What was the salary on this date?" type of queries. If you haven't stored the data that way, then you can't answer the questions the users need to ask about the data. – HLGEM Sep 11 '15 at 17:28
-
1Downvoted for perpetuating entity/table conflation. Entities are represented by _values_, tables represent predicates. – reaanb Oct 01 '15 at 05:33
-
@reaanb: Here is the first paragraph of **"What are the alternatives to down-voting?"** for you: *The up-vote privilege comes first because that's what you should focus on: pushing great content to the top. Down-voting should be reserved for extreme cases. It's not meant as a substitute for communication and editing.* – displayName Oct 01 '15 at 14:14
-
@reaanb: If you change your mind, then go ahead and update the answer and also undo your down-vote. – displayName Oct 01 '15 at 14:17
-
I agree with your comments and am inclined to undo the down-vote - the answer as a whole doesn't deserve it. However, the issue is an important one to me and part of what motivates me to participate. I'll undo the down-vote in good faith, please consider improving your answer. I'll be happy to discuss the topic if you're interested. – reaanb Oct 01 '15 at 17:34
-
It says my vote is older than 12 hours and locked in unless the answer is edited. – reaanb Oct 01 '15 at 17:36
-
"entities (these are the tables)" No. "relation between these entities (primary key-foreign key relation and 1-N, N-N relation etc.)" No. "2NF: requires 1NF qualification and that any non-key field should be dependent on the entire primary key" No, non-prime fields must not be dependent on a proper subsetof a CK. "3NF: requires 2NF qualification and that no non-key field should depend upon any other non-key field." No. Look up a correct definition yourself in a (good) textbook. "PK" No. PKs are irrelevant, CKs matter. Etc. ("dependencies in your table" Yes, relevant.) – philipxy May 22 '19 at 09:31
-
@philipxy: I carefully read your comment. It seems you were too eager to say that the answer has many flaws. – displayName May 22 '19 at 10:32
-