Partial and Transitive Dependency

stonehenge
Photo by Inja Pavlić, some rights reserved.

Partial Dependency

A Composite Primary Key is a primary key that is made of multiple columns. It’s also often to hear the term Candidate Key, which refers to the same thing.

Partial Dependency occurs when one of your non-key columns is functionally dependent on only a part of the Composite Primary Key, and not the whole thing. Let’s look at an example.

Suppose we have a table with various fitness test results for different players.

player_idtest_idpointssupervisor_nametest_type
1187/100John Stewart30m Sprint
1356/100Alicia MadisonVertical Jump Test
3290/100John StewartOne Rep Max
2271/100Isaiah GrayOne Rep Max

The primary key in this case is a Composite Primary Key that is a made up of player_id and test_id.

Both the points and the supervisor_name columns are directly dependent on the entire primary key (player_id + test_id). The test_type column, on the other hand, is only dependent on test_id0 and has nothing to do with player_id.

This is Partial Dependency. It happens when an attribute in a table depends only on a part of the primary key and not on the whole key.

Why and how to fix it?

Partial Dependency leads to duplicate entries, and if we have a lot of duplicate entries in our tables, our database will take up a lot more memory than it should. Partial Dependency, unless fixed, can also lead to insertion, deletion and updation anomalies. What do these mean?

Say we had to insert a 100 new results into the table. Many players would do the same types of tests (eg. One Rep Max), so this would lead to plenty of unnecessary duplicate entries in the test_type column. This is a typical example of an insertion anomaly.

What would happen if we wanted to change the name of a single test type? For example, we want to change the ’30m Sprint’ to the ’30m Dash’. Well, then you would have to find each row in the table that contains ’30m Sprint’ entry and manually replace each one of those with ’30m Dash’. It is obviously a lot of unnecessary hassle and we call this updation anomaly.

What if, for some reason, we decided to delete this entire table? Than, we would automatically delete all the information about the different tests as well, which is not something we want. We want the specific test type information to be completely standalone and independent of this table. We call this problem a deletion anomaly.

But fortunately, it’s really easy to remove partial dependencies from your database. In this example, all we have to do is split the original table into two new tables, like this.

player_idtest_idpointssupervisor_name
1187/100John Stewart
1356/100Alicia Madison
3290/100John Stewart
2271/100Isaiah Gray
test_idtest_type
130m Sprint
2One Rep Max
3Vertical Jump Test
4Sit-up Test

Now the test_type column if fully dependent on the primary key of that table, and nothing else. By splitting up the table we take care of all three already mentioned anomalies.

Transitive Dependency

Transitive Dependency occurs when you have a column that is functionally dependent on another column that is not part of the primary key. Lets look at an example.

Say we had a table with information on each specific football player:

player_idheightweightclub_idclub_country
1175713England
2189852France
3195904Germany
4178781Italy

The columns height, weight and club_id are directly dependent on the primary key, while club_country is not. club_country is only dependent on the club_id column, and not on the primary key, which is player_id.

This is an example of transitive dependency. It occurs when a column is functionally dependent on another column which is not a part of the primary key.

It’s really easy to take care of transitive dependency. All you have to do is split the original table into two new tables, where each column of the tables will only be dependent only on the primary key of that table.

player_idheight weightclub_id
1175713
2189852
3195904
4178781
club_idclub_nameclub_country
1InterItaly
2NantesFrance
3LiverpoolEngland
4Bayer LeverkusenGermany

The new club table only contains information on the different clubs. As you can see here, club_country is now completely dependent on the club_id column, which is the primary key in the new club table.

Just like with Partial Dependency, by doing this we take care of insertion, deletion and updation anomalies. Also, by reducing the amount of duplicate entries, we reduce the amount of memory required to store the database.

Partial and Transitive Dependencies need to be removed in order to have your database in the Second and the Third Normal Form, respectively. Now, whether you want to have your data in the Second or the Third Normal Form is a question for another topic. You can read more on that topic here.

 

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Did I miss something or make a mistake somewhere?
Let me know in the comments below.