0

I'm interested to know in which case I should create a primary key on multiple columns and why ?

To be more specific about my question, I want to know what are the advantages and disadvantages of primary key on multiple columns in terms of performance.

I did some research and found this question where someone is saying

Another good question is SHOULD a primary key be multiple columns. :) – Sonny Boy Nov 17 '09 at 17:31

Well, that's a different question, and the answer is NO, NOT EVER, AND IF YOU DO I WILL HUNT YOU DOWN AND HURT YOU. Just a personal preference, of course :-) – Stu Dec 4 '09 at 16:46

Despite the joke, there should be some good reason on why one shouldn't use primary key on multiple columns but I couldn't find any answer.


I was wondering about this because I have a concrete problem.
Consider the following table:

CREATE TABLE IF NOT EXISTS `user_travel_data` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `id_user` int(10) unsigned NOT NULL,
  `day` date NOT NULL,
  `origin` varchar(8) NOT NULL,
  `destination` varchar(8) NOT NULL,
  `data` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `ukey` (`id_user`, `day`, `origin`, `destination`)
) 

In this table, for each user I have some data for each day and for each travel from origin to destination.

I need this table because it's very easy to use WHERE clause on it. However, the numbers of line in this table are just insane. Usually, a user has ~50 origins and ~100 destinations. With 30 days filled & 200 users, there's ~30 million lines in my table. Because each user has around ~150000 lines, I had to create some tools for them to edit/delete/add lines in group.
And, because of these tools, users tends to delete old lines and create a whole bunch of new lines. This leads to a very fast increase of my primary key.
While it is not a problem now, it could be one pretty soon.

So, since I don't use the id on this table, I was wondering if I could do something like :

PRIMARY KEY (`id_user`, `day`, `origin`, `destination`)

This answer is also quite useful but didn't help with my problem

Gary Olsson
  • 101
  • 1
  • Why is a "fast increase" of a primary key a potential problem? Also see Performance of multi field primary key or contrived key – LowlyDBA - John M Nov 13 '15 at 15:41
  • @JohnM It is a potential problem because there's a limit to an int. While I can use a bigint instead it just delay the problem. When my Id reach it's max value, I'll not be able to insert any more row. – Gary Olsson Nov 13 '15 at 16:09
  • 1
    Do you really plan on having more than 9,223,372,036,854,775,807 rows in your table, even accounting for frequent deletes? There's also a UNIQUEIDENTIFIER as an option if so. – LowlyDBA - John M Nov 13 '15 at 16:12
  • 2
    To be honest, those two SO questions are only mildly useful. My take: if you don't plan to join other tables to this one (on the primary key), just use the natural key. If you do, use a surrogate key (bigint if int is not enough). – András Váczi Nov 13 '15 at 16:31
  • @JohnM What's better here - a single column or multicolumn primary key? while this answer give some general advice, it doesn't explain in terms of performance why having a multicolumn primary key is worse than a completly unused id. It just state Per-group serials are typically expensive, pointless effort. Yes, maybe, but isn't it better than an unused id ? If not, why ? – Gary Olsson Nov 13 '15 at 16:46
  • @dezso by natural key, do you mean my multiple column primary key ? – Gary Olsson Nov 13 '15 at 16:47
  • 1
    @GaryOlsson yes – András Váczi Nov 13 '15 at 16:48
  • A single column primary key is best. A comment: Although the PK is often made the clustered index as well, it is certainly possible to make the PK a UNIQUE index, then you can make the independent CLUSTERED INDEX include several columns, if there is truly a need for that. – RLF Nov 13 '15 at 19:33
  • 1
    Generally, an associative or bridging table is a good candidate for a compound primary key. For example, if multiple books have multiple authors, you can have a bridging table with only 2 columns: bookid and authorid. By designating both columns as a primary key, you guarantee uniqueness of the entries (that is, you never mention a book/author combination more than once). – Manngo Mar 18 '18 at 07:07

0 Answers0