71

Although one can use composite primary keys, for the case below, is it really a bad practice? The consensus on Stackoveflow seems to go both ways on this issue.

Why?


I want to store payments for the orders in a separate table. The reason is that, an order can have many items which are handled in a separate table in the form of many to many relationship. Now, if I don't use composite primary keys for my payment table, I'll lose my unique PaymentID:

[PaymentId] INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
[OrderId] INT NOT NULL PRIMARY KEY --Also a Foreign Key--

Now, if I just remove the Primary Key for the OrderId, I'll lose my one to one relationship here so Many OrderIds can be associated to many PaymentIds, and I don't want this.

This seems to be why other answers on SO have concluded (mostly) that the composite key is a bad idea. If it is bad, what's the best practice then?

ΩmegaMan
  • 26,526
  • 10
  • 91
  • 107
JAX
  • 1,410
  • 1
  • 15
  • 31
  • If I understand you correctly, in this case you can just add a separate unique constraint on `OrderId`, and have `PaymentId` as the primary key. – Laurence Sep 27 '14 at 19:53
  • I didn't understand the part: "The reason is that, an order can have many items which are handled also in a separate table in the form of many to many relationship."? if you have `order_id` in `payments` table then all you have to do is to reference it with `orders` table, How would you lose unique `PaymentID`? – Surya Sep 27 '14 at 19:55
  • @Laurence: Yeah but 1 Order in that case can have multiple payments which is bad, am I right ? – JAX Sep 27 '14 at 19:55
  • 1
    In my opinion, having multiple payments for an order isn't bad at all. – Surya Sep 27 '14 at 19:56
  • @Surya: Please look my previous comment – JAX Sep 27 '14 at 19:56
  • @Surya: In real world it is bad because it can't happen – JAX Sep 27 '14 at 19:56
  • What is the composite primary key you are talking about? (PaymentId,OrderId)? With FKs (PaymentId) and (OrderId)? It would be helpful if you gave proper SQL for your choices. Your question is unclear. – philipxy Sep 28 '14 at 02:58

5 Answers5

80

There is no conclusion that composite primary keys are bad.

The best practice is to have some column or columns that uniquely identify a row. But in some tables a single column is not enough by itself to uniquely identify a row.

SQL (and the relational model) allows a composite primary key. It is a good practice is some cases. Or, another way of looking at it is that it's not a bad practice in all cases.

Some people have the opinion that every table should have an integer column that automatically generates unique values, and that should serve as the primary key. Some people also claim that this primary key column should always be called id. But those are conventions, not necessarily best practices. Conventions have some benefit, because it simplifies certain decisions. But conventions are also restrictive.

You may have an order with multiple payments because some people purchase on layaway, or else they have multiple sources of payment (two credit cards, for instance), or two different people want to pay for a share of the order (I frequently go to a restaurant with a friend, and we each pay for our own meal, so the staff process half of the order on each of our credit cards).

I would design the system you describe as follows:

Products  : product_id (PK)

Orders    : order_id (PK)

LineItems : product_id is (FK) to Products
            order_id is (FK) to Orders
            (product_id, order_id) is (PK)

Payments  : order_id (FK)
            payment_id - ordinal for each order_id
            (order_id, payment_id) is (PK)

This is also related to the concept of identifying relationship. If it's definitional that a payment exists only because an order exist, then make the order part of the primary key.

Note the LineItems table also lacks its own auto-increment, single-column primary key. A many-to-many table is a classic example of a good use of a composite primary key.

Bill Karwin
  • 499,602
  • 82
  • 638
  • 795
  • 1
    "it's not a bad practice in all cases" . . . I can agree with that. – Gordon Linoff Sep 27 '14 at 19:59
  • What does a many-to-many relationship between orders and payments have to do with this question? – wvdz Sep 27 '14 at 20:02
  • @popovitsj: Nothing to do with this part of the question, I explained why I want things to be split up in different tables – JAX Sep 27 '14 at 20:06
  • 7
    Some of the opinion against declaring composite primary keys appears to be driven by the way some ORM tools work. Pierre may or may not be in this situation. – Walter Mitty Sep 28 '14 at 06:16
  • 8
    @WalterMitty, right, ORM frameworks like Ruby on Rails started out with the phrase "opinionated software" about PK design being `id` only, but this is like saying that you won't support functions with more than one argument. In versions after the first, RoR supports compound primary keys. All frameworks eventually come to the same conclusion. If anyone is still using an ORM that doesn't support compound PK's, you need to upgrade. – Bill Karwin Sep 28 '14 at 06:56
  • 12
    It's also worth pointing out that autoincrement guarantees the uniqueness of table rows, but not necessarily the singular identity of each of the subject matter entities. An operational error can result in duplicate entry of the same person, course, product, etc. – Walter Mitty Sep 28 '14 at 12:55
  • @WalterMitty, right but compound PK doesn't ensure that either. That's not usually a problem in the relational model, because it's implicit in the model that the value 1234 in one column doesn't mean the same thing as the value 1234 in a totally different column. – Bill Karwin Sep 28 '14 at 17:06
  • Would it not be better to invert the order of the primary key in LineItems, so that inserts into the index are sorted by order_id, product_id, and not the other way around? Unique-wise it's the same – nickdnk Jun 17 '17 at 18:37
  • @nickdnk, depends on how you plan to query the table. If you want to favor lookup by product most of the time, then it would be a good idea to define the primary key the way I show. If you want lookup by order to be more efficient, then change the column order as you suggest. – Bill Karwin Jun 17 '17 at 18:43
  • @BillKarwin - I mean for inserts, the performance should be better with the order id first, since you would not need to shift around rows randomly based on the product number used. It would already be in the order inserted, naturally. Adding another index (non-unique) to the product_id should solve the query problem, no? – nickdnk Jun 17 '17 at 18:45
  • @nickdnk, you're assuming the inserts are committed in perfect sequential order by order_id. If the inserts are coming in at a slow rate, this might be true. But if it's that slow, inserting in non-sequential order won't be a bottleneck. If the orders are coming in so fast that you have to optimize insert order, they're probably coming in from many concurrent threads, and they won't be perfectly sequential. – Bill Karwin Jun 17 '17 at 18:59
  • @BillKarwin - Okay. In a heavy insert situation I was considering that following the order_id insert incrementally would result in a lot more buffer pool cache hits, even though they weren't exactly in order. But I see your point. – nickdnk Jun 17 '17 at 19:01
  • 1
    @nickdnk Anyway, you're right that inserting in sequential order can be a benefit. See this blog for some explanation and clever graphical proof: https://www.percona.com/blog/2015/04/03/illustrating-primary-key-models-in-innodb-and-their-impact-on-disk-usage/ – Bill Karwin Jun 17 '17 at 19:09
  • some naming conventions here are really useful... calling a key column 'order_id_pk' or 'order_id_fk' is super useful. That way when you're writing queries or reading old code, you know when you're dealing with primary keys or foreign keys. – Scuba Steve Sep 27 '18 at 20:41
  • I believe it's harder to make payment_id ordinal for each order_id, rather than to simply make payment_id ordinal (serial). What's the point in making it more complex? @BillKarwin RoR supports composite primary keys? To which extent? It autoadds the `id` column to join tables, and to avoid that... I doubt that's practical. The price is probably too high, and the upsides are unclear. @ScubeSteve Calling primary keys `id` and foreign keys `something_id` achieves the same goal in a more concise/readable way ;) – x-yuri Jun 07 '21 at 01:02
30

This question is dangerously close to asking for opinions, which can generate religious wars. As someone who is highly biased toward having auto-increasing integer primary keys in my tables (called something like TablenameId, not Id), there is one situation where it is optional.

I think the other answers address why you want primary keys.

One very important reason is for reference purposes. In a relational database, any entity could -- in theory -- be referenced by another entity via foreign key relationships. For foreign keys, you definitely want one column to uniquely define a row. Otherwise, you have to deal with multiple columns in different tables that align with each other. This is possible, but cumbersome.

The table you are referring to is not an "entity" table it is a "junction" table. It is a relational database construct for handling many-to-many relationships. Because it doesn't really represent an entity, it should not have foreign key relationships. Hence, a composite primary key is reasonable. There are some situations, such as when you are concerned about database size, where leaving out an artificial primary key is even desirable.

cezar
  • 10,930
  • 6
  • 40
  • 81
Gordon Linoff
  • 1,198,228
  • 53
  • 572
  • 709
  • If you could answer @philipxy's comment, that would be helpful. As I'm currently in a design phase contemplating the trade-offs. – Anish Ramaswamy Mar 22 '16 at 23:02
  • 1
    @AnishRamaswamy I think he means that if you want to link two tables together, you link them by the unique identifier. The primary key of the other table becomes the foreign key in your table. And he is saying that his preference is if that primary key is not a composite key, because he might not want to have multiple columns imported into his table, rather, he might only want one. – barlop Jan 20 '18 at 02:47
  • 1
    I think it would be helpful to comment on this old answer to add an important comment: `TableNameId` is annoyingly redundant. – Alexander Oct 22 '19 at 12:41
12

Disk space is cheap, so a primary key clustered on an int identity(1,1) named after a convention (like pk + table name) is a good practice. It will make queries, joins, indexes and other constraints easy to manage.

However there's one good reason to no do that (in MS SQL Server at least): if you want to manage the physical sorting of your data in the underlying storage system.

The primary key clustered determines the physical sorting order. If you do it on an identity column, the physical sorting order is basically the insert order. However, this may not be the best, especially if you always query the table the same way. On very large tables, getting the right physical sorting order makes queries a lot faster. For example you may want the clustered index on a composite of two columns.

JeromeE
  • 421
  • 4
  • 6
6

Best practices are helpful at best, but blinding at worst. Going against a best practice isn't a sin. Just be sure you know what kind of trade-off you are making.

Database engines can be very complicated things. Without knowing what particular optimizations are made by a given engine, it will be difficult to determine what kinds of constructs will yield the best performance (because I assume that the issue we are talking about here is performance). Composite keys may be problematic for large tables in one kind of database, but not have any noticeable impact for another.

A useful practice I've learned is to always strive for having my applications as simple as possible. Do using composite keys save you from having to perform lookups before insertions, or some other nuisance? Use them. If you, however, notice that using them makes your application no longer satisfy some significant performance requirement, consider a solution without them.

Emanuel
  • 741
  • 5
  • 13
-2

If your table with a composite primary key is expected to have millions of rows, the index controlling the composite key can grow up to a point where CRUD operation performance is very degraded. In that case, it is a lot better to use a simple integer ID primary key whose index will be compact enough and establish the necessary DBE constraints to maintain uniqueness.

Source:

https://www.toptal.com/database/database-design-bad-practices

Alfonso Tienda
  • 3,200
  • 1
  • 15
  • 33