3

I have been reading about distributed databases and the like providing weak and strong consistency guarantees, but perhaps I am short on imagination - when would you want to use a weakly consistent system?

Most examples of real use-cases - services that share state across a distributed system, or read-write services accessible to users - seem to require strong read-write consistency.

What sort of situation is useful for a weakly consistent system?

2 Answers2

2

Concurrency

Weak consistency would only be OK for applications where an insert or update is does not rely on a previous select.

For example, consider this common workflow:

  1. Read a record
  2. Let a user edit it
  3. Compute a new record based on the original record + the user's changes
  4. Save the record

This sort of workflow would not work so well under weak consistency, because the record read in step #1 may be out of date, and saving the record may stomp on another change that is already in flight. If you change FIRST_NAME and someone else changes LAST_NAME, one of you may lose your changes.

On the other hand, if you are inserting records where 100% of the data come from an external source, weak consistency may be OK. Some examples:

  1. An application that crawls the web and stores page information as it is discovered
  2. An application that emits debug logs
  3. A fire-and-forget audit logging mechanism

R/I

Weak consistency may not work so well if you have a lot of R/I. For example, if you are inserting a record that contains foreign keys, you can't be sure that the records with those foreign keys will still exist by the time your updates have propagated.

This would not be so much of an issue if your database doesn't allow deletes. For example, if you need to insert a Session record with a foreign key of UserID, and users are not allowed to be physically deleted, only disabled.

Identities and collisions

Any surrogate keys in a weakly consistent database would have to be globally unique; either a GUID or a compound key that contains the partitioning key. Otherwise, different nodes may end up inserting the same identity value and you will have a collision when the data propagate.

John Wu
  • 26,462
1

The only use case that comes to mind is some system like Google that does fuzzy matching, has enormous performance requirements, is interested in data analysis in the aggregate, but isn't particularly concerned about losing a small number of individual data points here and there because they aren't going to matter.

Under those conditions, the loss of a small amount of data integrity is greatly outweighed by the enormous performance benefits you will receive by giving up on absolute durability of data.

Robert Harvey
  • 199,517