In which we reveal the fundamental inequality of NoSQL, and why NoSQL folks are so negative about joins.
Discussions of NoSQL design philosophies tend to quickly focus in on the matter of consistency. “Consistency”, however, turns out to be a rather overloaded concept, and confusion often ensues.
In this post I plan to address one essential subject, while ducking various related ones as hard as I can. It’s what Werner Vogel of Amazon called read-your-writes consistency (a term to which I was actually introduced by Justin Sheehy of Basho). It’s either identical or very similar to what is sometimes called immediate consistency, and presumably also to what Amazon has recently called the “read my last write” capability of SimpleDB.
This is something every database-savvy person should know about, but most so far still don’t. I didn’t myself until a few weeks ago.
Considering the many different kinds of consistency outlined in the Werner Vogel link above or in the Wikipedia consistency models article — whose names may not always be used in, er, a wholly consistent manner — I don’t think there’s much benefit to renaming read-your-writes consistency yet again. Rather, let’s just call it RYW consistency, come up with a way to pronounce “RYW”, and have done with it. (I suggest “ree-ooh”, which evokes two syllables from the original phrase. Thoughts?)
Definition: RYW (Read-Your-Writes) consistency is achieved when the system guarantees that, once a record has been updated, any attempt to read the record will return the updated value.
Here a “record” can be a row, a key-value pair, or any similar unit of data. An “update” can be whichever of insert/append or true change the system supports.
A conventional relational DBMS will almost always feature RYW consistency. Some NoSQL systems feature tunable consistency, in which — depending on your settings — RYW consistency may or may not be assured.
The core ideas of RYW consistency, as implemented in various NoSQL systems, are:
- Let N = the number of copies of each record distributed across nodes of a parallel system.
- Let W = the number of nodes that must successfully acknowledge a write for it to be successfully committed. By definition, W <= N.
- Let R = the number of nodes that must send back the same value of a unit of data for it to be accepted as read by the system. By definition, R <= N.
- The greater N-R and N-W are, the more node or network failures you can typically tolerate without blocking work.
- As long as R + W > N, you are assured of RYW consistency.
That bolded part is the key point, and I suggest that you stop and convince yourself of it before reading further.
Example: Let N = 3, W = 2, and R = 2. Suppose you write a record successfully to at least two nodes out of three. Further suppose that you then poll all three of the nodes. Then the only way you can get two values that agree with each other is if at least one of them — and hence both — return the value that was correctly and successfully written to at least two nodes in the first place.
In a conventional parallel DBMS, N = R = W, which is to say N-R = N-W = 0. Thus, a single hardware failure causes data operations to fail too. For some applications — e.g., highly parallel OLTP web apps — that kind of fragility is deemed unacceptable.
On the other hand, if W< N, it is possible to construct edge cases in which two or more consecutive failures cause incorrect data values to actually be returned. So you want to clean up any discrepancies quickly and bring the system back to a consistent state. That is where the idea of eventual consistency comes in, although you definitely can — and in some famous NoSQL implementations actually do — have eventual consistency in a system that is not RYW consistent.
Much technology goes into eventual consistency, as well as into the data distribution and polling in the first place. And in tunable systems, the choices of N, R, and W — perhaps on a “table” by “table” basis — can get pretty interesting. I’m ducking all those subjects for now, however, not least because of how much I still have to learn about them.
One point I will note, however, is this — RYW consistency and table joins make for awkward companions. If you want to join two tables, each of them distributed across some kind of parallel cluster, there are only two possibilities:
- In most cases, the data you need to join is co-located on the same nodes.
- You’re going to have an awful lot of network traffic.
In an R = W = N scenario, co-location may be realistic. But when R < N and W < N, a join can return incorrect results even when both of the tables being joined would have been read correctly.
In our example above, we had N = 3 and R = W = 2. Single-table RYW consistency was ensured. But suppose you join two records, each of which had been written correctly to 2 out of 3 nodes — but with only 1 node being correct about both records. Then only that 1 node out of 3 will return a correct value for the join, and badness will ensue.
Any architecture I can think of to circumvent that problem results in — you guessed it — an awful lot of network traffic.
And that, folks, is a big part of why the NoSQL folks are so negative about joins.
- Query fault-tolerance
- Huan Liu’s skepticism as to whether RYW consistency causes a significant performance hit
- Daniel Abadi’s views on NoSQL design tradeoffs