May 1, 2010

Read-your-writes (RYW), aka immediate, consistency

In which we reveal the fundamental inequality of NoSQL, and why NoSQL folks are so negative about joins.

Discussions of NoSQL design philosophies tend to quickly focus in on the matter of consistency. “Consistency”, however, turns out to be a rather overloaded concept, and confusion often ensues.

In this post I plan to address one essential subject, while ducking various related ones as hard as I can. It’s what Werner Vogel of Amazon called read-your-writes consistency (a term to which I was actually introduced by Justin Sheehy of Basho). It’s either identical or very similar to what is sometimes called immediate consistency, and presumably also to what Amazon has recently called the “read my last write” capability of SimpleDB.

This is something every database-savvy person should know about, but most so far still don’t. I didn’t myself until a few weeks ago.

Considering the many different kinds of consistency outlined in the Werner Vogel link above or in the Wikipedia consistency models article — whose names may not always be used in, er, a wholly consistent manner — I don’t think there’s much benefit to renaming read-your-writes consistency yet again. Rather, let’s just call it RYW consistency, come up with a way to pronounce “RYW”, and have done with it. (I suggest “ree-ooh”, which evokes two syllables from the original phrase. Thoughts?)

Definition: RYW (Read-Your-Writes) consistency is achieved when the system guarantees that, once a record has been updated, any attempt to read the record will return the updated value.

Here a “record” can be a row, a key-value pair, or any similar unit of data. An “update” can be whichever of insert/append or true change the system supports.

A conventional relational DBMS will almost always feature RYW consistency. Some NoSQL systems feature tunable consistency, in which — depending on your settings — RYW consistency may or many not be assured.

The core ideas of RYW consistency, as implemented in various NoSQL systems, are:

That bolded part is the key point, and I suggest that you stop and convince yourself of it before reading further.

Example: Let N = 3, W = 2, and R = 2. Suppose you write a record successfully to at least two nodes out of three. Further suppose that you then poll all three of the nodes. Then the only way you can get two values that agree with each other is if at least one of them — and hence both — return the value that was correctly and successfully written to at least two nodes in the first place.

In a conventional parallel DBMS, N = R = W, which is to say N-R = N-W = 0. Thus, a single hardware failure causes data operations to fail too. For some applications — e.g., highly parallel OLTP web apps — that kind of fragility is deemed unacceptable.

On the other hand, if W< N, it is possible to construct edge cases in which two or more consecutive failures cause incorrect data values to actually be returned. So you want to clean up any discrepancies quickly and bring the system back to a consistent state. That is where the idea of eventual consistency comes in, although you definitely can — and in some famous NoSQL implementations actually do — have eventual consistency in a system that is not RYW consistent.

Much technology goes into eventual consistency, as well as into the data distribution and polling in the first place. And in tunable systems, the choices of N, R, and W — perhaps on a “table” by “table” basis — can get pretty interesting. I’m ducking all those subjects for now, however, not least because of how much I still have to learn about them.

One point I will note, however, is this — RYW consistency and table joins make for awkward companions. If you want to join two tables, each of them distributed across some kind of parallel cluster, there are only two possibilities:

In an R = W = N scenario, co-location may be realistic. But when R < N and W < N, a join can return incorrect results even when both of the tables being joined would have been read correctly.

In our example above, we had N = 3 and R = W = 2. Single-table RYW consistency was ensured. But suppose you join two records, each of which had been written correctly to 2 out of 3 nodes — but with only 1 node being correct about both records. Then only that 1 node out of 3 will return a correct value for the join, and badness will ensue.

Any architecture I can think of to circumvent that problem results in — you guessed it — an awful lot of network traffic.

And that, folks, is a big part of why the NoSQL folks are so negative about joins.

Related link

Comments

12 Responses to “Read-your-writes (RYW), aka immediate, consistency”

  1. rc on May 1st, 2010 5:11 am

    Is it just joins or does it affect all queries that return more than one row or aggregate over more than one row?

    Is it possible to do a

    select country, sum(amount)
    from sales
    group by country

    query in a R+W > N system in a “read consistent” way? And with “read consistent” I mean “read consistent” in the way Oracle defines “read consistency” without causing an enormous amount of network traffic.

    Oracle means with “read consistency” that if you fire this group-by-query at 8:12:05 PM you will get back the results of how sales was at 8:12:05 PM, even if the query has to scan millions of records and other people are modifying the sales table (with or without commiting those modications) while this long running query is running.

  2. Curt Monash on May 1st, 2010 7:52 am

    @rc,

    These systems are designed for single-record lookup, at least if you want accuracy guarantees. Once you start going after multiple records at once, R+W>N loses its power to guarantee you accurate results.

  3. RC on May 1st, 2010 8:13 am

    Ok, so if you can’t do an accurate multiple record lookup in a reasonable amount of time you can’t join because joining means looking up at least two records.

    And db’s like mongodb and couchdb circumvent this limitation (only partially but stil useful) by using hierarchical records that you can load with a lot of stuff and they call those records “documents”.

    It is al becoming more clear to me. Thanks!

  4. unholyguy on May 1st, 2010 10:04 am

    I don’t think it’s so much you get inconsistent joins, but that you don’t really get database side joins at all. If you need to join two records, you would do so in the application, do two lookups, both of which presumably return correct values.

  5. Curt Monash on May 1st, 2010 5:40 pm

    @unholyguy,

    My point is that system designers have three choices:

    1. Allow incorrect joins
    2. Allow really slow joins
    3. Don’t allow joins at all

    Your point is that they choose #3.

    We’re not contradicting each other. ;)

  6. Jerry Leichter on May 1st, 2010 10:51 pm

    There’s no magic here. NoSQL systems assume that they know up front what joins will be important (or, more directly, they know what combinations of attributes/columns/however you want to describe them) show up in queries of interest. They then make sure that all those combinations are actually stored together. Since queries may have partially overlapping sets of contributing columns, this generally leads to storing the same data more than once. (As a performance optimization, this is fine. But when it leads some of the NoSQL advocates to say that denormalization is somehow a good in and of itself, it’s nonsense.)

    Anyway: In getting your read advantage by writing the data multiple times, what happens to your R+W>N algorithm? Do you wait for the data to be “stable” (more than W copies finished) for all the copies? That doesn’t seem practical in these heavily-write-oriented systems. But if you *don’t* do that, you potentially lose RYW consistency *when considering two or more queries*: If the queries hit different write sets for the same data, one may have finished writing a new value while the other didn’t.

    Yes, these systems are “eventually consistent”: The writes eventually all complete. (Well, maybe. If a hardware failure makes it impossible to complete the write set for one query while another has already completed, how do we recover? Assume a write log so that we eventually write the data? What if *that* fails? With independent write sets for the same data, you can’t just fall back on “the transaction never happened”.)

    While I’m sure there are people who *are* thinking about this, too much of the NoSQL stuff is done by people who don’t think about what correctness conditions they actually have – and in fact make a *point* of not wanting to deal with the formalisms.

    — Jerry

  7. Daniel Abadi on NoSQL design tradeoffs | DBMS2 -- DataBase Management System Services on May 2nd, 2010 1:30 am

    [...] a thought-provoking post, Daniel Abadi points out NoSQL-related terminological problems similar to the ones I just railed against, and argues To me, CAP should really be PACELC — if there is a partition (P) how does the [...]

  8. La revue de presse de la presse et des blogs BI | www.LeGrandBI.com on May 2nd, 2010 2:59 pm

    [...] Par Curt Monash, DBMS2, le 01 mai 2010. In which we reveal the fundamental inequality of NoSQL, and why NoSQL folks are so negative about joins. Lire l’article [...]

  9. VoltDB finally launches | DBMS2 -- DataBase Management System Services on June 27th, 2010 4:34 am

    [...] to get around 2PC performance issues, they sounded a lot like eventual consistency. Maybe tunable RYW consistency isn’t in the cards, but at least there’s a NoSQL-like possibility with [...]

  10. Cassandra technical overview | DBMS2 -- DataBase Management System Services on July 6th, 2010 5:10 am

    [...] RYW consistency, most commonly with N = 3 and R = W = 2. [...]

  11. The Clustrix story | DBMS2 -- DataBase Management System Services on July 29th, 2010 7:18 am

    [...] Sierra is fully ACID-compliant, with no eventual consistency or RYW consistency story. The default number of copies of each datum is two, and they’re kept consistent via [...]

  12. How immediate consistency works | DBMS 2 : DataBase Management System Services on August 26th, 2012 8:55 pm

    [...] Read-your-writes (RYW) consistency [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.