Let’s start from some reasonable premises.
- No technology category name is ever perfect.
- It’s particularly hard to describe NoSQL (Not Only SQL) accurately, given the basic confusion as to what NoSQL is all about.
- That said, it seems pretty clear that NoSQL is about making big websites (and perhaps other cloud-like installations) run and scale.
- Dwight Merriman (founder/CEO of MongoDB vendor 10gen) is heading in the right direction when he says that the unifying ideas of NoSQL are that you do away with transactions and joins. But if he’s ever said something like “NoSQL is Foo without joins and transactions,” I don’t know what Foo is.
- Actually, I do know what Foo is – Foo is what happens when lots of people want to get small amounts each of information in or out of a database at the same time. I just don’t know what Foo is called.
- Obviously, Foo is a lot like OLTP (OnLine Transaction Processing). However, it would be pretty silly for Foo to actually be OLTP, given that one of the core points of NoSQL is that you don’t have transactions.
- It not just the “T” part of OLTP that’s fried. Calling something “OnLine” only makes sense as long as offline is an option, and offline transaction processing has been obsolete for a very long time.*
*Sure, if you strain you can talk yourself into exceptions. But the point stands.
So we need a name for Foo, where Foo is what happens when lots of people want to get small amounts each of information in or out of a database at the same time. Thus, three major subcategories of more-or-less disk-based Foo are:
- No-compromises ACID-compliant relational OLTP
- Sharded MySQL
There may be some more purely memory-centric versions too, but let’s put those aside for the moment.
Absent a better idea, I can squeeze Foo into yet another four-letter acronym:
HVSP (High-Volume Simple Processing)
That’s as imperfect as any other category name, and an awkward mouthful to boot. So I’d love to hear a better one; if you have such, please share it! In the mean time, I think “HVSP” has merit because:
- The “Processing” part should be noncontroversial.
- “High-Volume” is inherent to the challenge. If RDBMS scale well enough for your use case, using something less powerful is probably silly.* Similarly, while Oracle shines at high-volume OLTP workloads, there are many cheaper DBMS that do a fine job of OLTP at lower volumes.
- “Simple” is the core principle of NoSQL systems, which drop joins and transactions as being too much foofarah. That only makes sense at all under the assumption that you have bone-simple queries and updates, so that programming around the lack of joins and transactions isn’t all that much of a burden.
- Something similar is true of sharded MySQL.
- Less obviously, “simple” is a core principle of relational OLTP as well. The point of the relational model is to cap the complexity of data operations, or more precisely to hide that complexity from programmers.
- And overloading the word “simple” a bit, it’s fair to say that if you’re reading or writing one record at a time, you’re doing something relatively simple, at least as opposed to what you do in analytic processing. The OLTP vs. OLAP distinction is preserved in this name change.
- The whole thing matches my definition above, namely “what happens when lots of people want to get small amounts each of information in or out of a database at the same time.”
*Assuming, of course, that rows-and-tables are a good metaphor for your data structure in the first place.
Systems I’m leaving out of the HVSP and hence also NoSQL categories include:
- Hadoop and other batch-oriented MapReduce. Hadoop isn’t part of NoSQL. I’m pretty sure that Cloudera CEO Mike Olson agrees with me.
- More generally, non-SQL data stores that don’t meet the HVSP criteria. Dave Kellogg stretches things when he claims that MarkLogic is a NoSQL system. (But then, that was in a post where he seemingly praised a train wreck of an article.)
But hey – what good is a categorization if it doesn’t leave some things out?