Last week I visited with James Phillips of Couchbase, Max Schireson and Eliot Horowitz of 10gen, and Todd Lipcon, Eric Sammer, and Omer Trajman of Cloudera. I guess it’s time for a round-up NoSQL post.
Views of the NoSQL market horse race are reasonably consistent, with perhaps some elements of “Where you stand depends upon where you sit.”
- As James tells it, NoSQL is simply a three-horse race between Couchbase, MongoDB, and Cassandra.
- Max would include HBase on the list.
- Further, Max pointed out that metrics such as job listings suggest MongoDB has the most development activity, and Couchbase/Membase/CouchDB perhaps have less.
- The Cloudera guys remarked on some serious HBase adopters.*
- Everybody I spoke with agreed that Riak had little current market presence, although some Basho guys could surely be found who’d disagree.
*I hope to do a separate post on HBase adoption soon. In connection with that, any info on HBase adoption by Facebook (said to be very heavy), Twitter, et al. would be much appreciated.
The reasons for using NoSQL of course are, in some order, dynamic schemas, scale-out, and open source. I find the scale-out argument somewhat bogus,* but the data model one is very real. Depending on whom you talk with, the most important point about dynamic schemas may actually be that they’re changeable, or it may just be that you don’t have to specify a schema at the time of initial application design. MongoDB gets particular praise as a good platform on which to throw something together quickly, although predictions as to how far the application will then scale may differ depending on whether you’re talking with, say, Max or Todd.
*It’s fair to say that NoSQL systems are more proven in scale-out than most relational DBMS. Even so, I would cringe at any line of reasoning that concluded one should adopt NoSQL because it is more mature than relational alternatives.
Finally, I was perhaps too extreme when I suggested there was no good reason for Oracle to have adopted the major key/minor key approach it took in its NoSQL offering. Todd offered a reason why that approach – which he characterized as similar to Project Voldemort’s – could make sense:
- If you have some kind of global secondary index, it’s hard to maintain that index consistently without what amounts to distributed transactions.
- If you want to avoid the overhead of those, one alternative is a column-group system such as HBase or Cassandra. Those have no indexes at all, except in the sense that a column is its own index.
- Another alternative is to load as much indexing information as you can into the key of a key-value store.
I’d be interested to learn about the Couchbase and MongoDB answers to that challenge.