I’ve written some snarky things about the “NoSQL” concept – or at least the moniker. (Carl Olofson’s term “non-schematic databases” seems less bad.) Yet I’m actually favorable about the increasing use of SQL alternatives. Perhaps I should pull those thoughts together.
Relational database management systems were invented to let you use one set of data in multiple ways, including ways that are unforeseen at the time the database is built and the first applications against it are written. In almost all cases, RDBMS are the best way to manage data of that nature. The increasing diversity in kinds of RDBMS – especially on the analytic side – just strengthens the point: Also, RDBMS are more mature than most competing technologies. And so, for multiple reasons, your highest-value data often belongs in an RDBMS.
The main reason I wrote “often” instead of “always” is that some of your highest-value data is in formats that don’t fit well into an RDBMS at all. The most obvious example is text. Text data shouldn’t be shoehorned into the relational model, and to date it often has been best to manage text entirely outside of RDBMS.
Even lower-value data often belongs in RDBMS. eBay has huge volumes of log files stored in RDBMS. Yahoo and Facebook both prefer Hadoop over traditional RDBMS – but both are also building capabilities into Hadoop that pretty much will amount to a new RDBMS.
Science provides some pretty compelling use cases for non-SQL-oriented DBMS. So does health care. But that’s not the kind of thing the NoSQL folks seem to focus on. Rather, “NoSQL” seems mainly to encompass three kinds of systems:
- Key-value stores, such as Cassandra or BigTable. So far as I can tell, a key-value store is just a substitute for a transaction-processing DBMS, inferior in most ways except scalable performance, where they can shine. As an additional benefit, a key-value store frees you from that pesky SQL programming you never learned in school. What’s more, if you can’t stabilize your schema, a key-value store lets you get some level of database programming done anyway.
- Document managers such as CouchDB or MongoDB. I haven’t figured out how those are different from low-volume distributed file systems, or why anybody should care about them.
- DBMS imitations built on top of HDFS (Hadoop Distributed File System). For the most part, I think those will wind up talking SQL.
So it seems that, at least for now, the legit part of the NoSQL movement is the distributed key-value stores. Frankly, even if transactional data is persisted in a key-value store, it should wind up in an RDBMS, whether OLTP or analytic. But even so, the big web companies seem to have demonstrated that key-value stores have very legitimate uses.