December 12, 2009

The legit part of the NoSQL idea

I’ve written some snarky things about the “NoSQL” concept – or at least the moniker. (Carl Olofson’s term “non-schematic databases” seems less bad.) Yet I’m actually favorable about the increasing use of SQL alternatives. Perhaps I should pull those thoughts together.

Relational database management systems were invented to let you use one set of data in multiple ways, including ways that are unforeseen at the time the database is built and the first applications against it are written. In almost all cases, RDBMS are the best way to manage data of that nature. The increasing diversity in kinds of RDBMS – especially on the analytic side – just strengthens the point: Also, RDBMS are more mature than most competing technologies. And so, for multiple reasons, your highest-value data often belongs in an RDBMS.

The main reason I wrote “often” instead of “always” is that some of your highest-value data is in formats that don’t fit well into an RDBMS at all. The most obvious example is text. Text data shouldn’t be shoehorned into the relational model, and to date it often has been best to manage text entirely outside of RDBMS.

Even lower-value data often belongs in RDBMS. eBay has huge volumes of log files stored in RDBMS. Yahoo and Facebook both prefer Hadoop over traditional RDBMS – but both are also building capabilities into Hadoop that pretty much will amount to a new RDBMS.

Science provides some pretty compelling use cases for non-SQL-oriented DBMS. So does health care. But that’s not the kind of thing the NoSQL folks seem to focus on. Rather, “NoSQL” seems mainly to encompass three kinds of systems:

So it seems that, at least for now, the legit part of the NoSQL movement is the distributed key-value stores. Frankly, even if transactional data is persisted in a key-value store, it should wind up in an RDBMS, whether OLTP or analytic. But even so, the big web companies seem to have demonstrated that key-value stores have very legitimate uses.

Comments

20 Responses to “The legit part of the NoSQL idea”

  1. Chris Anderson on December 12th, 2009 2:47 am

    Document databases have a rich heritage. CouchDB is inspired by Lotus Notes’ replication model.

    Our sweet spot is data that needs to be available on multiple machines despite unreliable networks. This is increasingly describing more of the computing world.

    CouchDB will require you to rethink some assumptions, but our data model is so flexible that you’ll start to see it across the spectrum of applications.

  2. Dan Bikle on December 12th, 2009 3:11 am

    I like the idea of easy synchronization of data between my app running on my end-users’ phones and my app running in the cloud. CouchDB is suited for this challenge. Once CouchDB runs well in mobile browsers, I’ll be pleased.

  3. Curt Monash on December 12th, 2009 3:30 am

    20 some odd years ago I was one of the biggest cheerleaders for Lotus Notes. That said, I don’t understand what a document-oriented data model necessarily has to do with being smart about synchronizing occasionally- or unreliably-connected computers. What am I missing?

  4. Bryan McCormick on December 12th, 2009 3:51 am

    What’s with the snarky comments about not learning SQL in school?

    Do a quick survey of where real “Nosql” implementations are being created and used and you’re going to be looking at a who’s who of the top tech companies. Google is obvious, with BigTable and the original MapReduce systems, Yahoo with Hadoop, Amazon with Dynamo and all of their EC2 related systems, Microsoft/Powerset with HBase, Facebook with Cassandra. Not exactly the types of places that just hire anyone off the street.

    There seems to be a lot of people that like to rail on these new systems and claim that they’re useless. The point behind all of these systems is to create a data repository that meets the transaction, reliability, and scalability requirements of the application. I’m sure there are a lot of people out there using them unnecessarily because they want to be buzzword compliant. But when the system needs to scale beyond 10, 100, and up to 1000s of servers to meet your scalability requirements, then you just might find yourself having to look at something other than a RDBMS.

    You can do something similar to what Flickr does and partition your data like crazy, running thousands of master-master pairs. Technically they’re still running in a mysql RDBMS, but in order to get a view of the complete dataset they resort to a search engine. Most application’s requirements just won’t fit into this model.

    In the end, I just don’t understand the fuss. Nosql is the new EJB, overkill for most applications, but there are some that actually do need it. So let them have it. Everyone else can play for awhile with the cool tech and then get back to what they were supposed to be doing.

  5. Curt Monash on December 12th, 2009 4:24 am

    You can get into a software development career without a computer science degree. You can get a computer science degree without learning SQL.

    And both those things may be more likely for very smart people who come out of top schools than they are for more ordinary folks.

    By the way, I once was hired as a stock analyst, by a top firm, without knowing the first thing about accounting. Three years later I was ranked as the top stock analyst following the software and services industry. It’s not a perfect analogy (I quickly became pretty hardcore in the accounting area), but perhaps it’s somewhat illustrative even so. ;)

  6. Andy E on December 12th, 2009 12:50 pm

    The NoSQL movement is correct in saying you have to throw out old DBMS conventions in order to meet the scale of many new Web 2.0 and other OLTP applications today.

    You just don’t have to throw away SQL or ACID transactions.

    But the DBMS needs to do them both differently, especially ACID.

    Mike Stonebraker just published an article on the NoSQL databases here:

    http://cacm.acm.org/blogs/blog-cacm/50678-the-nosql-discussion-has-nothing-to-do-with-sql/fulltext

    And his latest company, VoltDB (www.voltDB.com) is a commercial implementation of what he describes in the article…SQL, relational DBMS, ACID transactions, scale-out/MPP, and very high-performance (e.g., 210,000 transactions per second on a 3-node cluster).

    Again…I’m talking OLTP apps; doc data stores are a different ball of wax.

  7. Chris Anderson on December 12th, 2009 1:14 pm

    @Curt CouchDB’s data model is only transactional in the scope of a single document. This is because replication can’t be expected to preserve larger transactional boundaries in a distributed (p2p) system.

    Document modeling is appropriate because the only way to get lockless operation from a system like this is with multi-version concurrency control. It’s a small step from there to wanting a richer data model than raw key/value storage.

    For more details you can read my explanation of the CouchDB storage format: http://jchrisa.net/drl/_design/sofa/_show/post/CouchDB-Implements-a-Fundamental-Algorithm

  8. Hans on December 12th, 2009 6:30 pm

    I know next to nothing about NoSQL as an organization, but are they selling and/or promoting anything in particular?

    Prior your posts, I assumed it was just search keyword for people interested in data management outside of relational. More along the lines of “hey, there’s more to the picture than ACID and relational”. Although this may seem obvious, the vast majority of database users know exactly one product and accompanying literature. Or they know SQL and maybe the various functions and usage differences between a few relational products. So they may be interested to hear about other options.

    I had thought that NoSQL was just people getting together to look at alternatives to what they’re using now. But you seem to be asking them questions as though they have some particular goal.

  9. Curt Monash on December 12th, 2009 10:29 pm

    Hans,

    In my opinion, there’s no more substance to NoSQL than what you’re suggesting. However, some people may have attempted to get publicity suggesting there was more to it. And whether or not this was their intent, it’s what they accomplished. :)

  10. Unholyguy on December 12th, 2009 11:46 pm

    It’s interesting to see a bunch of very bright people take a fresh approach to the problem of data management and challenging the status quo.

    I agree that the movement seems to lack a central theme. From my vantage point noSQL seems to bundle several relatively unrelated concepts, and is more a reaction agasint traditional concepts in data management and traditions.

    It’s best to judge the value of the concepts relatively independently and forget the moniker. I think some of these themes will evolve back to something very similar to traditional approaches while others will have lasting impact.

    Some of the themes I see (note I am not saying I agree with them, I am just parroting and summarizing)

    – Hardware commiditization has reached a point where hardware scalability is becoming the only relevent factor for performance. Software licensing has not decreased in lockstep with hardware costs, as a result free/ open source solutions are becoming more attractive. Why give X amount of money to a software vendor when I can give X/N to a hardware vendor today?

    – Modular design, shared frameworks, and pluggable api’s will triumph over monolithic engines

    – SQL is not that great a language for querying and managing data, we can do better by supporting a wide variety of procedural languages

    – Data models / schemas are not that great of an abstraction layer for accessing data, we can do better

    – Unstructured data is becoming more and more important, less and less is data tabular in nature

    – Strict ACID compliance is becoming less important, almost more of the exception then the rule

    – Traditional RDBMs approaches are too slow with regards to time to market, due to all the upfront design work and data modeling. Better to design less, get to market faster, and deal with extra overhead later.

  11. Curt Monash on December 13th, 2009 4:20 am

    Unholyguy,

    I think that’s an excellent summary.

    CAM

  12. Joe Harris on December 14th, 2009 9:50 am

    @Unholyguy

    “SQL is not that great a language for querying and managing data” Huh? What’s your definition of data?

    SQL may be unsuitable for many things, but querying and managing data (even just semi-structured data) is not in that list. SQL offers inherent opportunities for parallelism that no other mainstream language comes close to.

    I’m all in favour of “supporting a wide variety of procedural languages” but that’s just what they are: *procedural* (i.e. PHP, Perl, Python, Ruby, etc.) They do everything *row by agonising row* and can’t even fully utilise a multicore machine. Regardless of future development in their VMs the programming structures they use will constrain a large percentage of their workloads to a single thread.

    Joe

  13. RC on December 14th, 2009 3:03 pm

    It is easier to parallelize declarative/set-based code than imperative/procedural code but it is certainly not impossible. However a function has to live up to certain constraints to be parallelizable. A reduce function for instance has to be idempotent. ( http://www.mongodb.org/display/DOCS/MapReduce )

  14. RC on December 14th, 2009 5:43 pm

    I’m not so sure about the no-schema thing when it comes to Cassandra. The manual explains that you need to think about what queries you want to support effectively ahead of time and model appropriately. You can’t add indexes later. You also need to restart the database when you want to add, remove or rename a column family.

    It sounds like a quite rigid schema to me.

  15. Andy E on December 17th, 2009 2:18 pm

    Shouldn’t the NoSQL movement consider renaming itself? Among the traditional limitations they’re trying to overcome (performance, scalability, data model flexibility, etc.), it seems to me that data access language (SQL) is the least of the problems.

    The name is catchy, but a name that positions against traditional DBMS (e.g., like what Mike Stonebraker calls the “One-Size-Fits-All” databases) would probably get broader support than just the KV vendors.

  16. RC on December 17th, 2009 3:00 pm

    @Andy E

    It is not that I disagree but what name do you have in mind?

  17. M on December 30th, 2009 12:06 am

    As a casual observer, it seems to me that NoSQL systems are really targeting a very specific problem: low latency queries on massively parallel data sets. The overhead to support distributed joins is a deal killer as far as this goal, so to simplify, they just did a way with all joins, and with no joins there really isn’t a relational model, so why keep SQL around?

  18. Ling Qian on January 6th, 2010 5:29 am

    NoSQL stands for “Not Only SQL”. I think KV store is good at manage data that have only one key, which is very popular for the Internet. However to manage multiple keys together, or relations among the data, SQL is probably the choice.

    To some extend, KV Store is good in some kind of OLTP applications (could be more popular?), while SQL stays the master of data warehouse.

  19. Interesting trends in database and analytic technology | DBMS2 -- DataBase Management System Services on February 1st, 2010 2:01 pm

    [...] me Friday he knows of a bunch of others as well. And that’s all before we even get into the NoSQL kind of [...]

  20. Chris Bird’s blog is brilliant, and update-in-place is increasingly passe’ | DBMS2 -- DataBase Management System Services on February 25th, 2010 1:46 am

    [...] the NoSQL guys point out, some of today’s most demanding applications have extremely simple schemas. [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.