March 28, 2014

NoSQL vs. NewSQL vs. traditional RDBMS

I frequently am asked questions that boil down to:

The details vary with context — e.g. sometimes MySQL is a traditional RDBMS and sometimes it is a new kid — but the general class of questions keeps coming. And that’s just for short-request use cases; similar questions for analytic systems arise even more often.

My general answers start:

In particular, migration away from legacy DBMS raises many issues: 

Except for the first, those concerns can apply to new applications as well. So if you’re going to use something other than your enterprise-standard RDBMS, you need a good reason.

Commonly, the good reason to change DBMS is one or more of:

NoSQL products commonly make sense for new applications. NewSQL products, to date, have had a harder time crossing that bar. The chief reasons for the difference are, I think:

And that brings us to the 762-gigabyte gorilla — in-memory DBMS performance – which is getting all sorts of SAP-driven marketing attention as a potential reason to switch. One can of course put any database in memory, providing only that it is small enough to fit in a single server’s RAM, or else that the DBMS managing it knows how to scale out. Still, there’s a genuine category of “in-memory DBMS/in-memory DBMS features”, principally because:

But Microsoft has now launched Hekaton, about which I long ago wrote:

I lack detail, but I gather that Hekaton has some serious in-memory DBMS design features. Specifically mentioned were the absence of locking and latching.

My level of knowledge about Hekaton hasn’t improved in the interim; still, it would seem that in-memory short-request database management is not a reason to switch away from Microsoft SQL Server. Oracle has vaguely promised to get to a similar state one of these years as well.

Of course, HANA isn’t really a short-request DBMS; it’s an analytic DBMS that SAP plausibly claims is sufficiently fast and feature-rich for short-request processing as well.* It remains to be seen whether that difference in attitude will drive enough sustainable product advantages to make switching make sense.

*Most obviously, HANA is columnar. And it has various kinds of integrated analytics as well.

Related links

Comments

16 Responses to “NoSQL vs. NewSQL vs. traditional RDBMS”

  1. John on March 30th, 2014 6:05 pm

    Curt, do you see or know of any use cases where nosql and new SQL compete or the use cases are differentiated ?Also looks like almost all new SQL vendors are in memory.

  2. Curt Monash on April 1st, 2014 1:59 am

    John,

    First, I wouldn’t at all say that NewSQL products are in-memory. VoltDB and MemSQL are in-memory, but Clustrix, Tokutek et al. are no more “in-memory” than NoSQL systems are (and less so than Aerospike). Note also that Couchbase grew out of an in-memory technology as well.

    As for competition between NoSQL and NewSQL — absolutely, most especially in the case of aggregations simple enough that NoSQL could conceivably handle them. E.g., MemSQL’s flagship deal is to a considerable extent at Couchbase’s expense, namely at Zynga.

  3. aaron on April 1st, 2014 11:00 am

    I’m not following how you are parsing the differences. As I see it:
    - There are many *for purpose* database engines focusing on specific use cases of data access. You call them NewSQL if they have a SQL API and don’t front a big data stack. A better name for these may be niche vendors; I’m not sure why anyone using them would care if they are in-memory or not, more interesting is what use-cases they manage well and how they manage that.
    - There are a lot DBMS API, generally SQL, on big data vendors. These are generally big-data vendors trying to move up the feeding chain from commodity storage/compute services, though some are distributed databases branded as big data. These are niche, but the key here is a MR like processing layer.

    The way I tend to look at these is:
    - For purpose DBMS. This includes:
    – Specialized indexes (neo4j, lucene, replicated distributed KV caches, arguably doc stores fit here, as do Oracle and SQL Server in-memory column stores)
    – OLTP specialized models (could be monolithic such as volt or distributed such as couch)
    – query specialized models (perhaps with governance and other features allowing scaling) Teradata and Netezza and Vertica and SAS, etc.
    - Scalable general purpose DBMS. Mostly RDBMS – think Oracle/Sql Server/DB2
    - Alternative traditional general purpose DBMS. MySQLs, Postgres, etc.

    My experience is that the choice for the 95% of applications doesn’t really make much of a difference, and people reasonably do what works for them, what fits into the existing stacks as you describe. What is interesting is where there is friction.

    The friction could be cost, for example if a company is attempting to compete in data driven business on price of service.

    The friction could be a programmer and change driven development march. Programmers, especially in agile environments, have different goals from data modelers. This leads to database-as-persistence designs, and dynamic schema and XML and JSON storage are artifacts of that. This optimizes for app innovation; it uses database for locking and recovery, but not much for query or data consistency.

    The glamor outliers are the scale issues. These create performance and license and stability and special case management issues….

  4. aaron on April 1st, 2014 11:03 am

    Sorry – my indentation was eaten. This should say:

    The way I tend to look at the [DBMSes] is:
    1. For purpose DBMS. This includes:
    1.1 Specialized indexes (neo4j, lucene, replicated distributed KV caches, arguably doc stores fit here, as do Oracle and SQL Server in-memory column stores)
    1.2 OLTP specialized models (could be monolithic such as volt or distributed such as couch)
    1.3 query specialized models (perhaps with governance and other features allowing scaling) Teradata and Netezza and Vertica and SAS, etc.
    2. Scalable general purpose DBMS. Mostly RDBMS – think Oracle/Sql Server/DB2
    3. Alternative traditional general purpose DBMS. MySQLs, Postgres, etc.

  5. Curt Monash on April 2nd, 2014 12:26 pm

    Aaron,

    I find your categories and phrasing confusing. E.g., you seem to be suggesting that NuoDB is less scalable than Oracle. I think you’re using “OLTP” the way I use “short-request”.

    I’m guessing that we see the choice of which product to use in similar ways, but I’m not really sure what you’re actually saying. Sorry!

  6. Rob Klopp on April 3rd, 2014 1:28 pm

    Hi Curt,

    A question and a comment…

    I very much like the your posts on dynamic schemas… but I wonder what is the trade-off? If you say dynamic vs. static then why would anyone want static? There must be a downside, I guess? Any ideas… as this would help with the theme of your post on when to pick what?

    I agree that an in-memory DBMS is not a DBMS with data in-memory. It is a DBMS engine that is optimized around the advantages of no IO. The no-latch approach you mention is a great example but there are others including optimizations to utilize processor cache, the ability to use vector instruction sets, and optimizations for NUMA. There will be more coming, I expect, including optimization around nonvolatile memory.

    Rob

  7. Curt Monash on April 3rd, 2014 8:57 pm

    Hi Rob,

    Queries are easiest and most flexible against a fixed, relational schema. That’s the essence of Ted Codd’s invention.

    What we’re seeing now is people trying to invent things that have rather competitive RDBMS functionality and performance while also having rather competitive functionality and performance in dynamic schema uses — and which furthermore have all that goodness against the same data.

  8. John on April 6th, 2014 6:45 pm

    Curt,

    NoSQL primarily has taken the databases back by couple of decades. Good for puts and gets (basically single row operations). Users have to spend time upfront designing a model which serves a particular application. Schema on read sounds more like a feature. Reporting and analytics obviously are not in a sweet spot of NoSQL engines. For example Datastax provides Hadoop node and Hive for analytics which is questionable. Will the user experience the speed using Hive? Cassandra doesn’t have much to offer when it comes to scanning the data and Hive on top seems like a toy.

    thoughts?

  9. Curt Monash on April 6th, 2014 8:00 pm

    John,

    I write about these trade-offs often. If it’s easy to design a relational schema that will serve well, one probably should do so, and use an RDBMS. But that’s not always the case, in practice and often even in theory.

    Using a hammer to drive in a screw is rarely optimal, and often things are so extreme as for that metaphor to be apt.

  10. John on April 7th, 2014 4:26 pm

    Agreed. Question

    Why would a user choose cassandra or mongo over memsql or voltdb or clustrix? All of the 3 vendors claim to have support for schema on read feature with JSON type support and they are blazingly fast.

    Trying to learn and educate by your wonderful blog!!

  11. Curt Monash on April 7th, 2014 7:02 pm

    JSON support is much newer in the RDBMS than in MongoDB. In most cases, it also is kludgier and less performant than non-relational alternatives that are designed to be non-relational.

    MemSQL and VoltDB aren’t designed for databases that spill onto conventional persistent storage. (Basically; there are exceptions, as for example in my recent blog about the latest MemSQL release.)

    VoltDB has a weird programming model.

    Cassandra has long had geo-distribution.

    Etc.

  12. Rob Klopp on April 10th, 2014 4:04 pm

    Thanks, Curt…

    Is there a dynamic schema implementation you particularly like? I’ll take a look…

    Rob

  13. Curt Monash on April 10th, 2014 4:12 pm

    Rob,

    I liked Akiban’s hybrid approach, but customers evidently didn’t agree, and the company was acquired.

    WibiData Kiji is another kind of best-of-both-worlds attempt.

    MarkLogic is also trying best-of-all-worlds, but I haven’t kept up with them since soon after the latest management team change.

    And of course there are the various schema-on-need efforts, but those are analytically focused, and I suspect that’s not what you’re asking about.

  14. John Held on April 10th, 2014 7:53 pm

    Very useful discussion.

    Scale out+licensing+can live w/out joins line up with what I’ve most seen recently in evaluations of traditional SQL versus NoSQL.

    I think the more dynamic schema is also an important when the rate of change is high (as it is for product hierarchies, content taxonomies, customer profile data, and many other domains.

    Great article!

  15. Charles Fox on April 23rd, 2014 10:49 pm

    Hi Curt,
    I enjoyed the article, I am writing a research paper for a graduate school class on NoSQL and this is very helpful.
    I have a question regarding using NoSQL. What criteria would you use to determine which NoSQL database to use? More specifically MongoDB vs. Cassandra. Also, regardless of the type of NoSQL, what would be the biggest challenges to a company implementing one for the first time?
    I appreciate any feedback you can provide!
    Thanks!
    Charles

  16. clive boulton on May 14th, 2014 6:58 pm

    A friend on the Redmond campus told me for many quarters Oracle, DB2, SQL Server have been number 1, 2, 3 by revenue. Till Q1 14 when this flipped. Perhaps this is more the NoSQL products starting to take revenue away from higher end databases.

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.