I frequently am asked questions that boil down to:
- When should one use NoSQL?
- When should one use a new SQL product (NewSQL or otherwise)?
- When should one use a traditional RDBMS (most likely Oracle, DB2, or SQL Server)?
The details vary with context — e.g. sometimes MySQL is a traditional RDBMS and sometimes it is a new kid — but the general class of questions keeps coming. And that’s just for short-request use cases; similar questions for analytic systems arise even more often.
My general answers start:
- Sometimes something isn’t broken, and doesn’t need fixing.
- Sometimes something is broken, and still doesn’t need fixing. Legacy decisions that you now regret may not be worth the trouble to change.
- Sometimes — especially but not only at smaller enterprises — choices are made for you. If you operate on SaaS, plus perhaps some generic web hosting technology, the whole DBMS discussion may be moot.
In particular, migration away from legacy DBMS raises many issues:
- Feature incompatibility (especially in stored-procedure languages and/or other vendor-specific SQL).
- Your staff’s programming and administrative skill-sets.
- Your investment in DBMS-related tools.
- Your supply of hockey tickets from the vendor’s salesman.
Except for the first, those concerns can apply to new applications as well. So if you’re going to use something other than your enterprise-standard RDBMS, you need a good reason.
Commonly, the good reason to change DBMS is one or more of:
- Programming model. Increasingly often, dynamic schemas seem preferable to fixed ones. Internet-tracking nested data structures are just one of the reasons.
- Performance (scale-out). DBMS written in this century often scale out better than ones written in the previous millennium. Also, DBMS with fewer features find it easier to scale than more complex ones; distributed join performance is a particular challenge.
- Geo-distribution. A special kind of scale-out is geo-distribution, which is sometimes a compliance requirement, and in other cases can be a response-time nice-to-have.
- Other stack choices. Couchbase gets a lot of its adoption from existing memcached users (although they like to point out that the percentage keeps dropping). HBase gets a lot of its adoption as a Hadoop add-on.
- Licensing cost. Duh.
NoSQL products commonly make sense for new applications. NewSQL products, to date, have had a harder time crossing that bar. The chief reasons for the difference are, I think:
- Programming model!
- Earlier to do a good and differentiated job in scale-out.
- Earlier to be at least somewhat mature.
And that brings us to the 762-gigabyte gorilla — in-memory DBMS performance – which is getting all sorts of SAP-driven marketing attention as a potential reason to switch. One can of course put any database in memory, providing only that it is small enough to fit in a single server’s RAM, or else that the DBMS managing it knows how to scale out. Still, there’s a genuine category of “in-memory DBMS/in-memory DBMS features”, principally because:
- In-memory database managers can and should have a very different approach to locking and latching than ones that rely on persistent storage.
- Not all DBMS are great at scale-out.
But Microsoft has now launched Hekaton, about which I long ago wrote:
I lack detail, but I gather that Hekaton has some serious in-memory DBMS design features. Specifically mentioned were the absence of locking and latching.
My level of knowledge about Hekaton hasn’t improved in the interim; still, it would seem that in-memory short-request database management is not a reason to switch away from Microsoft SQL Server. Oracle has vaguely promised to get to a similar state one of these years as well.
Of course, HANA isn’t really a short-request DBMS; it’s an analytic DBMS that SAP plausibly claims is sufficiently fast and feature-rich for short-request processing as well.* It remains to be seen whether that difference in attitude will drive enough sustainable product advantages to make switching make sense.
*Most obviously, HANA is columnar. And it has various kinds of integrated analytics as well.
- Wants vs. needs (March, 2014)
- The refactoring of everything (July, 2013)
- Notes on memory-centric data management (January, 2014)
- Traditional databases will eventually wind up in RAM (May, 2011)
- Coverage of memory-centric DBMS flag-wavers MemSQL, Aerospike, and SAP HANA