February 16, 2008

Mike Stonebraker’s DBMS taxonomy

In a response to my recent five-part series on DBMS diversity, Mike Stonebraker has proposed his own taxonomy of data management technologies over on Vertica’s Database Column blog.

  1. OLTP DBMSs focused on fast, reliable transaction processing
  2. Analytic/Data Warehouse DBMSs focused on efficient load and ad-hoc query performance
  3. Science DBMSs — after all MatLab does not scale to disk-sized arrays
  4. RDF stores focused on efficiently storing semi-structured data in this format
  5. XML stores focused on semi-structured data in this format
  6. Search engines — the big players all use proprietary engines in this area
  7. Stream Processing Engines focused on real-time StreamSQL
  8. “Lean and Mean,” less-than-a-database engines focused on doing a small number of things very well (embedded databases are probably in this category)
  9. MapReduce and Hadoop — after all Google has enough “throw weight” to define a category

He goes on to say that each will be architected differently, except that — as he already convinced me back in July — RDF will be well-managed by specialty data warehouse DBMS.

I must confess that I didn’t explicitly mention array-based data stores, whether scientific ones, nor the remaining native MOLAP (Multi-Dimensional OnLine Analytic Processing) engines, nor the sui generis SAS Intelligence Storage relational data warehouse product. So great catch there. On the not-so-great side, I think Mike’s definitions of categories #8 and #9 are a bit fuzzy (embedded DBMS tend to be full DBMS, but MapReduce is less than a DBMS). And of course any finite list like his will make over-general assumptions (e.g., it’s not obvious the StreamSQL-based CEP vendors will blow away rule-oriented Apama) and omit edge cases.

But there’s really only one point on which we have meaningful disagreement — Mike dumps all OLTP and general-purpose relational DBMS into a single bucket. Considering that such products currently represent a large majority of the multi-billion dollar DBMS market, I think some finer distinctions are in order. At a minimum, let’s break them into two categories — high-end vs. mid-range. High-end systems have maximum robustness, whether because there’s a real application need or because it just makes their owners feel good. Mid-range systems do everything high-end systems did in the 1990s, and are a cheaper/better alternative for ever more database management tasks.

The series on database diversity (more links at the bottom of Part 1):

Comments

2 Responses to “Mike Stonebraker’s DBMS taxonomy”

  1. My own data management software taxonomy | DBMS2 -- DataBase Management System Services on June 26th, 2008 3:36 am

    [...] a recent webcast, I presented an 11-node data management software taxonomy, updating a post commenting on Mike Stonebraker’s. It [...]

  2. Mike Stonebraker may be oversimplifying data warehousing just a tad | DBMS2 -- DataBase Management System Services on June 26th, 2008 3:39 am

    [...] Earlier I thought Mike was forgetting about the distinction between high-end and mid-range RDBMS. Naturally, that didn’t last long. He’s actually calling the mid-range systems “open source”, but that’s a decent first approximation to a hard-to-define category. [...]

Leave a Reply




Feed including blog about database management, data warehousing, and business intelligence Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.