In a response to my recent five-part series on DBMS diversity, Mike Stonebraker has proposed his own taxonomy of data management technologies over on Vertica’s Database Column blog. (Edit: Some good stuff disappeared when Vertica nuked that blog.)
- OLTP DBMSs focused on fast, reliable transaction processing
- Analytic/Data Warehouse DBMSs focused on efficient load and ad-hoc query performance
- Science DBMSs — after all MatLab does not scale to disk-sized arrays
- RDF stores focused on efficiently storing semi-structured data in this format
- XML stores focused on semi-structured data in this format
- Search engines — the big players all use proprietary engines in this area
- Stream Processing Engines focused on real-time StreamSQL
- “Lean and Mean,” less-than-a-database engines focused on doing a small number of things very well (embedded databases are probably in this category)
- MapReduce and Hadoop — after all Google has enough “throw weight” to define a category
He goes on to say that each will be architected differently, except that — as he already convinced me back in July — RDF will be well-managed by specialty data warehouse DBMS.
I must confess that I didn’t explicitly mention array-based data stores, whether scientific ones, nor the remaining native MOLAP (Multi-Dimensional OnLine Analytic Processing) engines, nor the sui generis SAS Intelligence Storage relational data warehouse product. So great catch there. On the not-so-great side, I think Mike’s definitions of categories #8 and #9 are a bit fuzzy (embedded DBMS tend to be full DBMS, but MapReduce is less than a DBMS). And of course any finite list like his will make over-general assumptions (e.g., it’s not obvious the StreamSQL-based CEP vendors will blow away rule-oriented Apama) and omit edge cases.
But there’s really only one point on which we have meaningful disagreement — Mike dumps all OLTP and general-purpose relational DBMS into a single bucket. Considering that such products currently represent a large majority of the multi-billion dollar DBMS market, I think some finer distinctions are in order. At a minimum, let’s break them into two categories — high-end vs. mid-range. High-end systems have maximum robustness, whether because there’s a real application need or because it just makes their owners feel good. Mid-range systems do everything high-end systems did in the 1990s, and are a cheaper/better alternative for ever more database management tasks.
The series on database diversity (more links at the bottom of Part 1):
- Part 1: Database management system choices – overview
- Part 2: Database management system choices – 4 categories of relational
- Part 3: Database management system choices – relational data warehouse
- Part 4: Database management system choices – mid-range-relational
- Part 5: Database management system choices – beyond relational