This is the first in a 5-part series of posts on data management product choices. By pre-arrangement, Mike Stonebraker is responding on The Database Column, starting with his own taxonomy of DBMS types.
In the 1990s, most database management experts believed that a single general-purpose DBMS could meet substantially all needs. If you just kept adding in enough datatypes and data access methods (e.g., specialized indexes), your DBMS could eventually do a good job of meeting almost any requirement. And so, from the late 1990s into the beginning of this decade, it seemed that technology was supporting business trends, and the DBMS industry was inexorably consolidating. There was an oligopoly of high-end vendors, who sold increasingly similar super-sophisticated database management systems. Nothing else in database management seemed to matter.
Well, we were wrong. The big thing we overlooked is that database optimizations go down to the level of actual storage. It makes a huge difference how you arrange the data, and even what kinds of devices you store it on. High-end data warehouses run best on shared-nothing massively multi-parallel (MPP) systems. Smaller ones may in the future do best with solid-state disks. Classic online transaction processing (OLTP) systems still do well in a shared-everything architecture.* And that’s just for the relational systems; some kinds of data shouldn’t be arranged in rows and columns at all.
*Even that may be over-generous to traditional shared-everything. Oracle RAC, high availability wide-area replication, and the H-Store research project all suggest that shared-everything’s dominance of high-end OLTP is at road’s end.
The plot thickens further. Most of these technical categories are populated by small companies, with relatively immature products – and in immaturity there is diversity. Thus, there is a broad range of viable data management products, each the best choice in at least some specific application and deployment scenarios.
Recently, one more alternative has emerged – create your own DBMS, or don’t use one at all. There also are half-and-half solutions, in which (commonly) MySQL is used to manage a variety of metadata, but media files might be left just in the file system. This underlies much of the buzz around Amazon and Google services or technologies such as EC2, SimpleDB, and MapReduce. But in most cases, especially for enterprise uses, the best way to go is with a DBMS, or else some DBMS-like data management technology such as a search engine or complex event/stream processing tool.
The database diversity series so far
- Part 1: Database management system choices – overview
- Part 2: Database management system choices – 4 categories of relational
- Part 3: Database management system choices – relational data warehouse
- Part 4: Database management system choices – mid-range-relational
- Part 5: Database management system choices – beyond relational
- Mike Stonebraker’s first response
- My first rejoinder
- Mike Stonebraker’s second response
- My second rejoinder
- My first post on H-Store
- My follow-up post on H-Store and its assumptions
- My ZDnet guest post on H-Store and its timing
- My most recent 11-category data management technology taxonomy