Mike Stonebraker has now responded to the second post in my five-part database diversity series. Takeaways and rejoinders include:
I obviously wasn’t clear when I talked about two major competitive relational challenges to Oracle, et al. I simply was referring to
- Mid-range relational DBMS and
- High-end analytic DBMS
Earlier I thought Mike was forgetting about the distinction between high-end and mid-range RDBMS. Naturally, that didn’t last long. He’s actually calling the mid-range systems “open source”, but that’s a decent first approximation to a hard-to-define category.
My real reservations about Mike’s post lie in the area of analytic DBMS. Mike points out that there are two kinds — row-based (which he thinks are destined to be obsoleted) and column-based (which he thinks are destined over time to run “the vast majority of analytic workloads”). Now, his predictions may eventually come true. But row stores dominate the specialty data warehouse DBMS market today.
Wha’s more, some major use cases such as data mining or on-the-fly scoring look inherently row-centric to me. Also, consider website personalization. It calls for pinpoint data lookup, integrated with analytics. Will that eventually be done by beefed-up OLTP systems? Stream processors? Column stores? Analytic row stores? None of the possibilities can yet be ruled out. Indeed, I’m not sure we can even make a good start on predicting the ultimate answer unless we first figure out what will be done in RAM, and what will continue to be driven from disk.
Speaking of assumptions, there’s a major sub-text coloring all these discussions. Stonebraker is on record claiming that a vast majority of data warehouses (he uses figures up to 99%) have or should have single-fact-table schemas. Indeed, Mike’s columnar product Vertica hasn’t yet been enhanced to handle anything but the single fact table scenario. While that certainly fits a lot of applications, it also leaves a lot out. Profitability calculations like those Kalido specializes in will have one fact table for revenue, but others for costs or margin deductions. Marketing warehouses might have one fact table each for fundamentally different kinds of customer contact (web, phone, etc.), plus one for actual transactions, plus one for external data.
This may ultimately be a distinction without a difference, in that a system well designed for 1 fact table will also do a good job on N fact tables, as long as the N tables have a shared key (e.g., customer ID) that can be used to simultaneously partition them. But it illustrates that columnar systems haven’t proved their eventual dominance quite yet. And if we’re looking at current and near-future use, row-based specialty data warehouse systems still have a huge role to play.
The database diversity series so far
- Part 1: Database management system choices – overview
- Part 2: Database management system choices – 4 categories of relational
- Part 3: Database management system choices – relational data warehouse
- Part 4: Database management system choices – mid-range-relational
- Part 5: Database management system choices – beyond relational
- Mike Stonebraker’s first response
- My first rejoinder
- Mike Stonebraker’s second response
- My second rejoinder
- My first post on H-Store