In my quick reactions to the EMC/Greenplum announcement, I opined
I think that even software-only analytic DBMS vendors should design their systems in an increasingly storage-aware manner
promising to explain what I meant later on. So here goes.
There always have been good technical reasons to tailor hardware to analytic database software. Data moves through disk controller, network, RAM, CPU and more, each with its own data rate. Getting different kinds of parts into the right balance doesn’t completely eliminate bottlenecks – the Wonderful One-Hoss Shay is poetic fiction – but it certainly can help. As a result, every analytic DBMS vendor of any size offers at least one of:
- A Type 0 appliance
- A Type 1 appliance
- A “recommended hardware configuration”
And beyond performance, appliances and pre-specified hardware configurations offer at least the possibility of easing installation, administration, and support.
There also are marketing reasons to offer an appliance or something appliance-like.
- To various extents, Oracle, Teradata, Microsoft, IBM, Netezza, and EMC are all telling the world that your hardware should be optimized for your analytic DBMS.
- Smaller vendors such as Vertica and Aster Data also tend to cobble together some sort of appliance, in part so they don’t have to say they disagree.
- Thus, a “We don’t see any point in special hardware assembly at all” story would leave an analytic DBMS vendor pretty far out on a limb.
Finally, there are three overlapping technical trends that increase the need for storage-awareness in analytic DBMS. First and foremost is the rise of solid-state memory. For starters, I believe:
- Flash will be important for analytic DBMS soon.
- There are good technical reasons for this.
- Oracle’s marketing will make a big deal out of the flash aspects of Exadata, so other analytic DBMS vendors will need a response. And of course, if Netezza or Teradata preemptively make a big deal of their flash-based offerings, that just adds to the pressure for Flash adoption on everybody else.
- But it’s not just flash – flash, other solid-state memory, and disk will be combined in various ways.
But this move to flash will require analytic DBMS vendors to be increasingly storage-aware for at least three reasons:
- It just adds another level of complexity to their hardware-balancing challenges.
- Flash overturns some of the fundamental assumptions of modern analytic DBMS, in particular:
- Sequential reads are hugely better than random
- The worst bottleneck is at the point where data comes out of storage.
- The flash technology stack is still immature, and you have to pick your poison in how to deal with it. Vendors are making very different choices in this regard – and they do have to choose.
Another trend that could naturally lead analytic DBMS vendors to be more storage-aware is their incorporation of what could be viewed as hierarchical storage/ILM technologies. Different data is stored in different ways and/or on different kinds of storage hardware. (Vendors pursuing – you guessed it – different approaches to this include Teradata, Greenplum, Vertica, and Sybase.) The more automatic that process is, the more storage-aware the DBMS will need to be.
Finally, there are reasons to think that DBMS should be split between conventional servers and smart storage. This is, of course, the Exadata strategy. Netezza’s two-processor approach, while rather different, also somewhat validates the idea.