Last week, Dan Weinreb tipped me off to something very cool: Mike Stonebraker and a group of MIT/Brown/Yale colleagues are calling for a complete rewrite of OLTP DBMS. And they have a plan for how to do it, called H-Store, as per a paper and an associated slide presentation.
On the system side, some of their most radical suggestions include:
- No disks or other persistent storage at all.
- No multi-threading.
- No locks.
- No redo logs (and perhaps not a lot of undo logs either).
Their programming wish list is equally dramatic. It includes:
- All transactions implemented via a kind of stored procedure.
- No more separate database language. SQL bad, Ruby good.
- The relational model replaced by something more hierarchical.
- Some limitations on the types of transactions allowed.
Mike and I have agreed for a while that current market-leading DBMS products aren’t optimal for much besides high-end OLTP. But now he’s saying that Oracle, SQL Server, and the like are utterly obsolete for OLTP as well.
There seem to be three main assumptions underlying the H-Store design, two of which Mike seems fairly certain of, and the third of which he regards as subject to further research. The first assumption is that there is no need any more for such a thing as a long-running transaction or cursor. Transactions aren’t held open any more for input from users at dumb terminals. Records aren’t sent down in batches to be scrolled through. Instead, transactions are fired off from web pages via internet protocols, with results being sent back upon transaction completion.
This assumption has three major consequences. First, multi-threading is no longer needed. That gets rid of huge overhead around connection pooling and b-tree consistency, to name just two areas. Second, traditional locking isn’t needed; H-Store relies on optimistic locking instead. Third, it’s a really pointless and high-overhead idea to call out to a separate data manipulation language like SQL. Instead, Mike favors programming languages that mix data manipulation into other logic, like Ruby on Rails (which will be used in the next iteration of H-Store), or the fourth-generation languages of the 1980s.
The second of H-Store’s three big assumptions is that you can do without disks. Disk rotation is the big technical bottleneck of database management. While most other measures of computing performance double every 2 years or so, disk rotation speeds have only increased 12.5-fold in the past half century. And unlike the case of data warehousing, the nature of OLTP makes it pretty impossible to be disk-centric without doing huge numbers of random disk seeks.
The H-Store way of getting data persistence is simply to keep multiple RAM copies of the data, widely dispersed geographically (to protect against power grid failures and physical disasters), with the (presumably identical) individual systems being as robust as the user deems fit. Clearly, that allows for hugely faster processing than anything disk-centric, even with the same data access methods (i.e, b-trees). Changing the data structure – e.g., to something like solidDB’s – should provide some further speedups yet.
That said, I don’t think going entirely without persistent storage is a great idea. There’s no way to be sure you designed out single points of failure – if nothing else, there are killer bugs, hacks, etc. And since it’s both radical and not-obviously-safe, I don’t think the no-persistence idea is likely to gain traction in the market. Fortunately for the H-Store project, persistence can easily be added in; if storage on magnetic or optical media is desired, it would be easy for one of the H-Store nodes to provide persistent checkpointing.
The third of H-Store’s three big assumptions – and the one that is called out as requiring further research – is that transactions fall primarily into certain specific categories. I want to understand that part better before making any attempt to write about it. Stay tuned.
Edit: Daniel Abadi points out, quite correctly, that I should link to the actual H-Store project home page.
The database diversity series so far (as of February 20)
- Part 1: Database management system choices – overview
- Part 2: Database management system choices – 4 categories of relational
- Part 3: Database management system choices – relational data warehouse
- Part 4: Database management system choices – mid-range-relational
- Part 5: Database management system choices – beyond relational
- Mike Stonebraker’s first response
- My first rejoinder
- Mike Stonebraker’s second response
- My second rejoinder
- My first post on H-Store
- My follow-up post on H-Store and its assumptions
- My ZDnet guest post on H-Store and its timing