I wrote yesterday about the H-Store project, the latest from the team of researchers who also brought us C-Store and its commercialization Vertica. H-Store is designed to drastically improve efficiency in OLTP database processing, in two ways. First, it puts everything in RAM. Second, it tries to gain an additional order of magnitude on in-memory performance versus today’s DBMS designs by, for example, taking a very different approach to ensuring ACID compliance.
Today I had the chance to talk with two more of the H-Store researchers, Sam Madden and Daniel Abadi. Our call focused on the part that I didn’t think I’d understood well before, namely:
What are the database design and programming assumptions required for H-Store to work?
- How generally valid are they?
It’s too early in the research process for those questions to have fully definitive answers. Indeed, the guesses of the individual researchers seem in some cases to differ a bit. That said, here’s how I understand and evaluate the core H-Store assumptions and design issues at this point.
In H-Store, a “single-site” transaction is one that runs solely against data that has been partitioned to a single node in a grid. Most transactions are single-site. In the TPC-C benchmark, the 99% of inventory lookups are that are in line with the warehouse/district/customer hierarchy are single-site; the other 1% are not. In a real-life application that partitions solely by customer, any single-customer transaction will be single-site.
Many other transactions can be shoehorned into a single-site paradigm, and indeed are today. Suppose a customer wants to buy two flight segments as a unit, with the outward and return parts coming from different airlines. That’s inherently multi-site. But if you set up “escrow” pools of seat allocations reserved by various vendors, that don’t need to be synchronously checked with the airline before being sold, you can get back to a single-site paradigm. The H-Store researchers claim such hacks are already standard in high-end OLTP. I find that rather plausible. Reads can certainly be more complex – think of Amazon’s search across hundreds of thousands of used book store inventories — but updating data is a pretty separable task.
H-Store is focused on speeding up single-site transactions. The simplest explanation of the H-Store design philosophy is:
Radically speed up single-site transactions.
Don’t make multi-site transactions much slower than they are today.
Single-site transactions have predictable, very short processing times; single-threading is a great idea for them. Single-threading may not be so great for the multi-site ones, but so be it. The H-Store guys think that’s a great trade-off anyway. If the project runs into trouble, it will likely be because multi-site transactions are slowed down more than the researchers now anticipate.
You can’t get away from multi-site transactions altogether. H-Store will have a locking mechanism for multi-site transactions; it just will be an optimistic one. As for concurrency control, there seem to be two schools of thought right now. One says “Hey, even if going across the network takes 1 millisecond rather than 50 microseconds, that’s no big deal. Doing everything in a single thread is fine.” The other recalls Murphy’s Laws. I’m in the Murphy school, and believe that single-site and multi-site transactions will wind up in different threads as the H-Store design evolves.
In H-Store, transactions are all stored procedures. The H-Store researchers assert that in high-performance OLTP systems today, most of the transactions already are stored procedures, so H-Store’s reliance on them is no big deal. I don’t completely buy that. But it is obviously true that transactions are a key element of program modularity. So forcing them all into stored procedures doesn’t seem like a big deal – although some programmers will surely rant about it.
H-Store will probably wind up with persistent storage. I kind of see the theoretical justification for having data reside solely in a set of production RAM copies, some of which are separated by “more than the width of a hurricane.” Still, I think persistent storage is needed, for two reasons.
First, storing data solely in your run-time systems leaves you with too many single points of logical failure. If some logical error is introduced – whether via an attack or just human error – you can lose data no matter how many times it’s replicated. So it is safer to copy the data and store it in a disconnected way. Logic even aside, eliminating persistent storage would introduce a major fear factor – and few markets are more paranoid and safety conscious than high-end transaction processors.
Just to be clear: I’m not saying transaction rollback or even crash recovery will ever touch disk. I’m just saying that something will get periodically persisted, probably on a checkpoint/snapshot basis.