VoltDB and H-Store
Analysis of OLTP DBMS research project H-Store and its commercialization VoltDB. Related subjects include:
- Key H-Store researcher and VoltDB company founder Michael Stonebraker
- OLTP (OnLine Transaction Processing) database management
- Memory-centric data management
- Vertica Systems, which is incubating VoltDB
Todd Hoff (High Scalability blog) posted a lengthy examination of the case and use cases for VoltDB. That excellent post, in turn, is based on a Mike Stonebraker* webinar for VoltDB, for which the slide deck is happily available. It’s all nicely consistent with what I wrote about VoltDB last month, in connection with its launch. Read more
|Categories: In-memory DBMS, Michael Stonebraker, OLTP, Parallelization, Theory and architecture, VoltDB and H-Store||3 Comments|
VoltDB is finally launching today. As is common for companies in sectors I write about, VoltDB — or just “Volt” — has discovered the virtues of embargoes that end 12:01 am. Let’s go straight to the technical highlights:
- VoltDB is based on the H-Store technology, which I wrote about in February, 2009. Most of what I said about H-Store then applies to VoltDB today.
- VoltDB is a no-apologies ACID relational DBMS, which runs entirely in RAM.
- VoltDB has rather limited SQL. (One example: VoltDB can’t do SUMs in SQL.) However, VoltDB guy Tim Callaghan (Mark Callaghan’s lesser-known but nonetheless smart brother) asserts that if you code up the missing functionality, it’s almost as fast as if it were present in the DBMS to begin with, because there’s no added I/O from the handoff between the DBMS and the procedural code. (The data’s in RAM one way or the other.)
- VoltDB’s Big Conceptual Performance Story is that it does away with most locks, latches, logs, etc., and also most context switching.
- In particular, you’re supposed to partition your data and architect your application so that most transactions execute on a single core. When you can do that, you get VoltDB’s performance benefits. To the extent you can’t, you’re in two-phase-commit performance land. (More precisely, you’re doing 2PC for multi-core writes, which is surely a major reason that multi-core reads are a lot faster in VoltDB than multi-core writes.)
- VoltDB has a little less than one DBMS thread per core. When the data partitioning works as it should, you execute a complete transaction in that single thread. Poof. No context switching.
- A transaction in VoltDB is a Java stored procedure. (The early idea of Ruby on Rails in lieu of the Java/SQL combo didn’t hold up performance-wise.)
- Solid-state memory is not a viable alternative to RAM for VoltDB. Too slow.
- Instead, VoltDB lets you snapshot data to disk at tunable intervals. “Continuous” is one of the options, wherein a new snapshot starts being made as soon as the last one completes.
- In addition, VoltDB will also spool a kind of transaction log to the target of your choice. (Obvious choice: An analytic DBMS such as Vertica, but there’s no such connectivity partnership actually in place at this time.)
The past few years have seen a spate of startups in the analytic DBMS business. Netezza, Vertica, Greenplum, Aster Data and others are all reasonably prosperous, alongside older specialty product vendors Teradata and Sybase (the Sybase IQ part). OLTP (OnLine Transaction Processing) and general purpose DBMS startups, however, have not yet done as well, with such success as there has been (MySQL, Intersystems Cache’, solidDB’s exit, etc.) generally accruing to products that originated in the 20th Century.
Nonetheless, OLTP/general-purpose data management startup activity has recently picked up, targeting what I see as some very real opportunities and needs. So as a jumping-off point for further writing, I thought it might be interesting to collect a few observations about the market in one place. These include:
- Big-brand OLTP/general-purpose DBMS have more “stickiness” than analytic DBMS.
- By number, most of an enterprise’s OLTP/general-purpose databases are low-volume and low-value.
- Most interesting new OLTP/general-purpose data management products are either MySQL-based or NoSQL.
- It’s not yet clear whether MySQL will prevail over MySQL forks, or vice-versa, or whether they will co-exist.
- The era of silicon-centric relational DBMS is coming.
- The emphasis on scale-out and reducing the cost of joins spans the NoSQL and SQL-based worlds.
- Users’ instance on “free” could be a major problem for OLTP DBMS innovation.
I shall explain. Read more
The Boston Globe article has more detail than Vertica and VoltDB have ever OKed me to put out, and some business details they’ve never given me.
|Categories: In-memory DBMS, Memory-centric data management, OLTP, Vertica Systems, VoltDB and H-Store||Leave a Comment|
Eric Lai emailed today to ask what I thought about the NoSQL folks, and especially whether I thought their ideas were useful for enterprises in general, as opposed to just Web 2.0 companies. That was the first I heard of NoSQL, which seems to be a community discussing SQL alternatives popular among the cloud/big-web-company set, such as BigTable, Hadoop, Cassandra and so on. My short answers are:
- In most cases, no.
- Most of these technologies are designed for simple, high-volume OLTP (OnLine Transaction Processing.) Most large enterprises have an established way of doing OLTP, probably via relational database management systems. Why change?
- MapReduce is an exception, in that it’s designed for analytics. MapReduce may be useful for enterprises. But where it is, it probably should be integrated into an analytic DBMS.
- There’s one big countervailing factor to all these generalities — schema flexibility.
As for the longer form, let me start by noting that there are two main kinds of reason for not liking SQL. Read more
I’ve always honored more of an NDA about the H-Store project and its commercialization than I really felt obligated to, given how freely information was being bandied about to others. I’m still doing so.
But I think I’ll at least say that the H-Store project is now named VoltDB. The VoltDB website names two individuals — Mike Stonebraker and Andy Palmer — both of whom are founders of Vertica. Job listings on the site are for field engineer and trainer, but not developer, so that suggests something about the project’s/product’s maturity level.
If you have an extreme OLTP need, you should talk to VoltDB. If you don’t have access to Mike or Andy directly, I can hook you up with a key VoltDB marketing/outreach guy. Price may not be as much of a barrier as you’d initially fear.
If anybody from VoltDB wants to be less cloak-and-daggery and say more in the comment thread, I’d be pleased.
And yes — an open-secret working name for H-Store/VoltDB was, for a while, “Horizontica.”
|Categories: In-memory DBMS, Memory-centric data management, OLTP, Vertica Systems, VoltDB and H-Store||15 Comments|
Oracle Exadata was pre-teased as “Extreme performance.” Some incorrect speculation shortly before the announcement focused on the possibility of OLTP without disk, which clearly would speed things up a lot. I interpret that in part as being wishful thinking.
The most compelling approach I’ve seen to that problem yet is H-Store, which however makes some radical architectural assumptions. One point I didn’t stress in my earlier posts, but which turned out to be a deal-breaker for one early tire-kicker, is that to use H-Store you have to be able to shoehorn each transaction into its own stored procedure. Depending on how intricate your logic is, that might make it hard to port an existing app to H-Store.
Even for new apps, it could get in the way of some things you might want to do, such as rule-based processing. And that could be a problem. A significant fraction of the highest-performance OLTP apps are customer-facing, and customer-facing apps are one of the biggest areas where rule-based processing comes into play.
Billy Newport of IBM sees a lot of similarities between his app-server-based product ObjectGrid and H-Store. In both cases, constrained tree schemas are assumed, and OLTP performance goodness ensues. A couple of points I noted on a quick skim through his blog:
- He calls out RAM consumption as a challenge for this kind of architecture.
- He points out that it’s a big advantage to have data called and used in the same address space.
Being based in RAM is obviously a huge part of the H-Store scheme. But so is having transaction execution be close to the database.
IBM now has both ObjectGrid and a memory-centric DBMS (solidDB) that they’ve been using as a front end for DBMS. Integration of the two could be pretty interesting.
|Categories: Cache, IBM and DB2, Memory-centric data management, OLTP, solidDB, Theory and architecture, VoltDB and H-Store||Leave a Comment|
I wrote yesterday about the H-Store project, the latest from the team of researchers who also brought us C-Store and its commercialization Vertica. H-Store is designed to drastically improve efficiency in OLTP database processing, in two ways. First, it puts everything in RAM. Second, it tries to gain an additional order of magnitude on in-memory performance versus today’s DBMS designs by, for example, taking a very different approach to ensuring ACID compliance.
Today I had the chance to talk with two more of the H-Store researchers, Sam Madden and Daniel Abadi. Read more
|Categories: Database diversity, In-memory DBMS, Memory-centric data management, OLTP, VoltDB and H-Store||5 Comments|
Last week, Dan Weinreb tipped me off to something very cool: Mike Stonebraker and a group of MIT/Brown/Yale colleagues are calling for a complete rewrite of OLTP DBMS. And they have a plan for how to do it, called H-Store, as per a paper and an associated slide presentation.
|Categories: Database diversity, In-memory DBMS, Memory-centric data management, Michael Stonebraker, OLTP, Theory and architecture, VoltDB and H-Store||36 Comments|