This post has a sequel.
Last week, Mike Stonebraker insulted MySQL and Facebook’s use of it, by implication advocating VoltDB instead. Kerfuffle ensued. To the extent Mike was saying that non-transparently sharded MySQL isn’t an ideal way to do things, he’s surely right. That still leaves a lot of options for massive short-request databases, however, including transparently sharded RDBMS, scale-out in-memory DBMS (whether or not VoltDB*), and various NoSQL options. If nothing else, Couchbase would seem superior to memcached/non-transparent MySQL if you were starting a project today.
*The big problem with VoltDB, last I checked, was its reliance on Java stored procedures to get work done.
Pleasantries continued in The Register, which got an amazing-sounding quote from Mike. If The Reg is to be believed — something I wouldn’t necessarily take for granted — Mike claimed that he (i.e. VoltDB) knows how to solve the distributed join performance problem.
So, it’s Stonebraker against the web. And the difference of option is severe. In May, at a MongoDB developer conference in San Francisco, Mongo creator Dwight Merriman told his audience there was “no way” to do distributed joins in a way that really scales. “I’m not smart enough to do distributed joins that scale horizontally, widely, and are super fast. You have to choose something else. We have no choice but to not be relational,” he said
“You can do distributed transactions, but if you do them with no loss of generality and you do them across a thousand machines, it’s not going to be that fast.”
Stonebraker says precisely the opposite, and in typical fashion, he goes right for the jugular. “I reject what Merriman says out of hand,” he tells The Register. Merriman and his company, 10gen, declined to comment for this story. But Stonebaker says words don’t matter. As much as he likes to wield his opinions, he insists the debate will be decided elsewhere. “Let the bake-off begin,” he crows.
But when last I checked, VoltDB made nowhere near that claim. And well it shouldn’t have. In the fully general case, there’s no way to ensure super distributed join performance other than by throwing lots and lots of gear at the problem. But if you do that, many alternatives are fast. More specialized cases may be a different matter — but there are many fast alternatives for those too.
I imagine there will be use cases for which VoltDB sustains a lead as the truly fastest alternative, similarly-architected competitors perhaps excepted.* But what Mike supposedly said seems quite forward-leaning when compared to technical reality.
*The canonical VoltDB use case is e-commerce in virtual goods, the point of “virtual” being that physical inventory might necessitate costlier kinds of joins.