In the first “meat” — i.e., other than housekeeping — post on the new Database Column blog, Mike Stonebraker makes three core claims:
1. Different DBMS should be used for different purposes. I am in violent agreement with that point, which is indeed a major theme of this blog.
2. Vertica’s software is 50X faster than anything non-columnar and 10X faster than anything columnar. Now, some of these stats surely come from the syndrome of comparing the future release of your product, as tuned by world’s greatest experts on it who also hope to get rich on their stock options in your company, vs. some well-established production release of your competitors’ products, tuned to an unknown level of excellence,* with the whole thing running test queries that you, in your impartial wisdom, deem representative of user needs. Or something like that …
*For example, were Vertica’s competitors set up with vertical partitioning?
That said, is the implicit claim that columns are 5X faster than rows for data warehouse queries fundamentally ridiculous? For data retrieval, probably not. The improvement obviously varies hugely depending on how wide a table is and what fraction of the columns are needed in a particular query, but 5X doesn’t sound hopelessly out of whack. But when we focus on the joins themselves — well, I’ll confess to not knowing quite enough about current DBMS designs to judge fully, but it sounds a bit fishy to me. I think row-based DBMS are pretty stupid about, during joins, carting around all the columns that aren’t involved in the join criteria. But the extent to which that extra baggage actually gets in the way of efficient processing doesn’t seem as clear.
To look at it another way: If columns were really all THAT great, mightn’t existing bitmap capabilities — including those of Oracle et al. — be more widely used?
3. (Different) columnar systems can and should be developed that conquer the world and take over most or all data management niches. At least, that’s what he seemed to be suggesting. It’s not a totally crazy idea. After all, text indexes are essentially columnar systems, as were the stellar pre-relational “inverted list” DBMS Adabas, Datacom/DB, and Model 204. And what is abstract datatype support other than specialized handling on a column-by-column basis?
Note: The last time I suggested to Mike the idea that what the world really needed was an extensible columnar system that used different data access methods for different datatypes in different columns, he didn’t seem to have thought about the subject a whole heck of a lot. But I continue to think that’s a very interesting direction for further work.
But can the claim be stretched so far as to suggest that columnar systems are the best way to go about OLTP? That WAS exactly what was claimed for Required Technologies, Inc.* and the “TransRelational” model. But I spent a whole lot of cycles on, as it were, debunking that tripe. More generally, I think the most natural OLTP architectures are those that keep things together that get updated and retrieved together, whether these are relational rows, “objects” in the strict OO sense of the term, or something else. So at least in its strongest form, I think this third claim reflects a bit of, er, poetic license on Mike’s part.
*Mike actually was involved with RTI. However, he does not seem to be responsible for any of the mishegas that surrounded it. Indeed, he seems to have given them some pretty sound operational advice back when they were still a viable entity.