September 6, 2007

Three bold assertions by Mike Stonebraker

In the first “meat” — i.e., other than housekeeping — post on the new Database Column blog, Mike Stonebraker makes three core claims:

1. Different DBMS should be used for different purposes. I am in violent agreement with that point, which is indeed a major theme of this blog.

2. Vertica’s software is 50X faster than anything non-columnar and 10X faster than anything columnar. Now, some of these stats surely come from the syndrome of comparing the future release of your product, as tuned by world’s greatest experts on it who also hope to get rich on their stock options in your company, vs. some well-established production release of your competitors’ products, tuned to an unknown level of excellence,* with the whole thing running test queries that you, in your impartial wisdom, deem representative of user needs. Or something like that …

*For example, were Vertica’s competitors set up with vertical partitioning?

That said, is the implicit claim that columns are 5X faster than rows for data warehouse queries fundamentally ridiculous? For data retrieval, probably not. The improvement obviously varies hugely depending on how wide a table is and what fraction of the columns are needed in a particular query, but 5X doesn’t sound hopelessly out of whack. But when we focus on the joins themselves — well, I’ll confess to not knowing quite enough about current DBMS designs to judge fully, but it sounds a bit fishy to me. I think row-based DBMS are pretty stupid about, during joins, carting around all the columns that aren’t involved in the join criteria. But the extent to which that extra baggage actually gets in the way of efficient processing doesn’t seem as clear.

To look at it another way: If columns were really all THAT great, mightn’t existing bitmap capabilities — including those of Oracle et al. — be more widely used?

3. (Different) columnar systems can and should be developed that conquer the world and take over most or all data management niches. At least, that’s what he seemed to be suggesting. It’s not a totally crazy idea. After all, text indexes are essentially columnar systems, as were the stellar pre-relational “inverted list” DBMS Adabas, Datacom/DB, and Model 204. And what is abstract datatype support other than specialized handling on a column-by-column basis?

Note: The last time I suggested to Mike the idea that what the world really needed was an extensible columnar system that used different data access methods for different datatypes in different columns, he didn’t seem to have thought about the subject a whole heck of a lot. But I continue to think that’s a very interesting direction for further work.

But can the claim be stretched so far as to suggest that columnar systems are the best way to go about OLTP? That WAS exactly what was claimed for Required Technologies, Inc.* and the “TransRelational” model. But I spent a whole lot of cycles on, as it were, debunking that tripe. More generally, I think the most natural OLTP architectures are those that keep things together that get updated and retrieved together, whether these are relational rows, “objects” in the strict OO sense of the term, or something else. So at least in its strongest form, I think this third claim reflects a bit of, er, poetic license on Mike’s part.

*Mike actually was involved with RTI. However, he does not seem to be responsible for any of the mishegas that surrounded it. Indeed, he seems to have given them some pretty sound operational advice back when they were still a viable entity.

Comments

3 Responses to “Three bold assertions by Mike Stonebraker”

  1. Subodh on September 12th, 2007 4:15 pm

    “…..some of these stats surely come from the syndrome of comparing the future release of your product, as tuned by world’s greatest experts on it who also hope to get rich on their stock options in your company, vs. some well-established production release of your competitors’ products, tuned to an unknown level of excellence,* with the whole thing running test queries that you, in your impartial wisdom, deem representative of user needs. Or something like that….”

    Such perfect words for such a common occurrence. The way sales people emphasize / gloat their products, I would like to get back to them with these words.

  2. Pythian Group Blog » Log Buffer #62: a Carnival of the Vanities for DBAs on September 14th, 2007 1:09 pm

    [...] DBMS2, Curt Monash responded to what he saw as three bold assertions in the article, in particular, the claims Stonebraker makes for the speed of his technology, and [...]

  3. The query from hell, and other stories | DBMS2 -- DataBase Management System Services on November 15th, 2008 7:31 am

    [...] The bold assertions by Mike Stonebraker Share: These icons link to social bookmarking sites where readers can share and discover new web pages. [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.