December 20, 2008

More grist for the column vs. row mill

Daniel Abadi and Sam Madden are at it again, following up on their blog posts of six months arguing for the general superiority of column stores over row stores (for analytic query processing).  The gist is to recite a number of bases for superiority, beyond the two standard ones of less I/O and better compression, and seems to be based largely on Section 5 of a SIGMOD paper they wrote with Neil Hachem.

A big part of their argument is that if you carry the processing of columnar and/or compressed data all the way through in memory, you get lots of advantages, especially because everything’s smaller and hence fits better into Level 2 cache. There also is some kind of join algorithm enhancement, which seems to be based on noticing when the result wound up falling into a range according to some dimension, and perhaps using dictionary encoding in a way that will help induce such an outcome.

The main enemy here is row-store vendors who say, in effect, “Oh, it’s easy to shoehorn almost all the benefits of a column-store into a row-based system.”  They also take a swipe — for being insufficiently purely columnar — at unnamed columnar Vertica competitors, described in terms that seemingly apply directly to ParAccel.

Comments

2 Responses to “More grist for the column vs. row mill”

  1. Derek on December 21st, 2008 11:25 pm

    Ok I’ve heard the various reasonings to go with columnar databases, but I just don’t see how any any analysis winds up being that optimized. If you keep all the records together you only get the columnar benefits on a single row, and if you reduce each record to a series of sorted and compressed sets it seems that the number of joins required to reconstruct the records for anything but the simplest of analysis would overwhelm the I/O. I would think that at best you’re comparable, or maybe a little better in some very tuned scenarios, but at worst you have something that doesn’t work for sophisticated analysis at all. Is there some black magic on the join side that I’m missing here?

  2. Curt Monash on December 22nd, 2008 11:46 am

    Derek,

    Most of your objections, as best I understand them, go away if the data is stored in separate columns, but each column’s order is consistent with the same ordering of the rows. Am I missing anything?

    CAM

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.