August 5, 2008

Daniel Abadi and Sam Madden on column stores vs. indexed row stores

Daniel Abadi and Sam Madden — for whom I have the highest regard after our discussions regarding H-Store — wrote a blog post on Vertica’s behalf, arguing that column stores are far superior to fully-indexed row stores for not-very-selective queries. They link to a SIGMOD paper backing their argument up, provide some diagrams, and generally make a detailed case. As best I understand, here are some highlights:

Comments

8 Responses to “Daniel Abadi and Sam Madden on column stores vs. indexed row stores”

  1. Daniel Abadi and Sam Madden on column stores vs. indexed row stores on August 5th, 2008 9:27 pm

    [...] Original Patent Prospector [...]

  2. Daniel Abadi and Sam Madden on column stores vs. indexed row stores on August 5th, 2008 9:50 pm

    [...] Original simplerich [...]

  3. Daniel Abadi on August 6th, 2008 10:57 am

    Hi Curt,

    Thanks for the kind words. Just wanted to add a few clarifications:

    1) The column-store we used was C-Store. The reason why we chose to use C-Store over Vertica was that the paper required some additions and subtractions to the column-store code for some experiments not discussed in our blog post (we added a new join technique and made some other changes to the query executer), and it is a lot easier to add and subtract code to a research prototype than to a production ready DBMS (besides, I’m a whole lot more familiar with the C-Store codeline, having written much of it for my PhD thesis).

    Unfortunately, C-Store is a pretty slow piece of code. It only runs in single-threaded mode, and was completely unable to take advantage of the extra core on the benchmarking machine. It does not implement horizontal partitioning, which for this benchmark, would have improved performance by at least a factor of two. And the last time anyone attempted to do any serious performance optimization of the code was three years ago (seriously). A whole lot of new code has been written since then.

    Hence, C-Store is not representative of commercial column-stores (like Vertica) which do not have these shortcomings. Consequently, that 6X performance difference is much lower than what is typically observed for commercial column-stores.

    2) Keep in mind that we benchmarked single-node performance of the database code only. The DW-optimized row-stores such as DATAllegro or Greenplum tend to improve performance by adding parallelism (i.e., MPP/shared-nothing architecture). Their single-node DBMS is typically derived from an open-source DBMS (like Ingres or Postgres) and I would be surprised if they would be significantly faster running on a single machine than the commercial row-store that we used (I would actually guess they would be slower — our DBA used just about every trick in the book, including partitioning, indexing, and materialized views to get the commercial row-store to go as fast as possible).

    3) We have come through on our IOU. See: http://www.databasecolumn.com/2008/07/debunking-another-myth-columns.html, though I don’t think our experiments are exactly what you had in mind.

    Anyway, I think the bottom line message to take away from our blog post not the comparison of the row-store vs. the column-store; rather it is that an index is a very different data-structure than a column.

  4. Column stores vs. vertically-partitioned row stores | DBMS2 -- DataBase Management System Services on August 6th, 2008 6:48 pm

    [...] Abadi and Sam Madden followed up their post on column stores vs. fully-indexed row stores with one about column stores vs. vertically-partitioned row stores. Once again, the apparently [...]

  5. Bill Walters on August 6th, 2008 10:54 pm

    Can you give examples of production implementations of the columnar approach at a Fortune 100 company? If so how does the columnar approach handle mixed work loads? If we have multiple user communities using the same data submitting queries of varying complexity from simple reports to complex analytics, how would this be set up in the columnar approach? Would we need a separate column for each type of query and user? Do you have any real world examples of concurrency tests in mixed work load environments? How would the columnar approach be organized to handle customer level basket analysis with multiple count-distinct aspects in the same query. Can you discuss how the columnar approach would handle data feeds from transactional systems?

  6. Curt Monash on August 7th, 2008 2:44 am

    Bill,

    I presume you think those questions are all rhetorical, but I have in fact discussed how they handle data feeds elsewhere in this blog.

    And I addressed the concurrency point on another post where you repeated it.

    CAM

  7. Dominik Slezak on August 9th, 2008 3:01 am

    I had the great pleasure of reading the paper and attending Daniel’s presentation at SIGMOD. It’s worth mentioning that – besides the main topic discussed here – the paper describes several really cool optimizations helping column stores to achieve better performance, like “invisible join” or “late materialization” which were also considered in some earlier papers.

    Going back to “column stores versus row stores”, indeed, the key point seems to compare columns with indices as different types of structures designed to assist in different types of queries. So the question should be whether there is a way to combine advantages of those types of structures to support more mixed workloads including, for example, highly selective ad hoc queries in column stores? How about replacing conventional indices with something new, better adjusted to the columnar-like architectures?

  8. sauraj sinha on March 30th, 2009 8:21 am

    my query is behave like row
    and i want to display that as horizontal
    e.g.my query is display name like
    1. sauraj
    2 rajiv
    3.vikash
    4.vicky
    so on…………
    but i want to display that like
    1.sauraj 2.rajiv 3.vikash 4.vicky
    5.vf 6.gsdgASd so on………….

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.