Daniel Abadi and Sam Madden — for whom I have the highest regard after our discussions regarding H-Store — wrote a blog post on Vertica’s behalf, arguing that column stores are far superior to fully-indexed row stores for not-very-selective queries. They link to a SIGMOD paper backing their argument up, provide some diagrams, and generally make a detailed case. As best I understand, here are some highlights:
- They tested some queries (a benchmark of their own team’s devising, I think) on a column store, a row store normally configured, and a row store with every column indexed and the DBMS forced to use all the indexes. The third option performed TERRIBLY.
- The big reason the third option performed terribly is that the DBMS was forced to do huge amounts of work reconstituting each row — much more than it would have to do in a column store, let alone in ordinary row-store operation.
- They provide an IOU at the end for a follow-on post with a less self-defeating design on the row store. However, it’s not clear whether they plan to consider the case of a row-store with a couple of indexes on each column (one in record order, one in sort order), which would be the most obvious way to simulate the advantages of a columnar system in a conventional row store.
- The benchmark was presumably against Oracle or some such DBMS, rather than a DW-optimized row store such as DATAllegro or Kognitio WX2 (to name two that run on similar hardware to Vertica). It showed a little less than 6X performance advantage for the column store (presumably Vertica or its research predecessor C-Store). It’s not obvious to me that DW-optimized row stores wouldn’t do as well or better.