When I was at SIGMOD last week, ParAccel and its SIGMOD talk were mentioned several times, always in puzzled and at least slightly unflattering terms. (Typical comment: “Why did they present a paper about that? We were doing the same thing in our company years ago.”) That doesn’t prove much per se, since most of the mentions were by competitors and/or Vertica-affiliated academics, and since my own unflattering ParAccel-related comments were rather fresh at the time.
But now Daniel Abadi has done a brilliant, detailed, speculative analysis of ParAccel’s publications. Here’s the meat, emphasis mine:
(1) Why did they configure their TPC-H application with such a high amount of disk I/O throughput capabilty when they are a column-store? (Stonebraker’s question)
(2) Why did queries spend seemingly 6X more time doing I/O than a column-store should have to do?
(3) Why are they worried about queries with thousands of joins?
(4) Why do they think TPC-H/TPC-DS queries have 42 joins?
And then a theory that answers all four questions at the same time came to me. Perhaps ParAccel directly followed my advice (see option 1) on “How to create a new column-store DBMS product in a week“. They’re not a column-store. They’re a vertically partitioned row-store (this is how column-stores were built back in the 70s before we knew any better). Each column is stored in its own separate table inside the row-store (PostgreSQL in ParAccel’s case). Queries over the original schema are then automatically rewritten into queries over the vertically partitioned schema and the row-store’s regular query execution engine can be used unmodified. But now, every attribute accessed by the query now adds an additional join to the query plan (since the vertical partitions for each column in a table have to be joined together).
This immediately explains why they are worried about queries with hundreds to thousands of joins (questions 3 and 4). But it also explains why they seem to be doing much more I/O than a native column-store. Since each vertical partition is its own table, then each tuple in a vertical partition (which contains just one value) is preceded by the row-store’s tuple header. In PostgreSQL this tuple header is on the order of 27 bytes. So if the column width is 4 bytes, then there is a factor of 7 extra space used up for the tuple header relative to actual user data. And if the implementation is super naive, they also will need an additional 4 bytes to store a tuple identifier for joining vertical partitions from the same original table with each other. This answers questions 1 and 2, as the factor of 6 worse I/O efficiency is now obvious.
It will be interesting to see whether ParAccel comments, but even it does, I wouldn’t necessarily take ParAccel’s statements as dispositive. For example — and illustrative of my view of ParAccel’s trustworthiness — I believe ParAccel’s competition who tell me that ParAccel’s claim to have won or at least tied all POCs on performance is flat-out untrue.