January 12, 2011

Mike Stonebraker on “real column stores”

Mike Stonebraker has a post up on Vertica’s blog trying to differentiate “real” from “pretend” column stores. (Edit: That post seems to have come back down, but as of 1/19 it can be found in Google Cache.) In essence, Mike argues that the One Right Way to design a column store is Vertica’s, a position that Daniel Abadi used to share but since has retreated from.

There are some good things about that post, and some not-so-good. The worst paragraph is probably

Several row-store vendors (including Oracle, Greenplum and Aster Data) now claim to be selling a column store.   Obviously, this would require a complete rewrite of a DBMS to move from Figure 1 to Figure 2.  Hence, none of the “pretenders” have actually done this.  Instead all have implemented some aspects of column stores, and then claim to be the real thing.  This blog defines what the “real enchilada” looks like, and how to tell it from the pretenders.

which I question on two levels. First, the vendors cited don’t actually claim to be selling a column store; thus, the whole premise of Mike’s post is incorrect. Second, neither those vendors nor Mike are really correct. What Mike is really doing is differentiating, in his opinion,* good column stores from bad or mediocre ones.

*That Mike’s opinion in that regard is neither (wholly) unreasonable nor (wholly) unbiased should go pretty much without saying.

A lesser oopsie is Mike’s criterion “IO-1”, which is written so confusingly that it technically seems not to be met by any of the vendors cited — including Vertica, which introduced Vertica FlexStore in mid-2009.  And while I’m at it — Aster Data nCluster definitely meets criterion IO-3; I confirmed that by asking Tasso Agyros. Mike’s “No” for Sybase IQ on his criterion CPU-5 is also pretty questionable, given that Sybase IQ operates on compressed data until “the last possible moment.”

With the minor stuff cleared away, let’s get to the heart of the matter. Mike in essence concedes that multiple competitors can get the I/O benefits of a column store, even “aggressive compression.” However, he asserts that a designed-from-the-ground-up column store also can and should have major CPU advantages over row stores or row/column hybrids, for three reasons (as I paraphase them):

Actually, I have my doubts about the competitive-comparison aspect CPU-5. I think multiple DBMS that have dictionary/token compression, for example, operate on tokenized data in memory. I’ll confess to not having a current list memorized as to who does or doesn’t, but anyhow it’s a solvable technical problem. Also, as Tasso points out, if you use a bitmapped index you’re surely operating on compressed data.

On the other hand, the goodness of CPU-5 functionality is beyond reasonable dispute. For many queries (albeit by no means all), operating on compressed data is a major advantage.

For CPU-6, things are the other way around. Vertica is probably alone in the flexibility of how it orders columns on disk. Any other system I can think of is generally restricted to two storage orders at most — e.g., some kind of universal ID/row-ID, plus a sort on the actual values of the column. But is this a significant advantage at all?

Competitors like to argue that storing even in sort-by-value order is not advantageous at all, because of the overhead at data loading time, and the questionable number of queries that benefit. That extreme seems overstated. Why would the overhead be higher than that to, for example, maintain a b-tree index? And surely queries try to pick out specific values and/or value ranges, for a significant fraction of all columns.

On the other hand, total flexibility in storage sort order might require yet more overhead, and would also be of rarer benefit. And while Vertica claims to have fixed a prior drawback to the feature — administrative complexity — in Vertica 4.0, I don’t have hard facts as to how complete the fix really is.

As for the CPU-4 inner loop point — I must confess to not knowing much about it.

Comments

24 Responses to “Mike Stonebraker on “real column stores””

  1. Daniel Lemire on January 12th, 2011 9:16 pm

    A reasonable compromise is to pick the sort order so as to maximize compression. See for example this paper:

    Reordering Columns for Smaller Indexes
    http://arxiv.org/abs/0909.1346

  2. Curt Monash on January 12th, 2011 11:38 pm

    If you have compression options such as delta, run-length encoding (RLE), prefix, etc., wouldn’t the max compression sort order usually be value sort?

  3. chet on January 13th, 2011 12:54 am

    Ultimately, does it really matter whether it is row-store or column-store (my confession, I don’t know much about column-stores)?

    We could debate what is the best architecture and theory all day long but at the end of the day, the proof is in the pudding right? I’m all for pontificating on profound (to me) matters, but it’s just that…

    chet

  4. Curt Monash on January 13th, 2011 2:20 am

    Columnar data storage has huge advantages in numerous use cases, because of the basic I/O reduction. That’s no longer much questioned.

  5. Daniel Lemire on January 13th, 2011 8:53 am

    @Curt

    Within a column, you want to sort values and that’s easy.

    In the multidimensional case (projections in C-Store/Vertica have several columns), you want lexicographic sort… which is sensitive to the order of the columns.

    Think about a date/place/vendor table… do you first sort on vendor, then place and then date… or is it date, place, vendor?

    Obviously, you get more compression on the first column you sorted, less on the second and so on.

    So… which column order is best? That’s one of the question posed by column stores.

    Meanwhile, in a row store, the order of columns is unimportant… except for compressed column-oriented indexes (such as bitmap indexes).

  6. Curt Monash on January 13th, 2011 9:59 am

    Daniel,

    Right. You can get better performance from sorting one column in the order of another one’s values. But you won’t get better compression that way. 😉

    The other wrinkle here is tokenizing multiple columns at once, and sorting on that. E.g. Rainstor (the former Clearpace).

  7. Daniel Lemire on January 13th, 2011 11:49 am

    Columnar data storage has huge advantages in numerous use cases, because of the basic I/O reduction. That’s no longer much questioned.

    True as of now, but I thought that Bruno’s analysis was convincing:

    “””we (…) show that many of the benefits of C-stores can indeed be simulated in traditional engines with no changes whatsoever. “””

    Reference:

    Teaching an Old Elephant New Tricks
    http://research.microsoft.com/apps/pubs/default.aspx?id=74156

    You can get better performance from sorting one column in the order of another one’s values. But you won’t get better compression that way.

    Consider a phone book. The entries are sorted on the last name, which makes up a very compressible column. But they are also “locally” sorted on the first name (as a secondary key). This second column will also be compressible, but less so. And, of course, the “phone number” attribute won’t be very compressible.

    Now, you might also print phone books which are sorted on the “first names”, and then on the last name.

    Which one is best? Sorting first on the last name, or on the first name? This will have an impact on your overall compression but also on the type of queries that will run fast.

    (This is true whether you use RLE, or block-wise prefix coding.)

    The other wrinkle here is tokenizing multiple columns at once, and sorting on that. E.g. Rainstor (the former Clearpace).

    In my example above, it might make sense to store last name and first name as a single meta-column. Maybe they are rarely accessed separately. So the benefit of loading “just the one column” is small.

  8. chet on January 13th, 2011 12:02 pm

    OK, column-stores reduce I/O especially in certain use cases. Always a good thing.

    So I don’t get too distracted…my point was (and I wasn’t clear), isn’t the performance the ultimate indicator? Users don’t care how it is stored on disk…they care about how it performs.

    Would this just be but one piece of an overall architecture? It’s the sum of the parts right?

  9. Marcin Zukowski on January 13th, 2011 1:35 pm

    Curt, small note about CPU-4.

    What Mike describes is exactly the idea that was implemented in MonetDB 15+ years ago: when you have a given operation, first execute it entirely on one column, then on another etc. You can call it column-at-a-time execution.

    I often explain this concept with the following story. Imagine you need to buy 100 beers for a party. A row-executor would would go to a store, take a single bottle, pay at a register, walk home, put it in the fridge, repeat 100x times. With column-at-a-time, you go to a store, take 100 beers, pay once, walk home, and you’re done.

    But I wonder if this is really what Vertica is doing. Column-at-a-time execution is known to suffer from scalability problems due to the need of materialization of intermediate results. Try carrying 100 bottles of beer at once.

    A better solution is working in a pipeline on parts of a column – this is what we presented (way before Vertica) as vector-at-a-time, or vectorized, execution and what is one of the fundaments of VectorWise. Yes, think of beer crates 🙂

    As for the non-generic nature of Mike’s post, I guess I could come up with a number of column-related aspects of a system Vertica is not doing, but I guess so could many other vendors.

    Still, it does showcase that column-approach in a DBMS can go way further than just storing columns separately, providing great benefits. No argument there.

  10. chet on January 13th, 2011 4:54 pm

    Marcin,

    Great explanation of a row-store vs. column-store.

    chet

  11. Mike Pilcher on January 13th, 2011 4:58 pm

    I commented at length on Mike’s post last week http://www.sand.com/challenging-column-store-criteria/

    I agree the debate needs to be clear on who we are comparing ourselves to and at SAND we don’t see Oracle, DB2, SQL Server as the competition; great for transaction processing, we would argue just not great for analytic purposes. We see vendors like Vertica, Infobright and now SAP’s HANA with column store technology. At SAND we are glad they finally joined the party, we have been here for a long time.

    I would like to highlight a couple of other key elements worthy of debate. We believe that software design should have a goal and so I have categorised these with the goals in mind.

    User volumes
    Like row-level locking was to on-line transaction processing, where it was required to ensure every user could get consistent access to data and ensure both data integrity and performance in highly volatile environments. At SAND we believe you need to have an approach for high concurrent user access to analytic data to spread the intelligence in the data to as many people as possible and make it usable. We believe generation based concurrency is the approach to ensure scalable performance for thousands of users. Generation based concurrency control is a lock-less concurrency control scheme well suited to the read-mostly environments of data warehousing and analytics. Unlike OLTP, conflicts are infrequent and therefore transactions proceed without incurring the overhead and loss of performance of a lock-based management scheme. GBCC results in higher transaction throughput that is predictably and highly scalable, something a conventional lock-based concurrency control scheme cannot achieve in a read-mostly environment. GBCC provides a full spectrum of transaction isolation levels enabling applications to choose the isolation mode that best balances requirements.

    At SAND we believe this lockless model is the correct model to deliver user scalability. As SAND has been in business for over 20 years we have encountered the complex problems that others in the data warehouse and analytic database market have yet to encounter. We have worked over the years to find innovative and creative ways to solve these problems. Getting a lot of users connected to SAND with great performance was one such challenge. We believe one day all analytic data warehousing databases will use GBCC. If someone has come up with other alternatives, we would be happy to hear how they deal with thousands of users while maintaining performance.

    Performance
    Performing operations on the smallest unit of work makes processing data more efficient and faster. No quantum physicist I need help tying things back to the real world to understand quantum phenomena, if I want to understand duality I need Schrödinger’s semi-dead cat (or is that semi-alive) to help me grasp the concept. In analytic databases it helps me to think about analogies to understand the power of what is going on. If we need to move a rock from New York to San Francisco, there are a lot of logistics, transportation, cost and energy expanded to do so. If we were moving a grain of sand our options are legion, including flying it from coast to coast. While the boulder may be made up of billions of grains of sand there’s only one I want, so that is the one I should move. It is simply more efficient.

    BVAT technology allows database algorithms to be expressed as operations on simple integer data rather than involving much less efficient character string data or resource-hogging floating point data. Processor execution units are at their core integer processors, and therefore superior performance is achieved when database algorithms are written as operations on integer data. SAND’s BIT vector technology provides a highly efficient means of representing and manipulating sets of integer data. The combination of set-oriented processing and integer-based algorithms enables extremely high database query performance.

    I fully admit bias, doesn’t mean we’re wrong, just biased.

  12. Randolph on January 13th, 2011 8:55 pm

    Marcin is on the ball as usual.
    People are still talking about aggressive compression as if it were the solution. Remember: disk is cheap, I/O is expensive. The best solution is not the one that compresses the most its the one that gets the answer the fastest.
    Thats the idea behind the lightweight compression in Vectorwise – minimise the time to move AND uncompress the data. Just adding zip compression to your columns may not make the system faster…

    I would love to see a performance comparison between Vertica and Vectorwise. My money os on Vectorwise.

  13. Vlad Rodionov on January 14th, 2011 2:58 pm

    “operating on compressed data is a major advantage.”

    Not really. Modern CPU can encode/decode data with several GBs per sec speed (depends on algo and implementation). The bottleneck here is definitely not in-memory decompression.

    I would say the the major advantage of any store (row or columnar) is capability of translating query into machine code on the fly and very fast. The code which can be executed by CPU directly w/o any intermediate interpretation layers. Consider it as JIT compilation for DBMS. How many DBMS have this capability?

  14. Vlad Rodionov on January 14th, 2011 3:02 pm

    @Randolph

    Can we run Vectorwise in a cluster? No, we can’t.

  15. Curt Monash on January 14th, 2011 6:33 pm

    Vlad,

    The benefit of in-memory compression is not avoiding a decompression step so much as it is fitting more data into silicon.

  16. Randolph on January 16th, 2011 5:28 am

    @vlad

    Yes we can. We have just built a MPP solution based on Vectorwise.

  17. Curt Monash on January 16th, 2011 12:01 pm

    Randolph,

    Obvious questions include:

    1. Do you have any kind of a “head node”? If so, what keeps it from being a bottleneck?

    2. Do you have much of a separate optimizer, or do you just broadcast a query to multiple Vectorwise nodes and hope for the best?

  18. Randolph on January 16th, 2011 8:29 pm

    Curt,

    We are using the MPI supercomputer API and OpenFabrics for messaging and control, which we find to be really fast with low latency (depending on the chosen fabric).

    There is no head node, we have a symmetric architecture that distributes admin operations dynamically across the cluster using normal supercomputer scheduling algorithms to achieve even load.

    Yes, deepcloud has its own optimiser that works at a fairly high level before handing SQL component queries to the Vectorwise optimisers in turn.

    Versions of the system also work with normal Ingres but the performance increase we see with Vectorwise is truly impressive.
    The combination of DeepCloud MPP and Vectorwise steps around any scaling limitations seen with high end VW systems and keeps the VW nodes running in their ‘sweet spot’.

    Syntax is near MySQL standard, rather than the ‘clunkier’ VW dialect.
    Connections are delivered to LibDrizzle or MySQLclients for forwards and backwards compatibility.

    More information is on the website: http://www.deepcloud.co
    but I’m happy to answer any other questions you may have.

    Cheers
    Randolph

  19. DJR on January 21st, 2011 2:25 am

    …because of the overhead at data loading time, and the questionable number of queries that benefit. That extreme seems overstated. Why would the overhead be higher than that to, for example, maintain a b-tree index?

    I believe storing data physically sorted is more expensive than maintaining a tree.

  20. Curt Monash on January 21st, 2011 4:04 am

    But is the ratio of costs order(1), or something bigger?

  21. Quora on January 24th, 2011 11:06 pm

    Which databases are the best at replaying historical data to backtest realtime algorithms?…

    I believe that the question, “Which databases are the best at replaying historical data to back-test real-time algorithms?” only asks 1/2 of the real question. Because of the massive amounts of data that can be involved (quote volumes can easily exce…

  22. HP to Acquire Vertica « EMA Blog Community on February 14th, 2011 11:50 am

    […] Mike Stonebraker on “real column stores” (dbms2.com) […]

  23. The curious case of the canceled column store criteria | SAND on February 15th, 2011 6:26 pm

    […] the rush of wind all the way from Massachusetts.) Given that SAND responded rather eloquently and Curt Monash and Seth Grimes both weighed in with excellent points as well, I thought they’d keep it up […]

  24. Now we know why Vertica has been so weirdly evasive | DBMS 2 : DataBase Management System Services on February 24th, 2011 2:05 am

    […] went retro in its marketing with some Mike Stonebraker column-store architetural tub-thumping — and then removed the post a few days later when it came under […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.