September 3, 2009

Oracle Exadata hybrid columnar compression

Oracle Database 11g Release 2 is out, and as usual I wasn’t briefed — perhaps because Oracle is more scared than its competitors are of hard questions, perhaps for some other reason entirely.*  Anyhow, Oracle Database 11 Release 2 contains an Exadata-only feature called hybrid columnar compression. The Oracle Database 11g Release 2 white paper says “data is grouped, ordered, and stored one column at a time.” But Kevin Closson clarifies:

The word hybrid is important.

Rows are still used. They are stored in an object called a Compression Unit. Compression Units can span multiple blocks. Like values are stored in the compression unit with metadata that maps back to the rows.

So, “hybrid” is the word. But, none of that matters as much as the effectiveness. This form of compression is extremely effective.

That sounds a whole lot like PAX. Specifically, in Oracle’s case I would guess “hybrid columnar compression” provides the compression benefits of column stores, but not column stores’ I/O benefits, and also not any kind of in-memory compression.

*Actually, Oracle has indicated to me multiple times that the reason is I won’t let Oracle review what I write before I publish it. My stance is that such “review” is an extremely time-wasting courtesy, in which one spends a lot of time diplomatically explaining to a vendor that, contrary to what it hopes, one really does know the difference between marketing puffery and sober fact.  I rarely do white paper projects any more, notwithstanding that my fee for those now exceeds $2,000/page. I’m not about to go through the “review” hassle for something I write for free, about a vendor who isn’t otherwise a paying client.

Comments

14 Responses to “Oracle Exadata hybrid columnar compression”

  1. Daniel Lemire on September 3rd, 2009 1:55 pm

    I think it is an excellent guess as there are no many alternatives between the decomposition storage model used by column stores (which, btw, dates back to at least 1979), and the classical row store model.

    Of course, in such a case, small differences in implementation can be a big deal. You can be a little closer to the row store, or a little closer to the column store. It would depend on the design philosophy. If you read the original PAX paper, it is not very specific on the implementation. This is not a fault, but it means there can be many PAX-like implementations. Oracle being what it is, we can guess that they stuck close to the row store philosophy. That is, referencing individual rows is probably nearly as easy as in a row store… maybe at the expense of compression.

    Yet, I’m not certain that a PAX-based architecture would have to compromise on compression. Indeed, you can use all the same tricks as in a column store including run-length encoding and working from sorted tables. There is a small penalty to pay, but with enough cleverness, it can be made negligible. At least in theory, you can.

    In fact, because the C-Store way introduces redundancies (through multiple projections on different sets of columns), it seems likely that a PAX-based architecture would use less storage overall.

    My own guess is that Oracle probably decided to close the performance gap with the likes of Vertica. They don’t need to be as fast as long as they can convince their users that they are not 50x slower than Vertica. Their sales pitch might end up being that they are not quite as fast as a true column store on all queries, but then, they offer more balance performance (referencing an individual row might be faster than with a column store).

    Disclaimer: I don’t speak with any authority. Just guessing. I do claim to know the science behind it a little bit though.

  2. Greg Rahn on September 3rd, 2009 2:46 pm

    One thing is for certain, when using hybrid columnar compression the price per TB drops significantly given 10x compression for non-archive data and 40x for archive/historical data. This new technology seems to put Oracle’s compression portfolio well ahead of Netezza’s.

  3. Curt Monash on September 3rd, 2009 3:18 pm

    If you’re really getting close-to-columnar levels of compression, that’s indeed better than Netezza’s rates.

  4. Daniel Abadi on September 3rd, 2009 7:07 pm

    Oracle’s is the third hybrid row/column storage scheme announced in the last month. It’s time for a taxonomy: http://dbmsmusings.blogspot.com/2009/09/tour-through-hybrid-columnrow-oriented.html

  5. Changing your perspective: horizontal, vertical and hybrid data models on September 4th, 2009 9:34 am

    [...] hybrids. For example, text search sometimes require full-text indexes such as suffix arrays, Oracle recently announced a row/column hybrid, and so on. Take away message: If you are stuck, try to rotate your data model. If neither the [...]

  6. Daniel Weinreb on September 4th, 2009 9:37 am

    If I understand this correctly, then I would not precisely say that they get the compression benefits of column stores. I’d say that they get compression benefits, which column stores also get. But column stores may be able to do more effective compression because they are compressing a sequence of values drawn from the same domain of data. I don’t know how big this effect is, but it might be significant for all I know.

  7. Curt Monash on September 4th, 2009 11:37 am

    Dan,

    The idea of a PAX-like scheme is that you take a certain subset of the rows (i.e., enough to fill a block, or in the case of Oracle evidently enough to fill a number of blocks) and store them VERY much as you would in a column store. Now, do you get a better compression ratio on 10s of terabytes of data than you do on megabytes or 10s of megs? Yes. But it’s my understanding that for many data sets, the difference isn’t really very much.

  8. Ankush on September 4th, 2009 1:02 pm

    >> in Oracle’s case I would guess “hybrid columnar
    >> compression” provides the compression benefits
    >> of column stores, but not column stores’ I/O
    >> benefits

    I was wondering if we consider a table with 10 columns; and a query Q is interested in only 2 columns – with architectures like Netezza and Exadata, if the unnecessary 8 columns are filtered and are never sent to the DB by storage servers – (network) IO perceived by the DB should be somewhat similar to what column stores see, no?

  9. Curt Monash on September 4th, 2009 1:10 pm

    Ankush,

    Correct. But in the case of Netezza and Exadata you’re paying for the equipment (and electricity) that does the initial I/O and filtering.

    Netezza’s argument is “Yeah, but that’s not so bad because FPGAs are cheap.” Even so, it’s not free.

  10. Relational databases: are they obselete? on September 16th, 2009 2:04 pm

    [...] frequently catching up to specialized engines. In particular, they are not limited to row stores. Curt Monash’s blog post on Oracle’s hybrid columnar approach makes this obvious. Nicolas Bruno, in Teaching an Old [...]

  11. Notes on the Oracle Database 11g Release 2 white paper | DBMS2 -- DataBase Management System Services on September 21st, 2009 1:14 pm

    [...] Oracle Database 11g Release 2 white paper I cited a couple of weeks ago has evidently been edited, given that a phrase I quoted last month is no longer to be found. Anyhow, [...]

  12. Oracle Exadata 2 capacity pricing | DBMS2 -- DataBase Management System Services on October 5th, 2009 8:20 am

    [...] These figures are highly sensitive to assumptions about Oracle’s hybrid columnar compression. [...]

  13. Exadata Hybrid Columnar Compression (HCC) for (storage) dummies « Dirty Cache on August 10th, 2012 10:06 am
  14. Column Stores: Teaching an Old Elephant New Tricks | Java and SQL Best Practices and Lessons Learned on August 27th, 2013 9:00 am

    [...] Exadata supports hybrid columnar compressions since 2009. This was (deliberately?) omitted by Stonebraker. It would be interesting to hear him compare [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.