October 6, 2009

Oracle and Vertica on compression and other physical data layout features

In my recent post on Exadata pricing, I highlighted the importance of Oracle’s compression figures to the discussion, and the uncertainty about same. This led to a Twitter discussion featuring Greg Rahn* of Oracle and Dave Menninger and Omer Trajman of Vertica.  I also followed up with Omer on the phone.

*Guys like Greg Rahn and Kevin Closson are huge assets to Oracle, which is absurdly and self-defeatingly unhelpful through conventional public/analyst relations channels.

Six key tweets by Greg Rahn said:

I think the HCC 10x compression is a slideware (common) number. Personally I’ve seen it in the 12-17x range on customer data…

This was on a dimensional model. Can’t speak to the specific industry. I do believe Oracle is working on getting industry #s.

As far as I know, Exadata HCC uses a superset of compression algorithms that the commonly known column stores use…

…and it doesn’t require the compression type be in the DDL like Vertica or ParAccel. It figures out the best algo to apply.

The compression number I quoted is sizeof(uncompressed)/sizeof(hcc compressed). No indexes were used in this case.

Exadata HCC is applicable for bulk loaded (fact table) data, so a significant portion (size wise) of most DWs.

Summing up, that seems to say:

Omer and I happened to have a call scheduled to discuss MapReduce yesterday evening, but wound up using most of the time to talk about Vertica’s compression and physical layout features instead. Highlights included:

Comments

14 Responses to “Oracle and Vertica on compression and other physical data layout features”

  1. Greg Rahn on October 6th, 2009 11:50 am

    Greg, like many Vertica competitors, was wrong about Vertica requiring manual, low-level DDL..

    @Curt

    I would disagree that I am wrong. Perhaps not 100% correct, but let me explain. With Vertica there seems to be two ways to “get” compression. The first is not to specify the type which then gives the default of “auto” which seems to be either LZO or delta depending on the data type. All the other types (RLE, dictionary, etc) need to be specified in the DDL for each column by hand, or via generated DDL by the use of an external tool that looks at the data, queries or both. (See my Twitter conversation with Dave Messenger). ParAccel also has an external tool to suggest/generate the compression type per column in the DDL called Compalyze.

    My point was that with Exadata Hybrid Columnar Compression the user specifies only the compression level (for query low, for query high, or archive) on a table or partition and internally the optimal encoding/compression algorithms are determined on a per column basis with no prior knowledge of the data or queries.

    I hope that clarifies my comment.

  2. Curt Monash on October 6th, 2009 12:21 pm

    Greg,

    How do you get from Dave Menninger’s

    @GregRahn Vertica has a dozen native compression types selected automatically & adjusts on the fly based on queries.

    to your

    either LZO or delta

    ?

  3. Greg Rahn on October 6th, 2009 2:11 pm

    @Curt

    I got there via other sources of information, but I’ll defer to Dave or someone else from Vertica clarify if my interpretation of the information is correct or not and if the Vertica database software can indeed choose the any compression type for columns based on data or queries or both after the physical projection has been created and if the “auto” mode is more sophisticated than if this data type then that compression type (implicitly done at projection creation time).

    If you read through the FDR for Vertica’s ETL Database Load Benchmark you can see that the projection DDL explicitly contains encoding algorithms for some of the columns (as well as segmentation). I would have to assume that this was done either by 1) a person or 2) a external tool knowing something about the data. If it were truly done automatically inside the database, then there should be no real reason to explicitly use any encoding declarations in the DDL.

  4. Curt Monash on October 6th, 2009 2:21 pm

    @Greg,

    Vertica absolutely gives a manual tuning option.

    However, as I understand from the company it doesn’t REQUIRE manual tuning.

    Neither of those should be surprising. The devil, of course, is in the details — how much manual tuning is typically called for.

  5. Omer Trajman on October 9th, 2009 10:03 am

    Vertica’s automatic tuning does indeed recommend various encoding types including RLE. It can run at any point – prior to, during or after load and after workloads or as data change over time.

    Manual tuning is always available and we chose to use manual tuning for the ETL benchmark.

  6. Greg Rahn on October 9th, 2009 10:48 am

    @Omar

    By “automatic tuning” you mean running the external admin tool versus the default or auto choice of the compression type for a column in the DDL, correct? As I understand it, the automation is in the recommendation and the optional implementation of that recommendation (for projection choice and column encoding type). It still requires that a DBA perform those tasks, which are not necessarily difficult tasks, but none the less tasks that need to be done.

  7. Curt Monash on October 9th, 2009 4:05 pm

    @Greg,

    I happen to be on the phone w/Omer again at the moment. Short answer to your comments is that your news about Vertica seems several releases old.

  8. Greg Rahn on October 9th, 2009 6:02 pm

    @Curt

    I (and surely other readers) would be interested in the long answer. I will say that I am doubtful that my information is several releases old. IIRC it is based on version 3.0 which at this point may be several releases old, but probably less than 6 months old.

  9. Curt Monash on October 9th, 2009 6:49 pm

    @Greg,

    Fair enough.

    On the other hand, Omer has already posted saying what Vertica does. There’s only so much you can reasonably demand of a competitor in outlining what their prior releases, by way of contrast, did not do.

  10. Greg Rahn on October 9th, 2009 7:10 pm

    @Curt

    I agree – I’m not trying to dwell on the past so an answer on how it works in the current (whatever number that is or will be) release is acceptable.

    FWIW my info seems to agree with what Seth Grimes has in this paper:
    http://www.sybase.com/files/White_Papers/Sybase-IQ-Competitive-Assessment-070209-WP.pdf

  11. Curt Monash on October 9th, 2009 8:33 pm

    @Greg,

    Evidently, you think Vertica is lying through its teeth as to what its products do or don’t do. (This would by no means, of course, be the first time a vendor has done such a thing …)

    I have a simple suggestion, then. Vertica’s product is freely available for download. Why don’t you download it, check it out for yourself, and blog about what you discover?

  12. Curt Monash on October 9th, 2009 8:45 pm

    @Greg,

    As for Seth’s paper — I like the guy, so I won’t post anything more about it beyond my original harsh words on Twitter.

  13. Aster Data 4.0 and the evolution of “advanced analytic(s) servers” | DBMS2 -- DataBase Management System Services on October 30th, 2009 9:57 pm

    […] eschew RAID, instead doing mirroring in its own software. (Other examples of this strategy would be Vertica, Oracle Exadata/ASM, and Teradata Fallback.) Prior to nCluster 4.0, this caused a problem, in that […]

  14. Arun on November 11th, 2012 10:12 am

    I think , compression in exadata …just simply forget it ..if u compress then it will be slow …

    Vertica on other hand use columnar and as explained above … it dose the job vey well..u can compare both u and see oracle gets killed…

    Overall in compression and query speed with same hardware and data volume …Vertica is the fastest and oracle in the slowest in the market right now …

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.