October 13, 2011

Compression in Sybase ASE 15.7

Sybase recently came up with Adaptive Server Enterprise 15.7, which is essentially the “Make SAP happy” release. Features that were slated for 2012 release, but which SAP wanted, were accelerated into 2011. Features that weren’t slated for 2012, but which SAP wanted, were also brought into 2011. Not coincidentally, SAP Business Suite will soon run on Sybase Adaptive Server Enterprise 15.7.

15.7 turns out to be the first release of Sybase ASE with data compression. Sybase fondly believes that it is matching DB2 and leapfrogging Oracle in compression rate with a single compression scheme, namely page-level tokenization. More precisely, SAP and Sybase seem to believe that about compression rates for actual SAP application databases, based on some degree of testing.  

While Sybase ASE is unambiguously a row store, I’d be OK with calling that “columnar compression“. However, I wouldn’t expect compression ratios as strong as, say, Vertica’s, even in scenarios where Vertica was limited to dictionary compression only.

This is the second time I’ve heard recently about token compression being done one small block or page at a time (Sybase’s options for page size are 2/4/8/16K). As I noted in connection with Teradata’s similar strategy,

One benefit versus having a more global dictionary is that, since you compress fewer items, compression tokens can each be shorter. (The length of a typical token is a lot like the log of the cardinality of the dictionary.) Another benefit is that smaller dictionaries are faster to search. The obvious offsetting drawback is that a larger and more global dictionary has the potential to compress various items that wind up being left uncompressed in this smaller-scale scheme.

I could also have added:

However, Sybase ASE does buffer data in compressed form, so it enjoys at least some benefits of in-memory compression.

Comments

5 Responses to “Compression in Sybase ASE 15.7”

  1. Vlad Rodionov on October 19th, 2011 3:22 pm

    One more drawback of a local block DC :

    This scheme does work well (or does not work at all) when data tokens are quite large and data cardinality is high. Example: MD5 hashed values of a sensitive data (such as SSN, phone number), various UUIDs (which are at least 16 bytes long and often – at least 32 bytes), device/hardware ids etc.
    From my own experience having such column w/o compression in a table can ruin query performance. Global DC solves this problem. Btw, is there any vendor who supports global DC?

  2. Curt Monash on October 19th, 2011 8:45 pm

    Vlad,

    How does one do dictionary compression on phone or SS numbers?

  3. Vlad Rodionov on October 19th, 2011 11:51 pm

    OK. When someone anonymize data they usually go the easy way – use collision resistant hashing (16 – 32 bytes). There are for example 5 million telecom customers, they have 10 digit phone numbers which become 16 bytes (minimum) string after anonymization. One need less than 3 bytes to represent all customers in a database. That is 16/3 compression ratio. Local block DC won’t work in this case unless you sort all CDRs by scrambled phone number. It won’t work because the size of a block’s dictionary will be comparable to an uncompressed block size itself. Global DC will work in this case.

  4. Vlad Rodionov on October 19th, 2011 11:58 pm

    Back yo your original question. I do not know how other vendors do this, that is why I asked you if you know any DB (clustered) which supports global DC. For our internal data warehouse we use Hadoop/HBase/RDBMS pipeline for ETL and one of the stage of the pipeline is the global dictionary compression of dimensional data (we call this process – dimension normalization), but its custom built DB loader.

  5. Curt Monash on October 20th, 2011 1:33 am

    Vlad,

    I was being stupid. I was forgetting that the same phone number is used in a whole lot of different rows …

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.