Omer Trajman of Vertica put up a must-read blog post spelling out detailed compression numbers, based on actual field experience (which I’d guess is from a combination of production systems and POCs):
- CDR – 8:1 (87%)
- Consumer Data – 30:1 (96%)
- Marketing Analytics – 20:1 (95%)
- Network logging – 60:1 (98%)
- Switch Level SNMP – 20:1 (95%)
- Trade and Quote Exchange – 5:1 (80%)
- Trade Execution Auditing Trails – 10:1 (90%)
- Weblog and Click-stream – 10:1 (90%)
It’s clear what Omer means by most of those categories from reading the post, but I’m a little fuzzy on what “Consumer Data” or “Marketing Analytics” comprise in his taxonomy. Anyhow, Omer’s post is a huge improvement over my recent one — based on a conversation with Omer — which featured some far less accurate or complete compression numbers.
Omer goes on to claim that trickle-feed data is harder for rival systems to compress than it is for Vertica, and generally to claim that Vertica’s compression is typically severalfold better than that of competitive row-based systems.