Vertica finally spells out its compression claims
Omer Trajman of Vertica put up a must-read blog post spelling out detailed compression numbers, based on actual field experience (which I’d guess is from a combination of production systems and POCs):
- CDR - 8:1 (87%)
- Consumer Data - 30:1 (96%)
- Marketing Analytics - 20:1 (95%)
- Network logging - 60:1 (98%)
- Switch Level SNMP - 20:1 (95%)
- Trade and Quote Exchange - 5:1 (80%)
- Trade Execution Auditing Trails - 10:1 (90%)
- Weblog and Click-stream - 10:1 (90%)
It’s clear what Omer means by most of those categories from reading the post, but I’m a little fuzzy on what “Consumer Data” or “Marketing Analytics” comprise in his taxonomy. Anyhow, Omer’s post is a huge improvement over my recent one — based on a conversation with Omer
— which featured some far less accurate or complete compression numbers.
Omer goes on to claim that trickle-feed data is harder for rival systems to compress than it is for Vertica, and generally to claim that Vertica’s compression is typically severalfold better than that of competitive row-based systems.
Comments
3 Responses to “Vertica finally spells out its compression claims”
Leave a Reply










[...] whose details I forget for now — they could only do 2.5X. Edit: Vertica has now posted much more accurate versions of those numbers. Infobright’s 30X compression reference at TradeDoubler seems to be for a clickstream-type [...]
[...] Each of the customers cited below received “half” an Oracle Database Machine. As I previously noted, an Oracle Database Machine holds either 14.0 or 46.2 terabytes of uncompressed data. This suggests the 220 TB customer listed below — LGR Telecommunications — got compression of a little under 10:1 for a CDR (Call Detail Record) database. By comparison, Vertica claims 8:1 compression on CDRs. [...]
[...] work underway that’s getting 20X compression in call detail records, versus the 8X that Vertica claims. I’ll post more about that [...]