May 14, 2009

Facebook’s experiences with compression

One little topic didn’t make it into my long post on Facebook’s Hadoop/Hive-based data warehouse: Compression. The story seems to be:

Comments

2 Responses to “Facebook’s experiences with compression”

  1. Ivan Novick on May 14th, 2009 1:12 pm

    Also be careful about the memory usage of bzip2, which may be prohibitively high on large data sets.

    On gzip there are 9 different compression levels, level 6 seems to give the best balance between cost and data size. 6X seems about right for web log data

  2. Ashish Thusoo on May 18th, 2009 12:53 am

    Ivan,

    Memory usage is definitely an issue and that typically means that we would not be able to run as many map/reduce slots in the cluster.

    We are targeting this mostly for archival at this point and there the latency requirements on decompressing or compressing this data is not that high yet.

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.