March 24, 2007

Mike Stonebraker explains column-store data compression

The following is by Mike Stonebraker, CTO of Vertica Systems, copyright 2007, as part of our ongoing discussion of data compression. My comments are in a separate post.

Row Store Compression versus Column Store Compression

I Introduction

There are three aspects of space requirements, which we discuss in this short note, namely:

structural space requirements

index space requirements

attribute space requirements.

II Structural Space Requirements

In this section we discuss slotted pages, headers, round-up, and empty space.

All row stores that we are familiar with use a slotted page structure. Hence, records are identified by a (page_number, slot_number). In a particular location on the page, there is a slot table which in turn points to the actual record. The advantage of a slotted page structure is that a record can be moved on the page (say when it is updated to get wider) without changing the record id or moving the record to a different page. However, a slotted page architecture costs a level of indirection and the resulting extra space.

Vertica does not use a slotted page structure, and hence does not pay this cost.

In addition, every row store we are familiar with has a “header” on the front of each record. Typical items in this header include:

An immutable tuple identifier (to identify a record if it moves to a different page)

A bit map indicating the non-null fields in the record (so nulls do not have to be stored)

A pointer to the first byte of each variable length field (variable length records are exactly that – variable length)

Although the length of this header depends on the number of variable length fields, typical values are 16-32 bytes. To discover what this overhead is for your favorite DBMS, try the following test.

Generate 1 million numbers from the Fibonacci sequence, and store them as 4 byte integers in a one-column table. Without compression, the data occupies 4 Mbytes. Any amount over this value is structural overhead. For example, a DBMS with a 16 byte header would occupy 20 Mbytes.

Vertica has no header and no structural overhead. Moreover, since it automatically applies data compression, the storage cost of this table is way under 4 Mbytes.

Most row stores we are familiar with pad entries to byte or word boundaries. Hence, there is empty space left inside a tuple. The reason for this decision is that the cost of shifting the data in main memory for processing was considered to dominate the wasted space in the tuple.

Obviously a column store does not have tuples and does not pay this cost.

Lastly, there is inevitably empty space left on a storage block (typically 4K in current systems). This results from many causes, including:

Failure to reuse the space for deleted records

Failure to reuse space from records which are updated to become shorter

The inability to pack the page completely full

The roughly 1/3 of each page that is empty in B-tree structures

In contrast, Vertica packs every 64K block full.

In summary, there is substantial structural overhead in a row store, which need not be paid by a column store. Although a row store can be changed to lower the amount of structural overhead, this requires extensive modifications, and will not be accomplished by merely coding the attribute data.

III Indexing

Most major row stores support B-tree indexing, which can be either clustered (tuples are approximately in index order) or unclustered (tuples are not approximately in index order). Indexes are dense, i.e. there is one index entry for each record. The typical implementation stores (key_value, record_id) pairs in an auxiliary structure. A sophisticated implementation (which is not the norm at the present time) would “delta compress” the key values. Because they are in sorted order, the next key_value has a prefix which is identical to the previous key_value, and does not need to be stored. However, the record_id is typically 8 bytes and is not easily compressed. Moreover, it is difficult to pack index pages tightly because of insertions and deletions.

Hence, the cost of indexing is significant. Moreover, data compression (to be discussed next) does nothing to help the space requirements of indexes.

In contrast Vertica stores materialized views, not indexes. Every materialized view is in some sorted order and is indexed. However, the index is sparse (one index entry per 64K block). Moreover, record_ids are never stored, because of the Vertica architecture. Hence, indexing overhead is dramatically reduced, compared to a row store.

Of course, there are other kinds of indexes (hashing, bit maps, etc.). We have used B-trees, just as an example. One would have to compare the specifics of each vendor’s implementation. The point to be clearly made is that attribute compression does not necessarily do anything for indexes.

IV Data Compression

A column store tends to be more effective at data compression than a row store. The standard wisdom in most row stores is to use block compression. Hence, a storage block is compressed using a single technique (say Lempel-Ziv or dictionary). The technique chosen then compresses all the attributes in all the columns which occur on the block. In contrast, Vertica compresses a storage block that only contains one attribute. Hence, it can use a different compression scheme for each attribute. Obviously a compression scheme that is type-specific will beat an implementation that is “one size fits all”.

It is possible for a row store to use a type-specific compression scheme. However, if there are 50 attributes in a record, then it must remember the state for 50 type-specific implementations, and complexity increases significantly.

In addition, all row stores we are familiar with decompress each storage block on access, so that the query executor processes uncompressed tuples. In contrast, the Vertica executor processes compressed tuples. This results in better L2 cache locality, less main memory copying and generally much better performance.

Of course, any row store implementation can rewrite their executor to run on compressed data. However, this is a rewrite – and a lot of work.

V Summary

The aim of compression is superior overall performance. The best solution is a “from the ground up” architecture that simultaneously deals with:

1) Minimizing structural overhead

2) Minimizing index overhead

3) Sophisticated data compression

4) An executor that runs on compressed data

A row store that performs a “one size fits all” version of 3) is only solving part of the problem, and will be dramatically beaten by a system such as Vertica that does all four. For example, in a head-to-head, apples-to-apples comparison (both systems are storing exactly the same objects) against one of the very popular row stores with their “one size fits all” compression scheme turned on, Vertica required less than ½ of the space of the other system. It also ran more than an order of magnitude faster.


7 Responses to “Mike Stonebraker explains column-store data compression”

  1. DBMS2 — DataBase Management System Services»Blog Archive » Mike Stonebraker on data compression — comments on March 24th, 2007 1:19 am

    [...] In my opinion, the key part of Mike Stonebraker’s fascinating note on data compression was (emphasis mine): [...]

  2. DBMS2 — DataBase Management System Services»Blog Archive » Will database compression change the hardware game? on March 24th, 2007 4:43 am

    [...] I’ve recently made a lot of posts about database compression. 3X or more compression is rapidly becoming standard; 5X+ is coming soon as processor power increases; 10X or more is not unrealistic. True, this applies mainly to data warehouses, but that’s where the big database growth is happening. And new kinds of data — geospatial, telemetry, document, video, whatever — are highly compressible as well. [...]

  3. Mark on March 25th, 2007 7:06 am

    Teradata is a row store, yet it compresses by column domain (dictionary compression) and does not do block decompression on access. Access to the field is more efficient with compression because the CPU savings for IO is greater than the indirection to the value dictionary. There is a slight CPU overhead on the update side to search the value dictionary for the index to put in the row header, but this is minor in comparison to the overall win in space savings and scan rate.

  4. Database compression coming to the fore | DBMS2 -- DataBase Management System Services on August 8th, 2008 12:57 am

    [...] compression, which can be a major part of their value proposition. Most notable, perhaps, is a short paper Mike Stonebraker wrote for this blog — before he and his fellow researchers started their own blog — on column-stores’ [...]

  5. Infology.Ru » Blog Archive » Сжатие данных в СУБД выходит на первый план on October 5th, 2008 5:08 pm

    [...] клиентам. Наиболее примечательной, вероятно, является короткая статья, которую Майк Стоунбрейкер (Mike Stonebraker) …, о тех преимуществах в области компрессии, которыми [...]

  6. Huh? on April 12th, 2011 10:31 am

    You talk about efficent storage, but it sounds like a recipie for disaster if you have to recover from a failure. Of course, things like a haywire hardware never happen, and things are always properly written and read from disk… yikes. Storage is a commodity. Use it to build in fault tollerance.

  7. 列数据库特点 | Alex的个人Blog on November 7th, 2011 9:32 am

Leave a Reply

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.