March 24, 2007

Will database compression change the hardware game?

I’ve recently made a lot of posts about database compression. 3X or more compression is rapidly becoming standard; 5X+ is coming soon as processor power increases; 10X or more is not unrealistic. True, this applies mainly to data warehouses, but that’s where the big database growth is happening. And new kinds of data — geospatial, telemetry, document, video, whatever — are highly compressible as well.

This trend suggests a few interesting possibilities for hardware, semiconductors, and storage.

  1. The growth in demand for storage might actually slow. That said, I frankly think it’s more likely that Parkinson’s Law of Data will continue to hold: Data expands to fill the space available. E.g., video and other media have near-infinite potential to consume storage; it’s just a question of resolution and fidelity.
  2. Solid-state (aka semiconductor or flash) persistent storage might become practical sooner than we think. If you really can fit a terabyte of data onto 100 gigs of flash, that’s a pretty affordable alternative. And by the way — if that happens, a lot of what I’ve been saying about random vs. sequential reads might be irrelevant.
  3. Similarly, memory-centric data management is more affordable when compression is aggressive. That’s a key point of schemes such as SAP’s or QlikTech’s. Who needs flash? Just put it in RAM, persisting it to disk just for backup.
  4. There’s a use for faster processors. Compression isn’t free. What you save on disk space and I/O you pay for at the CPU level. Those 5X+ compression levels do depend on faster processors, at least for the row store vendors.

Comments

6 Responses to “Will database compression change the hardware game?”

  1. David Kanter on March 24th, 2007 8:38 pm

    Hi Curt,

    Just a quick comment regarding flash.

    Flash doesn’t actually support the bandwidth required for data-warehousing. While flash can support orders of magnitude more IOPS than disk, the bandwidth of each IOP is fairly small. The bandwidth provided is basically (size of read)*(number of IOPS). In comparison, disks provide few IOPS, but under the right circumstances, the read bandwidth becomes very very large.
    Most flash devices are not going to get any where close to the 70-100MB/s that a disk can achieve (reading data that has high spatial locality). This means flash is largely unsuitable for the underlying data in a warehouse.

    Flash makes sense for OLTP (actually Texas Memory Systems has devices targeted specifically at OLTP). The only issue there is that many frequent reads will eventually wear down the flash…if you’re sustaining thousands of writes/sec, it may prove to be an unwise choice.

    DK

  2. Curt Monash on March 25th, 2007 10:03 am

    Thanks, David!

    But is that a locked-in aspect of the technology, or can we expect it to improve over time?

  3. David Kanter on March 25th, 2007 6:58 pm

    So, I believe that the throughput/bandwidth aspect of solid state memory is here to stay for a while. Just to satisfy our mathematical curiousity, here’s are some stats on flash drives:
    http://www.bitmicro.com/products_storage_devices.php

    Most of them look like they provide up to 70MB/s – half the bandwidth of a 15K RPM drive, but ridiculous numbers of IOPs. Most disks top out at 150 IOPS, these provide up to 10x that.

    The other side of the equation is the tolerance for writes. While flash memory will never be up to par for heavy write workloads, it’s possible that future types of solid state memory could deal with 10x more writes. That might be sufficient to deal with many transactional workloads.

    Also, it’s probably good to keep in mind that the MTTF numbers that disk vendors provide are likely not very accurate. I think it would be interesting to compare the TTF for a set of solid state and traditional disk drives.

    DK

  4. Jay Jakosky on March 26th, 2007 4:05 am

    I have used QlikView for almost 6 years now and the compression is unbelievable. I routinely get over 20x compression and average 1-2 bytes per field value.

  5. On Seaside and State - Cataga on March 26th, 2007 10:25 pm

    […] While I love this work I always seem to come away with a couple of nagging issues. One being the amount of memory required to store state on the server for many requests, which over time will become less of an issue. […]

  6. Zizaco on August 29th, 2007 9:23 am

    That increase will be very good for games…
    Maybe late but it will

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.