I’ve recently made a lot of posts about database compression. 3X or more compression is rapidly becoming standard; 5X+ is coming soon as processor power increases; 10X or more is not unrealistic. True, this applies mainly to data warehouses, but that’s where the big database growth is happening. And new kinds of data — geospatial, telemetry, document, video, whatever — are highly compressible as well.
This trend suggests a few interesting possibilities for hardware, semiconductors, and storage.
- The growth in demand for storage might actually slow. That said, I frankly think it’s more likely that Parkinson’s Law of Data will continue to hold: Data expands to fill the space available. E.g., video and other media have near-infinite potential to consume storage; it’s just a question of resolution and fidelity.
- Solid-state (aka semiconductor or flash) persistent storage might become practical sooner than we think. If you really can fit a terabyte of data onto 100 gigs of flash, that’s a pretty affordable alternative. And by the way — if that happens, a lot of what I’ve been saying about random vs. sequential reads might be irrelevant.
- Similarly, memory-centric data management is more affordable when compression is aggressive. That’s a key point of schemes such as SAP’s or QlikTech’s. Who needs flash? Just put it in RAM, persisting it to disk just for backup.
- There’s a use for faster processors. Compression isn’t free. What you save on disk space and I/O you pay for at the CPU level. Those 5X+ compression levels do depend on faster processors, at least for the row store vendors.