Clearpace is a UK-based startup in a similar market to what SAND Technology has gotten into – DBMS archiving, with a strong focus on compression and general cost-effectiveness. Clearpace launched its product NParchive a couple of quarters ago, and says it now has 25 people and $1 million or so in revenue. Clearpace NParchive technical highlights include:
- NParchive takes a multi-version concurrency control approach. Data is never updated in place; new information is just appended. Clearpace is careful to “time-proof” the data, keeping track and allowing the unwinding of, for example, changes in schema table structure.
- Data is stored in very large blocks – the default is 1 million rows. Currently any change to actual data values – as opposed to just database design changes – requires rewriting a whole block, but a redo log is on the roadmap.
- NParchive has four different approaches to compression, which can be used in series. Clearpace says that if any two of the four work well on a particular data set, 20X compression is realistically. If all four work well, 50-100X can be achieved. Presumably, not all have to be turned on for any particular database.
- Three of NParchive’s approaches to compression are pretty standard – tokenization, a “collection of cheap, standard compression algorithms” (including delta, which often works well), and EDLIB.
- The fourth part of the NParchive compression story has something to do with representing records as trees, and noticing when patterns are repeated and deduping them. I’m still fuzzy on how that all works. (Edit: I subsequently posted an explanation of that part.)
- Clearpace believes NParchive’s query performance is competitive with Oracle’s but not, say, Netezza’s. (And yes, that’s a meaningful assertion, even if you believe that all Oracle performance problems are solely due to poor implementation practices.)
- Clearpace says that no database administration is ever needed. Everything happens automagically – or as they say nowadays, “autonomically.”
According to Clearpace CEO John Bantleman, NParchive use cases include:
- Archiving data warehouses
- Archiving log files and similar kinds of data that never made it into a data warehouse
- Storing – and making available for query – data from decommissioned old applications
If I understood a couple of actual OEM stories correctly, we can also add to the list the archiving of transaction processing databases. Buzzphrases mentioned included information lifecycle management (ILM) and disaster recovery.
And then I coined a database archiving buzzphrase of my own …