Two similar companies reached out to me recently – SAND Technology and Clearpace. Their current market focus is somewhat different: Clearpace talks mainly of archiving, and sells first and foremost into the compliance market, while SAND has the most traction providing “near-line” storage for SAP databases.* But both stories boil down to pretty much the same thing: Cheap, trustworthy data storage with good-enough query capabilities. E.g., I think both companies would agree the following is a not-too-misleading first-approximation characterization of their respective products:
- Fully functional relational DBMS.
- Claims of fast query performance, but that’s not how they’re sold.
- Huge compression.
- Careful attention to time-stamping and auditability.
*Actually, SAND has two products, one of which really is sold as a DBMS, competing with Sybase IQ or Netezza. But I’m talking about the other one, which is the current main focus of SAND’s sales efforts.
When Clearpace CEO John Bantleman and I chatted last week, he spoke of such uses as:
- Cheap compliance with data-retention regulations
- Keeping data accessible even though the application that created it has been decommissioned
- Cheap duplication for disaster recovery
He also invoked the buzzphrase “information lifecycle management” (ILM).
When I pointed out that all of this could be construed as being aspects of “information preservation,” John enthusiastically agreed. Yesterday I bounced that phrase off SAND’s marketing chief Linda Arens, and she liked it too.
And that makes perfect sense. What do “archives” and “archivists” do in the classical senses of the terms? First and foremost, they preserve information. They don’t feel they’ve done their job well if it’s too too difficult to access, but utter ease-of-use is not their top concern.
Digression: I actually spent a day once with a university archivist (retired). She came to my house to check out a portrait of one of my Monasch ancestors and to rummage through my 19th Century family photos. Australian readers — and WW1 history buffs — will have little trouble guessing which university she was from.
So far, so good. But why use a specialty product for the purpose of information preservation, when you can instead just dump everything into your data warehouse environment? Well, the vast majority of large enterprises do just that, getting by without specialized technology from SAND, Clearpace, or any close competitor. And of course data warehouse technology is getting cheaper very quickly. So not all enterprises will ever need what SAND and Clearpace have to offer.
But every enterprise does need to think about a comprehensive information preservation strategy. Too often ILM puts the cart before the horse, focusing on throwing stuff away more than on keeping it. Notwithstanding the excessive popularity of some inherently shady legal tricks — “Let’s make sure to destroy the evidence before somebody can think of ordering us to preserve it” — and also notwithstanding some legitimate rules about privacy — preserving information is almost always better than losing it, whether accidentally or on purpose.
So I’d like to propose a deceptively simple exercise for any enterprise, really of any size. Inventory all the sources of potentially valuable information that are already being tracked in your enterprise. Then make a matching list of the preservation strategies for each. Some of those strategies will be very good. Others will fall into that ever-popular category “not ideal, but also not bad enough to bother fixing.” Then see which kinds of information are covered neither by a good preservation strategy, nor one that’s good enough. And think about whether you should move all those into one or two* information preservation environments of last resort.
*Two = one for tabular data + one for documents and media