April 4, 2010

The retention of everything

I’d like to reemphasize a point I’ve been making for a while about data retention:

As costs go down, the wisdom of keeping detailed data goes up. I’d go so far as to say that every piece of data generated by a human being should be preserved and kept online, legal and privacy considerations permitting.* Most forms of capital-, labor-, and/or location-based competitive advantage being commoditized and/or globalized away. But information remains a unique corporate asset. Don’t discard it lightly.

*Unless there’s an explicit law mandating data destruction, legal considerations should permit. The idea “Let’s destroy something of irreplaceable value today, against the possibility we might be brought to judgment tomorrow” is both morally and pragmatically weird. Privacy, however, may be a different matter.

That applies to the structured/tabular kinds of data I tend to focus on in this blog. It applies even more to anything that’s like a document (or email, instant message, whatever) somebody has taken the trouble to place into words.  A top document-oriented archiving analyst (and my good friend), David Ferris, quite agrees. As David puts it:

I think we’ll end up archiving everything, except egregious garbage like spam:

  • It’s too hard to get users to conform to policy.
  • Automated methods of capturing a human-understandable policy, for example “tax records,” are too hard to implement through automatic filters. The filters are too inaccurate.
  • It’s impractical to get users to classify everything, and automatic classification is too crude.
  • You never know what you might want later. Stuff you think you won’t want now may end up being very useful.
  • The cost of storage is trivial when looked at on a per-user basis.

In particular, I think information destruction is a crude instrument for the protection of privacy, wasteful at best, and likely to be vigorously resisted by governments and large businesses.  For example:

Besides, archiving technologies are getting ever more cost-effective.


3 Responses to “The retention of everything”

  1. Data-based snooping — a huge threat to liberty that we’re all helping make worse | DBMS2 -- DataBase Management System Services on April 6th, 2010 3:58 pm

    […] More on data retention […]

  2. Barfo Rama on April 7th, 2010 7:50 pm

    If I were a kid these days, I’d probably want control of the dumbass things I’ve posted, when I get older.

    “This will go on your permanent record” was a joke at one time, now it is a post-Orwellian threat.

  3. Blunks » Is the Expectation of Privacy Reasonable? on May 15th, 2010 6:37 pm

    […] Looming over this conversation (and supporting Zuckerberg’s stance, if not validating his opinion) is the technical reality that the capabilities of the network permit the centralized aggregation of multiple identities. One of the strongest sentiments that I have seen reacting against complaints of Facebook’s changes is that users should never publish anything on the internet that they wouldn’t want to be made public. We have known for years that privacy is something of an illusion on the internet, and with the rise of big data analytics, even the notion of ‘privacy through obscurity’ is becoming quaint. […]

Leave a Reply

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.