From time to time, I hear of regulatory requirements to retain, analyze, and/or protect data in various ways. It’s hard to get a comprehensive picture of these, as they vary both by industry and jurisdiction; so I generally let such compliance issues slide. Still, perhaps I should use one post to pull together what is surely a very partial list.
Most such compliance requirements have one of two emphases: Either you need to keep your customers’ data safe against misuse, or else you’re supposed to supply information to government authorities. From a data management and analysis standpoint, the former area mainly boils down to:
- Information security. This can include access control, encryption, masking, auditing, and more.
- Keeping data in an approved geographical area. (E.g., its country of origin.) This seems to be one of the three big drivers for multi-data-center processing (along with latency and disaster recovery), and hence is an influence upon numerous users’ choices in areas such as clustering and replication.
The latter, however, has numerous aspects.
First, there are many purposes for the data retention and analysis, including but by no means limited to:
- Financial reporting (all industries)
- Facilitating discovery in case you’re ever sued (all industries)
- Anti-discrimination (especially financial services, but also labor law in any industry)
- Safety and environmental review (many bricks-and-mortar industries, most notably pharma)
- Rate setting (industries with regulated prices, such as insurance and utilities)
- Financial risk evaluation, e.g. Basel 3 (financial services)
- Ratting out your customers (especially commercial banking and internet)
Second, there are a variety of technical issues supporting the authorities-informing side of compliance, such as:
- Keeping a whole lot of data, cost effectively and …
- … using those archives, since you have them anyway. While it’s not a big focus area for me, I’ve written a number of posts about archiving and information preservation.
- Making your document data sufficiently searchable by somebody who’s suing you. I posted about text e-discovery a couple of times in 2008.
- Getting your regulatory reports together quickly. Examples include:
- Closing the books promptly after quarter end. A lot of software technology goes into that.
- Doing risk analysis promptly, even though it’s computationally very demanding. Risk analysis keeps coming up as an application area for scale-out analytic technologies.
- Packaging up data nicely for regulators. The classic example of this is pharmaceutical regulatory filings, which is the application that fueled Documentum’s growth 20 years ago.
- Ensuring that the data is accurate and hasn’t been tampered with. Some of that, again, is a matter of information security. But also important are the highly overlapping areas of data lineage and data provenance.
Combining all that, and more, I’d say that a considerable fraction of data management and analysis efforts are devoted to meeting legal and regulatory obligations.