In a post about the recent JPMorgan Chase database outage, I suggested that JPMorgan Chase’s user profile database was over-engineered, in that various web surfing data was stored in a fully ACID-compliant manner when it didn’t really need to be. I’ve since gotten private communication expressing vehement agreement, and telling of the opposite choice being major in other major web-facing transactional systems.
What’s going on is this:
- ACID-compliant transaction integrity commonly costs more in terms of DBMS licenses and many other components of TCO (Total Cost of Ownership) than less rigorous approaches.
- Worse, it can actually hurt application uptime, by forcing your system to pull in its horns and stop functioning in the face of failures that a non-transactional system might smoothly work around.
- Other flavors of “complexity can be a bad thing” apply as well.
Thus, transaction integrity can be more trouble than it’s worth.
In essence, of course, that’s half of the classic NoSQL claim, where the other half of the claim is to assert that the same may be said of joins.
So when should you go for ACID-compliant transaction integrity, and when shouldn’t you bother? Every situation is different, but here’s a set of considerations to start you off.
- Is there a regulatory requirement for ACID-compliant transaction integrity? If so, your decision has already been made.
- Is your enterprise small enough that everything should just go in a single instance of one DBMS? If so, never mind whether your chosen DBMS is suboptimal for some particular application.
- Is money changing hands? How about goods and services? It is usually a bad idea to make mistakes about money, so if money is being spent, you probably want fully ACID-compliant transaction integrity. That’s true even if what’s being sold are virtual goods, or of course if the transactions are purely financial (as at, say, a bank).
- Is the information you’re recording doomed to inaccuracy anyway? Such situations can arise in the capture of user-contributed marketing data, or when you record information produced by lots of individually cheap machines. In such cases, you probably have a way to deal smoothly with missing data, so why pay a lot to avoid a very small number of additional errors?
- Is the information being recorded for the purpose of probabilistic analysis, such as data mining/predictive analytics? Then a few errors probably aren’t a big enough deal to worry about. (This case overlaps hugely with the prior one – the cases where you accept dirty data are generally the ones where you’re looking for trends rather than individual little facts.)
- Are you looking for rare, isolated “black swan” events? If so, it might be unsettling to accept the possibility of losing any data at all.
- How much does it cost you to be wrong? If an error would cause you to lose your reputation, then it might be worth great investment to avoid such a mistake. If an error just leads you to show a web page with slightly inferior personalization, it may not be such a big deal.
- How much does it save you to skip transaction integrity? In many cases, a standard ACID-compliant relational approach is fastest and cheapest, even if the data is of a kind you can afford to lose.
Bottom line: Should you build your applications on top of an ACID-compliant DBMS? Usually – but not always.