There are plenty of viable alternatives to relational database management systems. For short-request processing, both document stores and fully object-oriented DBMS can make sense. Text search engines have an important role to play. E. F. “Ted” Codd himself once suggested that relational DBMS weren’t best for analytics.* Analysis of machine-generated log data doesn’t always have a naturally relational aspect. And I could go on with more examples yet.
*Actually, he didn’t admit that what he was advocating was a different kind of DBMS, namely a MOLAP one — but he was. And he was wrong anyway about the necessity for MOLAP. But let’s overlook those details.
Nonetheless, relational DBMS dominate the market. As I see it, the reasons for relational dominance cluster into four areas (which of course overlap):
- Data re-use. Ted Codd’s famed original paper referred to shared data banks for a reason.
- The benefits of normalization, which include:
- You only have to do programming work of writing something once …
- … and you don’t have to do the programming work of keeping multiple versions of the information consistent.
- You only have to do processing work of writing something once.
- You only have to buy storage to hold each fact once.
- Separation of concerns.
- Different people can worry about programming and “database stuff.”
- Indeed, even performance optimization can sometimes be separated from programming (i.e., when all you have to do to get speed is implement the correct indexes).
- Maturity and momentum, as reflected in the availability of:
- A broad variety of mature relational DBMS.
- Vast amounts of packaged software that “talks” SQL.
Generally speaking, I find the reasons for sticking with relational technology compelling in cases such as:
- You’re building a low-volume, medium-complexity suite of applications that will evolve over time. This is the use case for which relational DBMS were invented, and they’re still great for it.
- Your (duplicated) data volumes would be ridiculous if you didn’t do a reasonable amount of normalization. Once you need to normalize, you need to do joins — and if you’re doing joins, you’re in relational territory.
- You simply don’t see a cost/benefit advantage to moving away from proven legacy technology. If you’re looking for an off-the-shelf answer to your needs — or if you’re inventorying your own technological shelves — relational-oriented technology has overwhelming share.
For many enterprises, that third point alone should be decisive in a large fraction of cases.
But the advantages of relational technology are less clear when you’re doing serious engineering of path-breaking new applications, where by “serious engineering” I mean:
- The problem is big enough that you simply want the best solution, with only loose coupling needed to the rest of your technical environment.
- Long-lasting “strategic” or legacy technology is not a great concern; you’re willing to keep “rebuilding the 747 while it’s flying” if that’s what’s necessary to get the best possible result.
- You have access to sufficient quantities of sufficiently smart people.
- I recently suggested that innovative SaaS vendors could adopt object-oriented database technology.
- Major web applications are rarely very relational. Until recently, the default approach to scaling out web databases was memcached/sharded MySQL, hardly a whole-hearted adoption of relational technology. Now NoSQL DBMS are vigorous competitors.
- Analytic challenges that amount to teasing out signals from streams of data are sometimes handled non-relationally as well, although it’s often nice to be able to do a few joins to mix in information from more relationally-structured data.
Not coincidentally, in a lot of those cases, throwing performance concerns “over the wall” to the database administrator isn’t going to work.
*I do expect the pendulum to swing back a bit as high-performance/highly-scalable MySQL implementations mature, but there are relatively few supporting examples to date.
To look at it another way, it’s right to be skeptical about relational DBMS when you can defeat all of the reasons to favor them. For example:
- Data re-use may not arise when applications are self-contained and rapidly-changing.
- Sometimes you don’t need to normalize your data.
- It’s not obvious that the relational approach to separation of concerns is the best one. Perhaps you’d be better off with the people who understand a specific application best being responsible for all the decisions connected with it.
- As for that maturity and momentum:
- People don’t actually learn much SQL in school.
- Are any of the mature relational DBMS what you really want?
- Is any of that packaged software out there really helpful for your specific problem?
I should probably stop there. But in an appeal to authority, I’ll close instead with a quote from Codd’s own OLAP paper:
IT should never forget that technology is a means to an end, and not an end in itself. Technologies must be evaluated individually in terms of their ability to satisfy the needs of their respective users. IT should never be reluctant to use the most appropriate interface to satisfy users’ requirements. Attempting to force one technology or tool to satisfy a particular need for which another tool is more effective and efficient is like attempting to drive a screw into a wall with a hammer when a screwdriver is at hand: the screw may eventually enter the wall but at what cost?