I write a lot about whether or not to use relational DBMS. For example:
- In May I surveyed relational vs. non-relational pros and cons at some length.
- Last November I mused about when it might be OK to do without joins.
- The question is implicit in a variety of posts about, say, document-oriented or object-oriented DBMS.
Before going further in that vein, I’d like to do a quick review of what E. F. “Ted” Codd was getting at with the relational model in the first place.
The first sentence of Codd’s famous 1970 paper introducing the relational database concept reads:
Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation).
In modern terms, that means “all you have to know to use the database is its logical schema; you don’t need to know anything about its physical representation.”
Over the next 15 years, Codd’s thinking — and his employer IBM’s technology — evolved to the point that Codd proposed 12 rules for a relational DBMS, the three most fundamental of which are:
A relational database management system must manage its stored data using only its relational capabilities.
All information in the database should be represented in one and only one way — as values in a table.
Guaranteed Access Rule
Each and every datum (atomic value) is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name.
I.e., Codd was positively asserting that a database should have a fixed logical schema, in a tabular form. The clear implication was that programmers could or should be able to write anything they wanted to against that schema, without database performance being unduly compromised.
Of course, things never quite worked out that way. For most of the history of tabular DBMS, the best-performing short-request and analytic DBMS have been designed quite differently from each other.* Non-relational systems — from IBM’s own IMS to various object-oriented DBMS — outperformed relational DBMS on particular applications. Designers of high-performance applications were sensitive to the database’s physical design, sometimes even going to the extreme of non-transparent sharding. But on the whole, it was generally agreed that programming against a fixed logical schema is a good thing.
*Codd acknowledged this himself by promoting multidimensional OLAP over traditional RDBMS. (I regard the multidimensional/relational divide to be a distinction without significant difference; it’s all just fixed-logical-schema tabular processing with different data manipulation languages.)
In my next post, I’ll return to the subject of why fixed schemas might not always be such a good idea after all.