E. F. “Ted” Codd taught the computing world that databases should have fixed logical schemas (which protect the user from having to know about physical database organization). But he may not have been as universally correct as he thought. Cases I’ve noted in which fixed schemas may be problematic include:
- “A bunch of apps in one, similar but not the same” (in my recent post on MongoDB).
- Out-of-control product catalogs (ditto).
- Analytic use cases in which one keeps enhancing the database with derived data.
And if marketing profile analysis is ever done correctly, that will be a huge example for the list.
So what do we call those DBMS — for example NoSQL, object-oriented, or XML-based systems — that bake the schema into the applications or the records themselves? In the MongoDB post I went with “schemaless,” but I wasn’t really comfortable with that, so I took the discussion to Twitter. Comments from Vlad Didenko (in particular), Ryan Prociuk, Merv Adrian, and Roland Bouman favored the idea that schemas in such systems are changeable or late-bound, rather than entirely absent. I quickly agreed.
I like that approach for the same reason I favor saying that databases are poly- or multi-structured (rather than un- or semi-): Every database has structure, the only question being when that structure is determined. I wouldn’t precisely equate “poly-structured” to “has a late-bound schema”; for example, I’d say that mucking with the DDL (Data Description Language) of a relational database shows that it’s a little bit poly-structured, even though it’s not at all late-bound. But the concepts are definitely related.
So what actual wording should we use here? The only alternative I see to fixed schema is “static”, and that feels like it has too much of a connotation of “unchangeable”. The simplest word I can think of for changeable/late-bound/whatever is dynamic schema; that choice also has the virtue of some traction, as per the Vlad Didenko tweet linked above. Casual googling is also supportive of “fixed” and “dynamic”, at least over whatever alternatives I came up with. So those are my choices.
For actual definitions, I’ll say:
- A (logical) schema is fixed if it is defined before a program is written, but dynamic if it is defined by the program or data itself.
- A database is fixed- or dynamic-schema depending on whether its schemas are fixed or dynamic respectively.
- A DBMS is fixed- or dynamic-schema depending on whether databases created in it tend to have fixed or dynamic schemas respectively.
Suit yourself as to what you say about relational DBMS when they also have a bit of XML, text, or whatever support.
By these definitions:
- Relational databases are fixed-schema (within the caveat above about XML or text data).
- MOLAP databases are fixed-schema.
- Pre-relational network and hierarchical DBMS (e.g. IMS) are fixed-schema.
- Most other DBMS are dynamic-schema.
What do you think? Do these definitions work for you?