This is the fifth of a five-part series on database management system choices. For the first post in the series, please click here.
Relational database management systems have three essential elements:
- Rows and columns. Theoretically, rows and columns may be inessential to the relational model. But in reality, they are built into the design of every real-world relational product. If you don’t have rows and columns, you’re not using the product to do what it was well-designed for.
- Predicate logic. Theoretically, everything can be fitted into a predicate Procrustean bed. But if you’re looking for relevancy rankings on a text search, binary logic is a highly convoluted way to get them.
- Fixed schemas. Database theorists commonly assume that databases have fixed schemas. If this means that 90%+ of all information is null or missing, they have elegant ways of dealing with that. Even so, as computing gets ever more concerned with individuals — each with his/her/its unique “profile(s)” — fixed schemas get ever harder to maintain.
If any of these three elements is missing or inappropriate, then a traditional relational database management system may not be the best choice.
More and more, it may be the case that the best logical data structure for your application isn’t entirely rows and columns. To be sure, almost every application has some alphanumeric aspects – e.g., the metadata associated with text documents, images, etc. But when you’re dealing with text, multimedia, geographic location, or nontrivial graphs, it’s a good bet that rows and columns don’t describe the most important part of your data.
The discussion of how to handle datatypes that don’t naturally fit into tables is complicated, which is why I spread it over several earlier posts. In essence, there are up to five choices for any particular datatype (standalone server, separate server integrated into RDBMS, wholly integrated into RDBMS, user-defined functions in RDBMS, or entirely outside a DBMS in simple files). Which one is best may vary greatly with your requirements for performance, transaction integrity, or query sophistication.
But datatypes aside, there’s another reason to leave the relational paradigm, and here I’m saying something much more controversial. To wit, I assert:
Even when data is alphanumeric, it may not belong in a rigid schema.
Clearly, there only are three reasonable opinions on the matter:
- Absolutely all alphanumeric data belongs in fixed schemas. To say otherwise is fuzzy thinking.
- In a few extreme or trivial cases, a fixed schema may miss the point. But in almost all cases that matter, you should use a fixed schema.
- In a significant (and growing) minority of cases, fixed schemas are counterproductive.
Here’s why I take Stance #3.
There are basically two kinds of new application — those that rely on old data and databases, and those that manage essentially new information. I’m talking about the latter kind. Those are often the ones that are highest value and most interesting, and certainly are the ones most apt to require new database management systems. Examples of this new data include:
- User profiles and web surfing behavior.
- Data gathered via innovative marketing campaigns.
- Location information collected via various kinds of new devices.
In many of these cases, the information you can obtain varies from one subject to the next, because it’s based on their consent, lifestyle, or use of certain devices. Or it varies from one marketing campaign to the next. Or it varies from one country to the next, due to data privacy laws. Or it varies from one quarter to the next, due to ever-advancing technology. That kind of variability is only going to increase. And is it does, fixed schemas will – at least for some applications – seem increasingly quaint.
So how does one do database management without fixed schemas? To date, not very well. But XML databases are getting better, text search vendors are getting more serious about providing DBMS-like programmability, and object-oriented DBMS aren’t quite dead yet.
More and more, fluid-schema databases will seem both natural and necessary.
At least, that’s how I think things will play out.
The series on database diversity (more links at the bottom of Part 1):
- Part 1: Database management system choices – overview
- Part 2: Database management system choices – 4 categories of relational
- Part 3: Database management system choices – relational data warehouse
- Part 4: Database management system choices – mid-range-relational
- Part 5: Database management system choices – beyond relational