February 15, 2008

Database management system choices – beyond relational

This is the fifth of a five-part series on database management system choices. For the first post in the series, please click here.

Relational database management systems have three essential elements:

  1. Rows and columns. Theoretically, rows and columns may be inessential to the relational model. But in reality, they are built into the design of every real-world relational product. If you don’t have rows and columns, you’re not using the product to do what it was well-designed for.
  2. Predicate logic. Theoretically, everything can be fitted into a predicate Procrustean bed. But if you’re looking for relevancy rankings on a text search, binary logic is a highly convoluted way to get them.
  3. Fixed schemas. Database theorists commonly assume that databases have fixed schemas. If this means that 90%+ of all information is null or missing, they have elegant ways of dealing with that. Even so, as computing gets ever more concerned with individuals — each with his/her/its unique “profile(s)” — fixed schemas get ever harder to maintain.

If any of these three elements is missing or inappropriate, then a traditional relational database management system may not be the best choice.

More and more, it may be the case that the best logical data structure for your application isn’t entirely rows and columns. To be sure, almost every application has some alphanumeric aspects – e.g., the metadata associated with text documents, images, etc. But when you’re dealing with text, multimedia, geographic location, or nontrivial graphs, it’s a good bet that rows and columns don’t describe the most important part of your data.

The discussion of how to handle datatypes that don’t naturally fit into tables is complicated, which is why I spread it over several earlier posts. In essence, there are up to five choices for any particular datatype (standalone server, separate server integrated into RDBMS, wholly integrated into RDBMS, user-defined functions in RDBMS, or entirely outside a DBMS in simple files). Which one is best may vary greatly with your requirements for performance, transaction integrity, or query sophistication.

But datatypes aside, there’s another reason to leave the relational paradigm, and here I’m saying something much more controversial. To wit, I assert:

Even when data is alphanumeric, it may not belong in a rigid schema.

Clearly, there only are three reasonable opinions on the matter:

  1. Absolutely all alphanumeric data belongs in fixed schemas. To say otherwise is fuzzy thinking.
  2. In a few extreme or trivial cases, a fixed schema may miss the point. But in almost all cases that matter, you should use a fixed schema.
  3. In a significant (and growing) minority of cases, fixed schemas are counterproductive.

Here’s why I take Stance #3.

There are basically two kinds of new application — those that rely on old data and databases, and those that manage essentially new information. I’m talking about the latter kind. Those are often the ones that are highest value and most interesting, and certainly are the ones most apt to require new database management systems. Examples of this new data include:

In many of these cases, the information you can obtain varies from one subject to the next, because it’s based on their consent, lifestyle, or use of certain devices. Or it varies from one marketing campaign to the next. Or it varies from one country to the next, due to data privacy laws. Or it varies from one quarter to the next, due to ever-advancing technology. That kind of variability is only going to increase. And is it does, fixed schemas will – at least for some applications – seem increasingly quaint.

So how does one do database management without fixed schemas? To date, not very well. But XML databases are getting better, text search vendors are getting more serious about providing DBMS-like programmability, and object-oriented DBMS aren’t quite dead yet.

More and more, fluid-schema databases will seem both natural and necessary.

At least, that’s how I think things will play out.

The series on database diversity (more links at the bottom of Part 1):

Comments

2 Responses to “Database management system choices – beyond relational”

  1. Daniel Weinreb on February 16th, 2008 10:00 am

    I completely agree.

    One way you can do this with an RDBMS is to let the relational schema merely be a meta-schema. For example, represent RDF as triples, and have one relational table with three columns, and put everything in there.

    There are, of course, disadvantages to this approach. It means that what you think of as the schema is not actually represented by the RDBMS. That means you can’t easily take advantage of the RDBMS’s integrity features, and it’s harder to control physical layout, just to mention two.

    You can actually modify the schema of an RDBMS dynamically with DDL statements, but it’s not the “sweet spot”. I have to admit that doing this in ObjectStore (the object-oriented DBMS that I co-designed), which we called “schema evolution”, was also not a piece of cake; it wasn’t designed for that. However, the XML database that we built on top of ObjectStore did provide that kind of flexibility.

  2. Mike Stonebraker’s DBMS taxonomy | DBMS 2 : DataBase Management System Services on July 8th, 2012 4:06 am

    […] 5: Database management system choices – beyond relational Categories: Data types, Database diversity, Michael Stonebraker, Mid-range, OLTP, RDF and […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.