November 11, 2006

Federation in the MySQL empire

Marten Micklos, CEO of MySQL, gave a recent speech speculating about a big federated “database in the sky,” providing all sorts of Web 2.0 benefits. Apparently, the idea isn’t at all fleshed out yet. Even so, I have a nagging suspicion he’s pointing in somewhat the wrong direction.

That’s because I think federating relational databases is a generically bad idea. You can federate sets of services, and you can generate services from relational databases – and that’s where DBMS2 (DataBase Management System Services) got its name. This is a superior approach to direct database federation, for two main reasons. (By “direct federation,” I mean some sort of structure in which there’s a giant virtual database whose schema more or less directly incorporates much of the schema of each individual database.)

First, there are going to be various complexities about latency, authorization, and so on. It’s tough to accommodate them all without a bit of procedural code. An SOA provides a much more natural hook for that than you get through grand-cosmic-schema-federation.

Second, schema alignment is hard. In the SOA approach, you need only do exactly the alignment that is necessary for your application needs. In direct federation, however, there’s an almost irresistible temptation to generalize the problem beyond any manageable level of complexity.

Bottom line: It’s a great idea, and a real need, but please make sure the database blinders are off before you try to solve it.

Comments

8 Responses to “Federation in the MySQL empire”

  1. Greg on December 20th, 2006 5:25 pm

    I’m confused. I’ve been reading some of your past articles, since I’m researching methods to implement revision tracking on data in a relational database (not change management to the DB itself; that’s easy).

    Not that data-level revision control isn’t easy, but there are so many options that I’m at a stand-still looking for a ‘good’ (or the ‘best’?) one.

    Anyway, you seem to have a grudge against relational data modeling in several areas. In every case, however, most of your issues seem to relate to designers’ errors in the use of relational models, not the concept itself.

    You seem to advocate SOA because “you need only do the alignment necessary”, but you do not like direct federation because of the “temptation” to do everything at once. That’s not really fair, now is it?

  2. Curt Monash on December 20th, 2006 5:49 pm

    Let’s separate my views on relational data modeling, the relational data model, and commercial relational database management products. Those are three different things.

    1. Relational data modeling has uses, and should be done. However, it’s not nearly as valuable as proponents claim.

    A typical discussion with the application developers I most respect might go:

    “Is relational data modeling good for anything any more?”
    “Not much.”
    “Do you do it?”
    “Yes.”

    2. The relational data model is a clever theoretical construct. It is not as universally applicable as proponents claim, however. For example, modeling text search via predicates is, depending on how you look at it, either a bad idea or an utterly incomplete one.

    3. Most pros and cons of the relational model are more or less equally applicable to today’s commercial products. However, that begs a related question, namely which of today’s commercial products one should use. I believe that one should use a combination of them, and that the idea of consolidation into a single database is greatly overblown.

    You haven’t said much about your specific needs, but I’m guessing that you’re trying to solve too big a problem. The more rigorous and disciplined you want to be about data modeling, the more data you have to leave out. I’m sure you’re not trying to model every e-mail or bookmark file on everybody’s PC. Well, there may also be some rows-and-columns stuff you should leave out as well. Just build walled cities where data is all neat and tidy, with clean interfaces (trade roads) between them. If the policies and procedures are different in each of the cities, that’s not necessarily a bad thing.

    CAM

  3. Greg on December 20th, 2006 11:47 pm

    Honestly, I just wanted to make the point about the means by which you set about making your point. As much as I’d like to debate with you about relational data modeling, I agree with your concept (I think) on schema alignment with external components (be they services, applications, or anything else that can talk to a database); I just think the faults you describe lie with the user, not with the tool.

    I also agree that specific alignment approaches such as SOA are superior to ‘database federation’ for almost any business case I can conceive. So, as much as I’d like to argue (just ’cause I like to argue), I’m not going to do so.

    Regarding my needs: I just need to track various ‘revisions’ of a record over time. Each revision gets a revision number, and no revision is ever deleted for any reason. Changing a usage name (to pick a data element at random) creates a new revision. I’m just trying to think of the best way to model it for purposes of efficiency, performance, integrity, and historical data.

    Overall, pretty good site. I think I’ll check back, time to time.

  4. Curt Monash on December 21st, 2006 4:46 am

    I think the canonical approach would simply be to timestamp the records. I think Chris Date proposed a clean relational way of doing this, but I’ll confess to not knowing exactly what it is.

    I presume what you’re REALLY looking for is a view that has three revision-related columns — First_Valid_Date, Last_Valid_Date, and Is_Still_Valid — except that you want to be clean about the natural redundancy. In that case, I really would recommend looking up what Date has to say about that and seeing whether you judge it to be worth the trouble.

    Commercial RDBMS also have various kinds of snapshot/timestamp/rollback features. Indeed, there’s a boom in them because they’re relevant to compliance.

    Thanks for the kind words,

    CAM

  5. Greg on December 21st, 2006 12:21 pm

    Timestamping is definitely a part of my solution. I need valid from/to dates in order to keep a valid history. The database is not the difficult part of the job; that will be the application layer responsible for protecting the existing data and populating new versions.

    “Valid/invalid” will not be sufficient; I need more status options than that. I won’t bore you with the details.

    Sorry to ‘jack a comment thread with my issues. But thanks for the reference to Chris Date. I’ve been looking for some decent online reference material on his work; I’m not buying anything until I know whether or not there is value in it.

  6. Greg on December 21st, 2006 12:26 pm

    Also, I must restate that I do believe in the relational data model for most business cases. Not all, but most. I’ve seen other models used in the analysis phase of so many projects, and the justification is usually along the lines of ‘relational modeling is dated’ or ‘object modeling/xml/[insert buzzword] is the wave of the future’.

    I prefer to base my decisions on technical merits, not marketing trends. And I have seen relational modeling work in almost every type of data storage exercise. Yes, I read Codd’s work. But I do not subscribe to a ‘pure’ relational approach. I do not subscribe to a ‘pure’ anything; again, I base my decisions on technical considerations, which are based in reality.

    If I find a perfect approach that works in its pure form, I’ll be sure to let the world know. Right after I find a way to patent it.

  7. Karen Lopez on January 17th, 2008 6:45 pm

    I have to agree that trying to force data integration via *syntactical* alignment is futile, at best. It forces undo coupling of systems that in the real world need to change independently of each other, or at the best at widely different rates.

    However, semantically aligning systems via services or any other method allows systems to change, to a great degree, at differing rates and locations.

    I see federated systems working where there is a high level of governance and a high level of common management control. But even then I’d rather see services used over forcing technical objects to be the same.

  8. Curt Monash on January 17th, 2008 8:52 pm

    In other words, Karen, you agree? 🙂

    CAM

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.