October 5, 2008

Schema flexibility and XML data management

Conor O’Mahony, marketing manager for IBM’s DB2 pureXML, talks a lot about one of my favorite hobbyhorses — schema flexibility — as a reason to use an XML data model. In a number of industries he sees use cases based around ongoing change in the information being managed:

Conor also thinks market evidence shows that XML’s schema flexibility is important for data interchange. For example, hospitals (especially in the US) have disparate medical records and billing systems, which can make information interchange a chore.

The second suggestion is probably the less controversial of the two. After all, everybody knows that data is very commonly exchanged in XML formats. So if it gets persisted in XML format somewhere along the way, even relational purists shouldn’t much mind, as long as it eventually gets into what they regard as a more properly structured database. (Besides — if the data is going on long, challenging, multi-stage journeys, then nobody should much blame it if it indeed wants to stop along the way somewhere and rest. :) )

In the first group of examples, there’s usually also a kind of cooperation between native XML and other kinds of database managers. Before those users had access to XML, they were getting by just fine using other database technology. So XML can be used in conjunction with other systems, not as complete replacement. Even so, it’s reasonable to consider scenarios in which XML is the primary data model of record, and relational/tabular copies of the information are secondary.

For example, an income tax authority wants to store your tax form in its entirety, so that they can check both your truthfulness and your arithmetic. This is most naturally done in XML, although for many years it’s been done in relational or pre-relational technologies. They also want to aggregate a limited amount of information from each taxpayer’s form for all sorts of aggregation and administrative purposes; that’s best done in a relational database. But the part that belongs in XML is the most fundamental.

As another example, the core information of a derivatives transaction is:

Here the majority of the basic record fits best in XML. The minority that fits best in a relational system is small enough that a good XML DBMS can probably handle it as well. Neither the superior OLTP performance nor data integrity safeguards of a relational DBMS are needed for the purchase/sale information. They are needed for the general account management – but again, that’s a relatively secondary or (no pun intended!) derivative part of the overall database.

So what we’re coming up with here is a strategy along the lines of:

  1. Use XML for your system of record.
  2. Spawn transactions in your relational/tabular data stores right away.

And by the way, while I haven’t dwelled on this – those relational/tabular data stores could be data warehouses instead of or in addition to transactional systems.

Obviously, there are two major classes of objections to this strategy (when it is contrasted with a traditional relational approach):

Well, we’ll see. So far the customer uptake for the native XML approach is small but non-zero. And thus the issue is far from being decided.

Comments

3 Responses to “Schema flexibility and XML data management”

  1. Overview of IBM DB2 pureXML | DBMS2 -- DataBase Management System Services on October 5th, 2008 8:54 am

    […] noted above, I am putting up separate posts on standards-based data interchange and schema flexibility. Share: These icons link to social bookmarking sites where readers can share and discover new […]

  2. Conor O'Mahony on October 5th, 2008 8:48 pm

    Hi Curt,

    Since we talked, I compiled a set of thoughts on the topic at Flexible Schemas: When to Persist Data in XML Instead of Relational. There are a class of situations where the practicalities of managing data using traditional SQL types is cumbersome, and where taking advantage of flexible XML schemas for all or part of the data makes life easier.

  3. Curt Monash on October 5th, 2008 9:38 pm

    Hi Conor,

    Good link!

    Which of those examples are hybrid-relational? You say explicitly that the energy one is. But IIRC from our talk, the tax and/or telecom ones are too.

    Best,

    CAM

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.