December 14, 2005

Reasons to use native XML

From a DevX article on Microsoft’s SQL Server 2005

Depending on your situation, XML can also be the best choice for storing even highly structured data. Here are a few practical reasons to consider storing data in a field of type XML:

* Repeated shredding or publishing—On-demand transformations carry a performance penalty. If you have to shred or publish the same document over and over again, consider storing it natively as XML. You can always expose it to relational consumers with an XML view.
* Rapidly changing data structures—When modeled correctly, XML lives up to its name: It’s extensible. Developers can add new pieces of data—even new hierarchies—to a schema without compromising existing software. Extensibility is an extra advantage when prototyping, or when working with rapidly changing problem domains such as bioinformatics.
* Atomic data—Sometimes, you’ll have XML data that’s never consumed except as a whole. Think of this as logical atomicity—if you never access the parts individually, you might as well store it in one big chunk.
* Debugging—Especially for new releases, it can be a good idea to tuck away a copy of your XML imports. The data may be redundant, but keeping the original makes tracking down problems a whole lot easier.

Nothing there to disagree with too heavily, although I can think of some other reasons that might rank higher yet.

Comments

4 Responses to “Reasons to use native XML”

  1. Eric on December 15th, 2005 8:36 am

    Good topicd – I’d like to hear your additional reasons, but my responses to the list you presented:

    * Repeated shredding or publishing—On-demand transformations carry a performance penalty. If you have to shred or publish the same document over and over again, consider storing it natively as XML.

    This is odd to me. If you’re repeatedly shredding it, doesn’t that imply that you’re repeatedly extracting data from XML into another form, and therefore that you’d want to just keep it in that other form? For publishing, yes… but then again, this sounds like something that fits into a generic caching strategy, whether the output is HTML or XML or something else.

    * Rapidly changing data structures—When modeled correctly, XML lives up to its name: It’s extensible. Developers can add new pieces of data—even new hierarchies—to a schema without compromising existing software.

    Agreed to some extent, but how is this different from relations, or from adding SQL tables? This talks about schema modification, not just adding new data to an XML document.

    * Atomic data—Sometimes, you’ll have XML data that’s never consumed except as a whole. Think of this as logical atomicity—if you never access the parts individually, you might as well store it in one big chunk.

    Agreed – this is just a data type in its own right, although with XML, the only operators you can really use on it (despite its “atomicity”) are generic XML ones (which violate its atomicity). Other type-definition languages have much stronger support for associating operations with types.

    * Debugging—Especially for new releases, it can be a good idea to tuck away a copy of your XML imports. The data may be redundant, but keeping the original makes tracking down problems a whole lot easier.

    Agreed, although this is a good principle regardless of the form of input (XML, HTML forms, text files, URIs, etc. etc.)

    – Eric

  2. Curt Monash on December 15th, 2005 2:15 pm

    Eric,

    1. Please be careful about giving a handwave and saying “Oh, general caching should take care of that.” Caching (like cost-based optimization, for that matter) is generally very simple and stupid in commercial DBMS products.

    Some of my articles and posts on “memory-centric data management” address that point a little more, and I have an extensive white paper in the works on the memory-centric subject.

    2. As for the repeated shredding point — if the database is taking in a lot of information that it almost never uses, I can see where storing it natively could make a lot of sense. Otherwise, your criticism seems spot on.

    3. The schema variability point is hard to address in a quick note like this one. That’s because it relies on an imprecise, empirical claim along the lines of “there is or will be a significantly large set of applications in which the cost of keeping schemas updated in the conventional manner is unacceptably large.” I imagine there’s no way you’ll ever accept that claim without a persuasive set of examples (one or two examples wouldn’t suffice). We should agree to disagree for now.

  3. Eric on December 20th, 2005 9:57 am

    Please be careful about giving a handwave and saying “Oh, general caching should take care of that.” Caching (like cost-based optimization, for that matter) is generally very simple and stupid in commercial DBMS products.

    I agree, but it doesn’t have to be. And I’m not saying caching is simple. But the caching needs for XML aren’t (as far as I’ve ever seen) different than those for other requests, like HTML pages.

    … if the database is taking in a lot of information that it almost never uses, I can see where storing it natively could make a lot of sense.

    Yes, in that case the XML value is just that – a value, something atomic to be referred to in toto. Certainly as time goes on, if “blah” comes to acquire critical information used in queries, it can be extracted and used in queries without the eccentricities of repeated XPath (and attendant parsing).

  4. Curt Monash on December 20th, 2005 1:06 pm

    Consider the appropriate implementations for short, tabular records to be at one
    end of a spectrum.

    Consider the appropriate implementations for BLOBs/CLOBs with some indexing
    looking insider them to be at the other end of this spectrum.

    Then native XML could be said to be around the midpoint of the spectrum.

    The question is whether there is a substantial body of apps for which neither
    endpoint of the spectrum is good enough.

    Early users of the native XML products provide persuasive evidence that
    are some such apps at this time. Given the relative maturity of native XML
    vs. older technologies, I think it is likely that the range of apps for which
    native XML implementations are a good idea will grow.

    Please note, however, that in most cases you can close your eyes, ignore the
    implementation, and access the data in SQL. So the discussions about physical
    and logical implementations can be at least partially separated. I know that
    you and I have a sharp disagreement on the logical implementations, one that’s not
    going to get resolved in my favor unless you some day become convinced of the
    usefulness of variable schemas and/or the importance of sophisticated document
    processing.

    But you might want to reconsider your views on the physical side.

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.