October 5, 2008

Overview of IBM DB2 pureXML

On August 29, I had a great call with IBM about DB2 pureXML (most of the IBM side of the talking was done by Conor O’Mahony and Qi Jin). I’m finally getting around to writing it up now. (The world of tabular data warehousing has kept me just a wee bit busy …)

As I write it, I see there are a considerable number of holes, but that’s the way it seems to go when researching XML storage. I’m also writing up a September call from which I finally figured out (I think) the essence of how MarkLogic Server works – but only after five months of trying. It turns out that MarkLogic works rather differently from DB2 pureXML. Not coincidentally, IBM and Mark Logic focus on rather different use cases for native XML storage.

What I understand so far about the basic DB2 pureXML architecture goes like this:

A big part of IBM’s XML business strategy is to support various (typically vertical market) XML standards. IBM has implemented support for these standards and made it freely downloadable. What does “support” mean? It surely starts with a DTD (Document Type Definition), and apparently also includes mappings to generic web services interfaces. It turns out that there are a lot of them, so I’m listing some in a separate post.

More generally, it seems that the sales and uses for IBM pureXML are concentrated in two main (overlapping) cases:

Experience teaches me that schema flexibility is a subject that can attract considerable flames, in the general vein of “Omigod! The relational model is perfect because it’s mathematically proven to ensure referential integrity!!” So I’ll split out the main discussion of that into yet another separate post, and keep going.

IBM actually breaks out the pureXML use cases into four main groups:

  1. Transactional. This comprises the transactional logging of information that just happens to be XML, such as in financial services.
  2. Forms-oriented. This comprises, for example, the tax authority use case.
  3. Service bus acceleration. That’s a fancy phrase to cover both the standards-based interchanges and the other EAI-related uses.
  4. Event-driven data warehousing. This one was kind of blurry to me. What I think it means is that if you have transactional data in XML, and you want to use it in near-real-time business intelligence, DB2 pureXML can help you with that.

#1, 3, and 4 seem to fit into my “When XML was going to be used anyway” category. Part of “Schema flexibility” matches #2; I’m not clear on where in IBM’s four buckets the rest of schema flexibility goes.

Finally, I asked directly in what areas there were significant numbers of DB2 pureXML customers. IBM offered two examples. One was financial services in general — especially in North America, notwithstanding the importance of the UNIFI standard in Europe. The other was health care data interchange outside the United States — especially in China, where regional and national centers are being established to more closely oversee local hospitals.

Related links


7 Responses to “Overview of IBM DB2 pureXML”

  1. Vertical market XML standards | DBMS2 -- DataBase Management System Services on October 5th, 2008 8:43 am

    […] the most important or successful IBM pureXML-supported standards, in terms of downloads and other evidence of customer interest, […]

  2. Schema flexibility and XML data management | DBMS2 -- DataBase Management System Services on October 5th, 2008 8:53 am

    […] O’Mahony, marketing manager for IBM’s DB2 pureXML talks a lot about one of my favorite hobbyhorses — schema flexibility — as a reason to […]

  3. Conor O'Mahony on October 6th, 2008 2:00 pm

    Hi Curt,

    Answers to your questions above…

    The reason that IBM did not use DB2’s general datatype extensibility framework is because we decided not to simply “extend” our existing infrastructure to support XML like many vendors did. Instead, we spent 5 years developing XML-specific infrastructure from the ground up.

    IBM essentially has traditional relational infrastructure and XML infrastructure in the physical layer. It then has a unified runtime execution layer above this which, for the most part, “hides” the physical storage considerations. This unified execution runtime layer provides the data manipulation and retrieval languages. (I say “for the most part” because there are certain physical storage settings that you want to be able to configure). All the management tooling, including backup/restore, utilities, and high availability are supported for XML data.

    DB2’s parser handles both SQL (and its extensions for XML) as well as native XQuery, and translates them into a common set of instructions. Of course, this is different to some relational vendors who translate XQuery into SQL.

  4. Curt Monash on October 6th, 2008 5:00 pm

    Conor, I’m still not getting it. (And I suspect this is a question for one of your highly techie colleagues.)

    When one uses an object-relational/extensible DBMS’s extensibility, one is indeed still banging data into rows somewhere. That’s why Oracle and DB2 put text into BLOBs/CLOBs, for example.

    But one can index however one likes, no? (Again, consider the text example.)

    I guess what I’m missing is this — where does DB2’s datatype extensibility fail? What unacceptable overhead does it impose in the case of XML?

    I’ve gotten the message that I used to overrate datatype extensibility like Oracle’s, DB2’s, and Informix/Illustra’s. What I haven’t figured out yet, however, is WHY I was wrong.



  5. Marco Gralike on April 30th, 2009 8:01 pm

    “I’ve gotten the message that I used to overrate datatype extensibility like Oracle’s, DB2’s, and Informix/Illustra’s. What I haven’t figured out yet, however, is WHY I was wrong.”

    LOL, maybe you just weren’t wrong…?

  6. Native XML engine short list | DBMS 2 : DataBase Management System Services on August 23rd, 2015 9:39 pm

    […] been implying that the short list for native XML database engine vendors should be MarkLogic, IBM, and maybe Microsoft, on the theory that Progress and Intersystems tried the market and pulled […]

  7. Abstract datatypes and extensible RDBMS | Software Memories on December 12th, 2015 6:55 am

    […] Notwithstanding what I wrote above — and to my surprise when I learned it — IBM did not rely on its general extensibility framework for XML support. […]

Leave a Reply

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.