Analysis of IBM’s native XML database management option, DB2 pureXML. Related subjects include:
There’s a growing trend for DBMS to beef up their support for multiple data manipulation languages (DMLs) or APIs — and there’s a special boom in JSON support, MongoDB-compatible or otherwise. So I talked earlier tonight with IBM’s Bobbie Cochrane about how JSON is managed in DB2.
For starters, let’s note that there are at least four strategies IBM could have used.
- Store JSON in a BLOB (Binary Large OBject) or similar existing datatype. That’s what IBM actually chose.
- Store JSON in a custom datatype, using the datatype extensibility features DB2 has had since the 1990s. IBM is not doing this, and doesn’t see a need to at this time.
- Use DB2 pureXML, along with some kind of JSON/XML translator. DB2 managed JSON this way in the past, via UDFs (User-Defined Functions), but that implementation is superseded by the new BLOB-based approach, which offers better performance in ingest and query alike.
- Shred — to use a term from XML days — JSON into a bunch of relational columns. IBM experimented with this approach, but ultimately rejected it. In dismissing shredding, Bobbie also disdained any immediate support for schema-on-need.
IBM’s technology choices are of course influenced by its use case focus. It’s reasonable to divide MongoDB use cases into two large buckets:
- Hardcore internet and/or machine-generated data, for example from a website.
- Enterprise data aggregation, for example a “360-degree customer view.”
IBM’s DB2 JSON features are targeted at the latter bucket. Also, I suspect that IBM is generally looking for a way to please users who enjoy working on and with their MongoDB skills. Read more
|Categories: Data models and architecture, IBM and DB2, MongoDB and 10gen, NoSQL, pureXML, Structured documents||2 Comments|
Conor O’Mahony, marketing manager for IBM’s DB2 pureXML, talks a lot about one of my favorite hobbyhorses — schema flexibility — as a reason to use an XML data model. In a number of industries he sees use cases based around ongoing change in the information being managed:
- Tax authorities change their rules and forms every year, but don’t want to do total rewrites of their electronic submission and processing software.
- The financial services industry keeps inventing new products, which don’t just have different terms and conditions, but may also have different kinds of terms and conditions.
- The same, to some extent, goes for the travel industry, which also keeps adding different kinds of offers and destinations.
- The energy industry keeps adding new kinds of highly complex equipment it has to manage.
Conor also thinks market evidence shows that XML’s schema flexibility is important for data interchange. Read more
|Categories: Data models and architecture, EAI, EII, ETL, ELT, ETLT, IBM and DB2, pureXML, Structured documents||3 Comments|
Tracking the alphabet soup of vertical market XML standards is hard. So as a starting point, I’m splitting a list I got from IBM into a standalone post.
|Categories: Application areas, EAI, EII, ETL, ELT, ETLT, IBM and DB2, pureXML, Structured documents||2 Comments|
On August 29, I had a great call with IBM about DB2 pureXML (most of the IBM side of the talking was done by Conor O’Mahony and Qi Jin). I’m finally getting around to writing it up now. (The world of tabular data warehousing has kept me just a wee bit busy …)
As I write it, I see there are a considerable number of holes, but that’s the way it seems to go when researching XML storage. I’m also writing up a September call from which I finally figured out (I think) the essence of how MarkLogic Server works – but only after five months of trying. It turns out that MarkLogic works rather differently from DB2 pureXML. Not coincidentally, IBM and Mark Logic focus on rather different use cases for native XML storage.
What I understand so far about the basic DB2 pureXML architecture goes like this: Read more