Native XML
Analysis of data management technology optimized for XML data. Related subjects include:
- Text data management
- Object-oriented database management
- Mark Logic
- IBM’s DB2 pureXML
- (in Text Technologies) More extensive coverage of text search
IBM’s Oracle emulation strategy reconsidered
I’ve now had a chance to talk with IBM about its recently-announced Oracle emulation strategy for DB2. (This is for DB2 9.7, which I gather has been quasi-announced in April, will be re-announced in May, and will be re-re-announced as being in general availability in June.)
Key points include:
- This really is more like Oracle emulation than it is transparency, a term I carelessly used before.
- IBM’s Oracle emulation effort is focused on two technological goals:
- Making it easy for an Oracle application to be ported to DB2.
- Making it easy for an Oracle developer to develop for DB2.
- The initial target market for DB2’s Oracle emulation is ISVs (Independent Software Vendors) much more than it is enterprises. IBM suggested there were a couple hundred early adopters, and those are primarily in the ISV area.
Because of Oracle’s market share, many ISVs focus on Oracle as the underlying database management system for their applications, whether or not they actually resell it along with their own software. IBM proposed three reasons why such ISVs might want to support DB2:
| Categories: Data types, Emulation, transparency, portability, EnterpriseDB and Postgres Plus, GIS and geospatial, IBM and DB2, Market share, Native XML, Oracle, Pricing, Text | 6 Comments |
Schema flexibility and XML data management
Conor O’Mahony, marketing manager for IBM’s DB2 pureXML talks a lot about one of my favorite hobbyhorses — schema flexibility — as a reason to use an XML data model. In a number of industries he sees use cases based around ongoing change in the information being managed:
- Tax authorities change their rules and forms every year, but don’t want to do total rewrites of their electronic submission and processing software.
- The financial services industry keeps inventing new products, which don’t just have different terms and conditions, but may also have different kinds of terms and conditions.
- The same, to some extent, goes for the travel industry, which also keeps adding different and different kinds of destinations.
- The energy industry keeps adding new kinds of highly complex equipment it has to manage.
Conor also thinks market evidence shows that XML’s schema flexibility is important for data interchange.
| Categories: Data models and architecture, EAI, EII, ETL, ELT, ETLT, IBM and DB2, Native XML, pureXML | 3 Comments |
Vertical market XML standards
Tracking the alphabet soup of vertical market XML standards is hard. So as a starting point, I’m splitting a list I got from IBM into a standalone post.
Among the most important or successful IBM pureXML-supported standards, in terms of downloads and other evidence of customer interest, are:
| Categories: Application areas, EAI, EII, ETL, ELT, ETLT, IBM and DB2, Native XML, pureXML | 2 Comments |
Overview of IBM DB2 pureXML
On August 29, I had a great call with IBM about DB2 pureXML (most of the IBM side of the talking was done by Conor O’Mahony and Qi Jin). I’m finally getting around to writing it up now. (The world of tabular data warehousing has kept me just a wee bit busy …)
As I write it, I see there are a considerable number of holes, but that’s the way it seems to go when researching XML storage. I’m also writing up a September call from which I finally figured out (I think) the essence of how MarkLogic Server works – but only after five months of trying. It turns out that MarkLogic works rather differently from DB2 pureXML. Not coincidentally, IBM and Mark Logic focus on rather different use cases for native XML storage.
What I understand so far about the basic DB2 pureXML architecture goes like this:
| Categories: EAI, EII, ETL, ELT, ETLT, IBM and DB2, Native XML, pureXML | 5 Comments |
MarkLogic architecture deep dive
While I previously posted in great detail about how MarkLogic Server is an ACID-compliant XML-oriented DBMS with integrated text search that indexes everything in real time and executes range queries fairly quickly, I didn’t have a good feel for how all those apparently contradictory characteristics fit into a single product. But I finally had a call with Mark Logic Director of Engineering Ron Avnur, and think I have a better grasp of the MarkLogic architecture and story.
Ron described MarkLogic Server as a DBMS for trees.
| Categories: Mark Logic, Native XML, Text | 3 Comments |
Who is doing what in XML data management these days?
A comment thread to a post on a different subject has opened up a discussion of XML storage. Frankly, I haven’t kept up with my briefings on the subject, in part because XML support hasn’t proved to be very important yet to the big DBMS vendors, somewhat to my surprise. When last I looked, the situation wasn’t much different from what it was back in November, 2005. Unless I’ve missed something (and please tell me if I have!), here’s what’s going on: Read more
| Categories: IBM and DB2, Intersystems and Cache', Mark Logic, Microsoft and SQL*Server, Native XML, Oracle | 7 Comments |
The Mark Logic story in XML database management
Mark Logic* has an interesting, complex story. They sell a technology stack based on an XML DBMS with text search designed in from the get go. They usually want to be known as a “content” technology provider rather than a DBMS vendor, but not quite always.
*Note: Product name = MarkLogic, company name = Mark Logic.
I’ve agreed to do a white paper and webcast for Mark Logic (sponsored, of course). But before I start serious work on those, I want to blog based on what I know. As always, feedback is warmly encouraged.
Some of the big differences between MarkLogic and other DBMS are:
-
MarkLogic’s primary DML/DDL (Data Manipulation/Description Language) is XQuery. Indeed, Mark Logic is in many ways the chief standard-bearer for pure XQuery, as opposed to SQL/XQuery hybrids.
-
MarkLogic’s XML processing is much faster than many alternatives. A client told me last year that – in an application that had nothing to do with MarkLogic’s traditional strength of text search – MarkLogic’s performance beat IBM DB2/Viper’s by “an order of magnitude.” And I think they were using the phrase correctly (i.e., 10X or so).
-
MarkLogic indexes all kinds of entities and facts, automagically, without any schema-prebuilding. (Nor, I gather, do they depend on individual documents carrying proper DTDs.) So there actually isn’t a lot of DDL. (Mark Logic claims in one test MarkLogic had more or less 0 DDL, vs. 20,000 lines in DB2/Viper.) What MarkLogic indexes includes, as Mark Logic puts it:
- Every word
- Every piece of structure
- Every parent-child relationship
- Every value.
-
As opposed to most extended-relational DBMS, MarkLogic indexes all kinds of information in a single, tightly integrated index. Mark Logic claims this is part of the reason for MarkLogic’s good performance, and asserts that competitors’ lack of full integration often causes overhead and/or gets in the way of optimal query plans. (For example, Mark Logic claims that Microsoft SQL Server’s optimizer is so FUBARed that it always does the text part of a search first.) Interestingly, Intersystems’ object-oriented Cache’ does pretty much the same thing.
-
MarkLogic is proud of its text search extensions to XQuery. I’ve neglected to ask how that relates to the XQuery standards process. (For example, text search wasn’t integrated into the SQL standard until SQL3.)
Other architectural highlights include:
| Categories: Data types, IBM and DB2, Mark Logic, Native XML | 2 Comments |
XML versus sparse columns in variable schemas
Simon Sabin makes an interesting point: If you can have 30,000 columns in a table without sparsity management blowing up, you can handle entities with lots of different kinds of attributes. (And in SQL Server you can now do just that.) The example he uses is products — different products can have different sets of possible colors, different kinds of sizes, and so on. An example I’ve used in the past is marketing information — different prospects can reveal different kinds of information, which may have been gathered via non-comparable marketing programs.
I’ve suggested this kind of variability as a reason to actually go XML — you’re constantly adding not just new information, but new kinds of information, so your fixed schema is never up to date. But I haven’t detected many actual application designers who agree with me …
| Categories: MySQL, Native XML, Theory and architecture | 3 Comments |
Microsoft SQL Server Data Services
As usual, Microsoft forgot to brief me, but Mary Jo Foley reports on Microsoft SQL Server Data Services. A look at the official site clarifies that this database-in-a-cloud offering uses “Microsoft SQL Server as a data storage node.” However, there seems to be a software layer on top of SQL Server providing scale-out and appropriate management.
In addition to the more-than-SQL-Server layer, there seems to be a less-than-SQL-Server aspect as well. In a particular, Microsoft SQL Server Data Services boasts “Support for simple types: string, numeric, datetime, boolean.” XML is the “primary wire format,” and hints dropped about the schema philosophy sound XMLish too.
Interestingly, Foley reports that Microsoft plans to offer an on-premises version of Microsoft SQL Server Data Services as well.
| Categories: Cloud computing, Microsoft and SQL*Server, Native XML | Leave a Comment |
CouchDB — lazy database design taken to excess?
I’ve run into a research/alpha/whatever project called CouchDB a couple of times now. It’s yet another “Who needs relational databases? Who needs schemas?” kind of idea. Rather, CouchDB is for taking random documents and banging them into databases, then calculating views on the fly as needed. It’s REST-friendly. Lucene and a web server are built in.
Damien Katz seems to be the driving force behind CouchDB, and his discussion of document-oriented development seems to be a good starting point. Read more
| Categories: CouchDB, Data models and architecture, Database diversity, Native XML, Theory and architecture | 4 Comments |
Who is actually using native XML?
Question of the day #2
Who is actually using native XML?
Mark Logic is having a fine time using its native XML engine for custom publishing. One outfit I know of is using a native XML for something like web analytics, but is driving me crazy by never coming through on permission to divulge details. There’s a bit of native XML use out there supporting the insurance industry’s ACORD standard.
And after that I quickly run out of examples of native XML use. Read more
| Categories: Data types, IBM and DB2, Mark Logic, Microsoft and SQL*Server, Native XML, Oracle | 3 Comments |
The 4 main approaches to datatype extensibility
Based on a variety of conversations – including some of the flames about my recent confession that mid-range DBMS aren’t suitable for everything — it seems as if a quick primer may be in order on the subject of datatype support. So here goes.
“Database management” usually deals with numeric or alphabetical data – i.e., the kind of stuff that goes nicely into tables. It commonly has a natural one-dimensional sort order, which is very useful for sort/merge joins, b-tree indexes, and the like. This kind of tabular data is what relational database management systems were invented for.
But ever more, there are important datatypes beyond character strings, numbers and dates. Leaving out generic BLOBs and CLOBs (Binary/Character Large OBjects), the big four surely are:
- Text. Text search is a huge business on the web, and a separate big business in enterprises. And text doesn’t fit well into the relational paradigm at all.
- Geospatial. Information about locations on the earth’s surface is essentially two-dimensional. Some geospatial apps use three dimensions.
- Object. There are two main reasons for using object datatypes. First, the data can have complex internal structures. Second, it can comprise a variety of simpler types. Object structures are well-suited for engineering and medical applications.
- XML. A great deal of XML is, at its heart, either relational/tabular data or text documents. Still, there are a variety of applications for which the most natural datatype truly is XML.
Numerous other datatypes are important as well, with the top runners-up probably being images, sound, video, time series (even though they’re numeric, they benefit from special handling).
Four major ways have evolved to manage data of non-tabular datatype, either on their own or within an essentially relational data management environment.
| Categories: Data types, GIS and geospatial, Native XML, Object, Text | 10 Comments |
Status of Software AG’s Tamino
Since I was researching Software AG anyway, I took the opportunity to ask about Software AG’s native XML DBMS Tamino, which certainly has some fans. Jim Fowler, Software AG’s Director of Market Development, Enterprise Transaction Systems, was kind enough to write up the following for me:
As you know, when Tamino was released in the late 1990s it was one of the first – if not the first – commercially available native XML database. We now have several hundred Tamino customers worldwide, and Software AG is fully committed to supporting our customers.
At the same time, we recognize that XML has matured and evolved in many different directions during the past decade;
| Categories: Data types, Native XML, Software AG | Leave a Comment |
Native XML performance, and Philip Howard on recent IBM DBMS announcements
Philip Howard went to at least one conference this month I didn’t, namely IBM’s, and wrote up some highlights. As usual, he seems to have been favorably impressed.
In one note, he says that IBM is claiming a 2-5X XML performance improvement. This is a good step, since one of my clients who evaluated such engines dismissed IBM early on for being an order of magnitude too slow. That client ultimately chose Marklogic, with Cache’ having been the only other choice to make the short list.
Speaking of IBM, I flew back from the Business Objects conference next to a guy who supports IMS. He told me that IBM has bragged of an actual new customer win for IMS within the past couple of years (a large bank in China). Read more
| Categories: IBM and DB2, Intersystems and Cache', Mark Logic, Native XML | Leave a Comment |
The Netezza Developer Network
Netezza has officially announced the Netezza Developer Network. Associated with that is a set of technical capabilities, which basically boil down to programming user-defined functions or other capabilities straight onto the Netezza nodes (aka SPUs). And this is specifically onto the FPGAs, not the PowerPC processors. In C. Technically, I think what this boils down to is: Read more
| Categories: Data types, Data warehouse appliances, Data warehousing, GIS and geospatial, Native XML, Netezza, PostgreSQL, SAS Institute | 8 Comments |
Applications for not-so-low-latency CEP
The highest-profile applications for complex event/stream processing are probably the ones that require super-low latency, especially in financial trading. However, as I already noted in writing about StreamBase and Truviso, there are plenty of other CEP apps with less extreme latency requirements.
Commonly, these are data reduction apps – i.e., there’s a gushing stream of inputs, and the CEP engine filters and “enhances” it, so that only a small, modified subset is sent forward. In other cases, disk-based systems could do the job perfectly well from a performance standpoint, but the pattern matching and filtering requirements are just a better fit for the CEP paradigm.
Read more
| Categories: Aleri and Coral8, Complex event processing (CEP), IBM and DB2, Memory-centric data management, Native XML, StreamBase | 1 Comment |
The Coral8 story
Complex event/stream processing vendor Coral8 raised its hand and offered a briefing – non-technical, alas, but at least it was a start. Here are some of the highlights: Read more
| Categories: Aleri and Coral8, Application areas, Complex event processing (CEP), Investment research and trading, Memory-centric data management, Native XML, StreamBase | Leave a Comment |
Nonstandard data management software — beyond the Bowling Alley?
I just finished a short Monash Letter on markets for nonstandard data management software. Of course, the whole thing is available only to Monash Advantage members, but here are some salient points:
- When new kinds of data are managed, new kinds of data management are used. More precisely, the old ways are tried first — but once they fail new technologies are tried out.
- Up through the “Bowling Alley,” markets for nonstandard data management technology commonly follow the classic Geoffrey Moore pattern. However, they rarely experience a “Tornado” or mass adoption.
- I think this is apt to change. My three strongest candidates are native XML, RDF, and memory-centric event/stream processing used for data reduction (as opposed to sub-millisecond latency, which I do think will continue to be a niche requirement).
| Categories: Complex event processing (CEP), Memory-centric data management, Native XML, RDF and graphs | Leave a Comment |
Native XML engine short list
I’ve been implying that the short list for native XML database engine vendors should be Mark Logic, IBM, and maybe Microsoft, on the theory that Progress and Intersystems tried the market and pulled back. Well, add Intersystems to the list, and not necessarily in last place. They’ve long had a very fast nonrelational engine in Cache’. Perhaps building Ensemble on it has induced them to sharpen up the XML capabilities again.
Anyhow, while I’m not at liberty to explain more of my reasoning (i.e., to disclose my evidence) — Cache’ should be taken seriously as an XML DBMS alternative … even if I never can seem to get a proper DBMS briefing from them (which is far from entirely being their fault).
| Categories: IBM and DB2, Intersystems and Cache', Mark Logic, Microsoft and SQL*Server, Native XML, Progress, Apama, and DataDirect | 1 Comment |
Lessons from EnterpriseDB
I had a nice conversation yesterday with Jim Mlodgenski of EnterpriseDB, covering both generalities and EnterpriseDB-specific stuff. Many of the generalities were predictable, and none were terribly shocking. Even so, I am dressed as Captain Obvious, and shall repeat a few of the ones I found interesting below:
| Categories: EnterpriseDB and Postgres Plus, Mid-range, Native XML, OLTP, Open source, Theory and architecture | 2 Comments |
Mark Logic and the MarkLogic Server
I’ve been interested in the Mark Logic story from the first time CEO Dave Kellogg told me about it. Basically, Mark Logic sells an XML-based DBMS optimized for text search, called MarkLogic Server. For obvious reasons, they don’t want to position it as a DBMS; hence they call it an “XML content server” instead. I posted about their marketing and application focus over on Text Technologies. In this post, I’ll dive a little deeper into the core technology.
| Categories: Mark Logic, Native XML | 3 Comments |
Philip Howard likes Viper
Philip Howard likes DB2’s Viper release. Truth be told, Philip Howard seems to like most products, whether they deserve it or not. But in this case, I think his analysis is spot-on.
| Categories: IBM and DB2, Native XML, OLTP | Leave a Comment |
DBMS2 at IBM
I had a chat a couple of weeks ago with Bob Picciano, who runs servers (i.e., DBMS) for IBM. I came away feeling that, while they don’t use that name, they’re well down the DBMS2 path. By no means is this SAP’s level of commitment; after all, they have to cater to traditional technology strategies as well. But they definitely seem to be getting there.
Why do I say that? Well, in no particular order:
- They have a huge commitment to a data integration business, with an increasing XML focus.
- Their favorite buzzword these days is “information-intensive,” which seems to amount to semi-composite apps that may talk in part to unstructured/semi-structured data.
- They’re serious about their XML data server.
- Unprompted – well, OK, he’s clearly read my stuff, but other than that it was unprompted – Bob referred to one of the key benefits (real and perceived) of XML storage as being “schema flexibility.”
- By accident or design, IBM has a multi-server, horses-for-courses DBMS strategy: DB2 in two important flavors, XML server, Multivalue/Pick (that’s growing, by the way), and so on.
The big piece of a DBMS2 strategy that IBM seems to be lacking is a data-oriented services repository. IBM has had disasters in the past with over-grand repository plans, so they’re treading cautiously this time around. There also might be an organizational issue; DBMS and integration technology sit in separate divisions, and I doubt it’s yet appreciated throughout IBM how central data is to an SOA strategy.
But that not-so-minor detail aside, IBM definitely seems to be developing a DBMS2-like technology vision.
| Categories: EAI, EII, ETL, ELT, ETLT, IBM and DB2, Native XML, OLTP | Leave a Comment |
Marklogic’s experiences — from the warhorse’s mouth!
Another subject I meant to blog about is what all I’ve learned from Mark Logic about customer uses for XML.
Well, I have a great workaround for that one. Mark Logic CEO Dave Kellogg has revved up what I think is the most interesting vendor-exec blog I’ve seen. So if you’re interested in search/publishing-style uses for native XML, I strongly encourage you to go browse his blog. (And he writes about a lot of other interesting stuff as well.)
| Categories: Mark Logic, Native XML | 1 Comment |
IBM’s definition of native XML
IBM’s recent press release on Viper says:
Viper is expected to be the only database product able to seamlessly manage both conventional relational data and pure XML data without requiring the XML data to be reformatted or placed into a large object within the database.
That, so far as I know, is true, at least among major products.
I’m willing to apply the “native” label to Microsoft’s implementation anyway, because conceptually there’s little or no necessary performance difference between their approach and IBM’s. (Dang. I thought I posted more details on that months ago. I need to remedy the lack soon.)
As for Oracle — well, right now Oracle has a bit of a competitive problem …
| Categories: IBM and DB2, Microsoft and SQL*Server, Native XML, Oracle | 1 Comment |
