The Mark Logic story in XML database management
Mark Logic* has an interesting, complex story. They sell a technology stack based on an XML DBMS with text search designed in from the get go. They usually want to be known as a “content” technology provider rather than a DBMS vendor, but not quite always.
*Note: Product name = MarkLogic, company name = Mark Logic.
I’ve agreed to do a white paper and webcast for Mark Logic (sponsored, of course). But before I start serious work on those, I want to blog based on what I know. As always, feedback is warmly encouraged.
Some of the big differences between MarkLogic and other DBMS are:
-
MarkLogic’s primary DML/DDL (Data Manipulation/Description Language) is XQuery. Indeed, Mark Logic is in many ways the chief standard-bearer for pure XQuery, as opposed to SQL/XQuery hybrids.
-
MarkLogic’s XML processing is much faster than many alternatives. A client told me last year that – in an application that had nothing to do with MarkLogic’s traditional strength of text search – MarkLogic’s performance beat IBM DB2/Viper’s by “an order of magnitude.” And I think they were using the phrase correctly (i.e., 10X or so).
-
MarkLogic indexes all kinds of entities and facts, automagically, without any schema-prebuilding. (Nor, I gather, do they depend on individual documents carrying proper DTDs.) So there actually isn’t a lot of DDL. (Mark Logic claims in one test MarkLogic had more or less 0 DDL, vs. 20,000 lines in DB2/Viper.) What MarkLogic indexes includes, as Mark Logic puts it:
-
Every word
-
Every piece of structure
-
Every parent-child relationship
-
Every value.
-
-
As opposed to most extended-relational DBMS, MarkLogic indexes all kinds of information in a single, tightly integrated index. Mark Logic claims this is part of the reason for MarkLogic’s good performance, and asserts that competitors’ lack of full integration often causes overhead and/or gets in the way of optimal query plans. (For example, Mark Logic claims that Microsoft SQL Server’s optimizer is so FUBARed that it always does the text part of a search first.) Interestingly, Intersystems’ object-oriented Cache’ does pretty much the same thing.
-
MarkLogic is proud of its text search extensions to XQuery. I’ve neglected to ask how that relates to the XQuery standards process. (For example, text search wasn’t integrated into the SQL standard until SQL3.)
Other architectural highlights include:
-
MarkLogic uses timestamps and appends for updates, rather than updates-in-place, much like Netezza or Illustra. Cleanup is done in the background. As long as your volume of changes (as opposed to inserts or reads) is sufficiently low, this can be more efficient than traditional approaches. Timestamping also makes it easy to write certain application functionality in publishing (“go live” times for content is a current use) and compliance (a possible future).
-
MarkLogic is ACID-compliant. Thus, you can read data as soon as it’s inserted, without a separate re-indexing step. Other native XML systems may not have that property (e.g., Mark Logic asserts DB2 Viper doesn’t.)
-
Mark Logic claims MarkLogic has relatively efficient (optional) range indexes. (This was in response to a question; details are secret.) Inverted-list DBMS like ADABAS and Model 204 have been doing decently efficient range queries for 30 years, so this claim is both credible and not terribly important.
Related links:
-
A companion post over on Text Technologies takes a text search view of MarkLogic.
-
One of the leading sites on text analytics and general enterprise software marketing, Dave Kellogg’s Mark Logic CEO Blog.
Please subscribe to our feed!
Comments
One Response to “The Mark Logic story in XML database management”
Leave a Reply










[...] two posts this morning on Mark Logic and it’s MarkLogic product family. The main one, over on DBMS2, outlines the technical architecture — focusing on MarkLogic as an XML database management [...]