October 5, 2008

MarkLogic architecture deep dive

While I previously posted in great detail about how MarkLogic Server is an ACID-compliant XML-oriented DBMS with integrated text search that indexes everything in real time and executes range queries fairly quickly, I didn’t have a good feel for how all those apparently contradictory characteristics fit into a single product. But I finally had a call with Mark Logic Director of Engineering Ron Avnur, and think I have a better grasp of the MarkLogic architecture and story.

Ron described MarkLogic Server as a DBMS for trees. That is, MarkLogic is designed for an XML data model that’s all about nodes and relationships, but not necessarily for XML data per se. The fundamental paradigm is thus searches for nodes and/or trees, for example:

Also important are aggregates and perhaps also — although Mark Logic rarely mentions them unless prompted — range queries.

So let’s start with some basics about Mark Logic Server’s indexing and storage strategy:

In addition to those indexes, which comprise what Mark Logic calls the “Universal Index,” there are scalar indexes. These are columnar, where a column can cover either a single element name (i.e., attribute name) or a set of (presumably related) element names. Two copies of each column are kept, one sorted by TreeID and one by value. Mark Logic believes these suffice to give good performance on aggregations and range lookups.

With all those columns and term lists, the question naturally arises: What about the MarkLogic update strategy? Highlights include:

Why not some kind of intelligent horizontal partitioning, whether range-based or otherwise? Well, we’ve finally gotten to a MarkLogic weakness. Join performance is not a MarkLogic long suit or Mark Logic priority. Indeed, Mark Logic insists that XML data is inherently denormalized, with joins (complex or otherwise) therefore rarely arising.

For many of today’s use cases, that’s probably true. For example, when I heard this I quickly started thinking “What if a book publisher changes name? That information is in a lot of individual book records.” But the fact is, people want author/publisher information that’s accurate as of the time of release, not updated for subsequent publishing company mergers and the like.

For other use cases, however, joins may be more important. For example:

Comments

5 Responses to “MarkLogic architecture deep dive”

  1. The Mark Logic story in XML database management | DBMS2 -- DataBase Management System Services on October 5th, 2008 7:25 am

    […] An October, 2008 post takes a deeper dive into the MarkLogic architecture. Share: These icons link to social bookmarking sites where readers can share and discover new […]

  2. Overview of IBM DB2 pureXML | DBMS2 -- DataBase Management System Services on October 5th, 2008 8:34 am

    […] also writing up a September call from which I finally figured out (I think) the essence of how MarkLogic Server works – but only after five months of trying. It turns out that MarkLogic works rather differently from […]

  3. Maybe text mining SHOULD be playing a bigger role in data warehousing | Text Technologies on October 24th, 2008 12:39 am

    […] special cases like national intelligence or very broad-scale web surveys could run larger, as per the biggest Marklogic databases. Medline runs larger too. Share: These icons link to social bookmarking sites where readers can […]

  4. Document-oriented DBMS without joins | DBMS 2 : DataBase Management System Services on November 29th, 2010 4:56 am

    […] What happens when one brokerage firm buys another? (Similar challenges could be made about medical records or consumer profiling.) The answer was that you just have to update or augment each existing record with the new […]

  5. MarkLogic and its document DBMS | DBMS 2 : DataBase Management System Services on November 29th, 2010 4:58 am

    […] has the same technical approach I previously […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.