November 29, 2010

MarkLogic and its document DBMS

This post has been long in the writing for several reasons, the biggest being that I stopped working for almost a month due to family issues. Please forgive its particularly choppy writing style; having waited this long already, I now lack the patience to further clean it up.

MarkLogic:

I think it’s time to do a catch-up post about MarkLogic. 🙂

As a practical matter, most MarkLogic users fall into the overlapping areas of  “content publishing” and “search.” However, in saying that I should note that from a search standpoint, MarkLogic is both less and more than a standard state-of-the-art text search engine. What I mean is:

However, MarkLogic does not do relational JOINs.

A couple of specific notes on MarkLogic’s search capabilities:

One exception to MarkLogic’s historical search/publishing focus is OEM OpenConnect, a then-client to whom I recommended MarkLogic, and who found MarkLogic to have great performance sorting through what amounts to graphs or trees of log events. Other exceptions may be found in the financial services market, where MarkLogic speaks of one customer that stores the complex information defining derivatives contracts as XML documents.

There are multiple aspects to the conjunction of MarkLogic and ETL/ELT/ELTL (Extract/Transform/Load/Transform):

MarkLogic scales out via a form of node heterogeneity, in that there are two types of nodes — evaluator and data. All MarkLogic evaluator nodes talk to all MarkLogic data nodes, and hence vice-versa. There is no third kind of “head” node, so I presume any evaluator node can act as a head for any particular query. The whole thing sounds fairly Exadata-like, if we ignore the fact that MarkLogic probably got there before Oracle Exadata shipped. Documents are distributed among data nodes via a hashing mechanism.

Note: The choice of hash key presumably doesn’t matter as much as it does in a relational DBMS, since MarkLogic has no concept of join, hash join, or having a join be accelerated by the fact that data is pre-hashed on the join key.

One focus area for MarkLogic 4.2 was revamping the MarkLogic availability story. In particular:

Comments

7 Responses to “MarkLogic and its document DBMS”

  1. Document-oriented DBMS truly without joins | DBMS 2 : DataBase Management System Services on November 29th, 2010 4:55 am

    […] I talked with MarkLogic’s Ken Chestnut about MarkLogic 4.2, I was surprised to learn that MarkLogic really, truly doesn’t do anything like a join. […]

  2. Data that is derived, augmented, enhanced, adjusted, or cooked | DBMS 2 : DataBase Management System Services on November 30th, 2010 1:30 am

    […] No way are you going to redo the whole process each time you do a query. Not coincidentally, MarkLogic — a huge fraction of whose business to date is for text-oriented uses — thinks heavily […]

  3. MarkLogic and its document DBMS | IT Information Technology on December 1st, 2010 10:21 am

    […] […]

  4. Dave Kellogg on December 4th, 2010 1:34 pm

    Curt,

    Glad you’re back on the scene and thanks for the write-up. If the customer slide is the one I’m thinking of (as you know I didn’t do the briefing with you), then we asked it be removed because it also contains an axis-less chart of company revenues which could be used (by the anal equipped with a good ruler) to calculate growth rates and such.

    The removal request presumably had nothing to do with the second chart which was about customer growth and the number of customers.

    Best,
    Dave

  5. Layering of database technology & DBMS with multiple DMLs | DBMS 2 : DataBase Management System Services on September 24th, 2013 4:40 am

    […] in a predicate-pushdown “storage” layer — most famously Oracle Exadata, but also MarkLogic, InfiniDB, and […]

  6. Couchbase 4.0 and related subjects | DBMS 2 : DataBase Management System Services on October 15th, 2015 11:18 am

    […] Other examples include Oracle Exadata, MySQL, MongoDB (now that it has pluggable storage engines), MarkLogic, and of course the whole worlds of Hadoop and […]

  7. Neil Bradley on April 22nd, 2016 4:30 am

    MarkLogic CAN do joins. I am not sure about SQL Joins, but it certainly does XQuery joins. I dont know where that idea that it couldnt do joins came from.

    Admittedly, XQuery joins are not as efficient as SQL joins, but then document databases in general discourage the normalisation approach that requires so many joins in typical SQL-based systems.

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.