August 26, 2006

Mark Logic and the MarkLogic Server

I’ve been interested in the Mark Logic story from the first time CEO Dave Kellogg told me about it. Basically, Mark Logic sells an XML-based DBMS optimized for text search, called MarkLogic Server. For obvious reasons, they don’t want to position it as a DBMS; hence they call it an “XML content server” instead. I posted about their marketing and application focus over on Text Technologies. In this post, I’ll dive a little deeper into the core technology.

In essence, the MarkLogic technology is search-engine-plus. So let’s review how a conventional search engine works. Its core index amounts logically to a sparse matrix, whose “rows” refer to documents, and whose columns refer to words (or phrases, or n-grams). The matrix cell entries aren’t single values; rather, they’re lists of positions for the word’s possibly multiple appearances in the document. Otherwise, however, the whole thing looks a lot like an inverted-list or bitmapped index from the conventional DBMS world.

The “plus” is where XML comes in. In principle but not in practice, I am told, the rows are not just documents, but any document subset (chapter, heading, paragraph, whatever) marked up in XML. I didn’t ask exactly what that meant, but I strongly guess that the index is on documents, and then when a document is hit its segments are drilled down into via a different but well-integrated data access method. I think there’s also something to do with XML in the columns, but I didn’t really grasp that part at all.

So how different are MarkLogic’s capabilities from those of conventional RDBMS or search engines? Well, MarkLogic does a lot less than the RDBMS do, obviously, so a better question would be: What capabilities does MarkLogic offer that conventional competitors don’t?

Full-spectrum DBMS, like those from Oracle, Microsoft, and IBM, offer some sort of integrated SQL, XML, and text-search-DML. Thus, it’s hard for Mark Logic to beat the big guys in search functionality, although the company plausibly claims that somehow things work out more slickly with their technology than with the generalists’. And that’s even before considering performance, although since Mark Logic doesn’t talk about performance much I imagine it’s not a particular strong suit of theirs.

Where MarkLogic is more differentiated, at least vis-à-vis Oracle, is in retrieval. Oracle can index and retrieve entire XML documents in CLOBs, no problemo. But suppose you only want to retrieve a single paragraph of a document. Uh, that can be quite problematic …

As for MarkLogic vs. search engines – well, search engines generally aren’t too smart about XML. On the other hand, they have lots of performance and fine-grained relevancy-enhancing features that MarkLogic may not match. At least, FAST does.

And by the way – I wish they’d decide for once and for all whether or not there’s a space between “Mark” and “Logic.”


3 Responses to “Mark Logic and the MarkLogic Server”

  1. Text Technologies»Blog Archive » Mark Logic and the custom publishing business on August 26th, 2006 5:52 am

    […] For more on Mark Logic, and more insight about the industry in general, see CEO Dave Kellogg’s blog. For a technical discussion of MarkLogic, see my write-up. • • • […]

  2. Jeff Graber on June 12th, 2009 2:54 pm

    Thanks for your review. Have you updated your thoughts to 2009?

  3. Curt Monash on June 12th, 2009 4:42 pm

    Just to 2008. Follow the links and see. 🙂

Leave a Reply

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.