November 29, 2010

Document-oriented DBMS without joins

When I talked with MarkLogic’s Ken Chestnut about MarkLogic 4.2, I was surprised to learn that MarkLogic really, truly doesn’t do anything like a join. Unlike some other non-SQL DBMS, MarkLogic has no SQL interface, no ODBC or JDBC. Nothing, nada. (MarkLogic has a Java interface for Xquery, but not for anything like SQL.)

Since MarkLogic and other XML DBMS are used in applications for brokerage trades and the like, I used that area as my example for a challenge question: What happens when one brokerage firm buys another? (Similar challenges could be made about medical records or consumer profiling.) The answer was that you just have to update or augment each existing record with the new firm’s information. And by the way, if you choose to augment, then you have the new and old information side-by-side, both of which could conceivably come in handy.

Document-oriented NoSQL DBMS such as CouchDB and MongoDB face similar challenges, of course. I didn’t pursue the matter in depth in either case, but:

I’m not totally sure what I think about joinlessness, but one way of looking at it could be:

Putting all that together, I’m inclined to think that for many applications, it’s OK to denormalize, or to have such a simple schema that normalization is moot. But even so, I’d be a lot more comfortable if a DBMS offered at least some way of doing a join.

All this raises a related question: What are transactions like in document-oriented DBMS? I’ve never pushed the point with MarkLogic, but when they talk of their ACID compliance they sound as if they are using the phrase in the usual way. MongoDB only lets you do transactions in single documents. I’ve never asked the question about CouchDB, but I do note with interest CouchDB’s “crash-only” architecture, which boils down to:

Comments

8 Responses to “Document-oriented DBMS without joins”

  1. Michael Blakeley on November 29th, 2010 2:25 pm

    I’m not sure what normalization has to do with ACID. Why wouldn’t a documented-oriented database implement ACID, at least as an option?

    For MarkLogic Server, my understanding is that transactions are ACID and look much like RDBMS transactions. If an update touches 100 documents, they all update at the same timestamp. The XPath, full-text, and range index entries that point to those documents also update in sync. In short, documents act much like RDBMS rows would.

  2. Why do we need database joins? on November 29th, 2010 7:27 pm

    […] Monash has a reasonable post where he points out that we need joins because we normalize. Furthermore, he offers reasons for why […]

  3. Curt Monash on November 29th, 2010 7:59 pm

    One of the stereotypical cases for transactional semantics is when you have orders and line items and so on, and they’re all in separate tables. That makes the most sense in an architecture where you can also have joins.

    An even more stereotypical case is one in which you credit one account and debit another, and those updates had better either both go through or neither. The tie to joins there isn’t as strong, but stylistically it makes one think of the kind of short-row OLTP where joins are likely to also come into play.

    That’s the kind of thing I was thinking of.

  4. J. Andrew Rogers on November 29th, 2010 8:29 pm

    There is only one common use case for joins I can think of where denormalization is not an option: transitive closure type queries (e.g. non-trivial graph search and traversal), which are typically implemented using recursive joins.

    Denormalization in these cases usually has a prohibitively expensive space complexity such that it is only an option for very “shallow” queries.

  5. tzahi on December 1st, 2010 8:24 am

    I second – Join is critical in some search-type applications.
    This is a dirty little secret that search engine vendors hide when they boast their ability to index database content. You have to pre-join all the tables that belong to a searchable object (aka de-normalization or flattening) in order to index them.
    I have seen intelligence applications that tried to flatten huge objects (> 20,000 rows) each time a row was updated. The poor indexing machines were working most of the time on few large, highly volatile objects. Lots of hardware was thrown at the problem with little help.
    The only once that have Join with text-search are Attivio ( AFAIK). That is the reason we have chosen them for huge intelligence-type application.
    FAST search had Join ambitions, but I think they gave it up when they became part of SharePoint.

  6. Notes on document-oriented NoSQL | DBMS 2 : DataBase Management System Services on February 7th, 2011 4:32 pm

    […] I previously noted, one downside to today’s document-oriented DBMS is that you can’t do joins. Let me now add that I think joins will be added to document DBMS in the future. Plausibility […]

  7. One database to rule them all? | DBMS 2 : DataBase Management System Services on February 28th, 2013 10:05 pm

    […] have indexes that at least support relational-like SELECTs. JOINs can be more problematic, but MarkLogic finally has them. Tokutek even offers a 3rd-party indexing option for […]

  8. Readings in Database Systems | DBMS 2 : DataBase Management System Services on December 10th, 2015 7:27 am

    […] with MarkLogic as the focus, in 2010 I was skeptical about document stores not offering joins. MarkLogic has since […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.