When people talk about document-oriented NoSQL or some similar term, they usually mean something like:
Or, if they really mean,
The essence of whatever it is that CouchDB and MongoDB have in common.
well, that’s pretty much the same thing as what I said in the first place.
Of the various questions that might arise, three of the more definitional ones are:
- Why JSON rather than XML?
- What’s with this fluidity between the terms “document” and “object”?
- Are you serious about the lack of joins?
Let me take a crack at each.
Like XML, JSON is a data-interchange format that has been repurposed as a data persistence model. JSON is evidently beating out XML in web applications, for reasons including:
- XML is more verbose and slower than JSON. (Whether this matters or not is of course use-case-dependent.)
- Like SQL, XML requires what some web programmers regard as too much formalism and up-front specification.
- JSON is regarded as being more suited to straightforwardly fielded data, while XML is regarded as being more suited to “mixed content” — e.g., real text documents.
- In general, XML feels “enterprisey” to developers who don’t like that feel.
So, in essence:
- The reasons JSON beats XML for web application data interchange have some applicability to web application data storage as well.
- There’s ever more JSON around, at the expense of XML.
But truth be told, I don’t think XML and JSON actually go head to head against each other on the DBMS side very often at all. E.g., Dwight Merriman (the 10gen/MongoDB guy) told me he never, ever competes against MarkLogic, and I found that very credible.*
*Proof point: Dwight was clueless about MarkLogic specifics in a way he never would be if they were any kind of competitive consideration for him.
Note that the one area where (almost) everybody agrees XML wins is for what one might call “real” documents. By way of contrast, JSON is best suited for stringing data attributes and values together. So the “documents” that JSON models can indeed just as reasonably be called “objects.”
That said, JSON-based DBMS are not what one would normally call object-oriented DBMS; for an example of those, consider Intersystems Cache’. And just to close the loop on confusion — Cache’ can also be used as an XML DBMS.
As I previously noted, one downside to today’s document-oriented DBMS is that you can’t do joins. Let me now add that I think joins will be added to document DBMS in the future. Plausibility arguments for this opinion include:
- MarkLogic — the XML database gold standard — sells to enterprises, and enterprises like joins.
- The alternative to joins in CouchDB and MongoDB is in essence MapReduce. Well, Hive proves that you can do joins on top of MapReduce if you want to. (So, for that matter, does Aster Data nCluster; Aster says its SQL parallelism is built on top of MapReduce.)
- Intersystems quite happily put SQL on top of an object-oriented DBMS, Cache’. And Cache’ is so similar to an XML DBMS that it in fact is sometimes used as one.
But that is indeed a future. For discussion of the current state of affairs, I refer you to my earlier post on the subject of joinlessness linked above.