I caught up with my clients at MongoDB to discuss the recent MongoDB 2.6, along with some new statements of direction. The biggest takeaway is that the MongoDB product, along with the associated MMS (MongoDB Management Service), is growing up. Aspects include:
- An actual automation and management user interface, as opposed to the current management style, which is almost entirely via scripts (except for the monitoring UI).
- That’s scheduled for public beta in May, and general availability later this year.
- It will include some kind of integrated provisioning with VMware, OpenStack, et al.
- One goal is to let you apply database changes, software upgrades, etc. without taking the cluster down.
- A reasonable backup strategy.
- A snapshot copy is made of the database.
- A copy of the log is streamed somewhere.
- Periodically — the default seems to be 6 hours — the log is applied to create a new current snapshot.
- For point-in-time recovery, you take the last snapshot prior to the point, and roll forward to the desired point.
- A reasonable locking strategy!
- Document-level locking is all-but-promised for MongoDB 2.8.
- That means what it sounds like. (I mention this because sometimes an XML database winds up being one big document, which leads to confusing conversations about what’s going on.)
- Security. My eyes glaze over at the details, but several major buzzwords have been checked off.
- A general code rewrite to allow for (more) rapid addition of future features.
Of course, when a DBMS vendor rewrites its code, that’s a multi-year process. (I think of it at Oracle as spanning 6 years and 2 main-number releases.) With that caveat, the MongoDB rewrite story is something like:
- Updating has been reworked. Most of the benefits are coming later.
- Query optimization and execution have been reworked. Most of the benefits are coming later, except that …
- … you can now directly filter on multiple indexes in one query; previously you could only simulate doing that by pre-building a compound index.
- One of those future benefits is more index types, for example R-trees or inverted lists.
- Concurrency improvements are down the road.
- So are rewrites of the storage layer, including the introduction of compression.
Also, you can now straightforwardly transform data in a MongoDB database and write it into new datasets, something that evidently wasn’t easy to do before.
One thing that MongoDB is not doing is offer any ODBC/JDBC or other SQL interfaces. Rather, there’s some other API — I don’t know the details — whereby business intelligence tools or other systems can extract views, and a few BI vendors evidently are doing just that. In particular, MicroStrategy and QlikView were named, as well as a couple of open source usual-suspects.
As of 2.6, MongoDB seems to have a basic integrated text search capability — which however does not rise to the search functionality level that was in Oracle 7.3.2. In particular:
- 15 Western languages are supported with stopwords, tokenization, etc.
- Search predicates can be mixed into MongoDB queries.
- The search language isn’t very rich; for example, it lacks WHERE NEAR semantics.
- You can’t tweak the lexicon yourself.
And finally, some business and pricing notes:
- Two big aspects of the paid-versus-free version of MongoDB (the product line) are:
- Management tools.
- Well, actually, you can get the management tools for free, but only on a SaaS basis from MongoDB (the company).
- If you want them on premises or in your part of the cloud, you need to pay.
- If you want MongoDB (the company) to maintain your backups for you, you need to pay.
- Customer counts include:
- At least 1000 or so subscribers (counting by organization).
- Over 500 (additional?) customers for remote backup.
- 30 of the Fortune 100.
And finally, MongoDB did something many companies should, which is aggregate user success stories for which they may not be allowed to publish full details. Tidbits include:
- Over 100 organizations run clusters with more than 100 nodes. Some clusters exceed 1,000 nodes.
- Many clusters deliver hundreds of thousands of operations per second (combined read and write).
- MongoDB clusters routinely store hundreds of terabytes, and some store multiple petabytes of data. Over 150 clusters exceed 1 billion documents in size. Many manage more than 100 billion documents.