The MongoDB story
Along with CouchDB/Couchbase, MongoDB was one of the top examples I had in mind when I wrote about document-oriented NoSQL. Invented by 10gen, MongoDB is an open source, no-schema DBMS, so it is suitable for very quick development cycles. Accordingly, there are a lot of MongoDB users who build small things quickly. But MongoDB has heftier uses as well, and naturally I’m focused more on those.
MongoDB’s data model is based on BSON, which seems to be JSON-on-steroids. In particular:
- You just bang things into single BSON objects managed by MongoDB; there is nothing like a foreign key to relate objects. However …
- … there are fields, datatypes, and so on within MongoDB BSON objects. The fields are indexed.
- There’s a multi-value/nested-data-structure flavor to MongoDB; for example, a BSON object might store multiple addresses in an array.
- You can’t do joins in MongoDB. Instead, you are encouraged to put what might be related records in a relational database into a single MongoDB object. If that doesn’t suffice, then use client-side logic to do the equivalent of joins. If that doesn’t suffice either, you’re not looking at a good MongoDB use case.
MongoDB has integrated MapReduce. Natural uses include:
- The usual kinds of transformations one might do via MapReduce.
- Aggregations one might otherwise do in SQL (e.g. GROUP BY kinds of things are an obvious MapReduce fit).
Improved aggregation/MapReduce performance is a roadmap item.
However, Dwight said MongoDB has excellent performance in simple real-time reporting, for example updating a counter 10,000 times per second. When I asked him why, reasons included:
- A memory-mapped data model.
- Deferred writes — a write might take a couple of seconds to actually persist.
- Optimism — you don’t have to wait for an acknowledgement if you write something to the database.
- “Upsert in place” – update in place without checking whether you’re doing a write or insert.
- General lack of overhead.
Inspired in part by Schooner’s internal benchmarks, I’ve come to think that, apples-to-apples, even the simplest key-value store will have < 3X single-node performance advantage over well-implemented MySQL. (Read, write, or blended.) The 10gen guys don’t dispute that. However, they point out that a single MongoDB request can process the equivalent of many relational rows, opening the possibility of much greater performance gains than that. In particular, there seem to be some Drupal implementations enjoying huge MongoDB-based speed-ups.
On a heavy-duty server (8-12 cores, 16-64 gigs of RAM), MongoDB can apparently do 20-30,000 writes or 100,000 reads per second. Improved concurrency and mixed read/write performance is coming, although obviously 10gen would think that MongoDB does pretty well in those areas already. The largest known MongoDB system does about 1 million reads/second. I would imagine that those figures require the database to fit into RAM, given:
- MongoDB’s memory-mapped architecture.
- This post mortem of the MongoDB Foursquare outage.
I’ve gotten conflicting signals as to whether there are any multi-hundred-node MongoDB deployments. But there are “lots” of 20-40 node ones, as well as lots under 10 nodes. Note that 1 million reads/second naively sounds as if it could be achieved on 10 MongoDB nodes — but I’d guess that’s not really the configuration. 🙂
MongoDB’s scale-out story starts with transparent sharding, although I think transparent sharding was a new feature in MongoDB 1.6 last summer. So far as I understand, what MongoDB sacrifices in the CAP Theorem is the P — partitions happen, and when they do you might not be able to write to the shard you want to. A MongoDB shard can have multiple nodes, in a master-slave set-up, and there’s some flexibility as to how intra-node consistency is handled.
MongoDB functionality futures include full-text search, and extensions to the general query language.
Comments
10 Responses to “The MongoDB story”
Leave a Reply
Dear 10gen,
1. Poor Map/Reduce performance is a show stopper for many potential users who want use MongoDB not only for simple counters stuff.
2. Do not spent resources on your own full-text search engine. Make use of 3rd party search engines: Solr, Elastic search etc.
yes you are right when the read rate per second is super high, generally things are in ram. if/when we have to do a disk seek, the same limiters apply to us as everyone else. that said there are some production mongodb users who do many thousands of random i/o’s per second with SSD right now. also in theory table scans are pretty fast…
@Vlad we are working on #1 right now, I hear you.
[…] detail, please see the slides linked above. I shall now do a separate post that is actually about MongoDB. Categories: Market share, MongoDB and 10gen Subscribe to our complete […]
I see you feature quite a number of data base management system. I really appreciate the business implications (advantages, disadvantages) you mention in your articles. Another open source DBMS you might find of interest is CUBRID. I’m not gonna start marketing its benefits here, but it might be worth a look.
Observations on coming to MongoDB from my long background in other forms of data management.
I am only a couple of weeeks into training with 10GEN’s class.
It is a mind blowingly different model. Most of my training has been in normalization, (relatively) normalized models, reduction of redundancy (expensive memory both disk and RAM), data consistency (removal of anomalies).
The syntax is – at least for this old fogey – quite difficult. Once I realized it was QBE like (remembering the DB2 QBE from the mainframe days, it started to make more sense. Think lisp meets QBE and imagine what the children would be like. I am wearing out the parentheses (of all kinds) on my keyboard. However once you start to realize and get comfortable with data being functions, it does start to make sense. so an (obvious?!) field name like “name” and an (obvious?!) operator like $gt use quite similar functional expressions to express query constraints.
I still (and maybe because it is early) think in terms of joins and views – just that the embedded items in a document are pre-built or materialized joins. The update anomalies are interesting to me. Interesting though because it brings into bas-relief those things that are composition type associations and those that are relationship type associations. In the relational models those are treated the same. In MongoDB they can be imagineered differently.
So, I will persevere with having my head bent by this. I am certainly glad of the learning opportunity.
I will post more if I succeed in having insights!
[…] Chris Bird is learning MongoDB. As is common for Chris, his comments are both amusing and […]
As with all languages, and make no mistake there is a language for accessing MongoDB, there are some aspects that are just plain unpleasant. Experts can ignore them, but novices will trip over them. There are some cases, using the Mongo Shell that are what I refer to in this blog post as “syntactic sucralose”. You eat it when you have to make something palatable, but it leaves a bitterness. Here’s the piece.
http://randomtechnologythoughts.blogspot.com/2012/11/syntactic-sucralose.html
[…] of this shift is one of nomenclature. For example, as relational database expert Chris Bird points out, the syntax in NoSQL Land differs greatly from SQL, and may require some mental gymnastics for new […]
[…] up to speed. Part of this shift is one of nomenclature. For example, as relational database expert Chris Bird points out, the syntax in NoSQL Land differs greatly from SQL, and may require some mental gymnastics for new […]
hi.
how i can add a function to mongo db source code(to DBMS)?
Thank you.
good luck.
bye.