April 4, 2011

The MongoDB story

Along with CouchDB/Couchbase, MongoDB was one of the top examples I had in mind when I wrote about document-oriented NoSQL. Invented by 10gen, MongoDB is an open source, no-schema DBMS, so it is suitable for very quick development cycles. Accordingly, there are a lot of MongoDB users who build small things quickly. But MongoDB has heftier uses as well, and naturally I’m focused more on those.

MongoDB’s data model is based on BSON, which seems to be JSON-on-steroids. In particular:

You just bang things into single BSON objects managed by MongoDB; there is nothing like a foreign key to relate objects. However …
… there are fields, datatypes, and so on within MongoDB BSON objects. The fields are indexed.
There’s a multi-value/nested-data-structure flavor to MongoDB; for example, a BSON object might store multiple addresses in an array.
You can’t do joins in MongoDB. Instead, you are encouraged to put what might be related records in a relational database into a single MongoDB object. If that doesn’t suffice, then use client-side logic to do the equivalent of joins. If that doesn’t suffice either, you’re not looking at a good MongoDB use case.

MongoDB has integrated MapReduce. Natural uses include:

The usual kinds of transformations one might do via MapReduce.
Aggregations one might otherwise do in SQL (e.g. GROUP BY kinds of things are an obvious MapReduce fit).

Improved aggregation/MapReduce performance is a roadmap item.

However, Dwight said MongoDB has excellent performance in simple real-time reporting, for example updating a counter 10,000 times per second. When I asked him why, reasons included:

A memory-mapped data model.
Deferred writes — a write might take a couple of seconds to actually persist.
Optimism — you don’t have to wait for an acknowledgement if you write something to the database.
“Upsert in place” – update in place without checking whether you’re doing a write or insert.
General lack of overhead.

Inspired in part by Schooner’s internal benchmarks, I’ve come to think that, apples-to-apples, even the simplest key-value store will have < 3X single-node performance advantage over well-implemented MySQL. (Read, write, or blended.) The 10gen guys don’t dispute that. However, they point out that a single MongoDB request can process the equivalent of many relational rows, opening the possibility of much greater performance gains than that. In particular, there seem to be some Drupal implementations enjoying huge MongoDB-based speed-ups.

On a heavy-duty server (8-12 cores, 16-64 gigs of RAM), MongoDB can apparently do 20-30,000 writes or 100,000 reads per second. Improved concurrency and mixed read/write performance is coming, although obviously 10gen would think that MongoDB does pretty well in those areas already. The largest known MongoDB system does about 1 million reads/second. I would imagine that those figures require the database to fit into RAM, given:

MongoDB’s memory-mapped architecture.
This post mortem of the MongoDB Foursquare outage.

I’ve gotten conflicting signals as to whether there are any multi-hundred-node MongoDB deployments. But there are “lots” of 20-40 node ones, as well as lots under 10 nodes. Note that 1 million reads/second naively sounds as if it could be achieved on 10 MongoDB nodes — but I’d guess that’s not really the configuration. 🙂

MongoDB’s scale-out story starts with transparent sharding, although I think transparent sharding was a new feature in MongoDB 1.6 last summer. So far as I understand, what MongoDB sacrifices in the CAP Theorem is the P — partitions happen, and when they do you might not be able to write to the shard you want to. A MongoDB shard can have multiple nodes, in a master-slave set-up, and there’s some flexibility as to how intra-node consistency is handled.

MongoDB functionality futures include full-text search, and extensions to the general query language.

Categories: Clustering, Data models and architecture, MapReduce, MongoDB, NoSQL, Parallelization

Subscribe to our complete feed!

Comments

10 Responses to “The MongoDB story”

Vlad Rodionov on April 4th, 2011 1:16 pm

Dear 10gen,

1. Poor Map/Reduce performance is a show stopper for many potential users who want use MongoDB not only for simple counters stuff.

2. Do not spent resources on your own full-text search engine. Make use of 3rd party search engines: Solr, Elastic search etc.
dwight on April 4th, 2011 6:25 pm

yes you are right when the read rate per second is super high, generally things are in ram. if/when we have to do a disk seek, the same limiters apply to us as everyone else. that said there are some production mongodb users who do many thousands of random i/o’s per second with SSD right now. also in theory table scans are pretty fast…

@Vlad we are working on #1 right now, I hear you.
10gen company basics | DBMS 2 : DataBase Management System Services on April 6th, 2011 5:16 am

[…] detail, please see the slides linked above. I shall now do a separate post that is actually about MongoDB. Categories: Market share, MongoDB and 10gen Subscribe to our complete […]
Matilda on April 6th, 2011 8:04 am

I see you feature quite a number of data base management system. I really appreciate the business implications (advantages, disadvantages) you mention in your articles. Another open source DBMS you might find of interest is CUBRID. I’m not gonna start marketing its benefits here, but it might be worth a look.
Chris Bird on October 30th, 2012 9:43 am

Observations on coming to MongoDB from my long background in other forms of data management.

I am only a couple of weeeks into training with 10GEN’s class.

It is a mind blowingly different model. Most of my training has been in normalization, (relatively) normalized models, reduction of redundancy (expensive memory both disk and RAM), data consistency (removal of anomalies).

The syntax is – at least for this old fogey – quite difficult. Once I realized it was QBE like (remembering the DB2 QBE from the mainframe days, it started to make more sense. Think lisp meets QBE and imagine what the children would be like. I am wearing out the parentheses (of all kinds) on my keyboard. However once you start to realize and get comfortable with data being functions, it does start to make sense. so an (obvious?!) field name like “name” and an (obvious?!) operator like $gt use quite similar functional expressions to express query constraints.

I still (and maybe because it is early) think in terms of joins and views – just that the embedded items in a document are pre-built or materialized joins. The update anomalies are interesting to me. Interesting though because it brings into bas-relief those things that are composition type associations and those that are relationship type associations. In the relational models those are treated the same. In MongoDB they can be imagineered differently.

So, I will persevere with having my head bent by this. I am certainly glad of the learning opportunity.

I will post more if I succeed in having insights!
Notes and comments — October 31, 2012 | DBMS 2 : DataBase Management System Services on October 31st, 2012 4:54 am

[…] Chris Bird is learning MongoDB. As is common for Chris, his comments are both amusing and […]
Chris Bird on November 2nd, 2012 2:32 pm

As with all languages, and make no mistake there is a language for accessing MongoDB, there are some aspects that are just plain unpleasant. Experts can ignore them, but novices will trip over them. There are some cases, using the Mongo Shell that are what I refer to in this blog post as “syntactic sucralose”. You eat it when you have to make something palatable, but it leaves a bitterness. Here’s the piece.
http://randomtechnologythoughts.blogspot.com/2012/11/syntactic-sucralose.html
Migrating from a relational to a NoSQL cloud database | TechRepublic on December 6th, 2012 3:01 pm

[…] of this shift is one of nomenclature. For example, as relational database expert Chris Bird points out, the syntax in NoSQL Land differs greatly from SQL, and may require some mental gymnastics for new […]
Migrating to NoSQL: The Process | Big Data | DATAVERSITY on December 7th, 2012 3:04 am

[…] up to speed. Part of this shift is one of nomenclature. For example, as relational database expert Chris Bird points out, the syntax in NoSQL Land differs greatly from SQL, and may require some mental gymnastics for new […]
mahmood on May 11th, 2013 6:17 am

hi.
how i can add a function to mongo db source code(to DBMS)?
Thank you.
good luck.
bye.

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

The MongoDB story

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin