I spoke with Eliot Horowitz and Max Schierson of 10gen last month about MongoDB users and use cases. The biggest clusters they came up with weren’t much over 100 nodes, but clusters an order of magnitude bigger were under development. The 100 node one we talked the most about had 33 replica sets, each with about 100 gigabytes of data, so that’s in the 3-4 terabyte range total. In general, the largest MongoDB databases are 20-30 TB; I’d guess those really do use the bulk of available disk space.
10gen recommends solid-state storage in many cases. In some cases solid-state lets you get away with fewer total nodes. 10gen also likes Flashcache (Facebook-developed technology to put a flash cache in front of hard disks). But the 100-node example mentioned above uses spinning disk.
Use cases 10gen is proud of include:
- Lots of user profile maintenance, including at online ad companies. This includes full user ad impression data. (I’ve argued for a while that user profile information belongs in something like a NoSQL database.)
- A big-name web company that wants to inspect every packet that enters their network, and replaced Splunk with MongoDB for performance reasons.
- A big-name photo/video site whose metadata is all in MongoDB. (That’s the kind of thing that often makes for good MarkLogic use cases.)
But actually, the reason we had the call was to review cases where MongoDB’s schemaless nature was significant. Examples of those included:
- A couple of top examples were of the kind “A bunch of apps, similar but not the same.” For MTV, it’s a single content management system for a bunch of websites. For Disney Playdom, it’s different schemas for every game.
- For a wireless telco, the issue was a product catalog in which devices and service plans called for very different schemas, and which the telco felt had thus become unmanageable in Oracle.
- For Craigslist, the issue wasn’t programming so much as performance — ALTER TABLE operations took months in MySQL, and that’s not a typo, although I’ll confess to not understanding why this was the case.
The 10gen guys went on to claim that schemalessness is helpful for incremental development in general, the point being that you don’t have a database-modification step. To some extent, changes can even be rolled back more easily than if you actually changed your schemas.