June 16, 2012

Metamarkets Druid overview

This is part of a three-post series:

My clients at Metamarkets are planning to open source part of their technology, called Druid, which is described in the Druid section of Metamarkets’ blog. The timing of when this will happen is a bit unclear; I know the target date under NDA, but it’s not set in stone. But if you care, you can probably contact the company to get involved earlier than the official unveiling.

I imagine that open-source Druid will be pretty bare-bones in its early days. Code was first checked in early in 2011, and Druid seems to have averaged around 1 full-time developer since then. What’s more, it’s not obvious that all the features I’m citing here will be open-sourced; indeed, some of the ones I’m describing probably won’t be.

In essence, Druid is a distributed analytic DBMS. Druid’s design choices are best understood when you recall that it was invented to support Metamarkets’ large-scale, RAM-speed, internet marketing/personalization SaaS (Software as a Service) offering. In particular:

Interestingly, the single-table/multi-valued choice is echoed at WibiData, which deals with similar data sets. However, WibiData’s use cases are different from Metamarkets’, and in most respects the WibiData architecture is quite different from that of Metamarkets/Druid.

As for many DBMS, much of what’s interesting about Druid is how it organizes and chunks data. Most important, Druid has MVCC (Multi-Version Concurrency Control) on a segment-by-segment basis. That is, an update requires a new version of the whole segment to be written; while that happens, reads can continue on the old version unabated.

Obviously, this is more suited for streaming or batch-load scenarios than for ones with many single-row updates.

Other Druid specifics include:

For more on Druid, please see my post on Metamarkets’ back-end technology.

Comments

8 Responses to “Metamarkets Druid overview”

  1. Introduction to Metamarkets and Druid | DBMS 2 : DataBase Management System Services on June 16th, 2012 5:54 pm

    [...] Druid overview [...]

  2. Metamarkets open sources Druid, its in-memory database — Data | GigaOM on October 24th, 2012 9:03 am

    [...] Metamarkets runs Druid on an 800-core system running on Amazon EC2. Others have done a decent job explaining what Druid seems good for and where the tradeoffs might [...]

  3. Metamarkets open sources Druid, its in-memory database ← techtings on October 24th, 2012 9:06 am

    [...] Metamarkets runs Druid on an 800-core system running on Amazon EC2. Others have done a decent job explaining what Druid seems good for and where the tradeoffs might [...]

  4. Patrick Wendell on October 31st, 2012 12:54 am

    This is the best explanation of Druid that exists anywhere – inclusive of their Marketing material, the Strata talk, and the documentation in the code. Thanks!

  5. Curt Monash on October 31st, 2012 2:29 am

    Thanks for the kind words!

    I put a lot of effort into it, but was still frustrated by the results (mainly around the in-memory part, not Druid itself).

  6. Notes and comments — October 31, 2012 | DBMS 2 : DataBase Management System Services on November 1st, 2012 7:16 am

    [...] Metamarkets’ Druid was open-sourced. Numerous other product introductions and so on that I’ve hinted at have [...]

  7. Big Data Warehouse in the cloud « Ravi's Technology Blog on November 28th, 2012 10:04 pm

    [...] HANA but cringe at the licensing costs?  One option is to look into open source alternatives like Druid which was created by the vendor MetaMarkets.   Druid claims to provide real-time analytics using [...]

  8. Hadoop’s Successors | Christopher Berry on October 5th, 2013 11:22 am

    [...] “I would encourage you to keep an eye on Metamarkets’ Druid, which Curt Monash recently covered: http://www.dbms2.com/2012/06/16/metamarkets-druid-overview/ [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.