This is part of a three-post series:
My clients at Metamarkets are planning to open source part of their technology, called Druid, which is described in the Druid section of Metamarkets’ blog. The timing of when this will happen is a bit unclear; I know the target date under NDA, but it’s not set in stone. But if you care, you can probably contact the company to get involved earlier than the official unveiling.
I imagine that open-source Druid will be pretty bare-bones in its early days. Code was first checked in early in 2011, and Druid seems to have averaged around 1 full-time developer since then. What’s more, it’s not obvious that all the features I’m citing here will be open-sourced; indeed, some of the ones I’m describing probably won’t be.
In essence, Druid is a distributed analytic DBMS. Druid’s design choices are best understood when you recall that it was invented to support Metamarkets’ large-scale, RAM-speed, internet marketing/personalization SaaS (Software as a Service) offering. In particular:
- Druid tries to use RAM well.
- Druid tries to stay up all the time.
- Druid has multi-valued fields. (Numeric, but of course you can use encoding tricks to be effectively more general.)
- Druid’s big limitation is to assume that there’s literally only one (denormalized) table per query; you can’t even join to dimension tables.
- SQL is a bit of an afterthought; I would expect Druid’s SQL functionality to be pretty stripped-down out of the gate.
Interestingly, the single-table/multi-valued choice is echoed at WibiData, which deals with similar data sets. However, WibiData’s use cases are different from Metamarkets’, and in most respects the WibiData architecture is quite different from that of Metamarkets/Druid.
As for many DBMS, much of what’s interesting about Druid is how it organizes and chunks data. Most important, Druid has MVCC (Multi-Version Concurrency Control) on a segment-by-segment basis. That is, an update requires a new version of the whole segment to be written; while that happens, reads can continue on the old version unabated.
Obviously, this is more suited for streaming or batch-load scenarios than for ones with many single-row updates.
Other Druid specifics include:
- A Druid table must have a timestamp column.
- Druid data is stored in columns, in timestamp order.
- Druid data is commonly chunked into segments of 5-10 million rows. Data is partitioned by time and then perhaps also by some other dimension.
- There can be two sets of data storage servers, one for data that has arrived recently, the other for older data (e.g. >1 hour old). In that case, data is first persisted on one set of servers, then flushed to the other.
- Druid data is structured the same way in memory as on disk (memory mapping). More precisely, there seems to be memory mapping between generic persistent storage and virtual memory, with the operating system taking care of figuring out which parts of virtual memory need to be in actual RAM.
- Druid keeps compressed bitmap indexes on the various dimensions, on a segment-by-segment basis.
- Druid uses dictionary/token compression, with a separate dictionary for each segment. Token length is dynamic, based on column cardinality. Max length is 31 bits, which rarely is a problem, since a segment doesn’t usually hold a lot more than 2^33 rows.
- You can have different replication factors for different segments. You can read from all replicas.
For more on Druid, please see my post on Metamarkets’ back-end technology.