Metamarkets and Druid

Discussion of Metamarkets and its to-be-open-sourced Druid technology.

October 31, 2013

Specialized business intelligence

A remarkable number of vendors are involved in what might be called “specialized business intelligence”. Some don’t want to call it that, because they think that “BI” is old and passé’, and what they do is new and better. Still, if we define BI technology as, more or less:

then BI is indeed a big part of what they’re doing.

Why would vendors want to specialize their BI technology? The main reason would be to suit it for situations in which even the best general-purpose BI options aren’t good enough. The obvious scenarios are those in which the mismatch is one or both of:

For example, in no particular order: Read more

August 14, 2013

The two sides of BI

As is the case for most important categories of technology, discussions of BI can get confused. I’ve remarked in the past that there are numerous kinds of BI, and that the very origin of the term “business intelligence” can’t even be pinned down to the nearest century. But the most fundamental confusion of all is that business intelligence technology really is two different things, which in simplest terms may be categorized as user interface (UI) and platform* technology. And so:

*I wanted to say “server” or “server-side” instead of “platform”, as I dislike the latter word. But it’s too inaccurate, for example in the case of the original Cognos PowerPlay, and also in various thin-client scenarios.

Key aspects of BI platform technology can include:

Read more

October 31, 2012

Notes and comments — October 31, 2012

Time for another catch-all post. First and saddest — one of the earliest great commenters on this blog, and a beloved figure in the Boston-area database community, was Dan Weinreb, whom I had known since some Symbolics briefings in the early 1980s. He passed away recently, much much much too young. Looking back for a couple of examples — even if you’ve never heard of him before, I see that Dan ‘s 2009 comment on Tokutek is still interesting today, and so is a post on his own blog disagreeing with some of my choices in terminology.

Otherwise, in no particular order:

1. Chris Bird is learning MongoDB. As is common for Chris, his comments are both amusing and enlightening.

2. When I relayed Cloudera’s comments on Hadoop adoption, I left out a couple of categories. One Cloudera called “mobile”; when I probed, that was about HBase, with an example being messaging apps.

The other was “phone home” — i.e., the ingest of machine-generated data from a lot of different devices. This is something that’s obviously been coming for several years — but I’m increasingly getting the sense that it’s actually arrived.

Read more

June 16, 2012

Metamarkets’ back-end technology

This is part of a three-post series:

The canonical Metamarkets batch ingest pipeline is a bit complicated.

By “get data read to be put into Druid” I mean:

That metadata is what goes into the MySQL database, which also retains data about shards that have been invalidated. (That part is needed because of the MVCC.)

By “build the data segments” I mean:

When things are being done that way, Druid may be regarded as comprising three kinds of servers: Read more

June 16, 2012

Metamarkets Druid overview

This is part of a three-post series:

My clients at Metamarkets are planning to open source part of their technology, called Druid, which is described in the Druid section of Metamarkets’ blog. The timing of when this will happen is a bit unclear; I know the target date under NDA, but it’s not set in stone. But if you care, you can probably contact the company to get involved earlier than the official unveiling.

I imagine that open-source Druid will be pretty bare-bones in its early days. Code was first checked in early in 2011, and Druid seems to have averaged around 1 full-time developer since then. What’s more, it’s not obvious that all the features I’m citing here will be open-sourced; indeed, some of the ones I’m describing probably won’t be.

In essence, Druid is a distributed analytic DBMS. Druid’s design choices are best understood when you recall that it was invented to support Metamarkets’ large-scale, RAM-speed, internet marketing/personalization SaaS (Software as a Service) offering. In particular:

Interestingly, the single-table/multi-valued choice is echoed at WibiData, which deals with similar data sets. However, WibiData’s use cases are different from Metamarkets’, and in most respects the WibiData architecture is quite different from that of Metamarkets/Druid.

Read more

June 16, 2012

Introduction to Metamarkets and Druid

I previously dropped a few hints about my clients at Metamarkets, mentioning that they:

But while they’re a joy to talk with, writing about Metamarkets has been frustrating, with many hours and pages of wasted of effort. Even so, I’m trying again, in a three-post series:

Much like Workday, Inc., Metamarkets is a SaaS (Software as a Service) company, with numerous tiers of servers and an affinity for doing things in RAM. That’s where most of the similarities end, however, as  Metamarkets is a much smaller company than Workday, doing very different things.

Metamarkets’ business is SaaS (Software as a Service) business intelligence, on large data sets, with low latency in both senses (fresh data can be queried on, and the queries happen at RAM speed). As you might imagine, Metamarkets is used by digital marketers and other kinds of internet companies, whose data typically wants to be in the cloud anyway. Approximate metrics for Metamarkets (and it may well have exceeded these by now) include 10 customers, 100,000 queries/day, 80 billion 100-byte events/month (before summarization), 20 employees, 1 popular CEO, and a metric ton of venture capital.

To understand how Metamarkets’ technology works, it probably helps to start by realizing: Read more

February 11, 2012

Applications of an analytic kind

The most straightforward approach to the applications business is:

However, this strategy is not as successful in analytics as in the transactional world, for two main reasons:

I first realized all this about a decade ago, after Henry Morris coined the term analytic applications and business intelligence companies thought it was their future. In particular, when Dave Kellogg ran marketing for Business Objects, he rattled off an argument to the effect that Business Objects had generated more analytic app revenue over the lifetime of the company than Cognos had. I retorted, with only mild hyperbole, that the lifetime numbers he was citing amounted to “a bad week for SAP”. Somewhat hoist by his own petard, Dave quickly conceded that he agreed with my skepticism, and we changed the subject accordingly.

Reasons that analytic applications are commonly less complete than the transactional kind include: Read more

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.