June 16, 2012

Introduction to Metamarkets and Druid

I previously dropped a few hints about my clients at Metamarkets, mentioning that they:

Have built vertical-market analytic platform technology.
Use a lot of Hadoop.
Throw good parties. (That’s where the background photo on my Twitter page comes from.)

But while they’re a joy to talk with, writing about Metamarkets has been frustrating, with many hours and pages of wasted of effort. Even so, I’m trying again, in a three-post series:

Introduction to Metamarkets and Druid (this post)
Druid overview
Metamarkets’ back-end technology

Much like Workday, Inc., Metamarkets is a SaaS (Software as a Service) company, with numerous tiers of servers and an affinity for doing things in RAM. That’s where most of the similarities end, however, as Metamarkets is a much smaller company than Workday, doing very different things.

Metamarkets’ business is SaaS (Software as a Service) business intelligence, on large data sets, with low latency in both senses (fresh data can be queried on, and the queries happen at RAM speed). As you might imagine, Metamarkets is used by digital marketers and other kinds of internet companies, whose data typically wants to be in the cloud anyway. Approximate metrics for Metamarkets (and it may well have exceeded these by now) include 10 customers, 100,000 queries/day, 80 billion 100-byte events/month (before summarization), 20 employees, 1 popular CEO, and a metric ton of venture capital.

To understand how Metamarkets’ technology works, it probably helps to start by realizing:

Metamarkets has one technology stack for receiving and managing data when it is ingested in batch mode.
Metamarkets has a different, overlapping technology stack for receiving and managing data when it is ingested in streaming mode.
Metamarkets is open-sourcing part of the two stacks, called Druid.
In the Venn diagram for these three things, the intersection of no two of them is strictly contained in the third.

and further:

Metamarkets doesn’t surface all the raw data for analysis or viewing. Rather, there’s some early aggregation, with the raw data preserved off to the side in case you want to create more aggregates later on.
Metamarkets’ application is a dashboard, supporting drilldown but not, at this time, other forms of analytics. A lot of what is measured are time series and/or top lists.
Druid is in essence an analytic DBMS; indeed, it’s so strictly analytic that it isn’t suited to manage its own metadata. MySQL is used for that.
Apache Zookeeper is also assumed as part of the environment to manage Druid.
The batch pipeline relies on Hadoop.
The streaming pipeline relies on Kafka (a publish-subscribe project out of LinkedIn).

The whole thing is fully multi-tenant, at least by the point that data is being stored and visualized. Metamarkets customers either live in the Amazon cloud (the smaller ones), or else used to live there and don’t mind shipping their data back there for analysis by Metamarkets. Some “not exactly Ted Codd’s tabular DBMS” features are:

Multi-valued fields (just vectors, not unlimited arrays).
A couple of fast approximate algorithms (uniques, top lists).

One thing MetaMarkets does that’s pretty much a best practice these days is roll out new code, mid-day if they like, without ever taking their system down. Why is this possible? Because the data is replicated across nodes, so you can do a rolling deployment of a node at a time without making any data unavailable. Notes on that include:

Performance could be affected, as the read load is generally balanced across all the data replicas.
Data locking is not an issue — Metamarkets doesn’t have any read locks, as Druid is an MVCC (Multi-Version Concurrency Control) system.

Categories: Business intelligence, Cloud computing, Data mart outsourcing, Data models and architecture, Data warehousing, Kafka and Confluent, Log analysis, Market share and customer counts, Metamarkets and Druid, Open source, Software as a Service (SaaS), Web analytics

Subscribe to our complete feed!

Comments

3 Responses to “Introduction to Metamarkets and Druid”

Metamarkets Druid overview | DBMS 2 : DataBase Management System Services on June 16th, 2012 5:53 pm

[…] Introduction to Metamarkets and Druid […]
Metamarkets’ back-end technology | DBMS 2 : DataBase Management System Services on June 16th, 2012 5:55 pm

[…] Introduction to Metamarkets and Druid […]
What matters in investigative analytics? | DBMS 2 : DataBase Management System Services on October 7th, 2013 1:24 am

[…] in the context of analytic DBMS, but it also arises in analytic stacks such as Platfora, Metamarkets or even QlikView, and also in the challenges of making predictive modeling […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Introduction to Metamarkets and Druid

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin