June 16, 2012

Metamarkets’ back-end technology

This is part of a three-post series:

The canonical Metamarkets batch ingest pipeline is a bit complicated.

By “get data read to be put into Druid” I mean:

That metadata is what goes into the MySQL database, which also retains data about shards that have been invalidated. (That part is needed because of the MVCC.)

By “build the data segments” I mean:

When things are being done that way, Druid may be regarded as comprising three kinds of servers:

This is in addition to the aforementioned Zookeeper and MySQL.

It occurs to me that I don’t know whether that local cache is the only RAM tier, which is a pretty major point. Oh well …

The alternative is that data just streams into Druid. In that case:

Comments

6 Responses to “Metamarkets’ back-end technology”

  1. Introduction to Metamarkets and Druid | DBMS 2 : DataBase Management System Services on June 16th, 2012 5:56 pm

    [...] Metamarkets’ back-end technology [...]

  2. Metamarkets Druid overview | DBMS 2 : DataBase Management System Services on June 16th, 2012 5:56 pm

    [...] more on Druid, please see my post on Metamarkets’ back-end technology. Categories: Clustering, Columnar database management, Data models and architecture, Data [...]

  3. Eric Tschetter on June 19th, 2012 3:51 pm

    “It occurs to me that I don’t know whether that local cache is the only RAM tier, which is a pretty major point. Oh well …”

    This is probably the only RAM tier, but I must admit that I don’t quite understand what is meant by “RAM tier.”

    If “RAM tier” means that it is an application tier that cannot do its own processing, but just stupidly stores stuff in RAM, which it can lose on process death. Then yes, this is the only RAM tier. The main compute nodes will retain their data between process death and restart.

    If “RAM tier” is referring to whether there is also a cache on the compute nodes. Then, this is the only RAM tier, the only data accessed on the compute nodes is the base table data. I.e. there is no “cache” local to a specific compute node, instead there is all of the data loaded up in main memory.

    If “RAM tier” means something else, then it might not be the only RAM tier :).

  4. Curt Monash on June 19th, 2012 4:37 pm

    Eric,

    The question is:

    Since you guys make such a large point of saying you do your processing based on in-memory data — on which tier is that data located in-memory?

  5. Eric Tschetter on June 20th, 2012 3:33 pm

    Ah, that would be the compute nodes. The local cache in the query broker is just a lazy cache of completed query results, no real processing of data is done there.

  6. Curt Monash on June 20th, 2012 7:19 pm

    Eric,

    So those compute nodes are separate from the various data management and ingest tiers I wrote about?

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.