July 27, 2011

Introduction to Zettaset

Zettaset is confusing, but as best I understand:

Zettaset’s basic product pitch is that it gives you a management console that not only observes Hadoop services, but actually directs them. So your administrators are saved from having to know Hadoop; they only have to know Zettaset instead. What’s more, Zettaset solves some of Hadoop’s issues for you, such as NameNode single point of failure. Automated backup got mentioned in my discussions with Zettaset too, as did taking specific nodes in and out of service.

Also, the Zettaset folks largely come from a security background. Not coincidentally, encryption, retention assurance, and so on are in near-term (by year-end or so) product plans, with a compliance orientation. I like that idea, because of how well it fits with a subset of the big bit bucket use case.

While I was trying to sort out various uncertainties, Zettaset CEO Brian Christina sent over the comment:

Our ‘Hadoop’ story relates to traditional business  requirements – monitoring, alerting, back-up and recovery, security and continuous integration. The Hadoop stack and subsequent packages have over 30 processes which are currently stand alone products without automated safeguards. Zettaset productizes Apache Hadoop by automating those safeguards so IT can securely leverage Hadoop without a Professional Service commitment, or building/maintaining those 30 processes with in-house Hadoop expertise.

Presumably the “currently” in that refers to vanilla Apache Hadoop, not to Cloudera Enterprise.

The coolest part of the Zettaset story is probably what Zettaset does with HBase to get around the HDFS small files/file number limit problem:


4 Responses to “Introduction to Zettaset”

  1. Vlad Rodionov on July 31st, 2011 1:18 pm

    Keeping NN meta data in a distributed K/V store will result in a significantly degraded overall performance. Fetching data from RAM and from remote hosts disk? How can you compare that?


  2. Curt Monash on July 31st, 2011 2:25 pm

    But will that cost be large when compared with the cost of actually retrieving the file, which is kind of the point of a NameNode lookup?

  3. Eric on August 22nd, 2011 12:13 am

    From the google GFSII reference, the NameNode metadata stored at Bigtable which keep at memory rather than disk. So I think NN meta data in distributed K/V store may resolve this problem if using properly.

  4. Thought this was cool: Largest MongoDB Databases and Big Data | Lisheng Yu on August 26th, 2011 7:50 pm

    […] a follow up to the MongoDB positioning itself as Big Data and development agile environment, I’ve found this bit of data on Curt Monash’s […]

Leave a Reply

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.