July 6, 2011

Hadapt update

I met with the Hadapt guys today. I think I can be a bit crisper than before in positioning Hadapt and its use cases, namely:

Hadapt is additional software on a cluster that also runs fully functional Hadoop/HDFS. (Cloudera Hadoop more than straight-from-Apache Hadoop to date, but that’s not a requirement.)
The cluster also runs a DBMS on every node, such as PostgreSQL or one of Infobright/Vectorwise.
Hadapt’s software manages parallel SQL queries by distributing them to the DBMS living on each node. Hadapt says that the resulting query performance far outshines Hive’s.
Hadapt further says that, by exploiting the partner DBMS, its SQL functionality outpaces Hive’s as well.
Target Hadapt use cases are centered around keeping machine-generated or other poly-structured data in Hadoop, and extracting, enhancing, or otherwise deriving some of it to live in the relational store.
In particular, Hadapt seems like an interesting choice when you want to use that relational data as you work on other data that’s still in HDFS, or if you want to keep using the relational data in other kinds of MapReduce jobs.
That all fits well with my thoughts about the importance of derived data.

Other evolution from what I wrote about Hadapt a few months ago includes:

Hadapt is in beta now.
Hadapt has added adult supervision in the form of Philip Wickline, late of Endeca.

In other news, Hadapt is our newest client.

Categories: Analytic technologies, Cloudera, Data models and architecture, Data warehousing, Hadapt, Hadoop, Infobright, MapReduce, Open source, PostgreSQL, SQL/Hadoop integration, VectorWise

Petabyte-scale Hadoop clusters (dozens of them)

I recently learned that there are 7 Vertica clusters with a petabyte (or more) each of user data. So I asked around about other petabyte-scale clusters. It turns out that there are several dozen such clusters (at least) running Hadoop.

Cloudera can identify 22 CDH (Cloudera Distribution [of] Hadoop) clusters holding one petabyte or more of user data each, at 16 different organizations. This does not count Facebook or Yahoo, who are huge Hadoop users but not, I gather, running CDH. Meanwhile, Eric Baldeschwieler of Hortonworks tells me that Yahoo’s latest stated figures are:

42,000 Hadoop nodes …
… holding 180-200 petabytes of data.

Categories: Cloudera, Facebook, Hadoop, Investment research and trading, Log analysis, MapReduce, Market share and customer counts, Petabyte-scale data management, Scientific research, Web analytics, Yahoo

13 Comments

July 6, 2011

Hadoop hardware and compression

A month ago, I posted about typical Hadoop hardware. After talking today with Eric Baldeschwieler of Hortonworks, I have an update. I also learned some things from Eric and from Brian Christian of Zettaset about Hadoop compression.

First the compression part. Eric thinks 6-10X compression is common for “curated” Hadoop data — i.e., the data that actually gets used a lot. Brian used an overall figure of 6-8X, and told of a specific customer who had 6X or a little more. By way of comparison, it sounds as if the kinds of data involved are like what Vertica claimed 10-60X compression for almost three years ago.

Eric also made an excellent point about low-value machine-generated data. I was suggesting that as Moore’s Law made sensor networks ever more affordable: Read more

Categories: Cloudera, Database compression, Hadoop, Hortonworks, Storage, Vertica Systems, Zettaset

10 Comments

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Hadapt update

Petabyte-scale Hadoop clusters (dozens of them)

Hadoop hardware and compression

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin