June 25, 2012

Why I’m so forward-leaning about Hadoop features

In my recent series of Hadoop posts, there were several cases where I had to choose between recommending that enterprises:

I favored the more advanced features each time. Here’s why.

To a first approximation, I divide Hadoop use cases into two major buckets, only one of which I was addressing with my comments:

1. Analytic data management.* Here I favored features over reliability because they are more important, for Hadoop as for analytic RDBMS before it. When somebody complains about an analytic data store not being ready for prime time, never really working, or causing them to tear their hair out, what they usually mean is that:

Those complaints are much, much, more frequent than “It crashed”. So it was for Netezza, DATAllegro, Greenplum, Aster Data, Vertica, Infobright, et al. So it also is for Hadoop. And how does one address those complaints? By performance and feature enhancements, of the kind that the Hadoop community is introducing at high speed.

*When I refer to Hadoop being used for analytic data management, I mean that a bunch of data gets dumped into it, which may be either analyzed in situ or else massaged and summarized to be forwarded to an analytic RDBMS.

2. HBase-led. For a short-request DBMS, I indeed take the stance “First, let’s not lose the data.” But I doubt many enterprises are using HBase in production right now unless they’re watching the community development process very closely. I.e., they’re making their own decisions, and they aren’t really who I had in mind when I was offering advice.

If I’m wrong in all this, it would be because I’m lumping too many things together in “Hadoop-based analytic data management”, and some of them do indeed require a high degree of reliability. Indeed, that’s exactly the argument Hortonworks made in some of its pushback. Namely, they think enterprises are already adopting Hadoop as part of repeatable, production ETL (Extract/Transform/Load) processes, and those processes require rather stable software. They may not be claiming that their version of Hadoop is as stable as Informatica or Teradata, but that’s the kind of environment they want to be playing in.

But you know what? In support of that kind of capability, Hortonworks wants enterprises to adopt the new and unproven HCatalog. :) I suspect they’re right to do so. And so we have another illustration of my thesis:

We’re still at the point in Hadoop use where “unquestionably stable” is a nice-to-have, not a must-have. The features themselves are still more crucial.

Comments

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.