June 10, 2015

Hadoop generalities

Occasionally I talk with an astute reporter — there are still a few left 🙂 — and get led toward angles I hadn’t considered before, or at least hadn’t written up. A blog post may then ensue. This is one such post.

There is a group of questions going around that includes:

To a first approximation, my responses are: 

Comments

6 Responses to “Hadoop generalities”

  1. Ranko Mosic on June 10th, 2015 10:59 am

    Hi Curt,
    Great questions, good answers.
    I’d appreciate more clarification on good use cases beyond data dump ( lake ), which I find pretty weak, but quite on par with current industry DW practices.
    Comparison with RDBMS adoption pattern is also useful. I have a problem with Gartner hype cycle – surely not all tech doesn’t follow same path ( even at different pace ).
    Are there any sources in your Software Memories blog or elsewere that describe how RDBMS early adoption occurred ( before 90’s ) ?

    Thanks, Ranko.

  2. David Gruzman on June 10th, 2015 2:17 pm

    I would compare Hadoop with new operating system. It has its VFS (named DFS) and several implementations of it. It has YARN which defines what is Hadoop application and manage their resource allocation. There are also several virtual machines (like JVM or CLR) – MapReduce, Spark, Tez, and there are some “native” applications like HBase.
    This operation system indeed oriented for data processing and RDBMS are very popular.

  3. Curt Monash on June 12th, 2015 3:41 am

    Ranko,

    I’ve probably stressed company and technology history more than adoption history, come to think of it. But I’ll say this — except for DB2, RDBMS adoption closely tracked the adoption of alternatives to IBM mainframes. Those alternatives were initially a variety of minicomputers with proprietary OS, mainly DEC VAX/VMS, and then later UNIX-based systems, plus a couple of data warehouse appliances (mainly Teradata).

    The rise of relational data warehousing and of modern BI was in the 1990s. Indeed, early in the 1990s Ted Codd placed his bets on MOLAP rather relational DW; he was then quickly proved to be more wrong than right.

  4. Ranko Mosic on June 12th, 2015 7:37 am

    Thanks Curt.
    Hadoop adoption will of course be different than RDBMS’s ( history doesn’t repeat, but it rhymes ).
    It then looks like RDBMS adoption had very long gestation time – from 1970 ( Codd paper ) to 90s take off.
    Hadoop is already 11 years old ( if we pick Google MR paper publication as start date ). Internet and other factors like Cloud might speed adoption up ( compared to pre-Internet time ). Finding convincing “standard corporation” use cases and other complexities might slow it down.
    It was easier with OLTP – relatively clearly defined requirements and use cases.
    But, as you and Merv said: it looks like Hadoop is now mainstream. And it looks like it will climb up from the current perceived lull.

  5. Michael McIntire on June 15th, 2015 11:27 am

    Curt, while I think it’s a good short list of fears in the market about the Apache Ecosystem, I don’t think it portends what is about to occur in the market. I think we would all be better served to observe the interactions of such fundamental change against an industry “adoption curve”. Mebbe not as broad as client/server to internet, but certainly in the analytics space it is… If I look at the market in that framework, what I see are that the “early adopters” already have – tech companies for example, and there are a lot of failures there from a ROI perspective.

    But as we head up the curve of adoption – those companies not yet in the Apache Ecosystem are going to write checks to exploit the time to market and other long term ROI benefits of Open Source. They are also going to demand enterprise fit and accountability… And just like every other technology, they will evaluate the overall costs – such as training and maintenance – which early adopters do not.

    My fear is that we’re in the transition phase, and all that extra engineering required to make something work outside of the early adopters – where 80% of the cost of engineering happens to be – is going to slow adoption.

  6. AlanL on July 17th, 2015 11:18 am

    I work as a business intelligence analyst at a traditional enterprise that is currently investing heavily in Hadoop infrastructure. So far I’m seeing two main patterns:

    (1) aggregation / summarisation of event data en route to the relational DWH, where there is already a clearly defined use case for the data but the volumes are too high for the DWH to handle directly: effectively an ETL pre-processor for the classic data warehouse

    (2) exploratory analysis of high volume data sources that *might* turn out to contain high value information, but where either the volumes are too high, or the potential usefulness still too vague, for a relational DWH load process to make sense.

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.