Informatica, Splunk, and IBM are all public companies, and correspondingly reticent to talk about product futures. Hence, anything I might suggest about product futures from any of them won’t be terribly detailed, and even the vague generalities are “the Good Lord willin’ an’ the creek don’ rise”.
Never let a rising creek overflow your safe harbor.
1. Hadoop can be an awesome ETL (Extract/Transform/Load) execution engine; it can handle huge jobs and perform a great variety of transformations. (Indeed, MapReduce was invented to run giant ETL jobs.) Thus, if one offers a development-plus-execution stack for ETL processes, it might seem appealing to make Hadoop an ETL execution option. And so:
- I’ve already posted that BI-plus-light-ETL vendors Pentaho and Datameer are using Hadoop in that way.
- Informatica will be using Hadoop as an execution option too.
Informatica told me about other interesting Hadoop-related plans as well, but I’m not sure my frieNDA allows me to mention them at all.
IBM, however, is standing aside. Specifically, IBM told me that it doesn’t see the point of doing the same thing, as its ETL engine — presumably derived from the old Ascential product line — is already parallel and performant enough.
2. Last year, I suggested that Splunk and Hadoop are competitors in managing machine-generated data. That’s still true, but Splunk is also preparing a Hadoop co-opetition strategy. To a first approximation, it’s just Hadoop import/export. However, suppose you view Splunk as offering a three-layer stack:
Then potentially the data could flow
Native log –> Splunk (collection) –> Hadoop –> Splunk (visualization)
I think that’s cool.
The other Splunk/Hadoop future I know is to enhance the ability for Splunk to capture Hadoop operations data, in two ways:
- Provide some prewritten filters to extract data fields from Hadoop operations logs.
- Get at Hadoop operations data that isn’t found in logs, via operator utilities and the like.
3. I wrote about an important aspect of IBM’s “Big Insights” Hadoop story months ago, namely IBM’s general recommended data topology. Beyond that, IBM offers:
- Its own Hadoop distribution, for free, with a small amount of IBM intellectual property added.
- Proprietary closed-source software, that runs on top of either IBM’s or Cloudera’s Hadoop distributions.
Unfortunately, I didn’t understand what, if anything, is interesting about IBM’s proprietary Hadoop capabilities at this time. There seem to be some Hadoop performance tweaks, and something that sounded like Datameer 1.0 (“Big Sheets”), and surely some management tools as well. But I didn’t grasp any reason to favor Big Insights over, for example, the combination of Datameer and Cloudera Enterprise.
One last note: I was surprised to learn that IBM’s Platform Computing acquisition is not involved in Big Insights. Perhaps that integration will come later on.