June 15, 2011

Notes and links, June 15, 2011

Categories: 1010data, Analytic technologies, Aster Data, Business intelligence, Columnar database management, Data models and architecture, Data warehousing, Exadata, Oracle, Pricing, Specific users, Teradata, Theory and architecture

5 Comments

June 15, 2011

Metaphors amok

It all started when I disputed James Kobielus’ blogged claim that Hadoop is the nucleus of the next-generation cloud EDW. Jim posted again to reiterate the claim, only this time he wrote that all EDW vendors [will soon] bring Hadoop into their heart of their architectures. (All emphasis mine.)

That did it. I tweeted, in succession:

Actually, I vote for Hadoop as the lungs of the EDW — first place of entry for essential nutrients.
Data integration can be the heart of the EDW, pumping stuff around. RDBMS/analytic platform can be the brain.
iPad-based dashboards that may engender envy, but which actually are only used occasionally and briefly … well, you get the picture.*

*Woody Allen said in Sleeper that the brain was his second-favorite organ.

Of course, that body of work was quickly challenged. Responses included: Read more

Categories: Analytic technologies, Business intelligence, Data warehousing, EAI, EII, ETL, ELT, ETLT, Fun stuff, Hadoop, Humor, MapReduce

Groupon-related thoughts on the future of advertising and e-commerce

There’s been a lot of debate about Groupon around its initial public offering, and I find the Groupon bears to be more persuasive than the Groupon bulls. That said, there’s a Groupon-optimism argument I want to share at length, by Steve Cheney, because it outlines some possibilities for the continued evolution of analytics. Read more

Categories: Analytic technologies, Specific users

1 Comment

June 14, 2011

Infobright 4.0

Infobright is announcing its 4.0 release, with imminent availability. In marketing and product alike, Infobright is betting the farm on machine-generated data. This hasn’t been Infobright’s strategy from the getgo, but it is these days, with pretty good focus and commitment. While some fraction of Infobright’s customer base is in the Sybase-IQ-like data mart market — and indeed Infobright put out a customer-win press release in that market a few days ago — Infobright’s current customer targets seem to be mainly:

Web companies, many of which are already MySQL users.
Telecommunication and similar log data, especially in OEM relationships.
Trading/financial services, especially at mid-tier companies.

Key aspects of Infobright 4.0 include: Read more

Categories: Data warehousing, Database compression, Infobright, Investment research and trading, Log analysis, Open source, Telecommunications, Web analytics

8 Comments

June 10, 2011

Patent nonsense: Parallel Iron/HDFS edition

Alan Scott commented with concern about Parallel Iron’s patent lawsuit attacking HDFS (Hadoop Distributed File System), filed in — where else? — Eastern Texas. The patent in question — US 7,415,565 — seems to in essence cover any shared-nothing block storage that exploits a “configurable switch fabric”; indeed, it’s more oriented to OLTP (OnLine Transaction Processing) than to analytics. For example, the Background section starts: Read more

Categories: EMC, Hadoop, MapReduce, Parallelization, Storage

9 Comments

June 5, 2011

Hadoop confusion from Forrester Research

Jim Kobielus started a recent post

Most Hadoop-related inquiries from Forrester customers come to me. These have moved well beyond the “what exactly is Hadoop?” phase to the stage where the dominant query is “which vendors offer robust Hadoop solutions?”

What I tell Forrester customers is that, yes, Hadoop is real, but that it’s still quite immature.

So far, so good. But I disagree with almost everything Jim wrote after that.

Jim’s thesis seems to be that Hadoop will only be mature when a significant fraction of analytic DBMS vendors have own-branded versions of Hadoop alongside their DBMS, possibly via acquisition. Based on this, he calls for a formal, presumably vendor-driven Hadoop standardization effort, evidently for the whole Hadoop stack. He also says that

Hadoop is the nucleus of the next-generation cloud EDW, but that promise is still 3-5 years from fruition

where by “cloud” I presume Jim means first and foremost “private cloud.”

I don’t think any of that matches Hadoop’s actual strengths and weaknesses, whether now or in the 3-7 year future. My reasoning starts:

Hadoop is well on its way to being a surviving data-storage-plus-processing system — like an analytic DBMS or DBMS-imitating data integration tool …
… but Hadoop is best-suited for somewhat different use cases than those technologies are, and the gap won’t close as long as the others remain a moving target.
I don’t think MapReduce is going to fail altogether; it’s too well-suited for too many use cases.
Hadoop (as opposed to general MapReduce) has too much momentum to fizzle, perhaps unless it is supplanted by one or more embrace-and-extend MapReduce-plus systems that do a lot more than it does.
The way for Hadoop to avoid being a MapReduce afterthought is to evolve sufficiently quickly itself; ponderous standardization efforts are quite beside the point.

As for the rest of Jim’s claim — I see three main candidates for the “nucleus of the next-generation enterprise data warehouse,” each with better claims than Hadoop:

Relational DBMS, much like today. (E.g., Teradata, DB2, Exadata or their successors.) This is the case in which robustness of the central data store matters most.
Grand cosmic data integration tools. (The descendants of Informatica PowerCenter, et al.) This is the case in which the logic of data relationships can safely be separated from physical storage.
Nothing. (The architecture could have several strong members, none of which is truly the “nucleus.”) This is the case in which new ways keep being invented to extract high value from data, outrunning what grandly centralized solutions can adapt to. I think this is the most likely case of all.

Categories: Data integration and middleware, EAI, EII, ETL, ELT, ETLT, Hadoop, MapReduce, Theory and architecture

9 Comments

June 4, 2011

Dirty data, stored dirt cheap

A major driver of Hadoop adoption is the “big bit bucket” use case. Users take a whole lot of data, often machine-generated data in logs of different kinds, and dump it into one place, managed by Hadoop, at open-source pricing. Hadoop hardware doesn’t need to be that costly either. And once you get that data into Hadoop, there are a whole lot of things you can do with it.

Of course, there are various outfits who’d like to sell you not-so-cheap bit buckets. Contending technologies include Hadoop appliances (which I don’t believe in), Splunk (which in many use cases I do), and MarkLogic (ditto, but often the cases are different from Splunk’s). Cloudera and IBM, among other vendors, would also like to sell you some proprietary software to go with your standard Apache Hadoop code.

So the question arises — why would you want to spend serious money to look after your low-value data? The answer, of course, is that maybe your log data isn’t so low-value. Read more

Categories: Hadoop, Investment research and trading, Log analysis, Splunk

9 Comments

June 4, 2011

Hardware for Hadoop

After suggesting that there’s little point to Hadoop appliances, it occurred to me to look into what kinds of hardware actually are used with Hadoop. So far as I can tell:

Hadoop nodes today tend to run on fairly standard boxes.
Hadoop nodes in the past have tended to run on boxes that were light with respect to RAM.
The number of spindles per core on Hadoop node boxes is going up even as disks get bigger.

Categories: Cloudera, Hadoop, Pricing, Storage, Yahoo

11 Comments

June 2, 2011

Why you would want an appliance — and when you wouldn’t

Data warehouse appliances are booming. But Hadoop appliances are a non-starter.

Data warehouse and other data management appliances are on the upswing. Oracle is pushing Exadata. Teradata* is going strong, and also recently bought Aster Data. IBM bought Netezza. Greenplum and Vertica were bought by EMC and HP respectively. All those moves are favorable for appliances.

*As far as I’m concerned, all Teradata hardware-included systems are appliances.

In essence, there are two kinds of reasons to prefer appliances over software-only offerings: Read more

Categories: Data warehouse appliances, Hadoop, Open source

10 Comments

June 1, 2011

The essence of an application

Once upon a time, information technology was strictly about — well, information. And by “information” what was meant was “data”.* An application boiled down to a database design, plus a straightforward user interface, in whatever the best UI technology of the day happened to be. Things rarely worked quite as smoothly as the design-database/press-button/generate-UI propaganda would have one believe, but database design was clearly at the center of application invention.

*Not coincidentally, two of the oldest names for “IT” were data processing and management information systems.

Eventually, there came to be three views of the essence of IT:

Data — i.e., the traditional view, still exemplified by IBM and Oracle.
People empowerment — i.e., Microsoft-style emphasis on UI friendliness and efficiency.
Operational workflow — i.e., SAP-style emphasis on actual business processes.

Graphical user interfaces were a major enabling technology for that evolution. Equally important, relational databases made some difficult problems easy(ier), freeing application designers to pursue more advanced functionality.

Based on further technical evolution, specifically in analytic and consumer technologies, I think we should now take that list up to five. The new members I propose are:

Investigative analytics.
Emotional response.

1 Comment

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Notes and links, June 15, 2011

Metaphors amok

Groupon-related thoughts on the future of advertising and e-commerce

Infobright 4.0

Patent nonsense: Parallel Iron/HDFS edition

Hadoop confusion from Forrester Research

Dirty data, stored dirt cheap

Hardware for Hadoop

Why you would want an appliance — and when you wouldn’t

The essence of an application

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin