On behalf of On-Demand Enterprise, nee’ Grid Today, Dennis Barker asked me to clarify the most important benefits, features, etc. to various constituencies (business users, programmers, DBAs, etc.) of the Greenplum and Aster Data MapReduce announcements. Questions like that are hard to answer simply. Here’s why.
The core benefit of MapReduce is price/performance (because it allows the cost benefits of parallelization to be applied to analyses that are hard to parallelize otherwise). Large price/performance gains commonly mix together three kinds of benefits.
1. They let you do what you did before, for less money.
2. They let you do a better version of what you did before, for similar money.
3. They let you do new things that didn’t make economic sense before, but now do.
In the case of data warehousing, the boundary between the second and third categories is pretty blurry, and that blurry boundary is where most of the action occurs. Data warehousing technology improvements are all about lowering TCO for a given amount of data or workload of queries. The result is usually that an enterprise or business unit runs queries or other analyses against data they otherwise wouldn’t have. The single best example of this may be the telecom industry; telecom companies are buying huge amounts of low-cost data warehouse technology so as to, for the first time, analyze the full detail of multiple years of call detail records (CDRs).
So with that disclaimer, here’s my best shot at an answer:
- Programmers benefit from MapReduce because it makes parallelization much easier to program.
- Business users benefit from MapReduce because they get answers they otherwise might not.
I’m still ducking detailed questions about the integration of SQL and MapReduce. Stay tuned.