Last Thursday, both Greenplum and Aster Data — the two most recent of my numerous data warehouse specialist customers — both told me of the same major innovation. Both were rushing to announce it first, before anybody else did. This led to considerable tap dancing, with the upshot being that both are releasing the information tonight or tomorrow morning.
What’s going on is that Aster Data and Greenplum have both integrated MapReduce into their respective MPP shared-nothing data warehouse DBMS. I’ll write about that at length very shortly, but for now let me throw up some sound bites ahead of the more detailed analysis:
- MPP shared-nothing database managers like Greenplum or Aster Data give great performance. But sometimes you need to do even better. That’s where MapReduce comes in.
- On its own, MapReduce can do a lot of important work in data manipulation and analysis. Integrating it with SQL should just increase its applicability and power.
- Google’s internal use of MapReduce is impressive. So is Hadoop’s success. Now commercial implementations of MapReduce are getting their shots too.
- At its core, most data analysis is really pretty simple – it boils down to arithmetic, Boolean logic, sorting, and not a lot else. MapReduce can handle a significant fraction of that.
- The hardest part of data analysis is often the recognition of entities or semantic equivalences. The rest is arithmetic, Boolean logic, sorting, and so forth. MapReduce is already proven in use cases encompassing all of those areas.
- MapReduce isn’t about data management, at least not primarily. It’s about parallelism.
- MapReduce offers dramatic performance gains in analytic application areas that still need great performance speed-up.
- MapReduce isn’t needed for tabular data management. That’s been efficiently parallelized in other ways. But if you want to build non-tabular structures such as text indexes or graphs, MapReduce turns out to be a big help.
- In principle, any alphanumeric data at all can be stuffed into tables. But in high-dimensional scenarios, those tables are super-sparse. That’s when MapReduce can offer big advantages by bypassing relational databases. Examples of such scenarios are found in CRM and relationship analytics.
Some of our recent links about MapReduce
- The integration of MapReduce with SQL data warehousing
- Three major applications of MapReduce
- Another application of MapReduce
- Sound bites about MapReduce
- Other links about MapReduce