Comments on: The Great MapReduce Debate

By: NoSQL Daily – Sat Nov 13 › PHP App Engine

NoSQL Daily – Sat Nov 13 › PHP App Engine — Sat, 13 Nov 2010 01:16:13 +0000

[…] The Great MapReduce Debate | DBMS2 : DataBase Management System Services […]

By: CAP equivalent for analytics? « Big Data Craft

CAP equivalent for analytics? « Big Data Craft — Sun, 10 Oct 2010 16:56:26 +0000

[…] it will make suboptimal (considering my model context not in wider sense!) great SV system. The great MapReduce debate is not for […]

By: Search Facets » MapReduce just semi-good for semi-structured data

Search Facets » MapReduce just semi-good for semi-structured data — Mon, 18 Jan 2010 21:27:47 +0000

[…] like Aster and Greenplum, followed fairly quickly by others such as Netezza and (somewhat surprisingly) […]

By: Bazy danych bez SQL « data mining à la polonaise

Bazy danych bez SQL « data mining à la polonaise — Wed, 25 Nov 2009 01:00:32 +0000

[…] danych, warto słuchać, co ma do powiedzenia Michael Stonebraker. Szczególnie, gdy bierze się za obronę systemów zarządzania bazami danych przed różnymi zakusami, np. przed powrotem do pre-relacyjnej […]

By: Rod Oldehoeft

Rod Oldehoeft — Wed, 27 Aug 2008 00:46:33 +0000

Check out MRNet:
http://www.paradyn.org/mrnet/
It is becoming the de facto utility for implementing scalable Multicasts and Reductions on high-performance technical computing systems, especially for performance analysis tools. Unlike Hadoop MapReduce, which is file-based, MRNet uses one or more tree-based overlay networks (TBON) over the physical network topology of current (IBM BG/*, Cray XT*) high-end systems (and clusters, too, of course). Each TBON uses the same filter function when used for reduction, so for different filtering functions you instantiate a TBON for each. Unlike MapReduce, MRNet also implements multicast communication over the same TBONs.

It’s not a database system, either.

By: Greg Holmberg

Greg Holmberg — Wed, 09 Jul 2008 23:39:49 +0000

What is MapReduce good for? Take a look at the Mahout project at Apache: http://lucene.apache.org/mahout

Mahout implements a variety of machine learning algorithms, many of which are useful in text mining. Mahout builds on Apache Hadoop, which is an implementation of MapReduce. Both are sub-projects of the Lucene search engine. If all this Lucene code comes together at some point in the future, then Lucene will be much more than a search engine.

I’m sure Google is working on using MapReduce for some of the same algorithms–i.e. that text mining is in Google’s future.

By: Google has thousands of internal data formats, mostly simple ones | DBMS2 -- DataBase Management System Services

Tue, 08 Jul 2008 18:27:09 +0000

[…] to think of it, that sounds very consistent with the idea that MapReduce solves a large fraction of Google’s data management issues. Share: These icons link to […]

By: Curt Monash

Curt Monash — Tue, 05 Feb 2008 17:19:56 +0000

Which nobody thought to tell me about. ::sigh::

Actually, I’m glad they didn’t. I’ve been coughing a lot, and wound up sleeping 14 hours yesterday to good effect. So missing the conference was probably a good thing.

CAM

By: Daniel Weinreb

Daniel Weinreb — Tue, 05 Feb 2008 12:14:46 +0000

Yesterday, at the New England Database Day conference, Prof. DeWitt gave an invited talk. It became far clearer to me in what manner he was comparing Map/Reduce with parallel database systems.

The blog entry that we’ve all been reading is confusing. Like everyone else, my reaction was “Well, Map/Reduce never said that it was a database system, so why are you criticizing it as if it were?”

What he’s mainly comparing is the overall job scheduling strategy of Map/Reduce versus the kind of job scheduling used by parallel RDBMS’s. The Map/Reduce pattern is obviously a useful tool for certain problems. His point is that a more general parallel database system can choose among many patterns, of which Map/Reduce is only one example, and therefore can be a good solution for a wider range of problems. Furthermore, you can issue a declarative query (i.e. in a query language such as SQL or relational algebra), and an automatic optimizer can choose which of those patterns to use for your particular problem, provided that you have an actual DBMS with a schema and so forth.

In this context, what the blog entry says makes a lot more sense.

I hope Prof. DeWitt writes up his talk as a paper, which would make this all a lot more clear.

By: Text Technologies»Blog Archive » 19 Microsoft/Yahoo synergies that could revolutionize the Internet

Sun, 03 Feb 2008 22:04:49 +0000

[…] Ditto. (Recent discussion of Google MapReduce quantifies this processing effort a […]