<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Three big myths about MapReduce</title>
	<atom:link href="http://www.dbms2.com/2009/10/18/three-big-myths-about-mapreduce/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com/2009/10/18/three-big-myths-about-mapreduce/</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Mon, 25 Jan 2010 14:39:21 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Clearing up MapReduce confusion, yet again &#124; DBMS2 -- DataBase Management System Services</title>
		<link>http://www.dbms2.com/2009/10/18/three-big-myths-about-mapreduce/comment-page-1/#comment-154190</link>
		<dc:creator>Clearing up MapReduce confusion, yet again &#124; DBMS2 -- DataBase Management System Services</dc:creator>
		<pubDate>Wed, 30 Dec 2009 10:51:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=1135#comment-154190</guid>
		<description>[...] frustrated by a constant need &#8212; or at least urge  &#8212; to correct myths and errors about MapReduce. Let&#8217;s try one more [...]</description>
		<content:encoded><![CDATA[<p>[...] frustrated by a constant need &#8212; or at least urge  &#8212; to correct myths and errors about MapReduce. Let&#8217;s try one more [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Cubegeek</title>
		<link>http://www.dbms2.com/2009/10/18/three-big-myths-about-mapreduce/comment-page-1/#comment-151145</link>
		<dc:creator>Cubegeek</dc:creator>
		<pubDate>Mon, 30 Nov 2009 14:12:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=1135#comment-151145</guid>
		<description>This conversation makes me wonder if anyone has plans to extend MDX to include MR functions or context. After all, this was the language designed to handle multidimensional data as a standard.</description>
		<content:encoded><![CDATA[<p>This conversation makes me wonder if anyone has plans to extend MDX to include MR functions or context. After all, this was the language designed to handle multidimensional data as a standard.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Analytics Team &#187; Blog Archive &#187; Myths about MapReduce</title>
		<link>http://www.dbms2.com/2009/10/18/three-big-myths-about-mapreduce/comment-page-1/#comment-148349</link>
		<dc:creator>Analytics Team &#187; Blog Archive &#187; Myths about MapReduce</dc:creator>
		<pubDate>Sat, 07 Nov 2009 22:04:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=1135#comment-148349</guid>
		<description>[...] DBMS2 takes a look at these three myths about mapreduce&#8230; * MapReduce is something very new * MapReduce involves strict adherence to the Map-Reduce programming paradigm * MapReduce is a single technology [...]</description>
		<content:encoded><![CDATA[<p>[...] DBMS2 takes a look at these three myths about mapreduce&#8230; * MapReduce is something very new * MapReduce involves strict adherence to the Map-Reduce programming paradigm * MapReduce is a single technology [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: &#8230;und das Leben nach SQL geht weiter&#8230;jetzt wird reduziert! &#124; PHP hates me - Der PHP Blog</title>
		<link>http://www.dbms2.com/2009/10/18/three-big-myths-about-mapreduce/comment-page-1/#comment-144960</link>
		<dc:creator>&#8230;und das Leben nach SQL geht weiter&#8230;jetzt wird reduziert! &#124; PHP hates me - Der PHP Blog</dc:creator>
		<pubDate>Thu, 22 Oct 2009 06:15:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=1135#comment-144960</guid>
		<description>[...] Kritische Stimmen zu MapReduce: Three big myths about MapReduce, DBMS2, October 18, 2009 [...]</description>
		<content:encoded><![CDATA[<p>[...] Kritische Stimmen zu MapReduce: Three big myths about MapReduce, DBMS2, October 18, 2009 [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Amrith Kumar</title>
		<link>http://www.dbms2.com/2009/10/18/three-big-myths-about-mapreduce/comment-page-1/#comment-144719</link>
		<dc:creator>Amrith Kumar</dc:creator>
		<pubDate>Tue, 20 Oct 2009 22:17:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=1135#comment-144719</guid>
		<description>Steve Wooledge,

I am flattered that you confused me for David DeWitt and Stonebraker :) 

They are the ones who are quoted as saying MapReduce wasn&#039;t something new. MapReduce is a creation of Ghemawat and Dean.

All I&#039;m saying is that recent claim by many that they’ve been “doing MapReduce all along” are not entirely true (and not entirely false either).

I&#039;m not equating MR with the MPP redistribution framework, hence my comment that reads &quot;... the simple answer is this: they have been doing something VERY MUCH LIKE MapReduce all along&quot;.

Thanks,

-amrith</description>
		<content:encoded><![CDATA[<p>Steve Wooledge,</p>
<p>I am flattered that you confused me for David DeWitt and Stonebraker <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  </p>
<p>They are the ones who are quoted as saying MapReduce wasn&#8217;t something new. MapReduce is a creation of Ghemawat and Dean.</p>
<p>All I&#8217;m saying is that recent claim by many that they’ve been “doing MapReduce all along” are not entirely true (and not entirely false either).</p>
<p>I&#8217;m not equating MR with the MPP redistribution framework, hence my comment that reads &#8220;&#8230; the simple answer is this: they have been doing something VERY MUCH LIKE MapReduce all along&#8221;.</p>
<p>Thanks,</p>
<p>-amrith</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steve Wooledge</title>
		<link>http://www.dbms2.com/2009/10/18/three-big-myths-about-mapreduce/comment-page-1/#comment-144698</link>
		<dc:creator>Steve Wooledge</dc:creator>
		<pubDate>Tue, 20 Oct 2009 18:34:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=1135#comment-144698</guid>
		<description>@Jerome: Our customers write SQL-MR functions to do computations on data that would have been extremely complicated, error-prone or slow-performing if done using only SQL. Therein lies a key motivator - customers consciously choose SQL-MR for convenience as opposed to being transparently locked-in. 
    

Our SQL-MR syntax goes a long way in ensuring that the relational model is preserved. For example, MR functions consume and produce relations; MR invocations are modeled as stored-procedure invocations. This means that a customer can migrate to non-Aster nCluster installations with an effort similar to migrating their user-defined functions from one platform to another.

The best part of our SQL-MR framework is that the implementation of the MR functions are in open languages chosen by the customer (e.g., Java, Perl, Python, C++, C#, Ruby, etc.). This means that the actual code is not proprietary to Aster nCluster. The code snippets/sub-functions can be re-used in other platforms as well. In addition, the Map Reduce programming model has widespread popularity allowing for portability since the structure of one’s code is first to design the Map-Reduce design, and secondarily express it in SQL-MR to the extent that one uses features unique to our platform.
  
The libraries of Aster SQL-MR functions that we provide are, of course, proprietary. They have innovations in data structure and processing that ensure high performance of the compute function. We&#039;ve published the source code of some of these functions; for others, we&#039;ve published the algorithms but not the source code; for the rest, we may not publish either the code/algorithm. In fact, we encourage our partners who write SQL-MR functions complete discretion on publishing their functions or providing only binaries to protect their IP. 

The important point to note here is that we are committed to providing an open platform in which one function is not forced upon the end-user.

===

@Amrith: Whenever an innovative system becomes mainstream, there are always claims that the innovation is nothing new! We went through this in the 1990s when Java appeared on the scene as well.

We cannot equate Map-Reduce programming framework to the internal re-distribution mechanism of tuples in MPP databases. The Map-Reduce programming framework is innovative because it provides a way of attaining parallelism for arbitrary computations. The internal MPP DB tuple re-distribution mechanisms operated on one-tuple at a time with a static hash function that had the number of partitions statically pre-defined. The mechanism could not be re-used by users or database applications - in fact, it could not be re-used even by stored procedures that were part of the MPP DB framework.

If you are interested, please look at the Related Work section of our VLDB 2009 conference paper. http://www.asterdata.com/resources/downloads/whitepapers/sqlmr.pdf</description>
		<content:encoded><![CDATA[<p>@Jerome: Our customers write SQL-MR functions to do computations on data that would have been extremely complicated, error-prone or slow-performing if done using only SQL. Therein lies a key motivator &#8211; customers consciously choose SQL-MR for convenience as opposed to being transparently locked-in. </p>
<p>Our SQL-MR syntax goes a long way in ensuring that the relational model is preserved. For example, MR functions consume and produce relations; MR invocations are modeled as stored-procedure invocations. This means that a customer can migrate to non-Aster nCluster installations with an effort similar to migrating their user-defined functions from one platform to another.</p>
<p>The best part of our SQL-MR framework is that the implementation of the MR functions are in open languages chosen by the customer (e.g., Java, Perl, Python, C++, C#, Ruby, etc.). This means that the actual code is not proprietary to Aster nCluster. The code snippets/sub-functions can be re-used in other platforms as well. In addition, the Map Reduce programming model has widespread popularity allowing for portability since the structure of one’s code is first to design the Map-Reduce design, and secondarily express it in SQL-MR to the extent that one uses features unique to our platform.</p>
<p>The libraries of Aster SQL-MR functions that we provide are, of course, proprietary. They have innovations in data structure and processing that ensure high performance of the compute function. We&#8217;ve published the source code of some of these functions; for others, we&#8217;ve published the algorithms but not the source code; for the rest, we may not publish either the code/algorithm. In fact, we encourage our partners who write SQL-MR functions complete discretion on publishing their functions or providing only binaries to protect their IP. </p>
<p>The important point to note here is that we are committed to providing an open platform in which one function is not forced upon the end-user.</p>
<p>===</p>
<p>@Amrith: Whenever an innovative system becomes mainstream, there are always claims that the innovation is nothing new! We went through this in the 1990s when Java appeared on the scene as well.</p>
<p>We cannot equate Map-Reduce programming framework to the internal re-distribution mechanism of tuples in MPP databases. The Map-Reduce programming framework is innovative because it provides a way of attaining parallelism for arbitrary computations. The internal MPP DB tuple re-distribution mechanisms operated on one-tuple at a time with a static hash function that had the number of partitions statically pre-defined. The mechanism could not be re-used by users or database applications &#8211; in fact, it could not be re-used even by stored procedures that were part of the MPP DB framework.</p>
<p>If you are interested, please look at the Related Work section of our VLDB 2009 conference paper. <a href="http://www.asterdata.com/resources/downloads/whitepapers/sqlmr.pdf" onclick="javascript:pageTracker._trackPageview('/www.asterdata.com');" rel="nofollow">http://www.asterdata.com/resources/downloads/whitepapers/sqlmr.pdf</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Mount</title>
		<link>http://www.dbms2.com/2009/10/18/three-big-myths-about-mapreduce/comment-page-1/#comment-144586</link>
		<dc:creator>John Mount</dc:creator>
		<pubDate>Mon, 19 Oct 2009 17:13:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=1135#comment-144586</guid>
		<description>A good point, but while Map Reduce is not new I feel it emphasized clarity and simplicity (at least for the problem of sorting), so that is probably why it markets easier than MPI or a database.  I wrote a bit on this point some time ago: http://www.win-vector.com/blog/2009/01/map-reduce-a-good-idea/</description>
		<content:encoded><![CDATA[<p>A good point, but while Map Reduce is not new I feel it emphasized clarity and simplicity (at least for the problem of sorting), so that is probably why it markets easier than MPI or a database.  I wrote a bit on this point some time ago: <a href="http://www.win-vector.com/blog/2009/01/map-reduce-a-good-idea/" onclick="javascript:pageTracker._trackPageview('/www.win-vector.com');" rel="nofollow">http://www.win-vector.com/blog/2009/01/map-reduce-a-good-idea/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: uberVU - social comments</title>
		<link>http://www.dbms2.com/2009/10/18/three-big-myths-about-mapreduce/comment-page-1/#comment-144563</link>
		<dc:creator>uberVU - social comments</dc:creator>
		<pubDate>Mon, 19 Oct 2009 11:06:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=1135#comment-144563</guid>
		<description>&lt;strong&gt;Social comments and analytics for this post...&lt;/strong&gt;

This post was mentioned on Twitter by jameskobielus: Read @CurtMonash on MapReduce (http://bit.ly/2pdJ1W). None of this brand new. Nor is it true standard. Vendor implementations vary widely....</description>
		<content:encoded><![CDATA[<p><strong>Social comments and analytics for this post&#8230;</strong></p>
<p>This post was mentioned on Twitter by jameskobielus: Read @CurtMonash on MapReduce (<a href="http://bit.ly/2pdJ1W)" onclick="javascript:pageTracker._trackPageview('/bit.ly');" rel="nofollow">http://bit.ly/2pdJ1W)</a>. None of this brand new. Nor is it true standard. Vendor implementations vary widely&#8230;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Amrith Kumar</title>
		<link>http://www.dbms2.com/2009/10/18/three-big-myths-about-mapreduce/comment-page-1/#comment-144540</link>
		<dc:creator>Amrith Kumar</dc:creator>
		<pubDate>Mon, 19 Oct 2009 00:38:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=1135#comment-144540</guid>
		<description>Jerome&#039;s point is dead on; SQL/MR is analogous to vendors custom SQL Extension. That is one of my concerns about these MR extensions; they introduce vendor lock-in.

And as for the recent claim by many that they&#039;ve been &quot;doing MapReduce all along&quot;, the simple answer is this: they have been doing something VERY MUCH LIKE MapReduce all along.

MPP databases horizontally partition the data and process partitions on distinct nodes. MapReduce does not perform the partitioning apriori, it does it at runtime. MPP implementations that I am familiar with always perform the partitioning of persistent data (tables) apriori with provisions to redistribute the data as part of the query processing mechanism.

Dean &amp; Ghemawat write, &quot;The Map invocations are distributed across multiple machines by automatically partitioning the input data into a set of M splits. The input splits can be processed
in parallel by different machines. Reduce invocations are distributed by partitioning the intermediate key space into R pieces using a partitioning function (e.g., hash(key) mod R). The number of partitions (R) and the partitioning function are specifed by the user.&quot;

Each MPP implementation has a different name for the mechanism to perform this splitting. In effect therefore MapReduce is another mechanism for MPP&#039;izing the solution to a problem and there is some merit to the claim being made by MPP database vendors that they&#039;ve been doing MapReduce all along.</description>
		<content:encoded><![CDATA[<p>Jerome&#8217;s point is dead on; SQL/MR is analogous to vendors custom SQL Extension. That is one of my concerns about these MR extensions; they introduce vendor lock-in.</p>
<p>And as for the recent claim by many that they&#8217;ve been &#8220;doing MapReduce all along&#8221;, the simple answer is this: they have been doing something VERY MUCH LIKE MapReduce all along.</p>
<p>MPP databases horizontally partition the data and process partitions on distinct nodes. MapReduce does not perform the partitioning apriori, it does it at runtime. MPP implementations that I am familiar with always perform the partitioning of persistent data (tables) apriori with provisions to redistribute the data as part of the query processing mechanism.</p>
<p>Dean &amp; Ghemawat write, &#8220;The Map invocations are distributed across multiple machines by automatically partitioning the input data into a set of M splits. The input splits can be processed<br />
in parallel by different machines. Reduce invocations are distributed by partitioning the intermediate key space into R pieces using a partitioning function (e.g., hash(key) mod R). The number of partitions (R) and the partitioning function are specifed by the user.&#8221;</p>
<p>Each MPP implementation has a different name for the mechanism to perform this splitting. In effect therefore MapReduce is another mechanism for MPP&#8217;izing the solution to a problem and there is some merit to the claim being made by MPP database vendors that they&#8217;ve been doing MapReduce all along.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2009/10/18/three-big-myths-about-mapreduce/comment-page-1/#comment-144506</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Sun, 18 Oct 2009 17:01:22 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=1135#comment-144506</guid>
		<description>Jerome,

The Aster syntax is Aster-specific, just as if you used any other vendor&#039;s proprietary SQL extensions.

CAM</description>
		<content:encoded><![CDATA[<p>Jerome,</p>
<p>The Aster syntax is Aster-specific, just as if you used any other vendor&#8217;s proprietary SQL extensions.</p>
<p>CAM</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic page generated in 0.260 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2010-01-26 18:32:46 -->
