<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: MapReduce for data mining?  Maybe for variable-schema analytics.</title>
	<atom:link href="http://www.dbms2.com/2008/01/19/mapreduce-variable-schema-analytics/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com/2008/01/19/mapreduce-variable-schema-analytics/</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Sat, 31 Jul 2010 00:21:55 -0400</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: More Google reliability woes &#124; DBMS2 -- DataBase Management System Services</title>
		<link>http://www.dbms2.com/2008/01/19/mapreduce-variable-schema-analytics/#comment-95582</link>
		<dc:creator>More Google reliability woes &#124; DBMS2 -- DataBase Management System Services</dc:creator>
		<pubDate>Mon, 25 Aug 2008 07:50:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/2008/01/19/mapreduce-variable-schema-analytics/#comment-95582</guid>
		<description>[...] reliability issues are ever worse. As I previously pointed out, this is evidence against the notion that MapReduce is a replacement for established DBMS.   Share: These icons link to [...]</description>
		<content:encoded><![CDATA[<p>[...] reliability issues are ever worse. As I previously pointed out, this is evidence against the notion that MapReduce is a replacement for established DBMS.   Share: These icons link to [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Stuart Frost</title>
		<link>http://www.dbms2.com/2008/01/19/mapreduce-variable-schema-analytics/#comment-68632</link>
		<dc:creator>Stuart Frost</dc:creator>
		<pubDate>Tue, 22 Jan 2008 00:29:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/2008/01/19/mapreduce-variable-schema-analytics/#comment-68632</guid>
		<description>This is all interesting on a number of fronts.

First of all, the critique of MapReduce by DeWitt and Stonebraker is breathtakingly arrogant. MapReduce was clearly not designed to solve the same problems as an RDBMS, so it&#039;s strange to criticize it for not having the same functionality. As for the comment that MapReduce will be difficult to scale - well, it&#039;s hard to argue with 20PB per day!

Google&#039;s benchmarks are also pretty revealing. Using 1,800 servers to grep through 1TB of data in 2.5 mins is incredibly inefficient. Using user defined functions (UDFs) in one of our appliances, I estimate that we&#039;d get through the same amount of work on less than 16 nodes - maybe as few as eight, given the likelihood of higher than normal compression ratios. Not sure about how fast the sort would run on our appliances, but over 800s to sort 1TB on 1,800 servers also seems very, very slow - as do the I/O rates shown on the charts.

Seems like they are just throwing an awful lot of hardware at the problem - don&#039;t tell Al Gore!

Stuart Frost
CEO, DATAllegro</description>
		<content:encoded><![CDATA[<p>This is all interesting on a number of fronts.</p>
<p>First of all, the critique of MapReduce by DeWitt and Stonebraker is breathtakingly arrogant. MapReduce was clearly not designed to solve the same problems as an RDBMS, so it&#8217;s strange to criticize it for not having the same functionality. As for the comment that MapReduce will be difficult to scale &#8211; well, it&#8217;s hard to argue with 20PB per day!</p>
<p>Google&#8217;s benchmarks are also pretty revealing. Using 1,800 servers to grep through 1TB of data in 2.5 mins is incredibly inefficient. Using user defined functions (UDFs) in one of our appliances, I estimate that we&#8217;d get through the same amount of work on less than 16 nodes &#8211; maybe as few as eight, given the likelihood of higher than normal compression ratios. Not sure about how fast the sort would run on our appliances, but over 800s to sort 1TB on 1,800 servers also seems very, very slow &#8211; as do the I/O rates shown on the charts.</p>
<p>Seems like they are just throwing an awful lot of hardware at the problem &#8211; don&#8217;t tell Al Gore!</p>
<p>Stuart Frost<br />
CEO, DATAllegro</p>
]]></content:encoded>
	</item>
</channel>
</rss>
