<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Fault-tolerant queries</title>
	<atom:link href="http://www.dbms2.com/2009/09/13/fault-tolerant-queries/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com/2009/09/13/fault-tolerant-queries/</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Wed, 08 Feb 2012 22:51:14 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
	<item>
		<title>By: RYW (Read-Your-Writes) consistency explained &#124; DBMS2 -- DataBase Management System Services</title>
		<link>http://www.dbms2.com/2009/09/13/fault-tolerant-queries/#comment-167077</link>
		<dc:creator>RYW (Read-Your-Writes) consistency explained &#124; DBMS2 -- DataBase Management System Services</dc:creator>
		<pubDate>Sat, 01 May 2010 05:18:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=884#comment-167077</guid>
		<description>[...] Query fault-tolerance [...]</description>
		<content:encoded><![CDATA[<p>[...] Query fault-tolerance [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Oracle&#8217;s version of &#8220;actually, we&#8217;ve been doing MapReduce all along too&#8221; &#124; DBMS2 -- DataBase Management System Services</title>
		<link>http://www.dbms2.com/2009/09/13/fault-tolerant-queries/#comment-142608</link>
		<dc:creator>Oracle&#8217;s version of &#8220;actually, we&#8217;ve been doing MapReduce all along too&#8221; &#124; DBMS2 -- DataBase Management System Services</dc:creator>
		<pubDate>Tue, 06 Oct 2009 10:02:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=884#comment-142608</guid>
		<description>[...] query result materialization. Presumably, then, Oracle&#8217;s quasi-MapReduce would also lack query fault-tolerance.   Categories: Analytic technologies, MapReduce, Oracle, Parallelization&#160;  Subscribe to our [...]</description>
		<content:encoded><![CDATA[<p>[...] query result materialization. Presumably, then, Oracle&#8217;s quasi-MapReduce would also lack query fault-tolerance.   Categories: Analytic technologies, MapReduce, Oracle, Parallelization&nbsp;  Subscribe to our [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Amrith Kumar</title>
		<link>http://www.dbms2.com/2009/09/13/fault-tolerant-queries/#comment-142478</link>
		<dc:creator>Amrith Kumar</dc:creator>
		<pubDate>Sun, 04 Oct 2009 16:53:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=884#comment-142478</guid>
		<description>Query Fault Tolerance is a big deal with big databases. In cases where the database is running on top of some RAID-X storage, a single drive failure (the most common failure) could go unnoticed. But, in the case of systems that have one-drive-per-node, a single drive failure could cause a hiccup that must be handled in the database.

At issue is the fact that while drive densities have gone up (higher capacity per drive), the failure rate per GB per year has not gone down in sync. 

MapReduce provides the ability to restart a node and have the MapReduce program continue to run; the same is not he case with pipelined MapReduce programs or programs that have multiple stages, one of which is MapReduce.</description>
		<content:encoded><![CDATA[<p>Query Fault Tolerance is a big deal with big databases. In cases where the database is running on top of some RAID-X storage, a single drive failure (the most common failure) could go unnoticed. But, in the case of systems that have one-drive-per-node, a single drive failure could cause a hiccup that must be handled in the database.</p>
<p>At issue is the fact that while drive densities have gone up (higher capacity per drive), the failure rate per GB per year has not gone down in sync. </p>
<p>MapReduce provides the ability to restart a node and have the MapReduce program continue to run; the same is not he case with pipelined MapReduce programs or programs that have multiple stages, one of which is MapReduce.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Matt</title>
		<link>http://www.dbms2.com/2009/09/13/fault-tolerant-queries/#comment-140344</link>
		<dc:creator>Matt</dc:creator>
		<pubDate>Mon, 14 Sep 2009 18:12:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=884#comment-140344</guid>
		<description>I was told by Aster that they don&#039;t have query fault tolerance but if a query fails in flight it will automatically restart from the beginning.  This is with their current 3.x version.  I&#039;m not sure what the 4.x version will have.

With regard to commodity hardware; doesn&#039;t 95% of the vendors out there run on commodity hardware, albeit high end commodity hardware?  

When I hear commodity hardware I think of that box sitting in my garage... I think the phrase is over used...</description>
		<content:encoded><![CDATA[<p>I was told by Aster that they don&#8217;t have query fault tolerance but if a query fails in flight it will automatically restart from the beginning.  This is with their current 3.x version.  I&#8217;m not sure what the 4.x version will have.</p>
<p>With regard to commodity hardware; doesn&#8217;t 95% of the vendors out there run on commodity hardware, albeit high end commodity hardware?  </p>
<p>When I hear commodity hardware I think of that box sitting in my garage&#8230; I think the phrase is over used&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Unholyguy</title>
		<link>http://www.dbms2.com/2009/09/13/fault-tolerant-queries/#comment-140280</link>
		<dc:creator>Unholyguy</dc:creator>
		<pubDate>Sun, 13 Sep 2009 19:37:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=884#comment-140280</guid>
		<description>i guess you can make an argument for storage on those lines. I&#039;m not sure there is really much differentiation in servers or network anymore</description>
		<content:encoded><![CDATA[<p>i guess you can make an argument for storage on those lines. I&#8217;m not sure there is really much differentiation in servers or network anymore</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2009/09/13/fault-tolerant-queries/#comment-140274</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Sun, 13 Sep 2009 17:07:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=884#comment-140274</guid>
		<description>Actually, a big theme in MapReduce is &quot;true commodity&quot; hardware vs. &quot;enterprise-class non-proprietary hardware&quot;.  That&#039;s in the Google data center story even before MapReduce was popularized. It&#039;s central to Facebook&#039;s fondness for Hadoop. Etc., etc.

And one of Aster Data&#039;s messages is &quot;Unlike those other companies in the same category as us, we let you get buy with true commodity hardware.&quot;</description>
		<content:encoded><![CDATA[<p>Actually, a big theme in MapReduce is &#8220;true commodity&#8221; hardware vs. &#8220;enterprise-class non-proprietary hardware&#8221;.  That&#8217;s in the Google data center story even before MapReduce was popularized. It&#8217;s central to Facebook&#8217;s fondness for Hadoop. Etc., etc.</p>
<p>And one of Aster Data&#8217;s messages is &#8220;Unlike those other companies in the same category as us, we let you get buy with true commodity hardware.&#8221;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Unholyguy</title>
		<link>http://www.dbms2.com/2009/09/13/fault-tolerant-queries/#comment-140272</link>
		<dc:creator>Unholyguy</dc:creator>
		<pubDate>Sun, 13 Sep 2009 16:36:04 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=884#comment-140272</guid>
		<description>Good analysis, in a way that ratio serves as the upper end of the current scalability of distributed RDBMS&#039;. That upper end is pretty high. Nowdays it seems like multi hour queries are becoming more and more rare for most enterprise warehouses.

However I do argue with this statement &quot;Using cheap/low-end/commodity hardware — as Hadoop fans like to do — increases the (node-hours)/MTTF ratio&quot;

It&#039;s not the commodity hardware that is the issue, Teradata is mostly commodity hardware. it&#039;s not even map reduce, I think the hadoop approach to pipelining  and block based partitioning which chooses to go down a brute force route rather then a clever query optimization route.

i think the concept that machines do not have identities whatsoever is pretty powerful in general, outside of the restartable queries thing. It&#039;s made possible a lot  of the ec2 work cloudera has done. 

It also could allow some interesting virtualization like approaches to queries, for instance it would be theoretically possibly in the hadoop architecture to copy a query in mid execution to another cluster. Or pause them indefinitely.</description>
		<content:encoded><![CDATA[<p>Good analysis, in a way that ratio serves as the upper end of the current scalability of distributed RDBMS&#8217;. That upper end is pretty high. Nowdays it seems like multi hour queries are becoming more and more rare for most enterprise warehouses.</p>
<p>However I do argue with this statement &#8220;Using cheap/low-end/commodity hardware — as Hadoop fans like to do — increases the (node-hours)/MTTF ratio&#8221;</p>
<p>It&#8217;s not the commodity hardware that is the issue, Teradata is mostly commodity hardware. it&#8217;s not even map reduce, I think the hadoop approach to pipelining  and block based partitioning which chooses to go down a brute force route rather then a clever query optimization route.</p>
<p>i think the concept that machines do not have identities whatsoever is pretty powerful in general, outside of the restartable queries thing. It&#8217;s made possible a lot  of the ec2 work cloudera has done. </p>
<p>It also could allow some interesting virtualization like approaches to queries, for instance it would be theoretically possibly in the hadoop architecture to copy a query in mid execution to another cluster. Or pause them indefinitely.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2009/09/13/fault-tolerant-queries/#comment-140267</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Sun, 13 Sep 2009 16:06:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=884#comment-140267</guid>
		<description>Todd,

See Joydeep&#039;s explanation in http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/

Short answer: One Hive job can comprise many MapReduce jobs.

I&#039;m also editing my post above for clarity.</description>
		<content:encoded><![CDATA[<p>Todd,</p>
<p>See Joydeep&#8217;s explanation in <a href="http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/" rel="nofollow">http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/</a></p>
<p>Short answer: One Hive job can comprise many MapReduce jobs.</p>
<p>I&#8217;m also editing my post above for clarity.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Todd Lipcon</title>
		<link>http://www.dbms2.com/2009/09/13/fault-tolerant-queries/#comment-140249</link>
		<dc:creator>Todd Lipcon</dc:creator>
		<pubDate>Sun, 13 Sep 2009 07:33:36 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=884#comment-140249</guid>
		<description>How is it that you classify Hive as not having fault tolerance? Hive&#039;s execution layer is MapReduce jobs on Hadoop, and thus has the same fault tolerance properties as Hadoop jobs in general. Failed tasks will be re-executed as necessary up to a user-configurable threshold.

-Todd</description>
		<content:encoded><![CDATA[<p>How is it that you classify Hive as not having fault tolerance? Hive&#8217;s execution layer is MapReduce jobs on Hadoop, and thus has the same fault tolerance properties as Hadoop jobs in general. Failed tasks will be re-executed as necessary up to a user-configurable threshold.</p>
<p>-Todd</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: HadoopDB &#124; DBMS2 -- DataBase Management System Services</title>
		<link>http://www.dbms2.com/2009/09/13/fault-tolerant-queries/#comment-140241</link>
		<dc:creator>HadoopDB &#124; DBMS2 -- DataBase Management System Services</dc:creator>
		<pubDate>Sun, 13 Sep 2009 04:59:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=884#comment-140241</guid>
		<description>[...] Query fault-tolerance [...]</description>
		<content:encoded><![CDATA[<p>[...] Query fault-tolerance [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>

