<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: How much state is saved when an MPP DBMS node fails?</title>
	<atom:link href="http://www.dbms2.com/2009/05/12/how-much-state-is-saved-when-an-mpp-dbms-node-fails/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com/2009/05/12/how-much-state-is-saved-when-an-mpp-dbms-node-fails/</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 09 Feb 2012 13:48:12 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
	<item>
		<title>By: Shawn Fox</title>
		<link>http://www.dbms2.com/2009/05/12/how-much-state-is-saved-when-an-mpp-dbms-node-fails/#comment-122968</link>
		<dc:creator>Shawn Fox</dc:creator>
		<pubDate>Wed, 27 May 2009 18:43:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=778#comment-122968</guid>
		<description>With Netezza when a SPU fails during query execution all select statements which have not started returning data get restarted from the beginning.  

Any data loads are killed and must be restarted manually.

Select statements which have started returning data are killed and must be restarted manually.

Fortunately failures are not very common so it is rarely an issue.</description>
		<content:encoded><![CDATA[<p>With Netezza when a SPU fails during query execution all select statements which have not started returning data get restarted from the beginning.  </p>
<p>Any data loads are killed and must be restarted manually.</p>
<p>Select statements which have started returning data are killed and must be restarted manually.</p>
<p>Fortunately failures are not very common so it is rarely an issue.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Omer Trajman</title>
		<link>http://www.dbms2.com/2009/05/12/how-much-state-is-saved-when-an-mpp-dbms-node-fails/#comment-121600</link>
		<dc:creator>Omer Trajman</dc:creator>
		<pubDate>Fri, 15 May 2009 00:52:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=778#comment-121600</guid>
		<description>Currently when a Vertica node fails the cluster will cancel any statements that are in flight and the user is immediately able to re-run them.  Transactions that are in flight get preserved (i.e. we keep transaction state).  The user receives a statement level error, similar to what they would get in the case of a lock timeout or if there is a policy based quota or resource rejection.   Feedback from our users is that this model makes development very simple.  They don’t need to handle any special cases that a node has failed.  When a statement fails they can just re-run it.
 
Of course if the machine you are connected to fails then the transactions that it initiated are rolled back and the user needs to connect to a new machine.  Since all nodes in a Vertica cluster are peers, users can connect to any node and new connections are rerouted to a live node automatically when using load balancing software.</description>
		<content:encoded><![CDATA[<p>Currently when a Vertica node fails the cluster will cancel any statements that are in flight and the user is immediately able to re-run them.  Transactions that are in flight get preserved (i.e. we keep transaction state).  The user receives a statement level error, similar to what they would get in the case of a lock timeout or if there is a policy based quota or resource rejection.   Feedback from our users is that this model makes development very simple.  They don’t need to handle any special cases that a node has failed.  When a statement fails they can just re-run it.</p>
<p>Of course if the machine you are connected to fails then the transactions that it initiated are rolled back and the user needs to connect to a new machine.  Since all nodes in a Vertica cluster are peers, users can connect to any node and new connections are rerouted to a live node automatically when using load balancing software.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe Hellerstein</title>
		<link>http://www.dbms2.com/2009/05/12/how-much-state-is-saved-when-an-mpp-dbms-node-fails/#comment-121523</link>
		<dc:creator>Joe Hellerstein</dc:creator>
		<pubDate>Thu, 14 May 2009 15:01:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=778#comment-121523</guid>
		<description>Daniel:
Google&#039;s MR paper (which Hadoop folks follow closely) sez:

1) Mappers write outputs to their local disk.  (And some Map jobs are done redundantly on &gt;1 machine to mitigate &quot;stragglers&quot;)
2) Reducers fetch from mappers over the network some time later
3) Reducers write their outputs to the distributed filesystem (triply replicated)

So in terms of resource consumption (energy, disk utilization) that&#039;s a lot of I/O.

Your points are on target: some reads can be absorbed by filesystem cache, and there is overlap of CPU, disk I/O, and network I/O.  The latter only affects completion time, not resource consumption.

J</description>
		<content:encoded><![CDATA[<p>Daniel:<br />
Google&#8217;s MR paper (which Hadoop folks follow closely) sez:</p>
<p>1) Mappers write outputs to their local disk.  (And some Map jobs are done redundantly on &gt;1 machine to mitigate &#8220;stragglers&#8221;)<br />
2) Reducers fetch from mappers over the network some time later<br />
3) Reducers write their outputs to the distributed filesystem (triply replicated)</p>
<p>So in terms of resource consumption (energy, disk utilization) that&#8217;s a lot of I/O.</p>
<p>Your points are on target: some reads can be absorbed by filesystem cache, and there is overlap of CPU, disk I/O, and network I/O.  The latter only affects completion time, not resource consumption.</p>
<p>J</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Weinreb</title>
		<link>http://www.dbms2.com/2009/05/12/how-much-state-is-saved-when-an-mpp-dbms-node-fails/#comment-121516</link>
		<dc:creator>Daniel Weinreb</dc:creator>
		<pubDate>Thu, 14 May 2009 14:19:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=778#comment-121516</guid>
		<description>Joe, everything you&#039;re saying sounds right.

But why would Hadoop have to do reread &quot;over and over&quot;? I thought it only reads that data if it&#039;s recovering from a crash.

Also, is it possible that the Hadoop node writing the checkpoint could be doing the disk writing and its computation in parallel, which would reduce the cost of the checkpointing to at least some, and perhaps a great, degree?</description>
		<content:encoded><![CDATA[<p>Joe, everything you&#8217;re saying sounds right.</p>
<p>But why would Hadoop have to do reread &#8220;over and over&#8221;? I thought it only reads that data if it&#8217;s recovering from a crash.</p>
<p>Also, is it possible that the Hadoop node writing the checkpoint could be doing the disk writing and its computation in parallel, which would reduce the cost of the checkpointing to at least some, and perhaps a great, degree?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe Hellerstein</title>
		<link>http://www.dbms2.com/2009/05/12/how-much-state-is-saved-when-an-mpp-dbms-node-fails/#comment-121480</link>
		<dc:creator>Joe Hellerstein</dc:creator>
		<pubDate>Thu, 14 May 2009 08:41:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=778#comment-121480</guid>
		<description>Roughly speaking, most MPP databases use dataflow pipelines via &lt;a href=&quot;http://www.hpl.hp.com/personal/Goetz_Graefe/&quot; rel=&quot;nofollow&quot;&gt;Graefe&#039;s&lt;/a&gt; famous &lt;a href=&quot;http://www.informatik.uni-trier.de/~ley/db/conf/sigmod/Graefe90.html&quot; rel=&quot;nofollow&quot;&gt;Exchange&lt;/a&gt; model.  Those pipelines reflect an extremely optimistic view or reliability, and expensive restarts of deep dataflow pipelines in the case of even a single fault.  

By contrast, Hadoop (as per the Google MapReduce paper) is wildly pessimistic, checkpointing the output of &lt;i&gt;every single&lt;/i&gt; Map or Reduce stage to disks, before reading it right back in.  As a result, it&#039;s easy to construct cases where a traditional MPP DB would do no more I/O than the scan of the inputs, whereas a Hadoop job might need to write and reread stages of the pipeline over and over.  (I describe this to my undergrads as vomiting the data to disk just to immediately swallow it back up.)

The best answer probably lies either in between, or in an entirely different approach.  More on that question in &lt;a href=&quot;http://databeta.wordpress.com/2009/05/14/bigdata-node-density/&quot; rel=&quot;nofollow&quot;&gt;this post&lt;/a&gt;.</description>
		<content:encoded><![CDATA[<p>Roughly speaking, most MPP databases use dataflow pipelines via <a href="http://www.hpl.hp.com/personal/Goetz_Graefe/" rel="nofollow">Graefe&#8217;s</a> famous <a href="http://www.informatik.uni-trier.de/~ley/db/conf/sigmod/Graefe90.html" rel="nofollow">Exchange</a> model.  Those pipelines reflect an extremely optimistic view or reliability, and expensive restarts of deep dataflow pipelines in the case of even a single fault.  </p>
<p>By contrast, Hadoop (as per the Google MapReduce paper) is wildly pessimistic, checkpointing the output of <i>every single</i> Map or Reduce stage to disks, before reading it right back in.  As a result, it&#8217;s easy to construct cases where a traditional MPP DB would do no more I/O than the scan of the inputs, whereas a Hadoop job might need to write and reread stages of the pipeline over and over.  (I describe this to my undergrads as vomiting the data to disk just to immediately swallow it back up.)</p>
<p>The best answer probably lies either in between, or in an entirely different approach.  More on that question in <a href="http://databeta.wordpress.com/2009/05/14/bigdata-node-density/" rel="nofollow">this post</a>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Serge Rielau</title>
		<link>http://www.dbms2.com/2009/05/12/how-much-state-is-saved-when-an-mpp-dbms-node-fails/#comment-121288</link>
		<dc:creator>Serge Rielau</dc:creator>
		<pubDate>Wed, 13 May 2009 11:20:17 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=778#comment-121288</guid>
		<description>As far as DB2 for LUW is concerned if a query fails in an DPF environment it needs to get restarted. Whether the reason is a node failure or something else is irrelevant.
No intermediate results are rescued - with the obvious exception of a primed bufferpool of course.</description>
		<content:encoded><![CDATA[<p>As far as DB2 for LUW is concerned if a query fails in an DPF environment it needs to get restarted. Whether the reason is a node failure or something else is irrelevant.<br />
No intermediate results are rescued &#8211; with the obvious exception of a primed bufferpool of course.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2009/05/12/how-much-state-is-saved-when-an-mpp-dbms-node-fails/#comment-121197</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Tue, 12 May 2009 21:04:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=778#comment-121197</guid>
		<description>LOL, Robert!</description>
		<content:encoded><![CDATA[<p>LOL, Robert!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2009/05/12/how-much-state-is-saved-when-an-mpp-dbms-node-fails/#comment-121196</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Tue, 12 May 2009 21:04:22 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=778#comment-121196</guid>
		<description>Steve Wooledge of Aster took a crack at this question over in the Facebook/Hadoop/Hive thread.</description>
		<content:encoded><![CDATA[<p>Steve Wooledge of Aster took a crack at this question over in the Facebook/Hadoop/Hive thread.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Masroor Rasheed</title>
		<link>http://www.dbms2.com/2009/05/12/how-much-state-is-saved-when-an-mpp-dbms-node-fails/#comment-121193</link>
		<dc:creator>Masroor Rasheed</dc:creator>
		<pubDate>Tue, 12 May 2009 20:45:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=778#comment-121193</guid>
		<description>If a node fails in a clique there is a performance plenty of any query  or load job. Once the node is recovered, it will join the MPP. 
It is important to know what level of fault tolerance is built in place.</description>
		<content:encoded><![CDATA[<p>If a node fails in a clique there is a performance plenty of any query  or load job. Once the node is recovered, it will join the MPP.<br />
It is important to know what level of fault tolerance is built in place.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robert Morton</title>
		<link>http://www.dbms2.com/2009/05/12/how-much-state-is-saved-when-an-mpp-dbms-node-fails/#comment-121188</link>
		<dc:creator>Robert Morton</dc:creator>
		<pubDate>Tue, 12 May 2009 19:44:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=778#comment-121188</guid>
		<description>A MPP RDBMS built by Chuck Norris never experiences node failures.</description>
		<content:encoded><![CDATA[<p>A MPP RDBMS built by Chuck Norris never experiences node failures.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

