Comments on: Fault-tolerant queries

By: RYW (Read-Your-Writes) consistency explained | DBMS2 -- DataBase Management System Services

Sat, 01 May 2010 05:18:54 +0000

[…] Query fault-tolerance […]

By: Oracle’s version of “actually, we’ve been doing MapReduce all along too” | DBMS2 -- DataBase Management System Services

Tue, 06 Oct 2009 10:02:41 +0000

[…] query result materialization. Presumably, then, Oracle’s quasi-MapReduce would also lack query fault-tolerance. Categories: Analytic technologies, MapReduce, Oracle, Parallelization Subscribe to our […]

By: Amrith Kumar

Amrith Kumar — Sun, 04 Oct 2009 16:53:39 +0000

Query Fault Tolerance is a big deal with big databases. In cases where the database is running on top of some RAID-X storage, a single drive failure (the most common failure) could go unnoticed. But, in the case of systems that have one-drive-per-node, a single drive failure could cause a hiccup that must be handled in the database.

At issue is the fact that while drive densities have gone up (higher capacity per drive), the failure rate per GB per year has not gone down in sync.

MapReduce provides the ability to restart a node and have the MapReduce program continue to run; the same is not he case with pipelined MapReduce programs or programs that have multiple stages, one of which is MapReduce.

By: Matt

Matt — Mon, 14 Sep 2009 18:12:39 +0000

I was told by Aster that they don’t have query fault tolerance but if a query fails in flight it will automatically restart from the beginning. This is with their current 3.x version. I’m not sure what the 4.x version will have.

With regard to commodity hardware; doesn’t 95% of the vendors out there run on commodity hardware, albeit high end commodity hardware?

When I hear commodity hardware I think of that box sitting in my garage… I think the phrase is over used…

By: Unholyguy

Unholyguy — Sun, 13 Sep 2009 19:37:23 +0000

i guess you can make an argument for storage on those lines. I’m not sure there is really much differentiation in servers or network anymore

By: Curt Monash

Curt Monash — Sun, 13 Sep 2009 17:07:00 +0000

Actually, a big theme in MapReduce is “true commodity” hardware vs. “enterprise-class non-proprietary hardware”. That’s in the Google data center story even before MapReduce was popularized. It’s central to Facebook’s fondness for Hadoop. Etc., etc.

And one of Aster Data’s messages is “Unlike those other companies in the same category as us, we let you get buy with true commodity hardware.”

By: Unholyguy

Unholyguy — Sun, 13 Sep 2009 16:36:04 +0000

Good analysis, in a way that ratio serves as the upper end of the current scalability of distributed RDBMS’. That upper end is pretty high. Nowdays it seems like multi hour queries are becoming more and more rare for most enterprise warehouses.

However I do argue with this statement “Using cheap/low-end/commodity hardware — as Hadoop fans like to do — increases the (node-hours)/MTTF ratio”

It’s not the commodity hardware that is the issue, Teradata is mostly commodity hardware. it’s not even map reduce, I think the hadoop approach to pipelining and block based partitioning which chooses to go down a brute force route rather then a clever query optimization route.

i think the concept that machines do not have identities whatsoever is pretty powerful in general, outside of the restartable queries thing. It’s made possible a lot of the ec2 work cloudera has done.

It also could allow some interesting virtualization like approaches to queries, for instance it would be theoretically possibly in the hadoop architecture to copy a query in mid execution to another cluster. Or pause them indefinitely.

By: Curt Monash

Curt Monash — Sun, 13 Sep 2009 16:06:57 +0000

Todd,

See Joydeep’s explanation in http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/

Short answer: One Hive job can comprise many MapReduce jobs.

I’m also editing my post above for clarity.

By: Todd Lipcon

Todd Lipcon — Sun, 13 Sep 2009 07:33:36 +0000

How is it that you classify Hive as not having fault tolerance? Hive’s execution layer is MapReduce jobs on Hadoop, and thus has the same fault tolerance properties as Hadoop jobs in general. Failed tasks will be re-executed as necessary up to a user-configurable threshold.

-Todd

By: HadoopDB | DBMS2 -- DataBase Management System Services

HadoopDB | DBMS2 -- DataBase Management System Services — Sun, 13 Sep 2009 04:59:42 +0000

[…] Query fault-tolerance […]