September 13, 2009

Fault-tolerant queries

MapReduce/Hadoop fans sometimes raise the question of query fault-tolerance. That is — if a node fails, does the query need to be restarted, or can it keep going? For example, Daniel Abadi et al. trumpet query fault-tolerance as one of the virtues of HadoopDB. Some of the scientists at XLDB spoke of query fault-tolerance as being a good reason to leave 100s or 1000s of terabytes of data in Hadoop-managed file systems.

When we discussed this subject a few months ago in a couple of comment threads, it seemed to be the case that:

This raises an obvious (pair of) question(s) — why and/or when would anybody ever care about query fault-tolerance? To start answering that, let’s propose a simple metric for query execution time on a grid or cluster: node-hours = (duration of query) * (number of nodes it executes on). Then, to a first approximation, it would seem:

Frankly, I’m not too clear on the use cases for which query fault-tolerance really matters. Still, as noted above, it is indeed coming up more and more. I’m not aware that it would be terribly hard for DBMS vendors to, as an option, let DBAs specify execution plans that force intermediate query materialization and further to use that materialization as the basis for query fault-tolerance. Perhaps doing so should be on some technical roadmaps.


10 Responses to “Fault-tolerant queries”

  1. HadoopDB | DBMS2 -- DataBase Management System Services on September 13th, 2009 12:59 am

    […] Query fault-tolerance […]

  2. Todd Lipcon on September 13th, 2009 3:33 am

    How is it that you classify Hive as not having fault tolerance? Hive’s execution layer is MapReduce jobs on Hadoop, and thus has the same fault tolerance properties as Hadoop jobs in general. Failed tasks will be re-executed as necessary up to a user-configurable threshold.


  3. Curt Monash on September 13th, 2009 12:06 pm


    See Joydeep’s explanation in

    Short answer: One Hive job can comprise many MapReduce jobs.

    I’m also editing my post above for clarity.

  4. Unholyguy on September 13th, 2009 12:36 pm

    Good analysis, in a way that ratio serves as the upper end of the current scalability of distributed RDBMS’. That upper end is pretty high. Nowdays it seems like multi hour queries are becoming more and more rare for most enterprise warehouses.

    However I do argue with this statement “Using cheap/low-end/commodity hardware — as Hadoop fans like to do — increases the (node-hours)/MTTF ratio”

    It’s not the commodity hardware that is the issue, Teradata is mostly commodity hardware. it’s not even map reduce, I think the hadoop approach to pipelining and block based partitioning which chooses to go down a brute force route rather then a clever query optimization route.

    i think the concept that machines do not have identities whatsoever is pretty powerful in general, outside of the restartable queries thing. It’s made possible a lot of the ec2 work cloudera has done.

    It also could allow some interesting virtualization like approaches to queries, for instance it would be theoretically possibly in the hadoop architecture to copy a query in mid execution to another cluster. Or pause them indefinitely.

  5. Curt Monash on September 13th, 2009 1:07 pm

    Actually, a big theme in MapReduce is “true commodity” hardware vs. “enterprise-class non-proprietary hardware”. That’s in the Google data center story even before MapReduce was popularized. It’s central to Facebook’s fondness for Hadoop. Etc., etc.

    And one of Aster Data’s messages is “Unlike those other companies in the same category as us, we let you get buy with true commodity hardware.”

  6. Unholyguy on September 13th, 2009 3:37 pm

    i guess you can make an argument for storage on those lines. I’m not sure there is really much differentiation in servers or network anymore

  7. Matt on September 14th, 2009 2:12 pm

    I was told by Aster that they don’t have query fault tolerance but if a query fails in flight it will automatically restart from the beginning. This is with their current 3.x version. I’m not sure what the 4.x version will have.

    With regard to commodity hardware; doesn’t 95% of the vendors out there run on commodity hardware, albeit high end commodity hardware?

    When I hear commodity hardware I think of that box sitting in my garage… I think the phrase is over used…

  8. Amrith Kumar on October 4th, 2009 12:53 pm

    Query Fault Tolerance is a big deal with big databases. In cases where the database is running on top of some RAID-X storage, a single drive failure (the most common failure) could go unnoticed. But, in the case of systems that have one-drive-per-node, a single drive failure could cause a hiccup that must be handled in the database.

    At issue is the fact that while drive densities have gone up (higher capacity per drive), the failure rate per GB per year has not gone down in sync.

    MapReduce provides the ability to restart a node and have the MapReduce program continue to run; the same is not he case with pipelined MapReduce programs or programs that have multiple stages, one of which is MapReduce.

  9. Oracle’s version of “actually, we’ve been doing MapReduce all along too” | DBMS2 -- DataBase Management System Services on October 6th, 2009 6:02 am

    […] query result materialization. Presumably, then, Oracle’s quasi-MapReduce would also lack query fault-tolerance. Categories: Analytic technologies, MapReduce, Oracle, Parallelization  Subscribe to our […]

  10. RYW (Read-Your-Writes) consistency explained | DBMS2 -- DataBase Management System Services on May 1st, 2010 1:18 am

    […] Query fault-tolerance […]

Leave a Reply

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.