June 8, 2015

Teradata will support Presto

At the highest level:

Now let’s make that all a little more precise.

Regarding Presto (and I got most of this from Teradata)::

Daniel Abadi said that Presto satisfies what he sees as some core architectural requirements for a modern parallel analytic RDBMS project: 

More precisely, this is a checklist for interactive-speed parallel SQL. There are some query jobs long enough that Dan thinks you need the fault-tolerance obtained from writing intermediate results to disk, ala’ HadoopDB (which was of course the MapReduce-based predecessor to Hadapt).

That said, Presto is a newish database technology effort, there’s lots of stuff missing from it, and there still will be lots of stuff missing from Presto years from now. Teradata has announced contribution plans to Presto for, give or take, the next year, in three phases:

Absent from any specific plans that were disclosed to me was anything about optimization or other performance hacks, and anything about workload management beyond what can be gotten from YARN. I also suspect that much SQL coverage will still be lacking after Phase 3.

Teradata’s basic business model for Presto is:

And of course Presto is usurping Hive’s role wherever that makes sense in Teradata’s data connectivity story, e.g. Teradata QueryGrid.

Finally, since I was on the phone with Justin Borgman and Dan Abadi, discussing a project that involved 16 former Hadapt engineers, I asked about Hadapt’s status. That may be summarized as:

Comments

19 Responses to “Teradata will support Presto”

  1. David Gruzman on June 8th, 2015 6:23 am

    In best of my understanding, one of the main problems of JVM based massive data processing is performance of serialization and deserialization processes. In a nutshell – efficient reading of single record from disk to Java is challenging. This part of processing is performance critical for reading tables’s data, as well as for sending intermediate data between nodes in cluster (shuffle stages).
    There is developing extension to Java, called Unsafe which gradually enable such things.

    I am curious how Presto achieve C like performance? Is Java Unsafe already advanced enough?

  2. David Phillips on June 8th, 2015 11:12 am

    Presto uses Unsafe extensively via our Slice library: https://github.com/airlift/slice

    Unsafe allows reading and writing primitives (int, long, double, etc.) with a single CPU instruction, just as if you wrote it in C. The get/put calls on Unsafe are special “intrinsics” that the Java JIT compiler understands and turns into efficient native machine code. This lets us treat a byte array as if it is arbitrary memory, like in C. Slice works on data from anywhere: disk, memory, network, etc.

    We are also careful about memory allocation. Rows are stored in columnar pages in memory, allowing efficient processing with little GC overhead. You wouldn’t call malloc in the inner loop in C, so of course we don’t do it in Java either.

  3. David Gruzman on June 9th, 2015 7:50 am

    Thank you for the answer. Can I conclude from the above that Java, to be competitive with C on data processing, must internally work “columnar way” and even represent intermediate result as columnar.

  4. darose on June 9th, 2015 11:53 am

    I tried Presto briefly a couple of months back, but found their lack of fault tolerance made it a non-starter for us. (See https://github.com/facebook/presto/issues/1010)

    This lack of fault tolerance is a much bigger issue than people realize. Many people run their infrastructure at cloud providers like Amazon, where you routinely experience intermittent “timeout” errors and such while trying to access your data (e.g., reading a file from Amazon S3). And the larger the data, the more likely for this situation to occur in any given query job.

    Each time this happens, it results in a failed task. In an environment with fault tolerance (e.g., Hadoop Map Reduce or Spark) this isn’t an issue – the task just gets tried again on another node and eventually succeeds. But on Presto, it kills the whole job/query.

    Their assertion that “the user must resubmit the query” in these cases really isn’t a realistic workaround: a) the query may very likely fail the next time it gets run for the same reason (network timeouts), and b) more importantly, this makes Presto completely unusable for situations where there is no “user” to re-submit, such as running scripted queries.

    IMO ,Spark SQL is the way forward for those who want to move beyond Hive, not Presto.

  5. David Gruzman on June 9th, 2015 2:23 pm

    Fault tolerance usually come with a price. In case of MapReduce – the price is big.
    In my opinion, the questions is – does Presto provide significantly better performance than fault tolerant systems?
    In the same time mentioned above big users, like AWS based Netflix, are using Presto, so it can not be that unusable in cloud environment.

  6. Curt Monash on June 9th, 2015 7:39 pm

    Dan Abadi reminded me of something I now wish I’d stressed in my original post — you can have two different query engines over the same data, with the system automagically deciding which one to use. In this context, we can say that one is the interactively fast one — e.g. Presto — while the other is the fault-tolerant one. That all was part of the Hadapt strategy.

    And since I mentioned Dan, recall that a different form of the approach is in Vertica — data comes into the write-optimized store (WOS) and soon is flushed to the read-optimized store (ROS), but a query may hit either one, or perhaps go against both and federate its results.

  7. Mark Callaghan on June 10th, 2015 5:02 pm

    I think the claim that “X is fault tolerant” is usually much more true on paper than in practice. I’d rather know what the query success rates are in practice as there are many reasons for faults. We focus on big downtime events, thus the push to claim things are fault tolerate. We often neglect the small downtime events (GC pause, etc). I’d also want to measure response time variance because slow responses == failures for many deployments.

  8. Curt Monash on June 10th, 2015 5:50 pm

    Mark,

    Once a query is taking 10 minutes anyway, having it take 11 minutes is probably not a failure.

    However, I’ll confess to being unsure of why a query would take 1,000 or 10,000 node-minutes to execute unless they involve huge amounts of I/O. And if they do, slathering on even more I/O doesn’t obviously lead to efficiency.

    I can see an obvious exception for queries that look inside large and opaque data fields (video, log files, etc.), but I don’t get the feeling that this is the main case being considered in these discussions.

  9. Mark Callaghan on June 10th, 2015 7:52 pm

    I wasn’t aware that the common use case for Presto was multi minute queries. I thought it was multi second queries and for multi second queries why add complexity for check pointing intermediate results.

  10. Nezih Yigitbasi on June 11th, 2015 1:44 am

    We have been using Presto @ Netflix on the AWS cloud for over a year and so far machine failures weren’t really a big issue for us — yes, once in a while it may happen but it hasn’t been a big deal for us so far. David Gruzman has a point, there is usually a trade-off between fault tolerance and performance and Presto is definitely designed & implemented for performance from the start.

    And regarding darose’s comment I don’t understand why the script cannot re-submit the query on failure.

  11. Curt Monash on June 11th, 2015 1:48 am

    Mark,

    You’re right about the use cases. See my comment above that starts “Dan Abadi”.

  12. Mark Callaghan on June 11th, 2015 7:27 am

    Do we need to talk about availability of the query engine separate from the data store? I assume the query engines are (mostly) stateless so keeping them going is easy. The datastore is the hard part, but isn’t the datastore HA problem the same as for all other things Hadoop?

  13. Curt Monash on June 11th, 2015 10:03 am

    Mark,

    What’s the Facebook answer to that? Are their queries “big” enough that fault-tolerance is regarded as more important than Presto’s speed?

  14. Zhenxiao Luo on June 11th, 2015 1:45 pm

    As Nezih said, we (Netflix) are using Presto on the AWS cloud for more than a year, and everything goes pretty well.

    When Presto is reading from S3, it does have retries, exponential backoffs, etc to make sure the query is working in case of intermittent “timeout” in Amazon S3. It is also configurable.

    Our (Netflix) production queries are mostly completed in less than one minute, while, there are some long running queries(which we do not suggest, but we could not stop users loving Presto), which scans large files from s3, and completed successfully in hours.

    Presto does not have fault tolerance at the worker/task level. But adding that could be expensive. It is a choice of:
    #1 Presto could finish the query in 5 seconds, but at some rare case, you have to retry
    #2 Hive/MapReduce could finish the query in 20 minutes, the probability of a query failure in MapReduce is lower than Presto, but it is still possible

    For SparkSQL, is there any production use case for 1PB+ dataset?

  15. Reynold Xin on June 14th, 2015 1:54 am

    @Zhenxiao: Yes – there are multiple organizations running Spark SQL against 1PB+ dataset. Some of them are at Spark Summit next week.

  16. Reynold Xin on June 14th, 2015 1:56 am

    @Zhenxiao: Also I’m slightly confused by your position. If you need to process 1PB+ datasets, fault-tolerance seems to be important. If all you are processing are tiny datasets (or a tiny part of PB+ datasets using coarse-grained indexing), then fault-tolerance doesn’t matter as much.

  17. Curt Monash on June 14th, 2015 6:37 am

    Ya know — the whole thing might be clearer with a few actual numbers.

    In a very large query, what IS the likelihood that a sub-task would fail, making you wish you had fault tolerance?

    By way of comparison, how much is execution slowed by insisting on fault-tolerance?

    And by the way, just how big does a query have to be before it takes 10 minutes to run or 1 hour or whatever, across many nodes, if we assume it’s optimized for speed rather than fault-tolerance.

  18. Big Analytics Roundup (June 15, 2015) | The Big Analytics Blog on June 15th, 2015 8:45 am

    […] Key analytics news from Hadoop Summit: new releases of MapR and HDP; Teradata announces support for Presto (more here). […]

  19. Zhenxiao Luo on June 15th, 2015 7:38 pm

    @Reynold Xin: Thank you for the info. Glad to know. Are there any public numbers or details about their deployment?

    I was/am talking about big datasets, 10PB+.

    From our (Netflix) experience, Presto could run SQL queries against big datasets, 10PB+, and most of the queries could complete within 1 minute, these queries are processing big datasets, not using index at all.

    Yes, Fault tolerance could always be important. From our experience, seems like it depends on the engine, if the engine is super fast like Presto, in some rare case, retry the query is OK, as the same query, which runs against big datasets could be running in tens of minutes if not hours in Hive or other inefficient engine.

    Is there any numbers about the cluster size and data size for SparkSQL deployment, in production?

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.