August 6, 2013

Hortonworks, Hadoop, Stinger and Hive

I chatted yesterday with the Hortonworks gang. The main subject was Hortonworks’ approach to SQL-on-Hadoop — commonly called Stinger —  but at my request we cycled through a bunch of other topics as well. Company-specific notes include:

Our deployment and use case discussions were a little confused, because a key part of Hortonworks’ strategy is to support and encourage the idea of combining use cases and workloads on a single cluster. But I did hear:

*By the way — Teradata seems serious about pushing the UDA as a core message.

Ecosystem notes, in Hortonworks’ perception, included:

I also asked specifically about OpenStack. Hortonworks is a member of the OpenStack project, contributes nontrivially to Swift and other subprojects, and sees Rackspace as an important partner. But despite all that, I think strong Hadoop/OpenStack integration is something for the indefinite future.

Hortonworks’ views about Hadoop 2.0 start from the premise that its goal is to support running a multitude of workloads on a single cluster. (See, for example, what I previously posted about Tez and YARN.) Timing notes for Hadoop 2.0 include:

Frankly, I think Cloudera’s earlier and necessarily incremental Hadoop 2 rollout was a better choice than Hortonworks’ later big bang, even though the core-mission aspect of Hadoop 2.0 is what was least ready. HDFS (Hadoop Distributed File System) performance, NameNode failover and so on were well worth having, and it’s more than a year between Cloudera starting supporting them and when Hortonworks is offering Hadoop 2.0.

Hortonworks’ approach to doing SQL-on-Hadoop can be summarized simply as “Make Hive into as good an analytic RDBMS as possible, all in open source”. Key elements include: 

Specific notes include:

As for ORC:

Finally, I asked Hortonworks what it sees as a typical or default Hadoop node these days. Happily, the answers seemed like straightforward upgrades to what Cloudera said in October, 2012. Specifics included:

Related links

Comments

10 Responses to “Hortonworks, Hadoop, Stinger and Hive”

  1. Hortonworks, Hadoop, Stinger and Hive — Tech News and Analysis on August 6th, 2013 9:11 pm

    [...] Hortonworks, Hadoop, Stinger and Hive By Derrick Harris 1 min ago Aug. 6, 2013 – 6:11 PM PDT [...]

  2. sacharya on August 6th, 2013 10:10 pm

    Hadoop/OpenStack integration may be closer than you may think. Check out project Savanna:
    https://wiki.openstack.org/wiki/Savanna

    Its being actively developed in the OpenStack community, and you can get a working Hadoop cluster deployed right now. Although its nowhere production ready just yet.

  3. Hortonworks, Hadoop, Stinger and Hive | DBMS 2 : DataBase Management System Services | Big Data Cloud on August 7th, 2013 12:12 am

    [...] via Hortonworks, Hadoop, Stinger and Hive | DBMS 2 : DataBase Management System Services. [...]

  4. Himanshu Bari on August 7th, 2013 6:09 pm

    Thanks for the comments Curt.
    Just wanted to provide more information on the OpenStack front
    - Our OpenStack integration has received good customer, partner interest for beta testing and feedback.
    - Swift integration is part of the effort but the crux of the value proposition for phase-1 is to enable provisioning of the complete Hortonworks data platform on OpenStack using templates in a few clicks and in an easily repeatable fashion. Hortonworks is actively working with the OpenStack community on Project Savanna for this and we have made great progress! For more information check out slides & video of our presentation & demo at Hadoop Summit. Links below

    http://www.slideshare.net/Hadoop_Summit/elterman-speidel-june26455pmhall1v2
    http://www.youtube.com/watch?v=3bI1WjB-5AM&feature=youtu.be

  5. Things I keep needing to say | DBMS 2 : DataBase Management System Services on August 12th, 2013 8:25 am

    [...] The transition from Hadoop 1 to Hadoop 2 will be drastic. [...]

  6. Un verano cargado de macrodatos – resumen de noticias | BigData4Success on August 12th, 2013 7:43 pm

    [...] ha dado que hablar. (1, 2, 3 y 4) Y ha sonado bastante en los medios especializados por la repentina salida de uno de sus [...]

  7. Big Data annd No SQL links | Fresh Water Perl on August 19th, 2013 6:16 am

    [...] Hadoop 2 [...]

  8. Hortonworks business notes | DBMS 2 : DataBase Management System Services on August 24th, 2013 8:47 am

    [...] Since Hortonworks a couple of times made it seem that Rackspace was an important partner, behind only Teradata and Microsoft, I finally asked why. Answers boiled down to a Rackspace Hadoop-as-a-service offering, plus joint work to improve Hadoop-on-OpenStack. [...]

  9. Cloudera Sentry and other security subjects | DBMS 2 : DataBase Management System Services on August 25th, 2013 11:39 am

    [...] have unpleasant performance consequences. From there, I segued the discussion to Accumulo. Unlike Hortonworks, Cloudera sees Accumulo demand strictly in the Federal government, where Accumulo is baked into [...]

  10. Distinctions in SQL/Hadoop integration | DBMS 2 : DataBase Management System Services on February 9th, 2014 1:51 pm

    [...] most detailed discussions of Impala and Stinger were last June and August, respectively. Categories: Cloudera, Data integration and middleware, [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.