August 8, 2012

HCatalog — yes, it matters

To a first approximation, HCatalog is the thing that will make Hadoop play nicely with all kinds of ETLT (Extract/Transform/Load/Transform). However, HCatalog is both less and more than that:

The base use case for HCatalog is:

Major variants on that include:

I gather that most of the above is shipping today, and the rest is coming along nicely.

A key point is that you can change the file format, remap it to the virtual tables, and have your applications run unaltered. This is part of what I meant by making Hadoop “more like a DBMS.”

As Informatica in particular has pointed out, more metadata is needed in at least two areas:

The statistics lack has a clear path to being fixed, in that:

The ETL and Hadoop communities need to talk more than they already have, but basically things seem to be on track.

*That’s another part of what I mean by saying that Hadoop/Hive are becoming more like a DBMS.

The history lack is a different matter. Information could be surfaced from Hadoop about many kinds of artifacts:

for use in many kinds of time frame:

If the data integration and compliance communities want their needs met any time soon, they may need to step up with some resources, energy, and even leadership.

In that vein, please see my companion post.

One last note — at this time, the integration of HBase into all this is less than one might think. Under development is the ability to let you make HCatalog “tables” based on HBase tables, as an alternative to basing them on HDFS files. But a manual process will be involved. And of course you could run into problems if your HBase tables have multi-valued fields.

Comments

12 Responses to “HCatalog — yes, it matters”

  1. What kinds of metadata are important anyway? | DBMS 2 : DataBase Management System Services on August 8th, 2012 7:27 am

    […] today’s post about HCatalog, I noted that the Hadoop/HCatalog community didn’t necessarily understand all the kinds of […]

  2. Hcatalog – Hadoop ja Hive enemmän tietokannan kaltaiseksi? « Olipa kerran Bigdata on August 9th, 2012 2:04 am

    […] Hcatalog – Hadoop ja Hive enemmän tietokannan kaltaiseksi? […]

  3. Robert Hodges on August 10th, 2012 2:50 pm

    One interesting trend is NoSQL offerings that now include SQL interfaces. Hadoop and Cassandra do this. I attended a talk at the recent Cassandra Summit 2012 where the main developer of CQL (Cassandra’s SQL interface) discussed the motivations for using SQL on Cassandra. In a nutshell he said SQL (including language bindings/APIs) is robust, well though-out, and does what a lot of people what, so why reinvent something else?

  4. What kinds of metadata are important anyway? « Another Word For It on August 11th, 2012 7:18 pm

    […] the post: In today’s post about HCatalog, I noted that the Hadoop/HCatalog community didn’t necessarily understand all the kinds of […]

  5. Paul Johnson on August 13th, 2012 5:02 am

    The over-riding theme is that this is another example of the various “efforts to make Hadoop/Hive more like a DBMS”, which makes a lot of sense. Long may it continue.

  6. shash on January 9th, 2013 1:46 am

    So, as of now, is it possible to integrate Hcatalog with Hbase?

  7. Hadoop distributions | DBMS 2 : DataBase Management System Services on February 27th, 2013 6:41 am

    […] Hortonworks is still focused on Hadoop 1 (without YARN and so on), because that’s what’s regarded as production-ready. But Hortonworks does like HCatalog. […]

  8. Hortonworks, Hadoop, Stinger and Hive | DBMS 2 : DataBase Management System Services on August 7th, 2013 2:53 am

    […] of the problem being slowness in the metadata store. (I hope that that’s already improved in HCatalog, but I didn’t think to ask.) Hortonworks thinks 100 milliseconds would be a better […]

  9. Hem on October 30th, 2013 3:42 am

    Does IBM, Oracle and Hadapt offer HCatalog integration?

  10. Distinctions in SQL/Hadoop integration | DBMS 2 : DataBase Management System Services on February 9th, 2014 1:50 pm

    […] HCatalog? […]

  11. Spark on fire | DBMS 2 : DataBase Management System Services on April 30th, 2014 6:41 am

    […] Spark works with the Hive metastore, nee’ HCatalog. […]

  12. Notes on HCatalog, Hive and Scalable Science | :: NickBurns on July 2nd, 2015 12:09 am

    […] “One interesting trend is NoSQL offerings that now include SQL interfaces. Hadoop and Cassandra do this. I attended a talk at the recent Cassandra Summit 2012 where the main developer of CQL (Cassandra’s SQL interface) discussed the motivations for using SQL on Cassandra. In a nutshell he said SQL (including language bindings/APIs) is robust, well though-out, and does what a lot of people what, so why reinvent something else?” Robert Hodges (2012). Source: http://www.dbms2.com/2012/08/08/hcatalog-yes-it-matters/ […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.