When I grumbled about the conference-related rush of Hadoop announcements, one example of many was Teradata Aster’s SQL-H. Still, it’s an interesting idea, and a good hook for my first shot at writing about HCatalog. Indeed, other than the Talend integration bundled into Hortonworks’ HDP 1, Teradata SQL-H is the first real use of HCatalog I’m aware of.
The Teradata SQL-H idea is:
- Register your Hadoop data to HCatalog. I’ll confess to being unclear about the details of how that works, for example in the case of data that just doesn’t fit well into flat relational tables. Stay tuned for future posts. For now, I’ll just note that:
- HCatalog is closely based on Hive’s metadata management. If you’ve run Hive against the data, HCatalog should already know about it.
- HCatalog can handle Pig and HBase data as well.
- Write SQL DDL (Data Description Language) so that your Aster cluster knows about the data.
- Write any Teradata Aster SQL/MR against that data. Some of the execution will be done on the Hadoop cluster, but pulling data back into Aster may well be necessary.
At least in theory, Teradata SQL-H lets you use a full set of analytic tools against your Hadoop data, with little limitation except price and/or performance. Teradata thinks the performance of all this can be much better than if you just use Hadoop (35X was mentioned in one particularly favorable example), but perhaps much worse than if you just copy/extract the data to an Aster cluster in the first place.
So what might the use cases be for something like SQL-H? Offhand, I’d say:
- SQL-H use cases are probably focused in areas where copying the data to Aster in advance doesn’t make a lot of sense. So presumably …
- … the Hadoop clusters involved would hold a lot more data than you’d want to pay for storing in Teradata Aster. E.g., think of cases where Hadoop is used as a big bit bucket or archival data store.
- There could be a kind of investigative workflow. First you play around with the Hadoop data via SQL-H. Then when you think you’re onto something, you set up ETL (Extract/Transform/Load) to get the data into Aster and ratchet up the effort.
By way of contrast, the whole thing makes less sense for dashboarding kinds of uses, unless the dashboard users are very patient when they want to drill down.