October 12, 2010

Vertica-Hadoop integration

DBMS/Hadoop integration is a confusing subject. My post on the Cloudera/Aster Data partnership awaits some clarification in the comment thread. A conversation with Vertica left me unsure about some Hadoop/Vertica Year 2 details as well, although I’m doing better after a follow-up call. On the plus side, we also covered some rather cool Hadoop/Vertica product futures, and those seemed easier to understand. 🙂

I say “Year 2” because Hadoop/Vertica integration has been going on since last year. Indeed, Vertica says that there are now over 25 users of the Hadoop/Vertica combination and hence Vertica’s Hadoop connector. Vertica is now introducing — for immediate GA — a new version of its Hadoop connector. So far as I understood:

Vertica’s Hadoop connector now works with Vertica 4.0.
Vertica’s Hadoop connector now works with Pig.
Vertica’s Hadoop connector now can let Vertica do aggregation, whereas in the past Hadoop would have done a bunch of Vertica queries and performed the aggregation itself. I think this technically sets up the paradox that sometimes being less parallel gives better performance, but only because the heavy lifting will already have been done — in parallel — on Vertica.
Vertica’s Hadoop connector now has smarts about how data is hash-distributed in Vertica.
Vertica’s Hadoop connector can make single calls from Hadoop or Pig to load data from Vertica, as opposed to — well, I guess as opposed to making calls to each Vertica node separately.
Vertica’s Hadoop connector now lets Hadoop write more easily to Vertica. Hadoop can write to the Vertica table of its choice — even if the table doesn’t yet exist, because in that case Vertica creates it on the fly. (Note that this capability wouldn’t have made sense before Vertica 4.0, because there wasn’t a simple CREATE TABLE capability in Vertica — manual intervention was needed to choose the table’s physical layout.)

In addition, inspired by a large banking customer, Vertica is announcing some cool Hadoop integration futures:

Vertica-formatted data will be stored on HDFS (Hadoop Distributed File System).
It will get there via parallel backup — i.e., you will be able to back up Vertica to HDFS.
Libraries will be exposed to let HDFS read and write the Vertica-formatted data, for purposes like ETL, long-running analytics, etc.

As for those 25+ (perhaps 27-8) Vertica/Hadoop users:

15 or more of them connect to Cloudera’s Hadoop distribution, free or paid. (Some may just use Apache Hadoop.)
Some number of them indeed do connect to Cloudera Enterprise.
Most of them are doing ETL.
Some of the ETL is of text.

Categories: Analytic technologies, Cloudera, EAI, EII, ETL, ELT, ETLT, Hadoop, MapReduce, Market share and customer counts, SQL/Hadoop integration, Text, Vertica Systems

Subscribe to our complete feed!

Comments

6 Responses to “Vertica-Hadoop integration”

Vlad Rodionov on October 12th, 2010 2:45 pm

Why does someone need to load data from Vertica to perform ETL in Hadoop? Just curious.
Curt Monash on October 12th, 2010 6:11 pm

Perhaps it is to help with ETL for OTHER data that is stored in HDFS.
Vlad Rodionov on October 13th, 2010 1:48 pm

So, most users use these connectors just for loading data from HDFS into Vertica.
Curt Monash on October 13th, 2010 2:11 pm

That’s my impression, although it is EXTREMELY common for software marketers to describe their most advanced users and make those users sound more typical than they are.
The state of the art in text analytics applications | Text Technologies on January 4th, 2011 3:47 am

[…] depending on whether the text was in German, French, or Italian. Vertica recently told me of a Vertica/Hadoop customer doing something similar, except for the multilingual aspect. And the end of a 2008 […]
Some Vertica 6 features | DBMS 2 : DataBase Management System Services on July 28th, 2012 1:29 am

[…] Vertica’s old Hadoop connector, Vertica’s new Hadoop connector doesn’t require any MapReduce […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Vertica-Hadoop integration

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin