September 13, 2009


Despite a thoughtful heads-up from Daniel Abadi at the time of his original posting about HadoopDB, I’m just getting around to writing about it now. HadoopDB is a research project carried out by a couple of Abadi’s students. Further research is definitely planned. But it seems too early to say that HadoopDB will ever get past the “research and oh by the way the code is open sourced” stage and become a real code line — whether commercialized, open source, or both.

The basic idea of HadoopDB is to put copies of a DBMS at different nodes of a grid, and use Hadoop to parcel work among them. Major benefits when compared with massively parallel DBMS are said to be:

HadoopDB has actually been built with PostgreSQL. That version achieved performance well below that of a commercial DBMS “DBX”, where X=2. Column-store guru Abadi has repeatedly signaled his intention to try out HadoopDB with VectorWise at the nodes instead. (Recall that VectorWise is shared-everything.) It will be interesting to see how that configuration performs.

The real opportunity for HadoopDB, however, in my opinion may lie elsewhere. Rather than trying to compete with parallel relational DBMS, HadoopDB might do more good parallelizing more specialized kinds of database engines. How about, for example, a massively parallel XML manager to compete with MarkLogic? Or a massively parallel array processor other than the still-nascent SciDB? Or, even more to the point, something that parallelizes a yet-more-specialized scientific data management engine? That kind of area is where I suspect the potential for HadoopDB really lives.


5 Responses to “HadoopDB”

  1. Fault-tolerant queries | DBMS2 -- DataBase Management System Services on September 13th, 2009 1:00 am

    […] keep going? For example, Daniel Abadi et al. trumpet query fault-tolerance as one of the virtues of HadoopDB. Some of the scientists at XLDB spoke of query fault-tolerance as being a good reason to leave 100s […]

  2. Jerome Pineau on September 17th, 2009 4:44 pm

    Well I believe this recent interview is relevant here. That’s traction IMHO.

  3. VectorWise, Ingres, and MonetDB | DBMS2 -- DataBase Management System Services on September 19th, 2009 8:04 pm

    […] still tentative — are afoot to integrate VectorWise with MapReduce in Daniel Abadi’s HadoopDB […]

  4. Fun with quotes in the VectorWise press release | DBMS2 -- DataBase Management System Services on June 10th, 2010 7:38 am

    […] that HadoopDB has never been used for a production application, large-scale or otherwise. Unsurprisingly, Daniel […]

  5. Teradata bought Hadapt and Revelytix | DBMS 2 : DataBase Management System Services on July 23rd, 2014 3:18 pm

    […] HadoopDB project was started by Dan Abadi and two grad […]

Leave a Reply

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.