September 13, 2009

HadoopDB

Despite a thoughtful heads-up from Daniel Abadi at the time of his original posting about HadoopDB, I’m just getting around to writing about it now. HadoopDB is a research project carried out by a couple of Abadi’s students. Further research is definitely planned. But it seems too early to say that HadoopDB will ever get past the “research and oh by the way the code is open sourced” stage and become a real code line — whether commercialized, open source, or both.

The basic idea of HadoopDB is to put copies of a DBMS at different nodes of a grid, and use Hadoop to parcel work among them. Major benefits when compared with massively parallel DBMS are said to be:

Open/cheap/free
Query fault-tolerance
The related concept of tolerating node degradation that isn’t an outright node failure.

HadoopDB has actually been built with PostgreSQL. That version achieved performance well below that of a commercial DBMS “DBX”, where X=2. Column-store guru Abadi has repeatedly signaled his intention to try out HadoopDB with VectorWise at the nodes instead. (Recall that VectorWise is shared-everything.) It will be interesting to see how that configuration performs.

The real opportunity for HadoopDB, however, in my opinion may lie elsewhere. Rather than trying to compete with parallel relational DBMS, HadoopDB might do more good parallelizing more specialized kinds of database engines. How about, for example, a massively parallel XML manager to compete with MarkLogic? Or a massively parallel array processor other than the still-nascent SciDB? Or, even more to the point, something that parallelizes a yet-more-specialized scientific data management engine? That kind of area is where I suspect the potential for HadoopDB really lives.

Categories: Analytic technologies, Columnar database management, Data models and architecture, Data types, Data warehousing, Database diversity, Hadoop, MapReduce, Open source, Parallelization, PostgreSQL, Scientific research, Structured documents, Theory and architecture

Subscribe to our complete feed!

Comments

5 Responses to “HadoopDB”

Fault-tolerant queries | DBMS2 -- DataBase Management System Services on September 13th, 2009 1:00 am

[…] keep going? For example, Daniel Abadi et al. trumpet query fault-tolerance as one of the virtues of HadoopDB. Some of the scientists at XLDB spoke of query fault-tolerance as being a good reason to leave 100s […]
Jerome Pineau on September 17th, 2009 4:44 pm

Well I believe this recent interview is relevant here. That’s traction IMHO.

http://news.cnet.com/8301-13505_3-10355679-16.html
VectorWise, Ingres, and MonetDB | DBMS2 -- DataBase Management System Services on September 19th, 2009 8:04 pm

[…] still tentative — are afoot to integrate VectorWise with MapReduce in Daniel Abadi’s HadoopDB […]
Fun with quotes in the VectorWise press release | DBMS2 -- DataBase Management System Services on June 10th, 2010 7:38 am

[…] that HadoopDB has never been used for a production application, large-scale or otherwise. Unsurprisingly, Daniel […]
Teradata bought Hadapt and Revelytix | DBMS 2 : DataBase Management System Services on July 23rd, 2014 3:18 pm

[…] HadoopDB project was started by Dan Abadi and two grad […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

HadoopDB

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin