Parallelization

Analysis of issues in parallel computing, especially parallelized database management. Related subjects include:

January 18, 2008

The Great MapReduce Debate

Google’s highly parallel file manipulator MapReduce has gotten great attention recently, after a research paper revealed:

MapReduce is running the core Google search engine, plus much of Google Analytics and other applications.
MapReduce is processing 400+ petabytes of data per month.

(Niall Kennedy popularized the paper and surveyed its results.)

David DeWitt and Mike Stonebraker then launched a blistering attack on MapReduce, accusing it of disregarding almost all the lessons of database management system theory and practice. A vigorous comment thread has ensued, pointing out that MapReduce is not a DBMS and asserting it therefore shouldn’t be judged as one.

While correct, that defense begs the question – what is MapReduce good for? Proponents of MapReduce highlight two advantages:

MapReduce makes it very easy to program data transformations, including ones to which relational structures are of little relevance.
MapReduce runs in massively parallel mode “for free,” without extra programming.

Based on those advantages, MapReduce would indeed seem to have significant uses, including: Read more

Categories: Cloud computing, MapReduce, Michael Stonebraker

10 Comments

March 6, 2007

Why Oracle and Microsoft will lose in VLDB data warehousing

I haven’t been as clear as I could have been in explaining why I think MPP/shared-nothing beats SMP/shared-everything. The answer is in a short white paper, currently bottlenecked at the sponsor’s end of the process. Here’s an excerpt from the latest draft:

There are two ways to make more powerful computers:

1. Use more powerful parts – processors, disk drives, etc.

2. Just use more parts of the same power.

Of the two, the more-parts strategy much more cost-effective. Smaller* parts are much more economical, since the bigger the part, the harder and more costly it is to avoid defects, in manufacturing and initial design alike. Consequently, all high-end computers rely on some kind of parallel processing.

*As measured in terms of capacity, transistor count, etc., not physical size. Read more

Categories: Data warehouse appliances, Data warehousing, DATAllegro, Microsoft and SQL*Server, Netezza, Oracle, Parallelization, Teradata, Theory and architecture, Vertica Systems

7 Comments

October 3, 2006

Vendor segmentation for data warehouse DBMS

February, 2011 edit: I’ve now commented on Gartner’s 2010 Data Warehouse Database Management System Magic Quadrant as well.

Several vendors are offering links to Gartner’s new Magic Quadrant report on data warehouse DBMS. (Edit: This is now a much better link to the 2006 MQ.) Somewhat atypically for Gartner, there’s a strict hierarchy among most of the vendors, with Teradata > IBM > Oracle > Microsoft > Sybase > Kognitio > MySQL > Sand, in each case on both axes of the matrix. The only two exceptions are Netezza and DATallegro, which are depicted as outvisioning Microsoft somewhat even as they trail both Microsoft and Sybase in execution.

Gartner Magic Quadrants tend to annoy me, and I’m not going to critique the rankings in detail. But I do think this particular MQ is helpful in framing a vendor segmentation, namely:

Big full-spectrum MPP/shared-nothing vendors: Teradata and IBM.
MPP/shared-nothing appliance upstarts: Netezza and DATallegro
Big SMP/shared-everything vendors who also are apt to be your OLTP incumbent, and who want to integrate your software stack soup-to-nuts: Oracle and Microsoft
Niche vendors: Pretty much everybody else

Categories: Data warehouse appliances, Data warehousing, DATAllegro, IBM and DB2, Microsoft and SQL*Server, Netezza, Oracle, Parallelization, Teradata

6 Comments

September 27, 2006

Oracle and Microsoft in data warehousing

Most of my recent data warehouse engine research has been with the specialists. But over the past couple of days I caught up with Oracle and Microsoft (IBM is scheduled for Friday). In at least three ways, it makes sense to lump those vendors together, and contrast them with the newer data warehouse appliance startups:

Shared-everything architecture
End-to-end solution story
OLTP industrial-strengthness carried over to data warehousing

In other ways, of course, their positions are greatly different. Oracle may have a full order-of-magnitude lead on Microsoft in warehouse sizes, for example, and has a broad range of advanced features that Microsoft either hasn’t matched yet, or else just released in SQL Server 2005. Microsoft was earlier in pushing DBA ease as a major product design emphasis, although Oracle has played vigorous catch-up in Oracle10g.

Categories: Data warehouse appliances, DATAllegro, EAI, EII, ETL, ELT, ETLT, IBM and DB2, Microsoft and SQL*Server, Netezza, Oracle, Parallelization, Teradata

1 Comment

← Previous Page

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in