March 6, 2007

Why Oracle and Microsoft will lose in VLDB data warehousing

I haven’t been as clear as I could have been in explaining why I think MPP/shared-nothing beats SMP/shared-everything. The answer is in a short white paper, currently bottlenecked at the sponsor’s end of the process. Here’s an excerpt from the latest draft:

There are two ways to make more powerful computers:

1. Use more powerful parts – processors, disk drives, etc.

2. Just use more parts of the same power.

Of the two, the more-parts strategy much more cost-effective. Smaller* parts are much more economical, since the bigger the part, the harder and more costly it is to avoid defects, in manufacturing and initial design alike. Consequently, all high-end computers rely on some kind of parallel processing.

*As measured in terms of capacity, transistor count, etc., not physical size.

There are two main kinds of parallel processing: Shared-everything and shared-nothing. In shared-everything systems, multiple processors address a common pool of memory – RAM and disk alike. In shared-nothing systems, there is a much looser coupling of components, which each processor controlling its own RAM and disk as it would in a stand-alone computer. While the two terms are not wholly equivalent, as a practical matter shared-everything systems are typically also SMP (Symmetric Multi-Processing), and SMP machines are typically shared-everything. Similar, shared-nothing systems are inherently MPP (Massively Parallel Processing), while MPP systems are usually shared-nothing.

When parallel processing became common in the 1990s, shared-everything SMP won out over MPP, for one compelling reason – existing software didn’t need to be rewritten. However, SMP has major problems with scalability, in at least two ways. One is a general problem: As each processor keeps track of what the others are doing, SMP overhead increases exponentially with the number of processors. Another is more database-specific: Shared-everything storage bandwidth has trouble keeping up with the data flows that dozens or hundreds of processors demand. Consequently, MPP always played a role in high-end data warehousing, primarily via Teradata*.

*Historically speaking, of course. IBM, Netezza, and DATAllegro are now important MPP players too, with Kognitio and Vertica well-positioned to join them.

I think that’s correct, but here’s my best try at a counter-argument.

Comments

7 Responses to “Why Oracle and Microsoft will lose in VLDB data warehousing”

  1. Haider R on March 6th, 2007 12:56 pm

    Curt, good writeup on the pros / cons of the parallel database architectures. One comment about your footnote:

    **Historically speaking, of course. IBM, Netezza, and DATAllegro are now important MPP players too …..

    IBM has been an MPP player for a long time with their shared-nothing parallel database (DB2) on the AIX/Unix front.

  2. Curt Monash on March 6th, 2007 1:41 pm

    I’d say there’s a rough 2:1 ratio between how long Teradata has been around and how long DB2 for open systems has been around. I’d say there’s also a rough 2:1 ratio for how long each one has been important. :)

    Seriously — do you think I’m overstating the case?

    Best,

    CAM

  3. DBMS2 — DataBase Management System Services»Blog Archive » Greenplum’s strategy on March 13th, 2007 7:07 pm

    [...] Greenplum rewrote a lot of PostgreSQL to parallelize it, in the correct belief that MPP is the best way to go for high-end data warehousing. [...]

  4. Oracle Errors on March 14th, 2007 7:23 am

    CAM,

    I think that you have rights.

  5. DBMS2 — DataBase Management System Services»Blog Archive » Deal prospects for data warehouse DBMS vendors on April 11th, 2007 10:02 am

    [...] Oracle needs to buy somebody, because of its rather dire product problems at the data warehouse high end. And it’s very much in keeping with their recent behavior to do so. [...]

  6. Regclean on January 6th, 2008 11:31 pm

    > Seriously — do you think I’m overstating the case?

    I’d say that is a fair representation.

  7. Infology.Ru » Blog Archive » Microsoft покупает DATAllegro on August 14th, 2008 3:58 pm

    [...] Oracle и Microsoft обречены на рынке хранилищ данных если они только не приобретут СУБД с архитектурой MPP/shared-nothing и/или data warehouse appliance. [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.