I haven’t been as clear as I could have been in explaining why I think MPP/shared-nothing beats SMP/shared-everything. The answer is in a short white paper, currently bottlenecked at the sponsor’s end of the process. Here’s an excerpt from the latest draft:
There are two ways to make more powerful computers:
1. Use more powerful parts – processors, disk drives, etc.
2. Just use more parts of the same power.
Of the two, the more-parts strategy much more cost-effective. Smaller* parts are much more economical, since the bigger the part, the harder and more costly it is to avoid defects, in manufacturing and initial design alike. Consequently, all high-end computers rely on some kind of parallel processing.
*As measured in terms of capacity, transistor count, etc., not physical size.
There are two main kinds of parallel processing: Shared-everything and shared-nothing. In shared-everything systems, multiple processors address a common pool of memory – RAM and disk alike. In shared-nothing systems, there is a much looser coupling of components, which each processor controlling its own RAM and disk as it would in a stand-alone computer. While the two terms are not wholly equivalent, as a practical matter shared-everything systems are typically also SMP (Symmetric Multi-Processing), and SMP machines are typically shared-everything. Similar, shared-nothing systems are inherently MPP (Massively Parallel Processing), while MPP systems are usually shared-nothing.
When parallel processing became common in the 1990s, shared-everything SMP won out over MPP, for one compelling reason – existing software didn’t need to be rewritten. However, SMP has major problems with scalability, in at least two ways. One is a general problem: As each processor keeps track of what the others are doing, SMP overhead increases exponentially with the number of processors. Another is more database-specific: Shared-everything storage bandwidth has trouble keeping up with the data flows that dozens or hundreds of processors demand. Consequently, MPP always played a role in high-end data warehousing, primarily via Teradata*.
*Historically speaking, of course. IBM, Netezza, and DATAllegro are now important MPP players too, with Kognitio and Vertica well-positioned to join them.
I think that’s correct, but here’s my best try at a counter-argument.