I talk with lots of vendors of MPP data warehouse DBMS. I’ve now heard enough different approaches to MPP architecture that I think it might be interesting to contrast some of the alternatives.
The base-case MPP DBMS architecture is one in which there are two kinds of nodes:
- A boss node, whose jobs include:
- Receiving and parsing queries
- Optimizing queries, determining execution plans, and sending execution plans to the nodes
- Receiving result sets and sending them back to the querier
- Worker nodes, which do their part of the query execution job and eventually ship data back to the head
In primitive forms of this architecture, there’s a “fat head” that does altogether too much aggregation and query resolution. In more mature versions, data is shipped intelligently from worker nodes to their peers, reducing or eliminating “fat head” bottlenecks.
Exceptions to the base case include Vertica and Exasol. In their systems, all nodes run identical software. At the other extreme, some vendors use dedicated nodes for particular purposes. For example, Aster Data famously has special nodes for bulk data loading and export. Greenplum has a logical split between nodes that execute queries and nodes that talk to storage, and is considering offering the option of physically separating them in a future release.
The basic tradeoffs between these schemes go something like this:
- If there are more kinds of dedicated nodes, real-time load-balancing is harder; you’re more likely to have idle capacity.
- If there are more kinds of dedicated nodes, you can optimize hardware better, by using different kinds of hardware for different kinds of nodes. Potentially, this is a bigger factor if some kinds of nodes have dedicated disks attached and some don’t.
Calpont, which hasn’t actually shipped a DBMS yet, has an interesting twist. They’re building a columnar DBMS in which the querying work is split between a kind of worker node, which does the query processing, and a storage node, which talks to disk. These nodes are not in any kind of one-to-one correspondence; any worker node can talk with any storage node. Calpont believes that in the future some of the storage node logic can migrate into storage systems themselves, in almost a Netezza-like strategy, but on more standard equipment.
The Calpont story may actually make more sense in a shared-disk storage-area-network implementation than for a fully shared-nothing MPP, but that’s a subject for a different post.