September 5, 2008

Dividing the data warehousing work among MPP nodes

I talk with lots of vendors of MPP data warehouse DBMS. I’ve now heard enough different approaches to MPP architecture that I think it might be interesting to contrast some of the alternatives.

The base-case MPP DBMS architecture is one in which there are two kinds of nodes:

A boss node, whose jobs include:
- Receiving and parsing queries
- Optimizing queries, determining execution plans, and sending execution plans to the nodes
- Receiving result sets and sending them back to the querier
Worker nodes, which do their part of the query execution job and eventually ship data back to the head

In primitive forms of this architecture, there’s a “fat head” that does altogether too much aggregation and query resolution. In more mature versions, data is shipped intelligently from worker nodes to their peers, reducing or eliminating “fat head” bottlenecks.

Exceptions to the base case include Vertica and Exasol. In their systems, all nodes run identical software. At the other extreme, some vendors use dedicated nodes for particular purposes. For example, Aster Data famously has special nodes for bulk data loading and export. Greenplum has a logical split between nodes that execute queries and nodes that talk to storage, and is considering offering the option of physically separating them in a future release.

The basic tradeoffs between these schemes go something like this:

If there are more kinds of dedicated nodes, real-time load-balancing is harder; you’re more likely to have idle capacity.
If there are more kinds of dedicated nodes, you can optimize hardware better, by using different kinds of hardware for different kinds of nodes. Potentially, this is a bigger factor if some kinds of nodes have dedicated disks attached and some don’t.

Calpont, which hasn’t actually shipped a DBMS yet, has an interesting twist. They’re building a columnar DBMS in which the querying work is split between a kind of worker node, which does the query processing, and a storage node, which talks to disk. These nodes are not in any kind of one-to-one correspondence; any worker node can talk with any storage node. Calpont believes that in the future some of the storage node logic can migrate into storage systems themselves, in almost a Netezza-like strategy, but on more standard equipment.

The Calpont story may actually make more sense in a shared-disk storage-area-network implementation than for a fully shared-nothing MPP, but that’s a subject for a different post.

Categories: Aster Data, Calpont, Exasol, Greenplum, Parallelization, Theory and architecture, Vertica Systems

Subscribe to our complete feed!

Comments

22 Responses to “Dividing the data warehousing work among MPP nodes”

Omer Trajman on September 5th, 2008 8:48 am

The biggest challenge with specialized nodes is managing failover. While a five node / 10TB Vertica cluster stays up when losing any two nodes, a comparably configured heterogeneous systems needs at least three of each type (head, working, loader, chauffeur…) to have the same kind of fail safety built in. That kind of hardware cost starts to add up fast.
Anon on September 5th, 2008 10:24 am

Vertica’s “Fault Tolerance” requires that you put all nodes on battery power UPS because they can’t take a power hit without corrupting data. In other words, their replication scheme is broken for power failure cases, and the workaround they provide is to put all nodes on battery backup.

BTW – the comment is wrong about the number of nodes that can fail – logical functions are not physical nodes.
Forced to use star schema? on September 5th, 2008 10:29 am

I’ve been told by Vertica that I have to use a star schema. I did some research and found out that other products like Teradata, Netezza and Greenplum don’t require a star schema.

Is this Vertica limitation there because they implemented parallelism differently?
Omer Trajman on September 5th, 2008 1:40 pm

Probably makes sense to clarify a few areas around Vertica’s HA capabilities…when a Vertica node goes away for any reason (network issues, h/w issues, etc.) the rest of the cluster gracefully ignores it and continues humming away. When the node returns, whether reconnected, rebooted or replaced, it recovers by validating itself against the rest of the cluster. There is no chance for data corruption and no requirement for UPS.

Schema design requirements are not an intrinsic function of how HA works. Star and snowflake schemas are popularly applied within analytic database applications, so that’s what we designed early versions of Vertica to support. The next version of Vertica (in beta now) does not require star or snowflake schemas.
Curt Monash on September 5th, 2008 2:10 pm

I was at Vertica Wednesday, and they said that the Release 1 schema limitation was just to make it easier to write a “great” optimizer.

Leaving aside the marketing fluff of “great”, that makes sense.

And by the way — while I don’t mind anonymous snarky comments, I wish you wouldn’t go so far as to pretend to be somebody different than who you are.

Thanks,

CAM
anon on September 6th, 2008 12:46 am

Let me get this straight:

– Vertica release 1 (current version) can’t handle anything other than star schema because they implemented trivial parallelism

– Authoritative-sounding documents and lectures from Vertica push star schema ‘because it’s better for DW’

– when the product is re-architected, they will change their opinion on schema to fit the product

So, which is it? Is strict star schema really better? Let’s get Vertica on record here.
Curt Monash on September 6th, 2008 1:36 am

I was over at Vertica on Wednesday and they insisted that, despite my skepticism, Mike Stonebraker seems right — the vast majority of data warehouse users are content with single fact tables.

Of course, when all you sell is a hammer, you generally only get into a sales cycle with prospects who have lots of nails.

Anyhow, would you care to elaborate on the “trivial parallelism” comment about Vertica? Nobody ever seems to spell it out credibly.

CAM
Anon on September 6th, 2008 2:33 am

If you are in a hurry and need to parallelize SQL, then you can build a parallel DB based on the following rules:

1 – Fact table that is distributed across disks and CPUs
2 – Dimension tables that are replicated on all CPUs
3 – Joins only occur between fact table and replicated dimension tables
4 – Aggregation (GROUP BY) only occurs on the column or columns that are used to distribute the fact table

These imply that dimension tables must be very small, otherwise (2) means you have a huge duplication of data as the CPU count grows.

Provided you follow these rules, you don’t have to do anything special to parallelize SQL:
– You don’t have to write an interconnect
– No optimizer changes needed except to enforce rules 1-4

This approach is sometimes referred to as a UNION ALL approach to parallelism. It’s so simple, you can implement it in a couple of months’ work. This has been done by tools like EnterpriseDB GridSQL (formerly ExtenDB), C-JDBC, Datallegro and now Vertica.

It has been amply demonstrated that this approach is extremely limited for data warehousing.

Datallegro found that they weren’t able to sell their product to anyone other than their investor, TEOCO until they re-architected and implemented a “repartition on the fly” approach to fix this problem of limited schema support. It took them two years to implement, and when they did it still was so bad that in the end they were only able to get 3 total customers limping along.

It turns out that customers doing significant data warehouse work can’t live with rules 1-4. The choice of schema needs to be flexible enough to accommodate the needs of modeling the business. It’s far too hard to bend the schema into a pure star in all but the most trivial of applications. Further, the exceptions to rules 1-4 end up causing the product to fail while trying to aggregate data on the “boss node”.

As a consequence, the much more difficult version of parallelism is required that allows for arbitrary choice of schema and aggregation and SQL. This is a non-trivial problem and it requires both a top notch architecture to begin with, excellent developers and a lot of time in testing and proof points.

Column store is trivial to build, parallelism for data warehousing SQL is not.
Curt Monash on September 6th, 2008 2:45 am

Why then do you suppose Vertica has as many paying customers as, say, Greenplum?

CAM
Anon on September 6th, 2008 3:05 am

I guess we’ll see won’t we. When does the Gartner Magic Quadrant report come out?
Curt Monash on September 6th, 2008 3:23 am

I don’t put much credence in the Gartner MQ. http://www.dbms2.com/2007/10/19/gartner-2007-magic-quadrant-for-data-warehouse-database-management-systems/

CAM
Anon on September 6th, 2008 3:32 am

Where else can we get information that digs into what “paying customer” really means for these vendors?

For instance:
– Did they pay $10 or $10,000,000?
– Did they get what they paid for?
– Are they in production?
– Are they buying more?
– How many failed?

At least Gartner digs into these details for the Magic Quadrant though it’s behind the scenes and you have to pay them to get the results.

Are there other reports that do this?
Curt Monash on September 6th, 2008 4:37 am

I think you’re abusing anonymity a bit. If you want to shill for one of my competitors, please do so in a different venue.

While I make the occasional mistake, I’m proud of the care I put into my research. I’m sure the Gartner folks feel the same way about theirs. I agree with some of their conclusions, but disagree with others; necessarily, the same is true in reverse.

Let’s leave it at that.

CAM
Curt Monash on September 6th, 2008 4:41 am

To answer your questions more directly about Vertica, I’m aware, by name (largely NDAed, of course, as most customer names I get from vendors are), of as many repeat customers at Vertica as at any other of the analytic DBMS startups (except, of course, Netezza). Since they’ve been selling product for a much shorter time than, say, Greenplum (a vendor with a comparable number of overall sales), that’s a nice proof point.

Let me hasten to point out, however, that I don’t take the time to review every customer for every vendor in detail. Not even close.

CAM
Anon on September 6th, 2008 4:59 am

Didn’t mean to shill, didn’t consider the competitive nature, apologies.

However – I think we’d all like a bit of auditing of the customers reported by all of these vendors. Easy enough for public companies like Teradata and Netezza, not so easy for these others.

Two approaches to the market – you can be like MySQL and get tens of thousands of customers for $1,000 each or you can get tens of customers for $1,000,000 each. Netezza fits into the latter, I/we don’t know where these others fit.
Log Buffer #114: A Carnival of the Vanities for DBAs on September 12th, 2008 2:23 pm

[…] 10.0 – Dividing the data warehousing work among MPP nodes Fri, Sep 5, 2008 8:48 AM I talk with lots of vendors of MPP data warehouse DBMS. I’ ve now heard enough different approaches to MPP architecture that I think it might be interesting to contrast some of the alternatives. The base-case MPP DBMS architecture is one in which there are two kinds of nodes: A boss node, whose jobs include: Receiving and parsing […] […]
Introduction to Aster Data and nCluster | DBMS2 -- DataBase Management System Services on September 19th, 2008 4:19 am

[…] are more parts of the Aster Data story I want to write about, namely node heterogeneity and MapReduce syntax, but for now I’ll stop here and post this. I’d also like to point […]
Exadata: Oracle finally answers the data warehouse challengers | DBMS2 -- DataBase Management System Services on September 24th, 2008 8:02 pm

[…] Dividing the data warehousing work among MPP nodes […]
Infology.Ru » Blog Archive » Exadata: Oracle наконец отвечает бросившим вызов в области хранилищ данных on September 26th, 2008 12:21 am

[…] Dividing the data warehousing work among MPP nodes […]
Oracle Exadata and Oracle data warehouse appliance sound bites | DBMS2 -- DataBase Management System Services on September 29th, 2008 9:19 am

[…] Oracle’s idea of splitting database processing between a couple of types of server is a smart one, and is consistent with what multiple other vendors are doing. […]
Infology.Ru » Blog Archive » Разделение нагрузки между узлами MPP-систем хранилищ данных on October 2nd, 2008 12:40 am

[…] Автор: Curt Monash Дата публикации оригинала: 2008-09-05 Перевод: Олег Кузьменко Источник: Блог Курта Монаша […]
Exadata and Oracle Database Machine parallelization clarified | DBMS2 -- DataBase Management System Services on September 3rd, 2009 9:57 am

[…] they are similarly-named HP servers using identical Intel chips. I.e., Oracle has moved into the node heterogeneity camp. By way of contrast, the usual-suspect MPP vendors — Teradata, Netezza, Greenplum, […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Dividing the data warehousing work among MPP nodes

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin