October 23, 2007

Vertica — just star and snowflake schemas?

One of the longest-running technotheological disputes I know of is the one pitting flat/normalized data warehouse architectures vs. cubes, stars, and snowflake schemas. Teradata, for example, is a flagwaver for the former camp; Microstrategy is firmly in the latter. (However, that doesn’t keep lots of retailers from running Microstrategy on Teradata boxes.) Attensity (a good Teradata partner) is in the former camp; text mining rival Clarabridge (sort of a Microstrategy spinoff) is in the latter. And so on.

Vertica is clearly in the star/snowflake camp as well. I asked them about this, and Vertica’s CTO Mike Stonebraker emailed a response. I’m reproducing it below, with light edits; the emphasis is also mine. Key points include:

Great question. This is something that we’ve thought a lot about and have done significant research on with large enterprise customers. … short answer is as follows:

Vertica supports star and snowflake schemas because that is the desired data structure for data warehousing. The overwhelming majority of the schemas we see are of this form, and we have highly optimized for this case.

This includes horizontally partitioning the fact table and replicating the dimension tables. This generates query plans with maximum parallelism and minimum execution time.

There are occasional non-snowflake schemas. For example, one customer insisted on running Vertica on the identical schema he was using for his current row store. To help his row-store perform better, he had split his fact table in half,* and used what we call a “barbell” schema, and he did not want to go to the effort of changing the design. Obviously, there is no penalty for “fat” fact tables in a column store. Hence the customer’s kludge was row-store specific. Nevertheless, he wanted to use his existing schema.

*Ed: I assume this means that different columns were stored in different tables, rather than that the rows were split up. Otherwise it wouldn’t really make sense.

For this reason, we are now extending the Vertica engine to support non-star/snowflake schemas, and will deliver this functionality in a few months.

In a similar vein, the overwhelming percentage of the schemas we see have an enormous fact table and (by comparison) tiny dimension tables (made even tinier by Vertica’s aggressive use of data compression). Therefore, replicating the dimension tables to improve query performance is the prudent thing to do — pay a little space to get vastly better query performance.

According to our customers, the goal is superior performance on star/snowflake schemas, and that is what Vertica delivers.


Share: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • Digg
  • DZone
  • Mixx
  • Reddit
  • Slashdot
  • Sphinn
  • StumbleUpon
  • Technorati

Comments

One Response to “Vertica — just star and snowflake schemas?”

  1. How well do Netezza, Greenplum, Vertica and others handle 12-way joins? on November 26th, 2007 4:51 am

    [...] distributed system replicates the dimension tables on each node and partitions the fact table. Vertica says that they have customers that use more transactional models, but what does that mean for overall performance? Greenplum’s website says: “Utilizes [...]

Leave a Reply




Feed including blog about database management, data warehousing, and business intelligence Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Recent white paper

Pervasive PSQL Summit v10 Highlights

September, 2007

Recent webcast

What leading database vendors don't want you to know

Originally broadcast April 9, 2008

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.