October 23, 2007

Vertica — just star and snowflake schemas?

One of the longest-running technotheological disputes I know of is the one pitting flat/normalized data warehouse architectures vs. cubes, stars, and snowflake schemas. Teradata, for example, is a flagwaver for the former camp; Microstrategy is firmly in the latter. (However, that doesn’t keep lots of retailers from running Microstrategy on Teradata boxes.) Attensity (a good Teradata partner) is in the former camp; text mining rival Clarabridge (sort of a Microstrategy spinoff) is in the latter. And so on.

Vertica is clearly in the star/snowflake camp as well. I asked them about this, and Vertica’s CTO Mike Stonebraker emailed a response. I’m reproducing it below, with light edits; the emphasis is also mine. Key points include:

Great question. This is something that we’ve thought a lot about and have done significant research on with large enterprise customers. … short answer is as follows:

Vertica supports star and snowflake schemas because that is the desired data structure for data warehousing. The overwhelming majority of the schemas we see are of this form, and we have highly optimized for this case.

This includes horizontally partitioning the fact table and replicating the dimension tables. This generates query plans with maximum parallelism and minimum execution time.

There are occasional non-snowflake schemas. For example, one customer insisted on running Vertica on the identical schema he was using for his current row store. To help his row-store perform better, he had split his fact table in half,* and used what we call a “barbell” schema, and he did not want to go to the effort of changing the design. Obviously, there is no penalty for “fat” fact tables in a column store. Hence the customer’s kludge was row-store specific. Nevertheless, he wanted to use his existing schema.

*Ed: I assume this means that different columns were stored in different tables, rather than that the rows were split up. Otherwise it wouldn’t really make sense.

For this reason, we are now extending the Vertica engine to support non-star/snowflake schemas, and will deliver this functionality in a few months.

In a similar vein, the overwhelming percentage of the schemas we see have an enormous fact table and (by comparison) tiny dimension tables (made even tinier by Vertica’s aggressive use of data compression). Therefore, replicating the dimension tables to improve query performance is the prudent thing to do — pay a little space to get vastly better query performance.

According to our customers, the goal is superior performance on star/snowflake schemas, and that is what Vertica delivers.


Comments

5 Responses to “Vertica — just star and snowflake schemas?”

  1. How well do Netezza, Greenplum, Vertica and others handle 12-way joins? on November 26th, 2007 4:51 am

    [...] distributed system replicates the dimension tables on each node and partitions the fact table. Vertica says that they have customers that use more transactional models, but what does that mean for overall performance? Greenplum’s website says: “Utilizes [...]

  2. Infology.Ru » Blog Archive » Позиционирование комплексов для хранилищ данных и специализированных СУБД on September 4th, 2008 3:14 am

    [...] игрок по критерию U, если взять в расчёт их фокус на схемах с одной таблицей фактов. Но они уже на пути к высокому уровню по критерию [...]

  3. Infology.Ru » Blog Archive » Vertica — только схемы «звезда» и «снежинка»? on November 2nd, 2008 12:09 pm

    [...] Автор: Curt Monash Дата публикации оригинала: 2007-10-23 Источник: Блог Курта Монаша [...]

  4. Vertica 4.0 | DBMS2 -- DataBase Management System Services on February 22nd, 2010 4:19 am

    [...] has finally cleaned up all vestiges of its prior bias to star schemas. For example, Vertica concedes that its product previously would sometimes force a star execution [...]

  5. Pitfalls for Pollyannas | Strategic Messaging on July 2nd, 2014 3:30 pm

    [...] industry experience notwithstanding — most analytic RDBMS users would be content with star schemas. They were [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.