September 19, 2009

Some issues in comparing analytic DBMS performance

The analytic DBMS/data warehouse appliance market is full of competitive performance claims. Sometimes, they’re completely fabricated, with no basis in fact whatsoever. But often performance-advantage claims are based on one or more head-to-head performance comparisons. That is, System A and System B are used to run the same set of queries, and some function is applied that takes the two sets of query running times as an input, and spits out a relative performance number as an output.

For example, Greg Rahn twittered to me that Oracle Exadata commonly outperforms existing Oracle installations by a factor of 50 or better, based on a “geometric mean”. What I presume he meant by that is:

Math note: Reversing the order of the second and third steps doesn’t change the outcome at all. Either way, you wind up multiplying N things together, dividing by the product of another N things, and taking the Nth root of all that.

Looking just at the arithmetic, a straightforward geometric-mean approach is not a terrible methodology. Theoretically, I’d prefer to just add up the running times for the whole workload — using each system — and divide the two aggregates. But I’ve tested that change in a couple of cases, and it didn’t seem to make a big difference. In particular, the geometric is better than the arithmetic mean, which gives huge weight to the most extreme number(s) in the set. (SAP used to do just that in marketing BI Accelerator, getting huge results because one customer once got a better than 600X speedup on one particular query out of eight or so.)

That said, there are a number of factors that can render such comparisons seriously misleading. For starters, most of these comparisons omit to consider how often each query will be run. (One advantage to my preferred approach — add up total running time before doing any other arithmetic — is that you can easily weight queries by frequency.) Beyond that, especially when a new challenger system is compared to an old incumbent:

In addition:

And last but not least:

*I suspect that some of the most dramatic speed-ups we see are for queries that are just plain badly written. On the other hand — if you’ve been running your data warehouse software for a few years and still haven’t figured out how to write your queries for decent performance, maybe it’s somewhat too hard to use …

Related links

Comments

One Response to “Some issues in comparing analytic DBMS performance”

  1. Bob Zurek on September 22nd, 2009 9:32 am

    Curt,
    Thanks very much for increasing the visibility of this highly debated topic. I’m particularly frustrated with all the claims companies make when they tout their performance numbers. Frankly I’ve never been particularly fond of industry benchmarks. But hey, that’s my personal opinion. Companies should also be interested in making sure customers are highly satisfied with the performance they are getting in their specific environments. Not just building for a benchmark.
    In my years of experience, people don’t just buy on performance, they take into account many other variables as you well know. In the meantime, we will continue to see all kinds of marketing games and offers around the performance topic. Seems like it never ends.
    Companies should always strive to improve performance in every release that they produce, however as we all know it is important to focus on listening to what customers and prospects want and frankly it is not always about better performance.

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.