The analytic DBMS/data warehouse appliance market is full of competitive performance claims. Sometimes, they’re completely fabricated, with no basis in fact whatsoever. But often performance-advantage claims are based on one or more head-to-head performance comparisons. That is, System A and System B are used to run the same set of queries, and some function is applied that takes the two sets of query running times as an input, and spits out a relative performance number as an output.
For example, Greg Rahn twittered to me that Oracle Exadata commonly outperforms existing Oracle installations by a factor of 50 or better, based on a “geometric mean”. What I presume he meant by that is:
- At any one user installation, a number of queries were compared on new system vs. old.
- In each case, the ratio between new and old running time was taken.
- The geometric mean of all those ratios was computed.
Math note: Reversing the order of the second and third steps doesn’t change the outcome at all. Either way, you wind up multiplying N things together, dividing by the product of another N things, and taking the Nth root of all that.
Looking just at the arithmetic, a straightforward geometric-mean approach is not a terrible methodology. Theoretically, I’d prefer to just add up the running times for the whole workload — using each system — and divide the two aggregates. But I’ve tested that change in a couple of cases, and it didn’t seem to make a big difference. In particular, the geometric is better than the arithmetic mean, which gives huge weight to the most extreme number(s) in the set. (SAP used to do just that in marketing BI Accelerator, getting huge results because one customer once got a better than 600X speedup on one particular query out of eight or so.)
That said, there are a number of factors that can render such comparisons seriously misleading. For starters, most of these comparisons omit to consider how often each query will be run. (One advantage to my preferred approach — add up total running time before doing any other arithmetic — is that you can easily weight queries by frequency.) Beyond that, especially when a new challenger system is compared to an old incumbent:
- Comparisons aren’t always run on identical or comparable hardware.
- One system might be running an obsolete release of the software.
- Queries may not be equally well written and tuned on both systems.*
- Both systems may not be equally well administered and/or configured.
- A small set of queries may not be representative of a whole workload.
- Since complete workloads are rarely tested, concurrency and contention may not be properly modeled.
And last but not least:
- Marketing departments tend to take their best results and claim those are “typical.”
*I suspect that some of the most dramatic speed-ups we see are for queries that are just plain badly written. On the other hand — if you’ve been running your data warehouse software for a few years and still haven’t figured out how to write your queries for decent performance, maybe it’s somewhat too hard to use …