I write about a lot of products whose core job boils down to Make queries run fast. Without exception, their vendors tout stories of remarkable performance gains over conventional/incumbent DBMS (reported improvement is usually at least 50-fold, and commonly 100-500+). They further claim at least 2-3X better performance than their close competitors. In making these claims, vendors usually stress that their results come from live customer benchmarks. In few if any of the cases, I judge, are they lying outright. So what’s going on?
Multiple things, I think.
- Existing data warehouses are often badly optimized. The same technology, configured differently, often would do a much better job.
- General-purpose DBMS often require much more tuning for decent complex-query performance than specialized products do. It might have been possible to get that query-from-hell to run fast on the old system, but it wasn’t easy.
- Besides, often nobody tried very hard. The value of the query didn’t seem to justify the tuning effort.
- Specialized products often really are better for the workloads they’re specialized for.
- Different specialized products are best suited for different kinds of analytic workloads.
- Different companies send different qualities of benchmark experts to different sales cycles at different times. Smaller vendors with few active sales cycles sometimes actually send their CTOs.
- And by the way, vendors do their best to “cook the books.” If one query runs 600X better faster than on the competition, and 19 queries run 2-5X faster, the claim of total speedup is apt to be in the 100X+ range, if that’s defensible by at least one definition of “average.”