Two different vendors recently tried to inflict benchmarks on me. Both were YCSBs, so I decided to look up what the YCSB (Yahoo! Cloud Serving Benchmark) actually is. It turns out that the YCSB:
- Was developed by — you guessed it! — Yahoo.
- Is meant to simulate workloads that fetch web pages, including the writing portions of those workloads.
- Was developed with NoSQL data managers in mind.
- Bakes in one kind of sensitivity analysis — latency vs. throughput.
- Is implemented in extensible open source code.
That actually sounds pretty good, especially the extensibility part;* it’s likely that the YCSB can be useful in a variety of product selection scenarios. Still, as recent examples show, benchmark marketing is an annoying blight upon the database industry.
*With extensibility you can test your own workloads and do your own sensitivity analyses.
A YCSB overview page features links both to the code and to the original explanatory paper. The clearest explanation of the YCSB I found there was:
Each operation against the data store is randomly chosen to be one of:
- Insert: Insert a new record.
- Update: Update a record by replacing the value of one field.
- Read: Read a record, either one randomly chosen field or all fields.
- Scan: Scan records in order, starting at a randomly chosen record key. The number of records to scan is randomly chosen.
As was anyway obvious from the benchmark’s purpose, there’s nothing about joins, distributed transactions, or other hallmarks of OLTP (OnLine Transaction Processing).
NuoDB generated some mediocre YCSB results, then made a big fuss because NuoDB got those numbers while operating through SQL. Blech. I guess they proved that NuoDB’s SQL parsing/execution layer is better than the worst thing one can imagine an undergraduate writing as a homework project, but otherwise little substance was demonstrated.
AeroSpike’s YCSB story isn’t as bad. Aerospike seems to have used the benchmark pretty much the way it was intended, and produced numbers that look better than NuoDB’s. Still, a few vertical markets aside, why does it matter how far under 10 milliseconds latency can get?* Further, Aerospike managed a 60 GB database with 30 GB of RAM per server, which is an awkward fit with its “You don’t need to put everything in RAM because we’re so fast on flash memory” positioning.
*If you really do care about that, maybe your app shouldn’t be making so many round trips.
So once again, I stand by my position that benchmark marketing is an annoying waste of everybody’s time.