My PhD was in a probability-related area of mathematics (game theory), so I tend to squirm when something is described as “random” that clearly is not. That said, a comment by Shilpa Lawande on our recent flash/temp space discussion suggests the following way of framing a key point:
- You really, really want to have multiple data streams coming out of temp space, as close to simultaneously as possible.
- The storage performance characteristics of such a workload are more reminiscent of “random” than “sequential” I/O.
If everybody else is cool with it too, I can live with that.
Meanwhile, I talked again with Tim Vincent of IBM this afternoon. Tim endorsed the temp space/Flash fit, but with a different emphasis, which upon review I find I don’t really understand. The idea is:
- Analytic DBMS processing generally stresses reads over writes.
- Temp space is an exception — read and write use of temp space is pretty balanced. (You spool data out once, you read it back in once, and that’s the end of that; next time it will be overwritten.)
My problem with that is: Flash typically has lower write than read IOPS (I/O per second), so being (relatively) write-intensive would, to a first approximation, seem if anything to disfavor a workload for flash.
On the plus side, I was reminded of something I should have noted when I wrote about DB2 compression before:
Much like Vertica, DB2 operates on compressed data all the way through, including in temp space.