August 18, 2010

More on temp space, compression, and “random” I/O

My PhD was in a probability-related area of mathematics (game theory), so I tend to squirm when something is described as “random” that clearly is not. That said, a comment by Shilpa Lawande on our recent flash/temp space discussion suggests the following way of framing a key point:

If everybody else is cool with it too, I can live with that. :)

Meanwhile, I talked again with Tim Vincent of IBM this afternoon. Tim endorsed the temp space/Flash fit, but with a different emphasis, which upon review I find I don’t really understand. The idea is:

My problem with that is: Flash typically has lower write than read IOPS (I/O per second), so being (relatively) write-intensive would, to a first approximation, seem if anything to disfavor a workload for flash.

On the plus side, I was reminded of something I should have noted when I wrote about DB2 compression before:

Much like Vertica, DB2 operates on compressed data all the way through, including in temp space.

Comments

6 Responses to “More on temp space, compression, and “random” I/O”

  1. DB2 workload management | DBMS2 -- DataBase Management System Services on August 18th, 2010 4:47 am

    […] By way of contrast, Tim is cautious about the common approach of just lowering a query’s priority. His concern is that a long-running query could linger even longer, creating a long-lasting bottleneck in, for example, temp space. […]

  2. Maris Darbonis on August 18th, 2010 5:27 am

    Well, for multiple data streams maybe “concurrency” is the key word, if data needs to be fetched from several places on the storage device concurrently, that places rather heavy load on the read heads of rotating disks. And temp space often is used by several sessions concurrently.
    It is also used for sort segments and hashes, which have random access patterns; for example, this presentation shows that SSDs are a good fit for that:
    http://www.cs.arizona.edu/~bkmoon/papers/sigmod08ssd-slides.pdf , slides 17-20.

  3. Curt Monash on August 18th, 2010 6:53 am

    Slide 19 is really interesting. Thanks!

  4. Juan Benavides on August 18th, 2010 7:59 am

    Related to IBM (Tim) comment
    In early times of DB2, temp storage was mostly used for sorting output from queries like ORDER BY. In most of this queries, not in all i.e. GROUP BY, the number of writes approach the number of reads.

  5. Alex B on August 20th, 2010 10:47 am

    Talking about concurrency. I haven’t noticed any degradation, but 400% performance improvement.
    http://code.google.com/p/mist01/wiki/Vertica_demystified
    Is this because I’ve used relatively small datasets for this test?

  6. Introduction to Kaminario | DBMS 2 : DataBase Management System Services on December 5th, 2010 6:00 am

    […] you can choose to put just your most bottlenecking data on Kaminario K2 – the hot stuff, your temp space, your logs, […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.