September 19, 2006

Is data warehousing now all about sequential access?

A lot of evidence is pointing to a major paradigm shift in data warehouse RDBMS, along the lines of:

Old way: Assume I/O is random; lower total execution time by improving selectivity and thus lowering the amount of I/O.

New way: Drive the amount of random I/O to near zero, and do as much sequential I/O as necessary to achieve this goal.

Examples include:

The hardware logic is compelling, as long as we rely on hard disks rather than, say, flash memory. Rotation speed has only gone up 12.5-fold in the entire 50-year history of the hard drive, and currently maxes out at 15,000 RPM, which puts a floor of 2 ms on average random access time. But streaming data on and off disk gets exponentially faster, in line with increases in disk density and semiconductor performance. Hence sequential data access gets ever faster, while random access does not.

What I don’t 100% understand yet, however, is the full array of techniques used by the traditional leaders to co-opt or combat this trend. I’m looking into that; in particular, I have a call scheduled with Oracle.

I hope to write about this issue in my October Computerworld column. (My columns are typically submitted on the first Monday or Tuesday morning of the month, to appear in the following week’s edition.) Or if it slips from October, then soon thereafter. Any thoughts in the interim would be most welcome.

Comments

4 Responses to “Is data warehousing now all about sequential access?”

  1. DBMS2 — DataBase Management System Services»Blog Archive » I say “sequential”, you say … on September 20th, 2006 5:52 pm

    [...] I talked with Teradata today, and they called me on my use of the term “sequential.” Basically, if there’s any head movement for disk seeks, some computer science researchers wouldn’t call it “sequential.” I didn’t know that; I was just familiar with the less precise usage of the term in some vendors’ marketing and discussions.* OK, I’ll make up a new, more precise term instead. How about “coarse-grained”? [...]

  2. David Aldridge on September 25th, 2006 5:52 pm

    I’m absolutely behind anything that will supress disk head latency as a factor in data warehouse performance. In fact I wrote something on the subject something over a year ago. http://oraclesponge.wordpress.com/2005/07/25/time-slicing-of-disk-io/

    I suppose that the vendors are still having trouble grasping the inherently different nature of data warehouses to the small-and-random i/o model that OLTP generates.

  3. Linux 2.6 Kernel I/O Schedulers for Oracle Data Warehousing: Part I « The Oracle Sponge on September 28th, 2006 10:47 pm

    [...] This issue popped back into my head after being directed through Log Buffer #11 at Mark Rittman’s site to an article by Curt Monash titled “Is data warehousing all about sequential access?” and which matched my thoughts very well. [...]

  4. oraclesponge.com » Blog Archive » Linux 2.6 Kernel I/O Schedulers for Oracle Data Warehousing: Part I on May 20th, 2010 9:57 am

    [...] through Log Buffer #11 at Mark Rittman’s site to an article by Curt Monash titled “Is data warehousing all about sequential access?” and which matched my thoughts very [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.