Most aspects of computer performance and capacity grow at Moore’s Law kinds of speeds. Doubling times may be anywhere from 9 months to 2 years, but in any case speeds and storage capacities grow exponentially quickly. Not so, however, with disk rotation speeds. The very first disk drives, over 50 years ago, rotated 1,200 times per minute. Today’s top disk rotation speed is around 15,000 RPM. Indeed, while I recall seeing a reference to one at 15,600 RPM, I can’t now go back and find it. Yes, folks; disk rotational speed in the entire history of computing has increased just by a measly factor of 13.
Why does this matter to DBMS design? Simply put, disk rotation speed is an absolute limit to the speed of random disk-based data access. Today’s fastest disks take 4 milliseconds to rotate once. Thus, multiple heads aside, getting something from a known but random location on the disk will take at least 2 milliseconds. And a naive data management algorithm will, for a single query, result in dozens or even hundreds of random accesses.
Thus, for a DBMS to run at acceptable speed, it needs to get data off disk not randomly, but rather a page at a time (i.e., in large blocks of predetermined size) or better yet sequentially (i.e., in continuous streams of indeterminate size). The indexes needed to assure these goals had best be sized to fit entirely in RAM. Clustering also plays an increasingly large role, so that data needed at the same time is likely to be on the same page, or at least in the same part of the disk.
Right there I’ve described some of the toughest ongoing challenges facing DBMS engineers. The big vendors all do a great job at meeting them (if they didn’t, they’d be out of business). Even so, some small companies find themselves able to beat the big guys, by some egregious cheating.
Data warehouse appliance vendors such as Netezza and especially Datallegro optimize their systems to stream data sequentially off of disk. In doing so, they go deeper into the operating systems, hardware, etc. than Oracle could ever allow itself to do. And the results seem pretty good. But I’ll write about that another time. Instead, I’m focusing right now on memory-centric data management; please see my other posts in that topic category.