Three months ago, I pointed out that it is hard to generalize about memory-centric database management, because there are so many different kinds. That said, there are some basic points that I’d like to record as background for any future discussion of the subject, focusing on differences between disk and RAM. And while I’m at it, I’ll throw in a few comments about flash memory as well.
This post would probably be better if I had actual numbers for the speeds of various kinds of silicon operations, but I’ll do what I can without them.
For most purposes, database speed is a function of a few kinds of number:
- CPU cycles consumed.
- I/O throughput.
- I/O wait time.
- Network throughput.
- Network wait time.
The amount of storage used is also important, both directly — storage hardware costs money — and because if you save storage via compression, you may get corresponding benefits in I/O. Power consumption and similar costs are usually tied to hardware efficiency; the less gear you use, the less floor space and cooling you may be able to get away with.
When databases move to RAM from spinning disk, major consequences include:
- I/O wait time is reduced by many orders of magnitude.
- I/O throughput is much faster too, but not to the same extent.
- Storage equipment is much more expensive (RAM vs. disk).
- There’s a minimum average wait time before you can read data from a specific place on a disk. At 15,000 RPM or less, it can’t be below 2 milliseconds, even if disk heads moved along the radius at infinite speed. In practice, the best figures are usually in the high single-digit milliseconds.
- Sequential disk access is much faster than random. Disks are capable of sending back over 100 megabytes/second. But as noted in the previous point, they max out on the order of 100 reads/second. So it’s hard to get max throughput unless the average read brings back a megabyte or more.
- These facts apply to writes as well as to reads.
- The advantage of sequential over random I/O is vastly reduced in RAM (it’s never quite eliminated, but it’s a much smaller consideration).
- Things get interesting for data compression as you move to RAM from disk:
- One classic benefit — compression saves I/O — is much less important than with disk.
- Another classic benefit — compression saves storage costs — is much increased in importance.
- Compression benefits in the area of network traffic aren’t much affected.
But notwithstanding everything else, you still need a persistent-storage story. Typically, that’s just your update/transaction log. Hence in-memory write performance is actually gated by the speed at which you can stream your update log to persistent storage — unless, of course, you’re running some kind of event processing/data reduction system and truly are willing to discard most of the data that passes through.
When you have to go to spinning disk, your data access methods are commonly indexes and scans, because those are the approaches that minimize the number of disk reads. But when data lives in RAM, pointer-chasing is a reasonable choice. Also, directly calculated addresses seem to be used more in memory than they are on disk. For example:
- QlikView and Neo4j both rely on direct addressing.
- Neo4j also has a lot of pointer-chasing.
- solidDB relies on the walking of trees, aka Patricia tries.
- Workday chases references among a whole lot of different objects.
Flash, of course, is another kind of silicon memory — persistent, and slower than RAM. Beyond that:
- You generally attach a lot more flash to one server than you would RAM. This can create bandwidth bottlenecks between the flash and the CPU. If you use PCIe, you could have issues with attaching as much flash as you want. If you use disk controllers instead, as Teradata does, you could have issues with throughput.
- Sequential writes to flash are slow, perhaps even slower than sequential writes to spinning disk.
- Random writes to flash require writing a whole block.
- Flash had a bad reputation for the number of times you can write to it before it wears out. But software has done a good job of obviating the problem, e.g. via error-correcting codes.
- In connection with that, the cheaper but less reliable form of flash — MLC vs. SLC (Multi-/Single-Level Cell) — is becoming more acceptable for enterprise use. For example, Clustrix appliances use MLC.
In theory, all the comments about random vs. sequential, pointers vs. indexes, and so on carry over pretty well from RAM to flash. In practice, however, data access methods used on flash seem to be pretty similar to those on spinning disk. I’m not totally sure why.