I had the chance to talk at length with Solid Information Technology tech guru Antoni Wolski about their memory-centric DBMS technology architecture. The most urgent topic was what made in-memory database managers inherently faster than disk-based ones that happened to have all the data in cache. But we didn’t really separate that subject from the general topic of how they made their memory-centric technology run fast, from its introduction in 2002 through substantial upgrades in the most recent release.
There were 4 main subtopics to the call:
1. Indexing structures that are very different from those of disk-based DBMS.
2. Optimizations to those indexing structures.
3. Optimizations to logging and checkpointing.
4. Miscellaneous architectural issues.
Most simply, since each disk access is slow, disk-based DBMS divide data into blocks, which are handled as units. Block size is typically 8K, and can be larger. Now, that may make sense from the file system’s standpoint anyway. Be that as it may, the block structure is preserved even when the data is processed in RAM. One disadvantage to this is suboptimal use of on-processor cache, which is important since lookups from Level 1/Level 2 cache are vastly faster than lookups on separate RAM chips.
But the bigger point is that indexing structures are screwed up. The canonical index structure in a disk-centric OLTP RDBMS is a tree of blocks. The record sought is in a block somewhere. There are index blocks whose entries are pointers to the correct block based on values in the index column. There are index blocks of pointers to other index blocks. And so on. One can traverse these trees in very few steps, but each step is costly, because each step involves examining the whole block.
SolidDB, by way of contrast, uses a core index structure called the trie. The key value on which the record search is based is divided into chunks of bits. Each chunk leads to a tree node with a small number of choices for the next chunk. There are more steps, but each step is much cheaper.
The story gets even better if one optimizes more. Notably, what’s searched at each node is, in SolidDB, an array. And arrays are super-fast to get information out of, because the precise address is a calculated value with no pointers needed at all. Arrays have another major advantage too, according to Solid – great proximity of data, meaning that what you’re looking for is more likely to already be in on-processor cache.
What we’ve discussed up to this point are for the most part ideas long known in computer science, but implemented more recently because of Moore’s Law advances. More innovative (in Antoni’s opinion, which I tend to share) are Solid’s ideas in logging and checkpointing.
Having an in-memory DBMS is all great and wonderful. But assuming one actually intends to persist the data, there still are transaction logs to be written to disk. What’s more, the in-memory database is periodically snapshotted and checkpointed to disk. So if one isn’t careful, the drastic I/O reduction benefits of in-memory operation are apt to get lost.
SolidDB features several tactics to reduce the impact of logging and checkpointing. The most noteworthy of these are:
- A SQL extension that lets the developer toggle between strict (synchronous) and relaxed (asynchronous) durability/logging. Relaxed logging can be done in separate I/O-oriented threads that never block database processing threads.
- An approach to checkpointing that is snapshot-preserving (i.e., not fuzzy) yet nonblocking.
This may be a good time to mention a cool page on Solid’s website, where they list a number of conference papers they’ve given. The one on checkpointing is the one with “SIREN” in the title, second from the top.
Solid also has done a lot of work in optimizing thread synchronization (there are lots of parallel threads, so the critical sections of the processes need to be as short as possible) and optimization (radically different cost estimates than in disk-centric systems). And they have the same kind of queueless architecture StreamBase does.