June 22, 2007

Memory-centric vs. conventional DBMS — a Solid difference

I had the chance to talk at length with Solid Information Technology tech guru Antoni Wolski about their memory-centric DBMS technology architecture. The most urgent topic was what made in-memory database managers inherently faster than disk-based ones that happened to have all the data in cache. But we didn’t really separate that subject from the general topic of how they made their memory-centric technology run fast, from its introduction in 2002 through substantial upgrades in the most recent release.

There were 4 main subtopics to the call:

1. Indexing structures that are very different from those of disk-based DBMS.
2. Optimizations to those indexing structures.
3. Optimizations to logging and checkpointing.
4. Miscellaneous architectural issues.

Most simply, since each disk access is slow, disk-based DBMS divide data into blocks, which are handled as units. Block size is typically 8K, and can be larger. Now, that may make sense from the file system’s standpoint anyway. Be that as it may, the block structure is preserved even when the data is processed in RAM. One disadvantage to this is suboptimal use of on-processor cache, which is important since lookups from Level 1/Level 2 cache are vastly faster than lookups on separate RAM chips.

But the bigger point is that indexing structures are screwed up. The canonical index structure in a disk-centric OLTP RDBMS is a tree of blocks. The record sought is in a block somewhere. There are index blocks whose entries are pointers to the correct block based on values in the index column. There are index blocks of pointers to other index blocks. And so on. One can traverse these trees in very few steps, but each step is costly, because each step involves examining the whole block.

SolidDB, by way of contrast, uses a core index structure called the trie. The key value on which the record search is based is divided into chunks of bits. Each chunk leads to a tree node with a small number of choices for the next chunk. There are more steps, but each step is much cheaper.

The story gets even better if one optimizes more. Notably, what’s searched at each node is, in SolidDB, an array. And arrays are super-fast to get information out of, because the precise address is a calculated value with no pointers needed at all. Arrays have another major advantage too, according to Solid – great proximity of data, meaning that what you’re looking for is more likely to already be in on-processor cache.

What we’ve discussed up to this point are for the most part ideas long known in computer science, but implemented more recently because of Moore’s Law advances. More innovative (in Antoni’s opinion, which I tend to share) are Solid’s ideas in logging and checkpointing.

Having an in-memory DBMS is all great and wonderful. But assuming one actually intends to persist the data, there still are transaction logs to be written to disk. What’s more, the in-memory database is periodically snapshotted and checkpointed to disk. So if one isn’t careful, the drastic I/O reduction benefits of in-memory operation are apt to get lost.

SolidDB features several tactics to reduce the impact of logging and checkpointing. The most noteworthy of these are:

This may be a good time to mention a cool page on Solid’s website, where they list a number of conference papers they’ve given. The one on checkpointing is the one with “SIREN” in the title, second from the top.

Solid also has done a lot of work in optimizing thread synchronization (there are lots of parallel threads, so the critical sections of the processes need to be as short as possible) and optimization (radically different cost estimates than in disk-centric systems). And they have the same kind of queueless architecture StreamBase does.

Comments

4 Responses to “Memory-centric vs. conventional DBMS — a Solid difference”

  1. OraTransplant » Log Buffer #51: a Carnival of the Vanities for DBAs on June 30th, 2007 3:35 am

    [...] wouldn’t fit in any of the other paragraphs. Curt Monash has an interesting article about SolidDB, a memory-centric database. Sue Harper, from Oracle, has a great tip for using a screen magnifier. Also great when showing of [...]

  2. DBMS2 — DataBase Management System Services » Blog Archive » Mike Stonebraker calls for the complete destruction of the old DBMS order on May 21st, 2008 4:39 pm

    [...] same data access methods (i.e, b-trees). Changing the data structure – e.g., to something like solidDB’s – should provide some further speedups [...]

  3. DBMS2 — DataBase Management System Services » Blog Archive » IBM acquires SolidDB to compete with Oracle TimesTen on May 27th, 2008 6:58 am

    [...] is actually a very interesting hybrid disk/in-memory memory-centric database management system. However, the press release announcing the deal makes it sound as if solidDB is in-memory [...]

  4. Notes on Sybase Adaptive Server Enterprise | DBMS2 -- DataBase Management System Services on May 16th, 2010 2:26 am

    [...] Replicating from memory to disk is a near-term future capability. (So Sybase does not yet have a hybrid memory-centric story ala solidDB Classic.) [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.