I’m frequently asked to generalize in some way about in-memory or memory-centric data management. I can start:
- The desire for human real-time interactive response naturally leads to keeping data in RAM.
- Many databases will be ever cheaper to put into RAM over time, thanks to Moore’s Law. (Most) traditional databases will eventually wind up in RAM.
- However, there will be exceptions, mainly on the machine-generated side. Where data creation and RAM data storage are getting cheaper at similar rates … well, the overall cost of RAM storage may not significantly decline.
Getting more specific than that is hard, however, because:
- The possibilities for in-memory data storage are as numerous and varied as those for disk.
- The individual technologies and products for in-memory storage are much less mature than those for disk.
- Solid-state options such as flash just confuse things further.
Consider, for example, some of the in-memory data management ideas kicking around.
- In many cases there is essentially an in-memory DBMS, trying for as much ACIDity as RAM reasonably allows, then (usually) also copying data synchronously to persistent storage. These can have many different architectures. For example:
- SAP HANA is an in-memory columnar DBMS, with text indexing/inverted-list antecedents, except when it uses one of a couple of approaches to in-memory row-based data management.
- solidDB, now an IBM product, is an RDBMS that relies on Patricia tries. It is actually a hybrid memory/disk product, but optimized for in-memory operation.
- eXtremeDB is an OODBMS, but relies on B-trees.
- H-Store and its commercialization VoltDB are row-based RDBMS that make drastic assumptions about the nature of your workload, but in return get to drop much of the overhead other DBMS need.
- Oracle TimesTen is a row-based RDBMS, oriented to OLTP (OnLine Transaction Processing), which stores its data persistently via another RDBMS. (MySQL was the default choice before Oracle bought the company.)
- Oracle’s answer to SAP HANA is to take TimesTen and do analytics on it, via the Exalytics appliance.
- Some disk-based DBMS just happen to be architected in ways so that for good performance you’re going to want to keep all the data in RAM. Often, their in-memory architecture is lot like their on-disk architecture, with memory mapping for I/O. This is done in very different kinds of DBMS.
- MongoDB is one visible example. In general, scale-out web databases (whether NoSQL or MySQL) often keep all their data in RAM, whether or not that plan is baked into the DBMS architecture.
- Various analytic DBMS vendors have at time been memory-oriented. At the moment, I think:
- Exasol (columnar) isn’t quite as extreme about wanting to be in-memory as it used to be.
- ParAccel (columnar) and its memory-mapped architecture can be happily used either in-memory or on disk.
- Kognitio (row-based), which used to be portrayed as a disk-based system that’s smart about using RAM, is currently being marketed as an in-memory system.
- My last technical briefing on Applix TM1 (now an IBM Cognos product) was in September, 2005. (The product itself dates back to 1984.) At the time TM1 had an interesting sparse MOLAP (Multi-Dimensional OnLine Analytic Processing) story, the point being that the system worked hard to isolate what was actually non-zero. Loading of raw data seemed to be batch, but you could update models with derived data, and there was a transaction log for confident persistence.
- Alternatively, you can use a caching layer, typically on a separate set of servers from your DBMS, which has no responsibility for managing data persistence. For example:
- TimesTen and solidDB are used, respectively, as relational caches for Oracle and DB2.
- Peter Zencke told me years ago that SAP had a purpose-built caching layer that kept over 99% of requests from touching disk.
- The key-value store memcached is central to many of the world’s largest web sites, typically backed by a MySQL cluster.
- ScaleArc has key-value cache that stores — rather than individual records — the entire TCP string sent by an RDBMS in response to a particular SQL query.
- Some systems manage data in memory in one kind of structure, then ensure persistence via a very different structure on disk. Examples include:
- Workday’s architecture — object-oriented in RAM, MySQL (really key-value) on disk. Edit: Workday thinks “key-value” is a slightly misleading way to put it. Stay tuned for more.
- Oracle Coherence (formerly Tangosol) — object-oriented in RAM, Oracle on disk. Edit: Actually, Coherence isn’t really a write-through ORM (Object-Relational Mapper). It functions more like memcached, albeit with a very different data model.
- Couchbase — memcached (key-value) in-memory, evolving from SQLite to CouchDB on disk.
- Similarly, business intelligence suites can manage data in-memory that comes from some other kind of data store (usually an RDBMS, sometimes Hadoop or whatever). I haven’t had a lot of luck in getting details, with one exception — QlikView, which uses a simple tabular data structure.
- Stream processors — i.e. CEP engines — are a whole other sort of in-memory engine, doing something that’s a lot like data management.
And that, kiddies, is why I hesitate to generalize in too much detail about “in-memory database management.”
Despite its length, this is still a very partial list of memory-centric data management approaches. I encourage you to add other examples into the comments that I might have left out.
- I did a simpler overview of memory-centric alternatives in 2005.