May 23, 2011

Traditional databases will eventually wind up in RAM

In January, 2010, I posited that it might be helpful to view data as being divided into three categories:

I won’t now stand by every nuance in that post, which may differ slightly from those in my more recent posts about machine-generated data and poly-structured databases. But one general idea is hard to dispute:

Traditional database data — records of human transactional activity, referred to as “Human/Tabular data above” — will not grow as fast as Moore’s Law makes computer chips cheaper.

And that point has a straightforward corollary, namely:

It will become ever more affordable to put traditional database data entirely into RAM. 

Actually, there are numerous ways for OLTP, other short-request, and some analytic databases to wind up in RAM.

And here’s the kicker: Intel told me last year that CPUs are headed to 46-bit address spaces around mid-decade. Indeed, they hired me to help figure out if that was enough.* That multiplies out to 64 terabytes of RAM on a single server, chip costs permitting. So most of what we now think of as operational databases — and many of the analytic ones too — will fit in-memory, even if they run very large businesses.

*And did so without putting the discussion under any kind of NDA.

Likely consequences of all this include:

Of course, the same trends that make data-storing chips cheaper will make data-generating chips cheaper too. So, just as there are huge amounts of machine-generated data that you’d never pay to store in RAM, the same will still be true 10 years from now; the data volumes involved will just be a lot bigger. And thus there will still be plenty of very large analytic databases using relatively cheap forms of storage, perhaps even disk.

But OLTP and other short-request processing are likely to wind up in-memory. And the same may be true for a considerable amount of analytics, especially but not only if the analytics have a low-latency requirement.

Comments

25 Responses to “Traditional databases will eventually wind up in RAM”

  1. Mark Stacey on May 23rd, 2011 12:35 pm

    Hi

    I think this is neglecting a big change in hardware that may be coming ~ wherein RAM speed non-volatile storage might denote an entire architecture change.

    It’s called Phase Change Memory, and Intel at least is betting big on it.
    http://en.wikipedia.org/wiki/Phase-change_memory
    http://www.technologyreview.com/computing/20148/

    These list it as a replacement for flash, but:

    http://www.micron.com/innovations/pcm.html

    There used to be a site http://www.numonyx.com/ which listed a collaboration between Intel and Micron.

    The point is , whether this is the breaktrhough itself or not, I think the current state of affairs will not carry on ~ volatile RAM will be replaced by almost as fast NON-volatile RAM of some sort

  2. Vlad Rodionov on May 23rd, 2011 12:56 pm

    Yep, there is another trend in the industry: new non-volatile RAM technologies are coming to the market. This makes future perspectives of an in-memory data processing even more solid.

  3. Curt Monash on May 23rd, 2011 9:05 pm

    Fair point as to whether new solid-state memory developments are relevant. I’m inclined to say:

    1. No matter what happens, there will long be major speed or latency differences between various kinds of storage. The level of cache closest to the CPU will be much faster than the storage that holds most of the data. DBMS design will need to optimize around that.

    2. More important, however, is the speed or latency of the SLOWEST storage involved in low-latency response. If you can afford to make that be at RAM speeds, very different DBMS architectures seem optimal than those that are optimal in a disk-centric world.

    3. Flash memory and its speeds live in a kind of muddled middle. If it were always going to be here, it would itself trigger major DBMS redesign. But while there’s certainly been some good flash-oriented engineering — specifically at a few appliance vendors or former appliance vendors — I think there’s a lot of wait-and-see as to whether PCM, racetrack memory, whatever leapfrog flash.

  4. Gary on May 24th, 2011 12:18 am

    This suggests that it is best to have human tabular data stored separately from machine generated data. Is that driven by the preferred DBMS technology / volumes or do you see them as being gathered for different purposes such that they naturally stay independent

  5. Curt Monash on May 24th, 2011 12:46 am

    Gary,

    Primarily the former. There are plenty of use cases in which you’d copy a (relatively speaking) small set of human-generated data into a data warehouse that holds mainly machine-generated stuff. Conversely, you might extract/subset/derive data from a large machine-generated set to load into a warehouse operating on a human-generated-data scale (that already happens in a large fraction of web businesses). Or not just a warehouse; it could be more of an operational system. And that’s even without telling federation stories and so on.

  6. Notes from the Fusion-io S-1 filing | DBMS 2 : DataBase Management System Services on May 24th, 2011 3:54 am

    [...] skepticism about specialized storage hardware for database applications applies in part but not in whole to [...]

  7. Dan Weinreb’s blog » Blog Archive » What are “Human-Generated Data” and “In-RAM Databases”? on May 24th, 2011 9:13 am

    [...] one of the best sources is Curt Monash’s DBMS2 blog.  Recently he posted an article called Traditional Databases will eventually wind up in RAM.  I have two comments about his points from that [...]

  8. Daniel Weinreb on May 24th, 2011 9:14 am

    I was sufficiently inspired by this post to write a whole blog post about it: see http://danweinreb.org/blog/what-are-human-generated-data-and-in-ram-databases

  9. Joachim Wester on May 24th, 2011 4:54 pm

    It does not help if your secondary storage is as fast as RAM if your DBMS architecture is based on moving secondary storage (Disk) to primary storage (RAM) before operating on it.

    So even if your disk was faster than RAM, a traditional DBMS will still be much slower than a memory centric database. My claim is easy to validate; move your “favorite-db-name-goes-here” to a RAM disk and benchmark it.

  10. Joachim Wester on May 24th, 2011 5:05 pm

    A simple example. If you have CPU addressability to you database storage, you would not copy it to RAM before scanning for a value.

    While a naive conclusion would be to simply rewrite such code, the cruel reality is that the complexity of any modern DBMS (due amongst other things to cuncurrent reads and writes) makes a total rewrite a much easier path.

  11. Arik Kol on May 26th, 2011 3:59 pm

    Back to Mark Stacey’s comment, I believe that another important HW based trend is being ignored.
    DRAM and SSD [aka fast media] based storage arrays are changing the way DBMSs work. While for years DBMSs have done their best to avoid accessing the storage arrays, due to their high latencies resulted by rotating disk, the new generation of fast media storage array enable fast access to data. Though it is still not (and never be) as fast as accessing DRAM on a server, it enables millions of IOPS requests, high throughput and fraction of mill seconds latency from a shared resource which is persistent and highly available.
    I believe that in the near future we will see how more DBMSs are utilizing these new IO performance capabilities more and more, leaving the usage of RAM centric DBMS only to very exotic applications (like ones used in trading floors for example).

  12. Curt Monash on May 27th, 2011 7:26 am

    Arik,

    We’ll see. DBMS vendors will definitely adapt to the fact that persistent storage can have a BROAD range of latencies. And there surely will be a marketing (and product management) strain of “Oh, our old disk-bound stuff is plenty fast enough when you don’t actually have to be bound by disks.” And there definitely is interest in direct-attached and/or custom-appliance use of solid-state memory.

    But it may be that, despite various details being very different, from a qualitative design standpoint, running against homogeneous solid-state storage arrays is a lot like running against disk arrays.

  13. Robert Hodges on May 29th, 2011 3:29 pm

    As Daniel Weinreb discussed in his blog response to your article, the question of going to 100% in-memory gets a lot more tricky if you need to ensure data don’t vaporize on failures. “Legacy” applications tend to have kind of annoying requirements in this regard.

    Economics for capable SSDs are starting to compare very favorably to disk and can be adopted without radical changes to current DBMS architectures. In the MySQL community at least much of the performance focus has been in this area though of course more memory is not bad either.

    Finally, what do you mean by “expensive” storage systems? If that means NetApp, EMC, and the like, I would not count them out by any means. They have a lot of very useful features like snapshots, cross-site replication, de-duplication, and centralized management that will keep them relevant for anyone doing transactional processing. However, perhaps you had something else in mind?

  14. Curt Monash on May 29th, 2011 4:26 pm

    Yep, Robert, I meant EMC, NetApp, and the like. All the features you list are ones I’d like to see and indeed expect to see in the DBMS itself. (Well, the Delphix approach to database de-dupe is potentially pretty interesting, but that’s not the province of storage hardware.)

    As for “100% in-memory”, obviously you log to persistent storage.

    And finally — there’s no doubt that solid-state memory will push out many uses of disk. What I’m wondering is whether the industry will stop there, or go all the way to RAM. (It is of course possible that this distinction is somehow obviated before we ever have a chance to get to that point.)

  15. Robert Hodges on May 30th, 2011 9:57 am

    @Kurt, Not that I’m trying to avoid work, but adding sophisticated disk management functions seems to add a lot of complexity to DBMS implementations. Many DBMS in the 1990s supported functions like mirroring but dropped/de-emphasized them as it became apparent RAID hardware did a better job. Storage technology is changing very rapidly, so it does not seem like a good time for DBMSs to take on advanced storage management again.

    My $0.02 anyway. :)

  16. Joe Harris on June 2nd, 2011 7:53 am

    It’s also worth noting in this context that Kognition’s appliance offering is moving very close to ‘everything in RAM’ approach. The latest specs on their site quote RAM:Disk ratios of either 1:4 or 1:8.

    Interestingly the only Kognitio customer I’ve personally spoken to told me that their actual user data was a somewhat less than the total system RAM.

  17. Curt Monash on June 2nd, 2011 9:36 am

    I have great trouble keeping up with Kognitio’s shifting strategies. They so hate appliances that they circulate a letter saying Netezza shouldn’t have gone public. They sell appliances. They’re data-as-a-service. No, they sell products. It’s bewildering.

    That said, http://www.dbms2.com/2008/12/14/kognitio-and-wx-2-update/ is supportive of your view. :)

  18. Metamarkets Blog » Blog Archive » Hadoop’s Secret Shortcoming: Speed (and How to Fix It) on November 4th, 2011 6:18 am

    [...] Last but certainly not least, we share the sentiment that traditional databases will eventually end up in RAM , because RAM is two-to-three orders of magnitude faster than disk. This storage would be more [...]

  19. Nati Shalom on August 21st, 2012 7:03 pm

    Lots of useful information – Thanks for putting it together.
    I’m quite surprised that i didn’t ran into it earlier.

    Anyway i thought that this could be another useful resource:

    Memory is the New Disk for the Enterprise
    http://natishalom.typepad.com/nati_shaloms_blog/2010/03/memory-is-the-new-disk-for-the-enterprise.html

  20. Beyond Hadoop: Fast Queries from Big Data | :: Metamarkets Group :: on September 23rd, 2012 10:32 pm

    [...] We share Curt Monash’s sentiment that traditional databases will eventually end up in RAM , as memory costs continue to fall. In-memory analytics are popular because they are fast, often [...]

  21. Building data startups: Fast, big, and focused - Strata on January 10th, 2013 8:01 am

    [...] from disk to SSD, others have observed that many traditional, relational databases will soon be entirely in memory. This is particularly true for applications that require repeated, fast access to a full set of [...]

  22. What are “Human-Generated Data” and “In-RAM Databases”? | Dan Weinreb on February 25th, 2014 7:38 pm

    [...] one of the best sources is Curt Monash’s DBMS2 blog.  Recently he posted an article called Traditional Databases will eventually wind up in RAM.  I have two comments about his points from that [...]

  23. What are “Human-Generated Data” and “In-RAM Databases”? | on February 25th, 2014 9:55 pm

    [...] one of the best sources is Curt Monash’s DBMS2 blog.  Recently he posted an article called Traditional Databases will eventually wind up in RAM.  I have two comments about his points from that [...]

  24. Notes on memory-centric data management | DBMS 2 : DataBase Management System Services on March 28th, 2014 5:59 am

    [...] I maintain my opinion that traditional databases will eventually wind up in RAM. [...]

  25. http://www.sonidoinc.com/ on September 20th, 2014 2:10 am

    I’m not that much of a online reader to be honest but your sites really nice, keep it up!
    I’ll go ahead and bookmark your website to
    come back later on. Many thanks

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.