In January, 2010, I posited that it might be helpful to view data as being divided into three categories:
- Human/Tabular data –i.e., human-generated data that fits well into relational tables or arrays.
- Human/Nontabular data — i.e., all other data generated by humans.
- Machine-Generated data.
I won’t now stand by every nuance in that post, which may differ slightly from those in my more recent posts about machine-generated data and poly-structured databases. But one general idea is hard to dispute:
Traditional database data — records of human transactional activity, referred to as “Human/Tabular data above” — will not grow as fast as Moore’s Law makes computer chips cheaper.
And that point has a straightforward corollary, namely:
It will become ever more affordable to put traditional database data entirely into RAM.
Actually, there are numerous ways for OLTP, other short-request, and some analytic databases to wind up in RAM.
- SAP has some good ideas for how it could happen, banging transactions into what is essentially an in-memory analytic database. (I dispute SAP’s claims of transformational database technology leadership, but that doesn’t mean the underlying ideas aren’t good.)
- For those who can afford the associated technology disruption, memory-centric object-oriented DBMS could be appealing.
- Web scalability best practices commonly include keeping data in RAM (e.g., that’s pretty much the point of caching layer memcached).
- SaaS (Software as a Service) companies — such as Workday — often bring a particular tenant’s database entirely into RAM.
- QlikView highlights the benefits of doing business intelligence in RAM.
- SAS HPA makes the argument that even “big data analytics” should sometimes be done in RAM.
- I don’t have particularly favorable opinions at this time about marketing strategies or momentum at Oracle TimesTen, IBM solidDB, or VoltDB, but those examples at least serve to illustrate that memory-centric OLTP DBMS have existed for years.
- Actually, SAP has at least two good ideas, if you count Sybase as part of SAP.
And here’s the kicker: Intel told me last year that CPUs are headed to 46-bit address spaces around mid-decade. Indeed, they hired me to help figure out if that was enough.* That multiplies out to 64 terabytes of RAM on a single server, chip costs permitting. So most of what we now think of as operational databases — and many of the analytic ones too — will fit in-memory, even if they run very large businesses.
*And did so without putting the discussion under any kind of NDA.
Likely consequences of all this include:
- Legacy apps will (eventually) be consolidated and virtualized in-memory. Their underlying databases will grow so slowly that eventually the cost of putting them in RAM will be too low to worry about.
- Expensive storage systems will (continue to) be irrelevant to database processing. Databases that don’t fit in RAM will typically be big enough to require the attention of a lot of CPUs — and in those cases the DBMS software itself will handle all the storage tasks.
- Major OLTP DBMS vendors, such as Oracle, will need alternate in-memory code lines, because disk-centric architectures are sub-optimal in-memory. Well, that’s what they have those big R&D budgets for.
- SaaS vendors and web businesses may not rely on today’s major OLTP DBMS vendors. (I was going to say “won’t” rather than “may not” until I recalled the likely M&A endgame.) Traditional enterprises may blanch at migrating away from their legacy DBMS environments, but the trade-offs are different for technology companies using DBMS as subsystems.
Of course, the same trends that make data-storing chips cheaper will make data-generating chips cheaper too. So, just as there are huge amounts of machine-generated data that you’d never pay to store in RAM, the same will still be true 10 years from now; the data volumes involved will just be a lot bigger. And thus there will still be plenty of very large analytic databases using relatively cheap forms of storage, perhaps even disk.
But OLTP and other short-request processing are likely to wind up in-memory. And the same may be true for a considerable amount of analytics, especially but not only if the analytics have a low-latency requirement.