I can think of seven major reasons not to use an analytic RDBMS. One is good; but the other six seem pretty questionable, niche circumstances excepted, especially at this time.
The good reason to not have an analytic RDBMS is that most organizations can run perfectly well on some combination of:
- SaaS (Software as a Service).
- A low-volume static website.
- A network focused on office software.
- A single cheap server, likely running a single instance of a general-purpose RDBMS.
Those enterprises, however, are generally not who I write for or about.
The six bad reasons to not have an analytic RDBMS all take the form “Can’t some other technology do the job better?”, namely:
- A data warehouse that’s just another instance of your OLTP (OnLine Transaction Processing) RDBMS. If your problem is that big, it’s likely that a specialized analytic RDBMS will be more cost-effective and generally easier to deal with.
- MOLAP (Multi-Dimensional OnLine Analytic Processing). That ship has sailed … and foundered … and been towed to drydock.
- In-memory BI. QlikView, SAP HANA, Oracle Exalytics, and Platfora are just four examples of many. But few enterprises will want to confine their analytics to such data as fits affordably in RAM.
- Non-tabular* approaches to investigative analytics. There are many examples in the Hadoop world — including the recent wave of SQL add-ons to Hadoop — and some in the graph area as well. But those choices will rarely suffice for the whole job, as most enterprises will want better analytic SQL performance for (big) parts of their workloads.
- Tighter integration of analytics and OLTP (OnLine Transaction Processing). Workday worklets illustrate that business intelligence/OLTP integration is a really good idea. And it’s an idea that Oracle and SAP can be expected to push heavily, when they finally get their product acts together. But again, that’s hardly all the analytics you’re going to want to do.
- Tighter integration of analytics and other short-request processing. An example would be maintaining a casual game’s leaderboard via a NoSQL write-optimized database. Yet again, that’s hardly all the analytics a typical enterprise will want to do.
*I’ve long used “tabular” to cover both relational and MOLAP structures, the point being that in both cases you have a neat and regular schema, well-represented as a set of arrays.
What could change this picture would be a future in which:
- All your tabular business data fits into RAM.
- Also, the OLTP/analytic DBMS distinction becomes less important.
In that case, it might be reasonable to get by with:
- A single in-memory relational DBMS, handling OLTP and some analytics alike.
- Whichever additional short-request systems you need (mainly for internet-heavy uses).
- A Hadoop-based analytic data store.
I’m on record as suggesting that traditional databases will indeed wind up in RAM. But I’m more doubtful that a single in-memory DBMS will suffice for OTLP and analytics alike.
What are some key aspects of a specialized analytic RDMS? My partial and overlapping list starts:
- Fast, high-volume analytic I/O.
- Smart query planning.
- Smart, high-volume internal data movement.
- Smart workload management.
- Good data compression, including in cache and during query execution.
- Strong analytic platform capabilities.
- Fast execution of analytic requests — standard SQL, advanced SQL, or other.
An analytic RDBMS typically:
- Is optimized for reads, which are often large, and perhaps temporary large writes as well.
- Reduces I/O bottlenecks via, for example, compression, columnar storage, and/or scale-out.
If all the data is in RAM, these problems are indeed lessened. Also, Oracle Exadata is dedicated to the premise that, even using conventional computer parts, I/O bottlenecks can be reduced with enough hardware — and price aside, it seems to work. Still, if you talk with analytic RDBMS designers, you repeatedly hear that it’s not that simple — even getting data efficiently out of RAM is different in the analytic and OLTP cases.
Query planning/execution, data movement, and workload management go together — they’re all about getting the most work done with the least machine effort, and they all depend on determining which specific execution choices might be synergistic or anti-synergistic with each other. Taken together, they form a very tough optimization challenge, which is different in the OLTP and analytic cases. Adding in analytic platform capabilities adds yet more difficulty to the optimization problem. And so:
A fast analytic database manager is a hard thing to build; expecting it to be fast at OLTP as well may be too much to ask for.
Given that, the discussion pivots to:
OK, but can we overprovision the RAM by so much that suboptimal performance doesn’t matter?
My guess is “Not any time soon” — because efficiency is always a good thing, databases will always grow, and RAM will never be free.
Bottom line: Analytic RDBMS will likely be needed for a long time.
- Integrating short-request and analytic processing (March, 2011)