A few years ago, I suggested that database workloads could be divided into two kinds — transactional and analytic. The advent of non-transactional NoSQL has suggested that we need a replacement term for “transactional” or “OLTP”, but finding one has been a bit difficult. Numerous tries, including high-volume simple processing, online request processing, internet request processing, network request processing, short request processing, and rapid request processing have turned out to be imperfect, as per discussion at each of those links. But then, no category name is ever perfect anyway. I’ve finally settled on short request processing, largely because I think it does a good job of preserving the analytic-vs-bang-bang-not-analytic workload distinction.
The easy part of the distinction goes roughly like this:
- Anything transactional or “OLTP” is short-request.
- Anything “OLAP” is analytic.
- Updates of small amounts of data are probably short-request, be they transactional or not.
- Retrievals of one or a few records in the ordinary course of update-intensive processing are probably short-request.
- Queries that return or aggregate large amounts of data — even in intermediate result sets — are probably analytic.
- Queries that would take a long time to run on badly-chosen or -configured DBMS are probably analytic (even if they run nice and fast on your actual system).
- Analytic processes that go beyond querying or simple arithmetic are — you guessed it! — analytic.
- Anything expressed in MDX is probably analytic.
- Driving a dashboard is usually analytic.
Where the terminology gets more difficult is in a few areas of what one might call real-time or near-real-time analytics. My first takes are:
- If you’re updating a counter 1000s of times per second, that’s short-request, even if the counter feeds something that looks like a dashboard or report.
- Serving web pages, tracking clicks, and so on is generally short-request, except that …
- … personalizing web pages can in some cases reasonably be viewed as analytic, and hence right at the boundary between or intersection of analytic and short-request processing. The same goes for the rest of what Teradata calls active data warehousing or Aster Data used to call “frontline”.
- If complex event/stream processing feeds a dashboard for humans to look at, it’s just fast analytics. If it automatically drives transactions, then it’s also at the boundary between or intersection of analytic and short-request processing.
Indeed, one of my top trends to watch these days is the integration of short request and analytic processing. Several different approaches come to mind.
You can do everything in a single instance of a general-purpose DBMS such as Oracle, DB2, Microsoft SQL Server, Sybase ASE, or MySQL. For sufficiently small enterprises with sufficiently undemanding workloads, that’s the best approach. Tokutek evidently aspires to be an improved version of the same thing.
You can do it in an analytic DBMS that is sufficiently strong in user concurrency, update speed, and so on. This is the sweet spot of Teradata’s market. It’s also where SAP HANA is alleged to be going.
You can tie together DBMS optimized for short-request and analytic processing (or use something like Hadoop for the analytics, whether or not it should be considered as a DBMS). E.g., Membase (now Couchbase) has integration stories with Hadoop and Vertica, at a couple of clients each. I think this is a major untapped opportunity in the MySQL world, and have been raising that point with various companies for some time.
You can graft short request processing onto analytic system. That’s the point of HBase.
You can superpose analytics on a short request processing system. That’s the point of DataStax Brisk.
Sybase RAP, depending on how it is configured, can fit several of these models. The same could be said of Oracle (especially Exadata) or DB2.