November 5, 2012

Real-time confusion

I recently proposed a 2×2 matrix of BI use cases:

Is there an operational business process involved?
Is there a focus on root cause analysis?

Let me now introduce another 2×2 matrix of analytic scenarios:

Is there a compelling need for super-fresh data?
Who’s consuming the results — humans or machines?

My point is that there are at least three different cool things people might think about when they want their analytics to be very fast:

Fast investigative analytics — e.g., business intelligence with great query response.
Computations on very fresh data, presented to humans — e.g. “heartbeat” graphics monitoring a network.
Computations on very fresh data, presented back to a machine — e.g., a recommendation engine that includes makes good use of data about a user’s last few seconds of actions.

There’s also one slightly boring one that however drives a lot of important applications:

Analytics fed to machines on not-so-fresh data — e.g., call center software that doesn’t check whether the caller used the company’s website within the last minute.

Every so often, sometimes gets the bright idea to take whatever they offer in one of these areas and call it “real-time”. Confusion invariably ensues, for reasons including:

If there’s a human being in the loop, then at best that’s human real-time.
If super-fresh data is a nice-to-have rather than being essential to the use case, then it’s not a true real-time story at all.

Thus, I think the industry would be better off if the phrase “real-time” were never used again. Monash’s First Law of Commercial Semantics teaches why this isn’t likely; but a guy can dream, can’t he?

I write about fast analytic computation often; my recent posts on Platfora, Impala, and Teradata and Hadapt are just three of literally hundreds. So let’s focus instead on the other two cool areas. For starters:

Any one act of receiving or transmitting (a small quantity of) data can, in principle, be done in milliseconds.
The most basic point of “streaming” is to ensure that small reads or (more likely to be the problem) writes don’t bottleneck each other.
Computations on fresh data are usually simple. If you can’t do a computation quickly at all, then in particular you can’t do it quickly on fresh data. But examples of simple computations include a lot of what goes on in data manipulation, notably:
- Filters, pattern matching, and rule-checking.
- Arithmetic with small numbers of inputs.
- Running-total aggregates — i.e. “counters”.

Based on all that, it would seem that:

Computing on super-fresh data for human or machine consumption should be very similar problems …
… unless the machines demand much lower latency than humans would ever need. (Paradigmatic example: Algorithmic trading.)

But while that’s probably true for enabling technology, application demand (and supply) tend to be more focused. In particular:

1. The CEP/stream processing industry of course lives off of algorithmic trading — machines informing machines. They also make various efforts at super-fresh BI, but don’t necessarily get much traction outside of the investment vertical.

2. Most NoSQL and NewSQL vendors that I know — to the extent they have customers at all — have users in gaming, some sort of ad serving, and/or some sort of personalization. Usually, there are counters and/or model scoring somewhere in the story — i.e., machines informing machines. So one way or another, they’re all active in my category of “computations on very fresh data, presented back to a machine.” But I hear fewer stories from them in the area of super-fresh BI.

Similarly, WibiData and now also Continuuity are building stacks on top of Hadoop/HBase to help with machines-informing-machines.

3. Splunk makes its living from super-fresh BI. But when machines help humans to monitor other machines, at some point the distinction between wet-brain and cybernetic users gets blurry.

And I’ll pause there, continuing the discussion in a general post about the role and future of analytic RDBMS.

Related link

Integrating short-request and analytic processing (March, 2011)

Categories: Business intelligence, Games and virtual worlds, Log analysis, Predictive modeling and advanced analytics, Splunk, Streaming and complex event processing (CEP), WibiData

Subscribe to our complete feed!

Comments

5 Responses to “Real-time confusion”

Thomas Bernhardt on November 5th, 2012 2:03 pm

CEP/stream processing could be broken into
CEP and stream processing platforms.

Wherein CEP products provide abstractions and/or power user language for event stream processing through windows, patterns, aggregations etc. Examples are products from Streambase, SAP, Progress, EsperTech and others.

Where stream processing platforms are programming frameworks that make it easier for general purpose streaming processing, examples being Storm, Apache S4, HStreaming and others.

The CEP industry for us, EsperTech, does not live off algorithmic trading and has found many use cases and customers outside of financial.
Curt Monash on November 5th, 2012 2:44 pm

Hi Thomas,

I’ve never talked with EsperTech that I can recall. However, StreamBase, Coral8, Apama, and Truviso all talked a good game about applications outside the financial area, then wound up mainly propsering (or not) in that sector as the bulk of their business.
Al DeLosSantos on November 6th, 2012 11:29 am

Hello Curt,

I used Apama on a trading desk in a middle tier bank. It was a good, functional tool that helped us automate several complex trading strategies. Our main challenge was meeting increased needs for trading-specific functionality (we did not have sufficient resources to develop complete solutions in-house) through the use of a general purpose CEP platform. Apama did begin to enhance their offerings for the Capital Markets. But, competitors such as Broadway Technology who understand the trading space are the vendors who appear to be gaining the most traction.

Very interesting posts on real-time technology and the need for analytic RDBMS’ and good point about the term “real-time”. The past few months “real-time” has been defined for me as the data being collected in a process data historian (InStep product eDNA).

Regards,
Al D.
Comments on the 2013 Gartner Magic Quadrant for Operational Database Management Systems | DBMS 2 : DataBase Management System Services on January 4th, 2014 1:52 pm

[…] The trends Gartner highlights are similar to those I see, although our emphasis may be different, and they may leave some important ones out. (Big omission — support for lightweight analytics integrated into operational applications, one of the more genuine forms of real-time analytics.) […]
Adversarial analytics and other topics | DBMS 2 : DataBase Management System Services on May 30th, 2016 6:15 am

[…] long-standing desire for business intelligence to operate on super-fresh data is, increasingly, making sense, as we get ever more stuff to monitor. However […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Real-time confusion

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin