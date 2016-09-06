September 6, 2016

I’ve been an analyst for 35 years, and debates about “real-time” technology have run through my whole career. Some of those debates are by now pretty much settled. In particular:

Yes, interactive computer response is crucial. Into the 1980s, many apps were batch-only. Demand for such apps dried up. Business intelligence should occur at interactive speeds, which is a major reason that there’s a market for high-performance analytic RDBMS.

is crucial. Theoretical arguments about “true” real-time vs. near-real-time are often pointless. What matters in most cases is human users’ perceptions of speed. Most of the exceptions to that rule occur when machines race other machines, for example in automated bidding (high frequency trading or otherwise) or in network security.

A big issue that does remain open is: How fresh does data need to be? My preferred summary answer is: As fresh as is needed to support the best decision-making. I think that formulation starts with several advantages:

It respects the obvious point that different use cases require different levels of data freshness.

It cautions against people who think they need fresh information but aren’t in a position to use it. (Such users have driven much bogus “real-time” demand in the past.)

It covers cases of both human and automated decision-making.

Straightforward applications of this principle include:

In “buying race” situations such as high-frequency trading, data needs to be as fresh as the other guy’s, and preferably even fresher.

Supply-chain systems generally need data that’s fresh to within a few hours; in some cases, sub-hour freshness is needed.

That’s a good standard for many desktop business intelligence scenarios as well.

Equipment-monitoring systems’ need for data freshness depends on how quickly catastrophic or cascading failures can occur or be averted. Different specific cases call for wildly different levels of data freshness. When equipment is well-instrumented with sensors, freshness requirements can be easy to meet.



E-commerce and other internet interaction scenarios can be more complicated, but it seems safe to say:

Recommenders/personalizers should take into account information from the current session.

Try very hard to give customers correct information about merchandise availability or pricing.

In meeting freshness requirements, multiple technical challenges can come into play.

Traditional batch aggregation is too slow for some analytic needs. That’s a core reason for having an analytic RDBMS.

Traditional data integration/movement pipelines can also be too slow. That’s a basis for short-request-capable data stores to also capture some analytic workloads. E.g., this is central to MemSQL’s pitch, and to some NoSQL applications as well.

Scoring models at interactive speeds is often easy. Retraining them quickly is much harder, and at this point only rarely done.

OLTP (OnLine Transaction Processing) guarantees adequate data freshness …

… except in scenarios where the transactions themselves are too slow. Questionably-consistent systems — commonly NoSQL — can usually meet performance requirements, but might have issues with the freshness of accurate data.

data. Older generations of streaming technology disappointed. The current generation is still maturing.

Based on all that, what technology investments should you be making, in order to meet “real-time” needs? My answers start:

Customer communications, online or telephonic as the case may be, should be based on accurate data. In particular: If your OLTP data is somehow siloed away from your phone support data, fix that immediately, if not sooner. (Fixing it 5-15 years ago would be ideal.) If your eventual consistency is so eventual that customers notice, fix it ASAP.

If you invest in predictive analytics/machine learning to support your recommenders/personalizers, then your models should at least be scored on fresh data. If your models don’t support that, reformulate them. If your data pipeline doesn’t support that, rebuild it. Actual high-speed retraining of models isn’t an immediate need. But if you’re going to have to transition to that anyway, consider doing do early and getting it over with.

Your BI should have great drilldown and exploration. Find the most active users of such functionality in your enterprise, even if — especially if! — they built some kind of departmental analytic system outside the enterprise mainstream. Ask them what, if anything, they need that they don’t have. Respond accordingly.

Whatever expensive and complex equipment you have, slather it with sensors. Spend a bit of research effort on seeing whether the resulting sensor logs can be made useful. Please note that this applies both to vehicles and to fixed objects (e.g. buildings, pipelines) as well as traditional industrial machinery. It also applies to any products you make which draw electric power.



So yes — I think “real-time” has finally become pretty real.

