I’ve been an analyst for 35 years, and debates about “real-time” technology have run through my whole career. Some of those debates are by now pretty much settled. In particular:
- Yes, interactive computer response is crucial.
- Into the 1980s, many apps were batch-only. Demand for such apps dried up.
- Business intelligence should occur at interactive speeds, which is a major reason that there’s a market for high-performance analytic RDBMS.
- Theoretical arguments about “true” real-time vs. near-real-time are often pointless.
- What matters in most cases is human users’ perceptions of speed.
- Most of the exceptions to that rule occur when machines race other machines, for example in automated bidding (high frequency trading or otherwise) or in network security.
A big issue that does remain open is: How fresh does data need to be? My preferred summary answer is: As fresh as is needed to support the best decision-making. I think that formulation starts with several advantages:
- It respects the obvious point that different use cases require different levels of data freshness.
- It cautions against people who think they need fresh information but aren’t in a position to use it. (Such users have driven much bogus “real-time” demand in the past.)
- It covers cases of both human and automated decision-making.
Straightforward applications of this principle include:
- In “buying race” situations such as high-frequency trading, data needs to be as fresh as the other guy’s, and preferably even fresher.
- Supply-chain systems generally need data that’s fresh to within a few hours; in some cases, sub-hour freshness is needed.
- That’s a good standard for many desktop business intelligence scenarios as well.
- Equipment-monitoring systems’ need for data freshness depends on how quickly catastrophic or cascading failures can occur or be averted.
- Different specific cases call for wildly different levels of data freshness.
- When equipment is well-instrumented with sensors, freshness requirements can be easy to meet.
E-commerce and other internet interaction scenarios can be more complicated, but it seems safe to say:
- Recommenders/personalizers should take into account information from the current session.
- Try very hard to give customers correct information about merchandise availability or pricing.
In meeting freshness requirements, multiple technical challenges can come into play.
- Traditional batch aggregation is too slow for some analytic needs. That’s a core reason for having an analytic RDBMS.
- Traditional data integration/movement pipelines can also be too slow. That’s a basis for short-request-capable data stores to also capture some analytic workloads. E.g., this is central to MemSQL’s pitch, and to some NoSQL applications as well.
- Scoring models at interactive speeds is often easy. Retraining them quickly is much harder, and at this point only rarely done.
- OLTP (OnLine Transaction Processing) guarantees adequate data freshness …
- … except in scenarios where the transactions themselves are too slow. Questionably-consistent systems — commonly NoSQL — can usually meet performance requirements, but might have issues with the freshness of accurate data.
- Older generations of streaming technology disappointed. The current generation is still maturing.
Based on all that, what technology investments should you be making, in order to meet “real-time” needs? My answers start:
- Customer communications, online or telephonic as the case may be, should be based on accurate data. In particular:
- If your OLTP data is somehow siloed away from your phone support data, fix that immediately, if not sooner. (Fixing it 5-15 years ago would be ideal.)
- If your eventual consistency is so eventual that customers notice, fix it ASAP.
- If you invest in predictive analytics/machine learning to support your recommenders/personalizers, then your models should at least be scored on fresh data.
- If your models don’t support that, reformulate them.
- If your data pipeline doesn’t support that, rebuild it.
- Actual high-speed retraining of models isn’t an immediate need. But if you’re going to have to transition to that anyway, consider doing do early and getting it over with.
- Your BI should have great drilldown and exploration. Find the most active users of such functionality in your enterprise, even if — especially if! — they built some kind of departmental analytic system outside the enterprise mainstream. Ask them what, if anything, they need that they don’t have. Respond accordingly.
- Whatever expensive and complex equipment you have, slather it with sensors. Spend a bit of research effort on seeing whether the resulting sensor logs can be made useful.
- Please note that this applies both to vehicles and to fixed objects (e.g. buildings, pipelines) as well as traditional industrial machinery.
- It also applies to any products you make which draw electric power.
So yes — I think “real-time” has finally become pretty real.