For a variety of reasons, I don’t plan to post my complete Enzee Universe keynote slide deck soon, if ever. But perhaps one or more of its subjects are worth spinning out in their own blog posts.
I’m going to start with analytic speed or, equivalently, analytic latency. There is, obviously, a huge industry emphasis on speed. Indeed, there’s so much emphasis that confusion often ensues. My goal in this post is not really to resolve the confusion; that would be ambitious to the max. But I’m at least trying to call attention to it, so that we can all be more careful in our discussions going forward, and perhaps contribute to a framework for those discussions as well.
Key points include:
1. There are two important senses of “latency” in analytics. One is just query response time. The other is the length of the interval between when data is captured and when it is available for analytic purposes. They’re often conflated — and indeed I shall do so for the remainder of this post.
2. There are many different kinds of analytic speed, which to a large extent can be viewed separately. Major areas include:
- Data exploration. In-memory OLAP is a huge trend, and QlikView is a hot BI product line.
- Budgeting/planning. In an unprecedentedly frightening economy, annual planning/forecasting cycles may well be too slow.
- Operational integration. This is probably the biggest current area of mission-critical IT advancement. Not coincidentally, it is also the mainstay of the most expensive and complex data warehousing technologies. It’s also an ongoing area of application for event/stream processing, aka CEP.
- General or deep analytics. This is what I seem to spend much of my time writing about — data warehousing price/performance, parallelized data mining, and much more.
- Data administration. Ease of data mart spin-out and administration is becoming a major concern. And of course analytic appliance and DBMS vendors have been telling ease-of-deployment, low-DBA-involvement kinds of stories at least since Netezza first came to market.
There certainly are relationships among those; e.g., a really great analytic DBMS could help speed up any and all of the last three categories. But when assessing your needs, you can go quite far viewing each of those areas separately.
3. It is indeed important to carefully assess your need-for-speed. Acceptable levels of analytic latency vary widely, ranging from sub-millisecond to multi-month. For example, I’ve put together a list:
- Algorithmic trading – Sub-millisecond. Increasingly, that’s what’s needed, at least for query response.
- Web page – Tenths of seconds. If you want to get up a complex web page in 2 seconds or less, you may require sub-second response time for your queries. (E.g., this is a key message from Teradata’s customer success story at Travelocity.)
- Call center – Seconds. If two humans are talking to each other on the phone, a couple-second delay in response is probably acceptable.
- Transportation – Tens of minutes. If a commercial flight is delayed, reaction to minimize the consequences often needs to be sub-hour. The same can be true for cargo transportation (truck, rail, or air). In other cases, a couple of hours may be fast enough.
- Inventory – Hours. In the 1980s, the retailers that won were the ones who reordered hot seasonal merchandise a couple of days before their competitors. Even then, 7-11 Japan was making restocking decisions several times a day. Things have only gotten faster since.
- Planning – Weeks or more. Planning is often done on an annual or even multi-year cycle. That may be excessively slow. But weeks or months? In many cases, that’s both the best achievable and plenty good enough.
That’s a range of at least 9 orders of magnitude, which is a lot like the difference between the speed of a turtle and the speed of light.