The highest-profile applications for complex event/stream processing are probably the ones that require super-low latency, especially in financial trading. However, as I already noted in writing about StreamBase and Truviso, there are plenty of other CEP apps with less extreme latency requirements.
Commonly, these are data reduction apps – i.e., there’s a gushing stream of inputs, and the CEP engine filters and “enhances” it, so that only a small, modified subset is sent forward. In other cases, disk-based systems could do the job perfectly well from a performance standpoint, but the pattern matching and filtering requirements are just a better fit for the CEP paradigm.
For example, StreamBase, Apama, and Coral8 each have some degree of activity in text filtering.
Also, I have a little more detail beyond what I already wrote about some of Coral8′s other applications.
1. One disclosed network security application is by Solutionary. Solutionary is a managed service provider that correlates different kinds of threat markers in real time to try to separate innocent unusual events from actual bad ones. Many of these are fairly slow (log-ins to unusual services, log-ins from scary countries, multiple log-in attempts to the same account); others presumably are at network speed; and the whole thing is definitely a pattern-matching exercise with time windows playing a core role.
2. One large Coral8 customer manages tens of thousands of trucks all over the United States. The main application is to notice promptly if a truck isn’t where it’s supposed to be, by matching GPS tracking against schedules. But there also are other sensors; e.g., in refrigerator trucks temperature is monitored every minute.
Mark notes that this sensor data is natively in XML format, without every having been spawned from a relational database. In general, he points out, if data comes from other systems, it’s likely to be in XML format. To which I add: If the data comes from any kind of inherently real-time system, then almost by definition it’s unlikely to have come from an RDBMS. Ditto if it comes from any kind of sensor network.
3. Speaking of sensor data, IBM has announced that Coral8 is being bundled into its WebSphere RFID Information Center v1.1.
4. Mark and I talked briefly about fraud detection and prevention. He pointed out that fraud applications are all about events having happened in a certain order. I see that as basically true, albeit slightly exaggerated in that it depends a lot on the kind of fraud (e.g., less true in insurance fraud than some other cases).
However, the question arises: For what kinds of fraud prevention is overnight analysis (for example) not sufficient? One example Mark came up with is bad cashier behavior in huge casino restaurants, which apparently is hard to document after the fact because of how much it relies on the physical cash drawer. Beyond that – well, we didn’t cover a lot more detail, but a couple of areas subsequently occurred to me. First, there are a lot of cases where it might be much more comfortable for a business to block an online or telephone transaction in the first place than it is for them to reverse it later. Also, any case where credit authorizations are made on behalf of bricks-and-mortar retailers has to be handled either real-time or not at all.