My recent non-technical Apama briefing has now had a much more technical sequel, with charming founder and former Cambridge professor John Bates. He still didn’t fully open the kimono – trade secrets and all that — but here’s the essence of what’s going on.
Complex event/stream processing (CEP) is all about looking for many patterns at once. Reality – the stream(s) of data – is checked against these patterns for matches. In Apama, these patterns are kept in a kind of tree – they call it a hypertree — and John says the work to check them is only logarithmic in the number of patterns.
Since patterns commonly have multiple parts — and usually also take time to unfold — what really goes on is that partial matches are found, after which what’s being matched against is the REMAINDER of the pattern. Thus, there’s constant pruning and rebalancing of the tree. What’s more, a large fraction of all patterns – at least in the financial trading market — involve a short time window, which again creates a need for ongoing, rapid tree modification.
The basic development interface/paradigm to Apama is the rule/frame:
when pattern-is-matched, then do-action-1, do-action-2, …
An action can be a normal program operation step, or it can be to enter new, transformed information into the stream for other patterns to check against.
John asserts that, under the covers, Apama doesn’t look a lot like the classical models of rules engines or expert system shells, specifically RETE. That said, I suspect it’s fairer to view Apama’s approach as “improvement on RETE” rather than “unrelated to RETE.” If left-hand-sides of rules are arranged in a tree, and incoming facts modify the tree, there’s at least a whiff of RETE in the air.
Other highlights included:
- Apama is of course designed to tightly integrate the filtering/enhancement of data with the actions based on the data processing. John fondly thinks this is a significant performance benefit, by eliminating the overhead that rival systems have in their quasi-SQL calls. I wasn’t convinced.
- John also thinks that Apama can put a quasi-SQL interface on their system if the market demands it. I was convinced.
- Apama’s core programming language MonitorScript is OO-like but not really OO (i.e., it lacks inheritance). In the 1990s we called this “object-based” or “event-driven.” He calls it “event-oriented.” You can also write in Java; there’s a JVM built-in.
- A geospatial datatype (which is to say, Cartesian coordinates) has been built in from the beginning. With Apama’s strong focus on financial trading, however, it’s rarely or never been used in a deployed system. There are finally some prototypes now being built in telecom (location-based services) and logistics (trucking gets mentioned a lot in discussions with CEP vendors these days).
- Besides the developer studio/IDE, there’s a power-user facility called Scenario Modeler. It looks somewhat more formidable than what BI power users normally face, but in the same league. Certainly anybody who can be a SAS programmer should be able to cope with it as well.