Last week, I complained that my first briefing with Coral8 wasn’t very technical. Wednesday I had a call with Mark Tsimelzon, CTO and founder of Coral8, and he made up for that in spades. In this post I’ll cover some of his general comments. Others will touch on more Coral8-specific topics, and his view of the Coral8/StreamBase comparison.
As Mark describes it, the big difference between a DBMS – even an in-memory DBMS – and a complex event processing engine is this: CEP engines do instantaneous incremental processing. He commonly refers to this as registering queries and operators for incremental evaluation. For example, suppose you need to maintain the sum of some data stream over the past 10 minutes. Then each second (or other short unit of time), the system adds in all the values that arrived in the past second, and subtracts all those that arrived 600-601 seconds ago. Voila! The sum is incrementally updated.
Now, rolling sums may not sound very interesting – but where you have rolling sums, you trivially also have rolling averages (just divide the sum by the count) and rolling standard deviations (same idea, with some squares and square roots mixed in). Those, of course, are primitives in Coral8 too. Ditto rolling maxima and minima. Ditto rolling joins (which are updated a lot like materialized views).
As for what happens when actual complex patterns are tracked – well, Mark’s take didn’t sound too different from what I heard from John Bates of Apama. The big issue is to take the thousands of patterns you’re attempting to match and align them in the most efficiently organized tree for optimal evaluation. The big challenge is that the patterns are constantly gaining and losing partial matches, over short periods of time, so that the optimal tree structure may change greatly from one second to the next.
One subject I had a bit of trouble nailing down is: What kinds of data structures are used in Coral8 or other CEP systems? In fact, many different kinds of data structures are used for different parts of the system. In particular, different data structures are used to execute different language primitives. That said, one thing that tends to go on in Coral8 is a kind of b-tree, albeit with very different balancing algorithms from conventional DBMS because of the very high rate of what amount to inserts and deletes. Mark also mentioned hash tables as having a role. Other vendors have confirmed to me they have structures akin to columns/bitmaps. And as already noted, tree structures play an important role as well.
At one point – about 1 ½ hours into the conversation, if memory serves – I asked Mark to sum up some of the differences between conventional DBMS and CEP engines like Coral8. He offered three specific areas:
1. Memory and buffer management. Both kinds of systems have these issues, of course, but the optimizations involved are very different. Indeed, CEP engines have fewer different tasks to do than full-fledged DBMS, so in some ways the challenge is simpler. However, he noted that Coral8 has to have a strong caching capability, so that it can federate in data from conventional disk-centric DBMS.
2. Scheduling. Conventional DBMS are driven by queries and updates. However, CEP systems are driven by the stream of incoming data.
3. Query representation. In a conventional DBMS, a query execution plan is a series of steps that (loops aside) are to be executed once each. In a CEP system, there’s a joint execution plan for many queries. And each step of the joint plan is (potentially) executed anew many times per second, as each new message comes in.