Microsoft still hasn’t worked out all the kinks regarding when and how intensely to brief me. So most of what I know about their announcement earlier this week of a CEP/stream processing product* is what I garnered on a consulting call in March. That said, I sent Microsoft my notes from that call, they responded quickly and clearly to my question as to what remained under NDA, and for good measure they included a couple of clarifying comments that I’ll copy below.
*”in the SQL Server 2008 R2 timeframe,” about which Microsoft wrote “the first Community Technology Preview (CTP) of SQL Server 2008 R2 will be available for download in the second half of 2009 and the release is on track to ship in the first half of calendar year 2010. “
Perhaps it is more than coincidence that IBM rushed out its own announcement of an immature CEP technology — due to be more mature in a 2010 release — immediately after Microsoft revealed its plans. Anyhow, taken together, these announcements support my theory that the small independent CEP/stream processing vendors are more or less ceding broad parts of the potential stream processing market.
The main use cases Microsoft talks about for CEP are in the area of sensor data. For example, Microsoft has prospects or customers who have many pieces of manufacturing or resource extraction equipment each. Each may spawn only a few messages per second, but overall there can be 1000s of messages/second, or indeed terabytes of data/day. (The orders of magnitude don’t quite match up there, but we were speaking pretty vaguely anyway.)
Microsoft called out four reasons to me why CEP might be needed in addition to ordinary database processing. Two are the standard reasons for data reduction:
1. Without CEP, you can’t bang the data into the database fast enough.
2. You don’t want to keep most of the data past a short time window anyway.
The other two are also fairly standard reasons for using CEP:
3. Standard SQL isn’t all that great for time series anyway.
4. CEP use cases often call for incremental processing and/or parameterization of queries, something CEP engines are commonly better designed for than are DBMS.
However, Microsoft seems to be taking a somewhat different approach to time-based SQL extensions than some other vendors. To quote email Microsoft sent today:
Microsoft Research (MSR) introduced the temporal extensions to relational algebra based upon a notion of application time that is independent of system time. It matters when the event originated instead of when they arrived at the processing system. Further it treats each event as being associated with an interval of time as opposed to a point in time. This helps in modeling certain real life phenomenon naturally. [StreamBase et al.] also reason about multiple streams. Both the approaches are extensions to relational algebra. The MSR approach took the algebra as the starting point while StreamBase took an existing language over the algebra – SQL as the starting point. The MSR approach consequently avoids having to rework other elements of the SQL surface. The primary language extensions through which this algebra will be exposed initially is LINQ.
Microsoft’s CEP capability is “fully integrated” with Visual Studio. There will be lots of adapters, both for inputs and outputs, with perhaps the most interesting non-obvious one being Excel charts. I definitely like the idea of CEP engines doing a good job of integrating both with dashboards/BI and with operational apps, because if you get value from one of those integrations, you’re apt to quickly want the other as well.
Microsoft told me that its CEP is written in a combination of “managed” and “native” code, where “managed” code, in Microsoft lingo, is more an issue of memory management than code. Noticing that I was confused on this point, Microsoft elucidated by email:
The implementation is built around getting the technologies available in the CLR and native code together to build the best implementation possible. We use the ability to do JIT code generation to efficiently evaluate expressions and back it up with very effective native memory management techniques.