Streaming and complex event processing (CEP)
Discussion of complex event processing (CEP), aka event processing or stream processing – i.e., of technology that executes queries before data is ever stored on disk. Related subjects include:
I have various subjects backed up that I don’t really want to write about at traditional blog-post length. Here are a few of them. Read more
|Categories: Analytic technologies, Columnar database management, MarkLogic, Open source, Oracle, Streaming and complex event processing (CEP), Structured documents, Theory and architecture, Vertica Systems||2 Comments|
I visited with SenSage on my two most recent trips to San Francisco. Both visits were, through no fault of SenSage’s, hasty. Still, I think I have enough of a handle on SenSage basics to be worth writing up.
General SenSage highlights include:
|Categories: Analytic technologies, Columnar database management, Data warehousing, Database compression, Log analysis, MapReduce, SenSage, Streaming and complex event processing (CEP), Telecommunications||3 Comments|
Eric Lai emailed today to ask what I thought about the NoSQL folks, and especially whether I thought their ideas were useful for enterprises in general, as opposed to just Web 2.0 companies. That was the first I heard of NoSQL, which seems to be a community discussing SQL alternatives popular among the cloud/big-web-company set, such as BigTable, Hadoop, Cassandra and so on. My short answers are:
- In most cases, no.
- Most of these technologies are designed for simple, high-volume OLTP (OnLine Transaction Processing.) Most large enterprises have an established way of doing OLTP, probably via relational database management systems. Why change?
- MapReduce is an exception, in that it’s designed for analytics. MapReduce may be useful for enterprises. But where it is, it probably should be integrated into an analytic DBMS.
- There’s one big countervailing factor to all these generalities — schema flexibility.
As for the longer form, let me start by noting that there are two main kinds of reason for not liking SQL. Read more
While performance may not be all that great a source of CEP competitive differentiation, event processing vendors find plenty of other bases for technological competition, including application development, analytics, packaged applications, and data integration. In particular:
- Most independent CEP vendors have some kind of application story in the capital markets vertical, such as packaged applications, ISV partners with packaged applications, application frameworks, and so on.
- CEP vendors offer lots of connectors to specific financial industry price/quote/trade feeds, as well as the usual other kinds of database connectivity (SQL, XML, etc.)
- Aleri/Coral8 (separately and now together) like to call attention to their business intelligence/analytics offerings. Analytics is front-and-center on Truviso’s web site too, not that Truviso does much to call attention to itself, period. (Roman Bukary once said he’d outline Truviso’s new strategy to me in 6-8 weeks or so … it’s now 14 months and counting.)
So far as I can tell, the areas of applications and analytics are fairly uncontroversial. Different CEP vendors have implemented different kinds of things, no doubt focusing on those they thought they would find easiest to build and then sell. But these seem to be choices in business execution, not in core technical philosophy.
In CEP application development, however, real philosophical differences do seem to arise. There are at least three different CEP application development paradigms: Read more
|Categories: Aleri and Coral8, Business intelligence, Microsoft and SQL*Server, Progress, Apama, and DataDirect, StreamBase, Streaming and complex event processing (CEP)||5 Comments|
I’ve been talking to CEP vendors on and off for a few years. So what I hear about performance is fairly patchwork. On the other hand, maybe 1-2+ year-old figures of per-core performance are still meaningful today. After all, Moore’s Law is being reflected more in core count than per-core performance, and it seems CEP vendors’ development efforts haven’t necessarily been concentrated on raw engine speed.
So anyway, what do you guys have to add to the following observations?
- Super-low-latency financial services industry tasks are often “embarrassingly parallel.” Thus, near-linear scale-out is common.
- That said, good parallelism seems fairly new in CEP engines (of course, CEP engines are fairly new themselves — for all I know, some have been parallel since inception).
- I’ve heard claims of up to 400,000 messages/second/core for simple queries or patterns.
- I’ve heard claims of 70,000 messages/core for not-so-simple queries or patterns, and probably higher than that depending on what the meaning of “simple” is.
- IBM just disclosed >15,000 messages/core on a pretty low-powered processor.
- I’ve heard that Coral8, Apama, and StreamBase rarely lost deals due to performance or throughput problems. I’ve heard that the same is not as true of Aleri.
- StreamBase proudly says it’s been fully multithreaded since academic research-project days. For Apama multithreading is evidently a more recent feature. But does it matter much?
|Categories: Aleri and Coral8, IBM and DB2, Memory-centric data management, Progress, Apama, and DataDirect, StreamBase, Streaming and complex event processing (CEP)||13 Comments|
After posting about IBM’s System S/InfoSphere Streams CEP offering, I sent three followup questions over to Jeff Jones. It seems simplest to just post the Q&A verbatim.
1. Just how many processors or cores does it take to get those 5 million messages/sec through? A little birdie says 4,000 cores. Read more
|Categories: Analytic technologies, IBM and DB2, Investment research and trading, Streaming and complex event processing (CEP)||7 Comments|
Microsoft still hasn’t worked out all the kinks regarding when and how intensely to brief me. So most of what I know about their announcement earlier this week of a CEP/stream processing product* is what I garnered on a consulting call in March. That said, I sent Microsoft my notes from that call, they responded quickly and clearly to my question as to what remained under NDA, and for good measure they included a couple of clarifying comments that I’ll copy below.
*”in the SQL Server 2008 R2 timeframe,” about which Microsoft wrote “the first Community Technology Preview (CTP) of SQL Server 2008 R2 will be available for download in the second half of 2009 and the release is on track to ship in the first half of calendar year 2010. “
Perhaps it is more than coincidence that IBM rushed out its own announcement of an immature CEP technology — due to be more mature in a 2010 release — immediately after Microsoft revealed its plans. Anyhow, taken together, these announcements support my theory that the small independent CEP/stream processing vendors are more or less ceding broad parts of the potential stream processing market.
The main use cases Microsoft talks about for CEP are in the area of sensor data. Read more
|Categories: Analytic technologies, Application areas, Microsoft and SQL*Server, Streaming and complex event processing (CEP)||8 Comments|
IBM has hastily announced System S Streams, a product that was supposed to be called InfoSphere Streams and introduced only in 2010. Apparently, the rush is because senior management wanted to talk about it later this week, and perhaps also because it was implicitly baked into some of IBM’s advertising already. Scrambling ensued. Even so, Jeff Jones and team got to me fast, and briefed me — fairly non-technically, unfortunately, but otherwise how I like it, namely on a harmless embargo and without any NDAs. That’s more than can be said for my clients at Microsoft, who also introduced CEP this week, but I digress …
*Indeed, as I draft this post-Celtics-game, the embargo is already expired.
Marketing aside, IBM System S/InfoSphere Streams is indeed a CEP/stream processing engine + language (with an Eclipse-based development environment). Apparently, IBM’s thinks InfoSphere Streams (if that’s what it winds up being renamed to) is or will be differentiated from other CEP packages in:
- Scale-out. (That’s the one that appears to be real today. In fact, there’s a prototype running on Blue Gene.)
- Support for complex datatypes such as XML, text, voice, video, etc.
- Security and general industrial-strengthness.
|Categories: Analytic technologies, Application areas, IBM and DB2, Investment research and trading, Scientific research, Streaming and complex event processing (CEP)||3 Comments|
My skeptical remarks on the Aleri/Coral8 merger generated some pushback. Today I actually got around to talking with John Morell, who was marketing chief at Coral8 and has remained with the combined company. First, some quick metrics:
- The combined Aleri has around 100 employees, 60-40 from Aleri vs. Coral8.
- The combined Aleri has around 80 customers. All of Aleri’s, with one sort-of exception at Banks.com, were in financial services. A large minority of Coral8’s were in financial services too.
- However, half of Aleri’s marketing spend going forward is budgeted outside the financial services markets. Not unreasonably, John presents this as a proof point Aleri is serious about selling to other markets.
- Aleri had 12-14 people in the UK pre-merger. Coral8 had none in Europe.
- Coral8 had 15 OEMs pre-merger, some actually generating revenue. Aleri had substantially none.
- Coral8 had been closing a “couple” of customers/quarter in online commerce. But recently, that rate ramped up to a “few.”
- Aleri’s engine is used to handle “many” hundreds of thousands of messages per second. Coral8’s highest-throughput user processes 100-150,000 messages/second.
John is sticking by the company line that there will be an integrated Aleri/Coral8 engine in around 12 months, with all the performance optimization of Aleri and flexibility of Coral8, that compiles and runs code from any of the development tools either Aleri or Coral8 now has. While this is a lot faster than, say, the Informix/Illustra or Oracle/IRI Express integrations, John insists that integrating CEP engines is a lot easier. We’ll see.
I focused most of the conversation on Aleri’s forthcoming efforts outside the financial services market. John sees these as being focused around Coral8’s old “Continuous (Business) Intelligence” message, enhanced by Aleri’s Live OLAP. Aleri Live OLAP is an in-memory OLAP engine, real-time/event-driven, fed by CEP. Queries can be submitted via ODBO/MDX today. XMLA is coming. John reports that quite a few Coral8 customers are interested in Live OLAP, and positions the capability as one Coral8 would have had to develop had the company remained independent. Read more
|Categories: Aleri and Coral8, Analytic technologies, Application areas, Games and virtual worlds, Investment research and trading, MOLAP, Streaming and complex event processing (CEP), Web analytics||4 Comments|
I’m not aware of anyone claiming “CEP is an alternative to a relational RDBMS” – except maybe as an application platform for processing events (where RDBMS could be seen as a square hole for an event round peg).
Actually, it’s hard to think of an application for off-the-shelf CEP where the alternative technologies aren’t:
- Custom CEP
- Just write it to an RDBMS and query it