One of the less popular category names I deal with is “Complex Event Processing (CEP)”. The word “complex” looks weird, and many are unsure about the “event processing” part as well. CEP does have one virtue as a name, however — it’s concise.
The other main alternative is to base the name on “stream processing” instead.* The CEP-or-whatever industry is split between these choices, with StreamBase currently favoring “CEP” (despite its company name), IBM emphatically favoring “stream”, and Sybase seemingly trying to have things both ways.
*And then, of course, there is “event stream processing”, regarding which please see below.
I’ve been juggling this terminological divide myself, referring to complex event/stream processing as long as four years ago. But enough is enough. I’d like to write more about the category without repeatedly apologizing for its name. And so, always bearing in mind Monash’s Third Law of Commercial Semantics, here’s where I’m coming down.
The more I think about it, the less I like the term “event processing”. Here’s why. Events happen; data is produced; CEP systems most commonly try to identify and categorize the events based on the data. The CEP systems may then do significant further processing, but more often they just pass the information on to another system (most commonly either persistent DBMS or “real-time” business intelligence). How much of that is really “event processing”? Relatively little, I’d say. And referring specifically to “complex” events doesn’t address my complaints at all.
So I’d like to go with some version of “stream”. But “stream processing” has other computer-related uses, while “Stream management” commonly describes care and planning for small waterways. So “stream” might do best with a modifier, such as “event” or “data”. Of the two, I prefer “data stream” (or “datastream”) to “event stream”; the events aren’t really streaming, but the data is.
So should it be “data stream processing” or “data stream management”? Well, the only one of numerous Wikipedia definitions I’ve actually liked while researching this post is the one for “Data Stream Management System“:
A Data Stream Management System (DSMS) is a set of computer programs that controls the maintenance and querying of data in data streams. The use of a DSMS to manage a data stream is roughly analogous to the use of a Database Management System (DBMS) to manage a conventional database.
A key feature of a DSMS is the ability to execute a continuous query against a data stream. A conventional database query executes once and returns a set of results for a given point in time. In contrast, a continuous query continues to execute over time, as new data enters the stream. The results of the continuous query are updated as new data appears.
I think the data stream/database management analogy is spot on. Your queries work a little differently, but otherwise you’re doing pretty much the same things. Indeed, you’re probably even going to persistently store some of the data, and ideally that DBMS capability would be tightly integrated into your CEP system. (In practice they’re apt to be more loosely coupled; for most purposes that works well enough.) Query execution, data ingestion, performance monitoring/tuning, workload prioritization — it’s very DBMS-like stuff. And by the way, “data stream management system” is the term that was used by the researchers — Mike Stonebreaker, Stan Zdonik, Dan Abadi, et al. — who wrote a paper describing the project on which StreamBase was based … although some might question whether that particular observation is a strong signal of accuracy.
This reasoning suggests Data Stream Management System is what it should be. The usual kinds of abbreviation — datastream (product), datastream manager, DSMS, etc. would no doubt follow. So should it be “Data Stream”, “Datastream”, or “Data-stream”? At that level of detail, I don’t yet have an opinion.
The only thing is — that’s all pretty wordy compared to CEP. So after all this, I’m still not sure which term(s) I prefer.
What are your thoughts?