IBM has hastily announced System S Streams, a product that was supposed to be called InfoSphere Streams and introduced only in 2010. Apparently, the rush is because senior management wanted to talk about it later this week, and perhaps also because it was implicitly baked into some of IBM’s advertising already. Scrambling ensued. Even so, Jeff Jones and team got to me fast, and briefed me — fairly non-technically, unfortunately, but otherwise how I like it, namely on a harmless embargo and without any NDAs. That’s more than can be said for my clients at Microsoft, who also introduced CEP this week, but I digress …
*Indeed, as I draft this post-Celtics-game, the embargo is already expired.
Marketing aside, IBM System S/InfoSphere Streams is indeed a CEP/stream processing engine + language (with an Eclipse-based development environment). Apparently, IBM’s thinks InfoSphere Streams (if that’s what it winds up being renamed to) is or will be differentiated from other CEP packages in:
- Scale-out. (That’s the one that appears to be real today. In fact, there’s a prototype running on Blue Gene.)
- Support for complex datatypes such as XML, text, voice, video, etc.
- Security and general industrial-strengthness.
IBM Streams seems to be a six-year-old project, with about 15 installations, about half of them true sales. The original and main customer appears to be the US government, with allied governments also in the picture. This usage involves a lot of text/voice/video/whatever analysis, but the technology for that is government-developed; IBM’s own complex datatype technology is one of the features left out to achieve the earlier-than-planned product release. Besides governments, it seems there have been actual InfoSphere Streams sales to universities.
Besides those true sales of System S Streams, there are prototypes and — in a new-to-me phrase — “First of a Kind” projects, which I gather are IP-based services engagements. One is with TD Bank Financial — that’s the one on Blue Gene — handling 5 million messages per second, which is an order of magnitude or so higher than any figure I’ve ever heard from StreamBase, Progress Apama, Aleri/Coral8, or Truviso. Latency is in the 150 microsecond range. IBM believes it is indeed close to a couple of true sales to financial services firms, presumably for super-low-latency algorithmic trading.
As is usually the case for CEP vendors, the financial services market seems to be the only one that cares about super-low latency. Everything else IBM talked about seemed to be in the area of data reduction, although IBM likes to think of that as identifying which data matches certain patterns and then transforming it accordingly. Examples mentioned include:
- Intelligence work
- Identifying marine mammals via sonar
- Neonatal ICU monitoring, being prototyped at the Ontario Institute of Technology
- Wafer testing at IBM’s own Fishkill manufacturing plant
- Info security (in collaboration with IBM’s intrusion detection arm, the former ISS, with a hoped-for first customer at REDACTED)
As I’ve previously noted, the independent CEP vendors aren’t all that active in data reduction, Aleri/Coral8 (somewhat) and StreamBase’s intelligence-community efforts excepted. I’m not in a position to discuss other software generalists’ CEP efforts at the moment, but let’s just say I think data reduction is a generally under-served market, with lots of different niches to fill. IBM System S/InfoSphere Streams was well worth bringing to market.
IBM sent a couple of PDFs with more detail on applications and architecture. I can’t find them online, and I think I have permission to post them, so here they are. (Edit: Jeff Jones subsequently sent over an official IBM white paper link.)
IBM has its own SQL-based stream processing language called SPADE, with an Eclipse-based IDE. SPADE is not StreamSQL-compatible, although some prototype work suggests IBM could support StreamSQL if necessary. The overall IBM InfoSphere Streams product delivery roadmap is, and I quote IBM’s final slide:
- Improved tools
- SPADE 2 Language Specification
- Java based custom analytics
- Additional adapters
- MQ, Low Latency Messaging, RSS feeds, Cognos, Mashup Center, WebSphere Business Events, XML
- Additional Analytics
- Text, Warehouse, video, audio, dense information grinding
- Additional platforms
- Windows, Unix vendors, Blue Gene, Cell blades, FPGA
In the first (did I mention it was rushed?) release, System S/InfoSphere Streams supports Intel architectures and Red Hat Linux only.