April 29th, 2008 Curt Monash
Truviso and EnterpriseDB announced today that there’s a Truviso “blade” for Postgres Plus. By email, EnterpriseDB Bob Zurek endorsed my tentative summary of what this means technically, namely:
-
There’s data being managed transactionally by EnterpriseDB.
-
Truviso’s DML has all along included ways to talk to a persistent Postgres data store.
-
If, in addition, one wants to do stream processing things on the same data, that’s now possible, using Truviso’s usual DML.
Read the rest of this entry »
Posted in Analytics and analytic technologies, Business intelligence, Complex event/stream processing (CEP), Data types, EnterpriseDB and Postgres Plus, Games and virtual worlds, Memory-centric data management, Open source RDBMS, PostgreSQL, Specialized data management in general, Truviso | 1 Comment »
March 19th, 2008 Curt Monash
I talked with both Coral8 and Truviso this afternoon. They both have their financial services efforts, of course. Coral8 also continues to get business doing data reduction for sensor networks — mainly RFID and utilities, I think. Coral8 is working on some really cool and confidential other stuff as well.
But my biggest takeaway from this pair of calls was that Coral8 and Truviso are penetrating general BI. Read the rest of this entry »
Posted in Analytics and analytic technologies, Business intelligence, Complex event/stream processing (CEP), Coral8, Memory-centric data management, Truviso | No Comments »
March 19th, 2008 Curt Monash
It seems that the CEP folks are still concerned about what to call themselves. There really are only three choices:
- Complex event processing
- Event processing
- Event stream processing
“Stream processing” might once have been on the list, but it has too many other meanings, and “streaming” adds more meanings yet.
“Complex” has the virtue of inertia; CEP is the closest thing the category has to an agreed-upon name. But few people want to buy technology that describes itself as being “complex.” And in any case it’s not clear how complex many of those events are. “Event stream processing” isn’t terribly well established, and to some extent it runs afoul of the same ambiguities as “stream processing.” What’s worse, those names lead to four-word product category names. Who really wants to market or hear about “complex event processing engines” or “event stream processing platforms”?
So let’s just call the category “event processing” and have done with it, OK? Products can, if they want, be “event processing somethings.” Names like that wouldn’t be any more of a mouthful than “data warehouse appliance,” and the latter category is doing pretty well for itself.
Please subscribe to our feed!
Posted in Complex event/stream processing (CEP) | No Comments »
January 16th, 2008 Curt Monash
There’s a lot of agitation today because Twitter broke under the message volume generated during Steve Jobs’ Macworld keynote. I don’t know what that volume was, but I just checked the lower volume of tweets (i.e., updates) going through the “public timeline” (i.e., everything) twice, and both times it was under 200 messages per minute. So, let’s say there’s a much higher volume at peak times, and also hypothesize that Twitter would like to grow a lot, and say that Twitter would like to handle 10-100,000 messages/minute – i.e., 1000+/second — as soon as possible.
That’s easy using CEP (Complex Event Processing). A Twitter update is just a string of 140 or fewer characters. It is associated with three pieces of metadata – author, time, and mode of posting. It should be visible in real time to any of the author’s “followers,” as well as in a single public timeline; perhaps there will be other kinds of Twitter channels in the future. In most cases, these updates are only visible to a user upon page refresh. Almost nNo Twitter user seems to have more than about 7,000 followers, even Robert Scoble or Evan Williams.* The average number of followers, at least among active updaters, is probably in the low hundreds now. So basically, this is all a heckuva lot easier than the tick-monitoring systems Wall Street firms are using today.
*I believe there’s a hard cap of 7,500, but nobody seems to have bumped against it yet.Twitterholic gives a different figure than Twitter does for Scoble. And it correctly shows Dave Troy with a little over 10,000.
Here’s how to implement that. Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Memory-centric data management | 11 Comments »
November 13th, 2007 Curt Monash
Coral8 today is rolling out the Coral8 Portal, offering some BI basics for CEP (Complex Event Processing) filters and queries. In Release 1, this is primitive compared with other BI portals, and of direct interest only to organizations that have already decided they’re using CEP technology. Even so, it serves as a useful illustration of several important issues in dashboarding.
The simplest is that real-time dashboards require different visualizations than others. Most obvious is the ever-popular graph marching from right to left across the screen as time advances along the x-axis. There also are difference in styles between reports and tables that you actually read, vs. read-outs that you merely watch for flickers of change. (Of course those two examples hardly make for a complete list.)
More interesting is the flexibility and parameterization. While Coral8 sells to multiple markets, the design point for the portal is clearly financial trading. So, for example, a query may be registered with one ticker symbol, and an end user can easily customize it to slot in another one instead. In a way, this is a step toward the much greater flexibility that dashboards need overall.
Truth be told, if you put all such Coral8 flexibility features together they’re not yet very impressive. So what’s even more interesting is the overall architecture that could support much greater flexibility in the future. If dashboards gain the flexibility they need, and queries continue to be done in the conventional manner, query volumes will increase enormously. If it further is the case that they are upgraded in some near real-time manner, that’s another huge increase.
How huge? Well, I can make a case that it could be well over three orders of magnitude: Read the rest of this entry »
Posted in Analytics and analytic technologies, Business intelligence, Complex event/stream processing (CEP), Coral8, Memory-centric data management | 1 Comment »
August 15th, 2007 Curt Monash
Robin Bloor is one of the best analysts around — which doesn’t say much about his eponymous firm, since he no longer works there, but I digress. Even so, he evidently got snookered by a Truviso spokesperson, as evidenced by this article.
Apparently, Truviso convinced him that other CEP firms execute one query at a time, while Truviso executes a bunch of queries at once. Well, the latter part of that is presumably true, but it’s hardly the big differentiatior for Truviso Robin would have one believe. That’s what everybody else — StreamBase, Coral8, Progress Apama, et al. — do too. I wouldn’t be surprised if Truviso had a somewhat different architecture for doing it (each vendor describes its approach in rather different language), or even if this were a particular focus and strongpoint of theirs. But fundamentally, all the CEP vendors are doing the same thing.
Posted in Complex event/stream processing (CEP), Memory-centric data management, Truviso | No Comments »
August 13th, 2007 Curt Monash
Coral8 at the time of a recent product release stated that it was improving the predictability of its queries. While this may sound like it has something to do with determinism, it doesn’t. Rather, it’s a matter of making what actually happens as a query result be more in line with what one would think will happen when one reads the query.
Coral8 CTO Mark Tsimelzon goes on to note:
But remember, we are really talking about a corner case — highly complex queries involving loops. We only had a couple of customers who were occasionally hitting queries that complex. The beauty of our SQL-based language is that the vast majority of queries, perhaps 99%, are very easy to understand, and their behavior is exactly what you’d expect based on your SQL experience.
Posted in Complex event/stream processing (CEP), Coral8, Memory-centric data management | No Comments »
August 12th, 2007 Curt Monash
The highest-profile applications for complex event/stream processing are probably the ones that require super-low latency, especially in financial trading. However, as I already noted in writing about StreamBase and Truviso, there are plenty of other CEP apps with less extreme latency requirements.
Commonly, these are data reduction apps – i.e., there’s a gushing stream of inputs, and the CEP engine filters and “enhances” it, so that only a small, modified subset is sent forward. In other cases, disk-based systems could do the job perfectly well from a performance standpoint, but the pattern matching and filtering requirements are just a better fit for the CEP paradigm.
Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Coral8, Hierarchies, networks, graphs, and trees, IBM and DB2, Memory-centric data management, Native XML, StreamBase | No Comments »
August 10th, 2007 Curt Monash
Complex event/stream processing vendors compete fiercely on the basis of low latency, down to the single-digit number of milliseconds, or even sub-millisecond levels. A question naturally springs to mind: When does this extreme low latency matter?
I think I’ve come up with a concise yet fairly accurate answer: Super-low latency matters when the application includes direct competition against a similarly fast opponent. The best example is automated stock trading – if you can exploit a market inefficiency 1 millisecond before your competition, you make money.
Other examples might arise in network security or battlefield systems, but I don’t know of any specific real-life cases. Instead, other applications for complex event/stream processing tend to be content with latencies that are easier to achieve. E.g., 100 milliseconds (1/10 of second) is likely to be plenty fast enough.
Keep getting great research about data management and related technologies. Get a FREE subscription by RSS/Atom or e-mail!
Posted in Complex event/stream processing (CEP), Memory-centric data management | No Comments »
August 10th, 2007 Curt Monash
Besides talking about what Coral8 and StreamBase (and other CEP vendors) have in common, Mark Tsimelzon and I talked quite a bit about what he sees as some of the important differences. There were a lot, of course, but three in particular stood out.
1. Mark believes Coral8 has significantly lower latency than StreamBase. E.g., the Wombat/Coral8 combo achieves sub-millisecond latency, with Coral8 itself consuming less than a tenth of that. The best comparable figures from StreamBase that I currently know of are almost an order of magnitude slower.
Top-end speed aside, Mark believes that Coral8 is fundamentally better suited for complex queries and pattern recognition, while StreamBase works well with simpler queries. For example, his other performance claims notwithstanding, he concedes that StreamBase is at least comparable to Coral8 in its throughput for huge numbers of simple queries. (The number he mentioned was ½ million queries/second.) Indeed, while we barely talked about customer/marketing issues, Mark asserts that the companies’ respective customer bases reflect this complex/simple distinction.*
Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Coral8, Memory-centric data management, Progress, Apama, and DataDirect, StreamBase | 4 Comments »
August 10th, 2007 Curt Monash
Last week, I complained that my first briefing with Coral8 wasn’t very technical. Wednesday I had a call with Mark Tsimelzon, CTO and founder of Coral8, and he made up for that in spades. In this post I’ll cover some of his general comments. Others will touch on more Coral8-specific topics, and his view of the Coral8/StreamBase comparison.
As Mark describes it, the big difference between a DBMS – even an in-memory DBMS – and a complex event processing engine is this: CEP engines do instantaneous incremental processing. He commonly refers to this as registering queries and operators for incremental evaluation. For example, suppose you need to maintain the sum of some data stream over the past 10 minutes. Then each second (or other short unit of time), the system adds in all the values that arrived in the past second, and subtracts all those that arrived 600-601 seconds ago. Voila! The sum is incrementally updated.
Now, rolling sums may not sound very interesting – but where you have rolling sums, you trivially also have rolling averages (just divide the sum by the count) and rolling standard deviations (same idea, with some squares and square roots mixed in). Those, of course, are primitives in Coral8 too. Ditto rolling maxima and minima. Ditto rolling joins (which are updated a lot like materialized views).
Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Coral8, Memory-centric data management | 1 Comment »
August 3rd, 2007 Curt Monash
For the most part, the vendors I talk with in complex event/stream processing like and speak well of each other (most of the exceptions seem to involve StreamBase). Even so, there are a lot of interesting competitive claims and counterclaims in this market. Prior posts and comment threads have covered Apama/StreamBase jousting on the subjects of who has more business and how many financial data feeds StreamBase supports. Other areas that generate interesting sparks are performance, parallelism, and determinism. Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Coral8, Memory-centric data management, Progress, Apama, and DataDirect, StreamBase | 1 Comment »
August 3rd, 2007 Curt Monash
My recent non-technical Apama briefing has now had a much more technical sequel, with charming founder and former Cambridge professor John Bates. He still didn’t fully open the kimono – trade secrets and all that — but here’s the essence of what’s going on.
Complex event/stream processing (CEP) is all about looking for many patterns at once. Reality – the stream(s) of data – is checked against these patterns for matches. In Apama, these patterns are kept in a kind of tree – they call it a hypertree — and John says the work to check them is only logarithmic in the number of patterns.
Since patterns commonly have multiple parts — and usually also take time to unfold — what really goes on is that partial matches are found, after which what’s being matched against is the REMAINDER of the pattern. Thus, there’s constant pruning and rebalancing of the tree. What’s more, a large fraction of all patterns – at least in the financial trading market — involve a short time window, which again creates a need for ongoing, rapid tree modification. Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Memory-centric data management, Progress, Apama, and DataDirect | 2 Comments »
August 3rd, 2007 Curt Monash
Complex event/stream processing vendor Coral8 raised its hand and offered a briefing – non-technical, alas, but at least it was a start. Here are some of the highlights: Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Coral8, Hierarchies, networks, graphs, and trees, Memory-centric data management, Native XML, StreamBase | No Comments »
July 18th, 2007 Curt Monash
In my post Monday about Apama, I complained that StreamBase hadn’t offered a rebuttal to some of Apama’s claims. This has now been fixed.
Bill Hobbib, StreamBase’s VP of Marketing wrote in. Part of what he had to say was the following.
Adapters to Data Feeds
Your blog comment that adapters doesn’t seem like a key competitive differentiator is accurate, and since adapters are so straightforward to develop with StreamBase as part of a customer engagement, we’ve never found adapters to be a key competitive differentiator. The comment by a competitor that their advantage over StreamBase comes from their having developed more adapters suggests they cannot distinguish themselves based on the other functional capabilities that are important to customers. In reality, our speed/performance and scalability are orders of magnitude superior to competitors, as is the speed with which StreamBase applications are developed, deployed, and modified when business needs change. (If it were easy to develop applications with certain competitive systems, then one might assume they would make free evaluation versions of their product available for download from their websites!)
That being said, StreamBase offers adapters to a broad array of data feeds. Most of these are offered out-of-the-box by StreamBase, including the following:
* Financial Market Data: processes data from Reuters® RMDS™ and Reuters Triarch™
* TIBCO® Rendezvous™: converts Rendezvous message into StreamBase tuples and vice versa.
* StreamBase Adapter for JDBC: connects StreamBase to enterprise databases, allowing submission of SQL queries to external resources such as IBM® DB2™, Oracle®, Microsoft® SQLServer™, and Sybase®.
* StreamBase Adapter for JMS: integrates StreamBase with any JMS-compliant message bus.
* StreamBase Adapter for Microsoft Excel™: allows applications to publish data to Excel or read data from Excel.
* StreamBase CSV Adapters: allow applications to read data from, and write data to, comma-separated value (CSV) files.
* StreamBase SMTP adapter: taps into the IP stack on a running system to process live data, converts the IP packets into a TCP data stream, or reads IP packets from captured files.
* StreamBase XML Adapter: streams XML-formatted data records into and out of StreamBase applications
We also can connect to financial exchanges either using our own adapters or through a third-party partnership. Below you’ll find a listing of those.
Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Memory-centric data management, Progress, Apama, and DataDirect, StreamBase | No Comments »
July 16th, 2007 Curt Monash
I finally got my promised briefing with Progress Apama. Unfortunately, nobody particularly technical was able to attend, but I came away with a better understanding even so.
Unlike StreamBase or Truviso, Apama has a rules-based architecture. In essence, the rules engine maintains state of various kinds, and matches that state against desired patterns, called “scenarios.” They can handle 100s or possibly even 1000s of scenarios at once. Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Memory-centric data management, Progress, Apama, and DataDirect | 1 Comment »
July 13th, 2007 Curt Monash
I just finished a short Monash Letter on markets for nonstandard data management software. Of course, the whole thing is available only to Monash Advantage members, but here are some salient points:
- When new kinds of data are managed, new kinds of data management are used. More precisely, the old ways are tried first — but once they fail new technologies are tried out.
- Up through the “Bowling Alley,” markets for nonstandard data management technology commonly follow the classic Geoffrey Moore pattern. However, they rarely experience a “Tornado” or mass adoption.
- I think this is apt to change. My three strongest candidates are native XML, RDF, and memory-centric event/stream processing used for data reduction (as opposed to sub-millisecond latency, which I do think will continue to be a niche requirement).
Posted in Complex event/stream processing (CEP), Hierarchies, networks, graphs, and trees, Memory-centric data management, Native XML, RDF and graphs | No Comments »
June 18th, 2007 Curt Monash
Mike Stonebraker wrote in with one “nit pick” about yesterday’s blog. I had credited Truviso for strong DBMS/stream processor integration. He shot back that StreamBase has Sleepycat integrated in-process. He further pointed out that a Sleepycat record lookup takes only 5 microseconds if the data is in cache. Assuming what he means is that it’s in Sleepycat’s cache, that would be tight integration indeed.
I wonder whether StreamBase will indefinitely rely on Sleepycat, which is of course now an Oracle product …
Want to continue getting great research about DBMS, analytics, data integration, and other technologies related to data management? Get a FREE subscription by RSS/Atom or e-mail! We recommend taking the integrated feed for all our blogs, but blog-specific ones are also easily available.
Technorati Tags: StreamBase, Sleepycat
Posted in Complex event/stream processing (CEP), Memory-centric data management, Michael Stonebraker, Oracle, StreamBase | No Comments »
June 18th, 2007 Curt Monash
After my call with Truviso and blog post referencing same, I had the chance to discuss stream processing with Mike Stonebraker, who among his many distinctions is also StreamBase’s Founder/CTO. We focused almost exclusively on the financial trading market. Here are some of the highlights. Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Memory-centric data management, Michael Stonebraker, StreamBase, Truviso | No Comments »
June 7th, 2007 Curt Monash
StreamBase is a decently-established startup, possibly the largest company in its area. Truviso, in the process of changing its name from Amalgamated Insight, has a dozen employees, one referenceable customer, and a product not yet in general availability. Both have ambitious plans for conquering the world, based on similar stories. And the stories make a considerable amount of sense.
Both companies’ core product is a memory-centric SQL engine designed to execute queries without ever writing data to disk. Of course, they both have persistence stories too — Truviso by being tightly integrated into open-source PostgreSQL, StreamBase more via “yeah, we can hand the data off to a conventional DBMS.” But the basic idea is to route data through a whole lot of different in-memory filters, to see what queries it satisfies, rather than executing many queries in sequence against disk-based data. Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Memory-centric data management, PostgreSQL, StreamBase, Truviso | 8 Comments »
March 25th, 2007 Curt Monash
Oracle made a slick move in picking up Tangosol, a leader in object/data caching for all sorts of major OLTP apps. They do financial trading, telecom operations, big web sites (Fedex, Geico), and other good stuff. This is a reminder that the list of important memory-centric data handling technologies is getting fairly long, including:
- Object caching (e.g., Tangosol, Progress ObjectStore)
- In-memory RDBMS (e.g., Oracle TimesTen, Solid BoostEngine, McObject eXtremeDB)
- Stream processing (e.g., Progress Apama, Streambase)
And that’s just for OLTP; there’s a whole other set of memory-centric technologies for analytics as well.
When one connects the dots, I think three major points jump out:
- There’s a lot more to high-end OLTP than relational database management.
- Oracle is determined to be the leader in as many of those areas as possible.
- This all fits the market disruption narrative.
I write about Point #1 all the time. So this time around let me expand a little more on #2 and #3.
Read the rest of this entry »
Posted in Cache, Complex event/stream processing (CEP), Database diversity, Database theory and practice, Memory-centric data management, OLTP database management, Oracle, Oracle TimesTen, Progress, Apama, and DataDirect, Relational database management systems, Specialized data management in general, StreamBase, solidDB | 2 Comments »
November 14th, 2005 Curt Monash
I’m writing more and more about memory-centric data management technology these days, including in my latest Computerworld column. You may be wondering what that term refers to. Well, I’ve basically renamed what are commonly called “in-memory DBMS,” for what I think is a very good reason: Most of the products in the category aren’t true DBMS, aren’t wholly in-memory, or both! Indeed, if you catch me in a grouchy mood I might argue that “in-memory DBMS” is actually a contradiction in terms.
I’ll give a quick summary of the vendors and products I am focusing on in this newly-named category, and it should be clearer what I mean:
- TimesTen (now owned by Oracle): TimesTen is the quintessentional “in-memory DBMS.” It’s a fairly full relational DBMS, but if you want to persist memory to disk it has to be handed off to a conventional DBMS. Historically, that has usually been MySQL or Oracle. TimesTen’s biggest market penetration has been in financial trading.
- Solid Information Technology’s BoostEngine: Solid is a Finnish company (or was — it’s pretty American now) specializing in embedded DBMS sold mainly for telecommunication uses. Big OEM customers include several well-known telecom equipment manufacturers and HP (for OpenView). “Embedded” often means no DBA, no monitor, no keyboard — they box manufacturer installs it and there it stays for the life of the product. Solid has to offer strong replication capabilities, since its products are often used in highly distributed (e.g., multiblade, multibox) environments. So it’s taken the next step and exploited the replication by allowing customers to use some instances of the product disklessly.
- Event-stream products from Streambase and Progress: The canonical application for event-stream products is automating financial trading decisions based on the flow of market information. Mike Stonebraker, the brains behind Streambase, has recently popularized the idea; Progress bought Apama, who actually have been in the business longer. These applications require even more speed than the financial trading apps that TimesTen handles, and they discard most of the information they look at. In-memory is the only way to go.
- Progress’s ObjectStore: ObjectStore comes from the company Object Design, which merged into Excelon, which was acquired by Progress. It’s really a toolkit for building DBMS and similar systems, which is why it’s at various times been marketed as an OODBMS and an XML DBMS, without a lot of success either way. But there have been a few sterling apps built in ObjectStore even so, including a key part of the Amazon bookstore Despite this limited market success, a significant fraction of Progress’s best engineering talent has moved over to the Real-Time Division to focus on ObjectStore and other memory-centric products. The memory-centric aspect of ObjectStore is this: ObjectStore’s big virtue is that it gets objects from disk to memory and vice-versa very efficiently, then distributes and caches them around a network as needed. This was originally invented for client/server processing, but works fine in a multi-server thin client setup as well. And object processing, of course, relies on a whole lot of pointers. And pointer-chasing is pretty much the worst way to deal with the disk speed barrier, unless you do it in main memory.
- Applix’s TM1: Like many companies in the analytics area, Applix has had trouble deciding whether it sells applications, BI system software, or both. But in any case its core technology is TM1, a memory-centric MOLAP offering. Traditional MOLAP products reside on the horns of a nasty dilemma: They rely on precalculation to give good performance, but that causes ghastly database explosion. Applix gets out of this problem by doing no precalculation whatsoever, loading the data into main memory, and executing all queries on the fly.
- SAP’s BI Accelerator: SAP is building out an elaborate technology stack with NetWeaver, especially in the BI area. One important aspect is that the full data warehouse is logically broken (or copied) into a series of data marts called “InfoCubes.” BI Accelerator takes the logical next step, loading an entire InfoCube into main memory. Almost every query is executed via a full table scan, which would be insane on disk but makes perfect sense when the data is already in RAM.
So there you have it. There are a whole lot of technologies out there that manage data in RAM, in ways that would make little or no sense if disks were more intimately involved. Conventional DBMS also try to exploit RAM and limit disk access, via caching; but generally the data access methods they use in RAM are pretty similar to those they use when going out to disk. So memory-centric systems can have a major advantage.
Posted in Cognos and Applix TM1, Complex event/stream processing (CEP), Data types, MOLAP, Memory-centric data management, OLTP database management, Objects, Oracle TimesTen, Progress, Apama, and DataDirect, SAP, BI Accelerator, and MaxDB, solidDB | 2 Comments »