April 29th, 2008 Curt Monash
Truviso and EnterpriseDB announced today that there’s a Truviso “blade” for Postgres Plus. By email, EnterpriseDB Bob Zurek endorsed my tentative summary of what this means technically, namely:
-
There’s data being managed transactionally by EnterpriseDB.
-
Truviso’s DML has all along included ways to talk to a persistent Postgres data store.
-
If, in addition, one wants to do stream processing things on the same data, that’s now possible, using Truviso’s usual DML.
Read the rest of this entry »
Posted in Analytics and analytic technologies, Business intelligence, Complex event/stream processing (CEP), Data types, EnterpriseDB and Postgres Plus, Games and virtual worlds, Memory-centric data management, Open source RDBMS, PostgreSQL, Specialized data management in general, Truviso | 1 Comment »
April 24th, 2008 Curt Monash
There’s an amazingly long comment thread on Coding Horror about WordPress optimization. Key points and debates include:
- WordPress makes scads of database calls on every page. (20 is the supposed default number. That sounds a little high to me, but not wholly incredible.)
- Therefore one should use a caching plug-in. WP-Cache is the preferred one. WP-Super-Cache gets some votes as perhaps being even better.
- In theory the database cache should handle most of the problem. (After all, many of those database queries are the same for every page.) In practice, it often doesn’t, even if you use dedicated (as opposed to shared) web hosting.
- LAMP vs. Microsoft stack (uh-oh).
- Drupal vs. WordPress vs. Movable Type vs. Joomla vs. do-it-yourself (uh-oh too).
Another theme is — well, it’s WordPress “theme” design. Do you really need all those calls? The most dramatic example I can think of one I experienced soon after I started this blog. Some themes have the cool feature that, in the category list on the sidebar, there’s a count of the number of posts in the category. Each category. I love that feature, but its performance consequences are not pretty.
As previously noted, we’ll be doing an emergency site upgrade ASAP. Once we’re upgraded to WordPress 2.5, I hope to deploy a rich set of back-end plug-ins. One of the caching ones will be among them.
Please subscribe to our feed!
Posted in About this blog, Application areas, Cache | 1 Comment »
March 25th, 2008 Curt Monash
EnterpriseDB is making a series of moves and announcements. Highlights include:
- Renaming/repositioning the product as “Postgres Plus.” The free product is now Postgres Plus, while the version you pay EnterpriseDB for is now Postgres Plus Advanced Server.
- Repackaging the products, so that Postgres Plus Advanced Server is a strict superset of Postgres Plus.
-
New features added to Postgres Plus Advanced Server.
-
Features newly migrated from Advanced Server down to Postgres Plus.
- A strategic investment by IBM.
- Stressing Postgres in EnterpriseDB marketing, and dropping the tag-line defining themselves as “the Oracle-compatible database company.”
So far as I can tell, most of the technical differences between Advanced Server and regular Postgres Plus lie in three areas: Read the rest of this entry »
Posted in Cache, EnterpriseDB and Postgres Plus, Mid-range DBMS, MySQL, OLTP database management, Open source RDBMS, Portability, transparency, and plug-compatibility, PostgreSQL, Relational database management systems | 1 Comment »
March 19th, 2008 Curt Monash
I talked with both Coral8 and Truviso this afternoon. They both have their financial services efforts, of course. Coral8 also continues to get business doing data reduction for sensor networks — mainly RFID and utilities, I think. Coral8 is working on some really cool and confidential other stuff as well.
But my biggest takeaway from this pair of calls was that Coral8 and Truviso are penetrating general BI. Read the rest of this entry »
Posted in Analytics and analytic technologies, Business intelligence, Complex event/stream processing (CEP), Coral8, Memory-centric data management, Truviso | No Comments »
March 19th, 2008 Curt Monash
It seems that the CEP folks are still concerned about what to call themselves. There really are only three choices:
- Complex event processing
- Event processing
- Event stream processing
“Stream processing” might once have been on the list, but it has too many other meanings, and “streaming” adds more meanings yet.
“Complex” has the virtue of inertia; CEP is the closest thing the category has to an agreed-upon name. But few people want to buy technology that describes itself as being “complex.” And in any case it’s not clear how complex many of those events are. “Event stream processing” isn’t terribly well established, and to some extent it runs afoul of the same ambiguities as “stream processing.” What’s worse, those names lead to four-word product category names. Who really wants to market or hear about “complex event processing engines” or “event stream processing platforms”?
So let’s just call the category “event processing” and have done with it, OK? Products can, if they want, be “event processing somethings.” Names like that wouldn’t be any more of a mouthful than “data warehouse appliance,” and the latter category is doing pretty well for itself.
Please subscribe to our feed!
Posted in Complex event/stream processing (CEP) | No Comments »
March 13th, 2008 Curt Monash
Twitter commonly has the problem of duplicate tweets. That is, if you post a message, it shows up twice. After a little while, the dupe disappears, but if you delete the dupe manually, the original is gone too.
I presume what’s going on is that tweets are cached, the tweets are eventually batched to disk, and they don’t always get deleted from cache until some time after they’re persisted. If you happen to check the page of your recent tweets inbetween — boom, you get two hits. But what I don’t understand is why the two versions have different timestamps.
Presumably, this could be explained at a MySQL User Conference session next month, one of whose topics will be Intelligent caching strategies using a hybrid MemCache / MySQL approach. I’m so glad they don’t use stupid strategies to do this … Read the rest of this entry »
Posted in Cache, MySQL, OLTP database management, Specific users | 3 Comments »
February 20th, 2008 Curt Monash
Billy Newport of IBM sees a lot of similarities between his app-server-based product ObjectGrid and H-Store. In both cases, constrained tree schemas are assumed, and OLTP performance goodness ensues. A couple of points I noted on a quick skim through his blog:
- He calls out RAM consumption as a challenge for this kind of architecture.
- He points out that it’s a big advantage to have data called and used in the same address space.
Being based in RAM is obviously a huge part of the H-Store scheme. But so is having transaction execution be close to the database.
IBM now has both ObjectGrid and a memory-centric DBMS (solidDB) that they’ve been using as a front end for DBMS. Integration of the two could be pretty interesting.
Please sign up for our feed!
Posted in Cache, Database theory and practice, H-Store, IBM and DB2, Memory-centric data management, OLTP database management, Relational database management systems, solidDB | No Comments »
February 19th, 2008 Curt Monash
I wrote yesterday about the H-Store project, the latest from the team of researchers who also brought us C-Store and its commercialization Vertica. H-Store is designed to drastically improve efficiency in OLTP database processing, in two ways. First, it puts everything in RAM. Second, it tries to gain an additional order of magnitude on in-memory performance versus today’s DBMS designs by, for example, taking a very different approach to ensuring ACID compliance.
Today I had the chance to talk with two more of the H-Store researchers, Sam Madden and Daniel Abadi.
Read the rest of this entry »
Posted in Database diversity, H-Store, Memory-centric data management, OLTP database management | 1 Comment »
February 18th, 2008 Curt Monash
Last week, Dan Weinreb tipped me off to something very cool: Mike Stonebraker and a group of MIT/Brown/Yale colleagues are calling for a complete rewrite of OLTP DBMS. And they have a plan for how to do it, called H-Store, as per a paper and an associated slide presentation.
Read the rest of this entry »
Posted in Database diversity, Database theory and practice, H-Store, Memory-centric data management, Michael Stonebraker, OLTP database management | 28 Comments »
January 16th, 2008 Curt Monash
There’s a lot of agitation today because Twitter broke under the message volume generated during Steve Jobs’ Macworld keynote. I don’t know what that volume was, but I just checked the lower volume of tweets (i.e., updates) going through the “public timeline” (i.e., everything) twice, and both times it was under 200 messages per minute. So, let’s say there’s a much higher volume at peak times, and also hypothesize that Twitter would like to grow a lot, and say that Twitter would like to handle 10-100,000 messages/minute – i.e., 1000+/second — as soon as possible.
That’s easy using CEP (Complex Event Processing). A Twitter update is just a string of 140 or fewer characters. It is associated with three pieces of metadata – author, time, and mode of posting. It should be visible in real time to any of the author’s “followers,” as well as in a single public timeline; perhaps there will be other kinds of Twitter channels in the future. In most cases, these updates are only visible to a user upon page refresh. Almost nNo Twitter user seems to have more than about 7,000 followers, even Robert Scoble or Evan Williams.* The average number of followers, at least among active updaters, is probably in the low hundreds now. So basically, this is all a heckuva lot easier than the tick-monitoring systems Wall Street firms are using today.
*I believe there’s a hard cap of 7,500, but nobody seems to have bumped against it yet.Twitterholic gives a different figure than Twitter does for Scoble. And it correctly shows Dave Troy with a little over 10,000.
Here’s how to implement that. Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Memory-centric data management | 11 Comments »
December 21st, 2007 Curt Monash
IBM is acquiring Solid Information Technology, makers of solidDB. Some quick comments:
- solidDB is actually a very interesting hybrid disk/in-memory memory-centric database management system. However, the press release announcing the deal makes it sound as if solidDB is in-memory only.
- That strongly suggests that IBM is buying Solid mainly to compete with Oracle TimesTen. As of last June, solidDB was already IBM’s TimesTen answer via a partnership; this deal just solidifies that arrangement.
- This probably isn’t good news for Solid’s MySQL engine. That’s a pity, since solidDB technically has the potential to be the best MySQL engine around.
- Notwithstanding IBM’s presumed intentions, Solid’s main market success historically is as an embedded system in telecommunications equipment, network software, and similar systems.
- Last year I wrote a white paper on memory-centric data management, showcasing four products. IBM now has bought two of them, namely Solid’s and Applix’s (via Cognos).
- Comparisons to IBM’s embedded Java DBMS Cloudscape are pointless. That’s just a failed product vs. solidDB or Sybase SQL Anywhere, and IBM long ago cut its losses.
Read the rest of this entry »
Posted in Cognos and Applix TM1, IBM and DB2, Memory-centric data management, OLTP database management, Sybase, solidDB | 2 Comments »
November 13th, 2007 Curt Monash
Coral8 today is rolling out the Coral8 Portal, offering some BI basics for CEP (Complex Event Processing) filters and queries. In Release 1, this is primitive compared with other BI portals, and of direct interest only to organizations that have already decided they’re using CEP technology. Even so, it serves as a useful illustration of several important issues in dashboarding.
The simplest is that real-time dashboards require different visualizations than others. Most obvious is the ever-popular graph marching from right to left across the screen as time advances along the x-axis. There also are difference in styles between reports and tables that you actually read, vs. read-outs that you merely watch for flickers of change. (Of course those two examples hardly make for a complete list.)
More interesting is the flexibility and parameterization. While Coral8 sells to multiple markets, the design point for the portal is clearly financial trading. So, for example, a query may be registered with one ticker symbol, and an end user can easily customize it to slot in another one instead. In a way, this is a step toward the much greater flexibility that dashboards need overall.
Truth be told, if you put all such Coral8 flexibility features together they’re not yet very impressive. So what’s even more interesting is the overall architecture that could support much greater flexibility in the future. If dashboards gain the flexibility they need, and queries continue to be done in the conventional manner, query volumes will increase enormously. If it further is the case that they are upgraded in some near real-time manner, that’s another huge increase.
How huge? Well, I can make a case that it could be well over three orders of magnitude: Read the rest of this entry »
Posted in Analytics and analytic technologies, Business intelligence, Complex event/stream processing (CEP), Coral8, Memory-centric data management | 1 Comment »
November 12th, 2007 Curt Monash
Analyst conference calls about merger announcements are generally pretty boring. Indeed, the companies involved tend to feel they are legally barred from saying anything interesting, by mandate of both the antitrust regulators and the SEC.
Still, such calls are joyful events, full of strategic happy talk. If one is really lucky, there may a virtuouso tap dancing exhibition as well. On today’s IBM/Cognos call, Cognos CEO Rob Ashe was asked whether he thought Cognos’ independence or lack thereof was as important today as he said it was after SAP announced its BOBJ takeover. Without missing a beat, he responded that there were two kinds of openness:
- Database openness (not important)
- ERP/business process openness (indeed important)
Hmm. I’m not so sure I agree. To begin with, there aren’t just two major points of potential integration. There’s also a whole lot of middleware: obviously data integration, but also app servers, portals, and query execution acceleration as well. Read the rest of this entry »
Posted in Analytics and analytic technologies, Business Objects, Business intelligence, Cognos and Applix TM1, IBM and DB2, Memory-centric data management, ParAccel, SAP, BI Accelerator, and MaxDB | No Comments »
October 8th, 2007 Curt Monash
SAP is acquiring Business Objects. There’s nothing inherent in BI Accelerator’s design that ties it to NetWeaver, SAP star schema InfoCubes, or any other particular current implementation detail. So BI Accelerator could become a lot more than an afterthought.
Combine that with Cognos’s acquisition of Applix and the continued success of upstart QlikView, and we could finally see a general memory-centric BI boom.
Maybe. There have been a lot of false alarms before.
Technorati Tags: Business intelligence, BI, QlikView, Applix
Posted in Analytics and analytic technologies, Business Objects, Business intelligence, Cognos and Applix TM1, Memory-centric data management, QlikTech and QlikView, SAP, BI Accelerator, and MaxDB | 2 Comments »
September 27th, 2007 Curt Monash
Apparently, one user isn’t happy with QlikView at all. The main problem seems to be, in effect, frequently-repeated bulk loads from disk into the in-memory structures. (Obviously — at least absent more information — that could be an artifact of a stupidly ignorant installation, rather than a fundamental problem with the technology itself.) He’s also not at all enamored of QlikView’s app dev tools.
Technorati Tags: QlikView, QlikTech, in-memory, business intelligence
Posted in Analytics and analytic technologies, Business intelligence, Memory-centric data management, QlikTech and QlikView | 2 Comments »
September 24th, 2007 Curt Monash
Pervasive Software has a long history – 25 years, in fact, as they’re emphasizing in some current marketing. Ownership and company name have changed a few times, as the company went from being an independent startup to being owned by Novell to being independent again. The original product, and still the cash cow, was a linked-list DBMS called Btrieve, eventually renamed Pervasive PSQL as it gained more and more relational functionality.
Pervasive Summit PSQL v10 has just been rolled out, and I wrote a nice little white paper to commemorate the event, describing some of the main advances over v9, primarily for the benefit of current Pervasive PSQL developers. In one major advance, Pervasive made the SQL functionality much stronger. In particular, you now can have a regular SQL data dictionary, so that the database can be used for other purposes – BI, additional apps, whatever. Apparently, that wasn’t possible before, although it had been possible in yet earlier releases. Pervasive also added view-based security permissions, which is obviously a Very Good Thing.
There also are some big performance boosts. Read the rest of this entry »
Posted in Database compression, Hierarchies, networks, graphs, and trees, Memory-centric data management, Microsoft and SQL*Server, Mid-range DBMS, OLTP database management, Pervasive Software, Portability, transparency, and plug-compatibility, Relational database management systems | No Comments »
September 6th, 2007 Curt Monash
If I weren’t on a snorkeling vacation,* this might be a good time to write about why I once called Cognos “The Gang That Couldn’t Shoot Straight,” how Ron Zambonini used that label to help him gain the company’s top spot, why he’s such a big fan of mine, why I got my highest ever per-minute speaking fee to attend a Cognos sales kickoff event, why I went for a midnight touristing stroll in downtown Ottawa in zero degree Fahrenheit weather, or how I managed, while attending the aforementioned Cognos sales kickoff, to get snowed in for three days in, of all places, Dallas, Texas. But the wrasses and jacks await, so I’ll get straight to the point.
*Albeit fairly snorkel-free so far, thanks to Hurricane Felix.
As I discussed at considerable length in a white paper, Applix’s core technology is fully-featured, memory-centric MOLAP. This is certainly cool technology, and I think it is actually unique. That it’s historically been positioned as the engine for a mid-range set of performance management tools is a travesty, a shame, the result of a prior merger – and also the quite understandable consequence of RAM limitations. However, RAM is ever cheaper and Applix’s technology is now 64-bit, so the RAM barriers have been relaxed. Cognos can take Applix’s TM1 engine high-end if it wants to. And boy, should Cognos ever want to. Indeed, there are three different great ways Cognos could package and position TM1:
- As a no-data-warehouse-design quick-start analytics engine analogous to QlikView (the fastest-growing and most important newish BI suite, open source perhaps excepted);
- As the most sophisticated and versatile planning tool this side of SAP’s APO (and while APO’s sophistication is not in dispute, its versatility is questionable anyway);
-
As the processing hub for dashboards-done-right.
Read the rest of this entry »
Posted in Analytics and analytic technologies, Business intelligence, Cognos and Applix TM1, MOLAP, Memory-centric data management | 4 Comments »
August 15th, 2007 Curt Monash
Robin Bloor is one of the best analysts around — which doesn’t say much about his eponymous firm, since he no longer works there, but I digress. Even so, he evidently got snookered by a Truviso spokesperson, as evidenced by this article.
Apparently, Truviso convinced him that other CEP firms execute one query at a time, while Truviso executes a bunch of queries at once. Well, the latter part of that is presumably true, but it’s hardly the big differentiatior for Truviso Robin would have one believe. That’s what everybody else — StreamBase, Coral8, Progress Apama, et al. — do too. I wouldn’t be surprised if Truviso had a somewhat different architecture for doing it (each vendor describes its approach in rather different language), or even if this were a particular focus and strongpoint of theirs. But fundamentally, all the CEP vendors are doing the same thing.
Posted in Complex event/stream processing (CEP), Memory-centric data management, Truviso | No Comments »
August 13th, 2007 Curt Monash
Coral8 at the time of a recent product release stated that it was improving the predictability of its queries. While this may sound like it has something to do with determinism, it doesn’t. Rather, it’s a matter of making what actually happens as a query result be more in line with what one would think will happen when one reads the query.
Coral8 CTO Mark Tsimelzon goes on to note:
But remember, we are really talking about a corner case — highly complex queries involving loops. We only had a couple of customers who were occasionally hitting queries that complex. The beauty of our SQL-based language is that the vast majority of queries, perhaps 99%, are very easy to understand, and their behavior is exactly what you’d expect based on your SQL experience.
Posted in Complex event/stream processing (CEP), Coral8, Memory-centric data management | No Comments »
August 12th, 2007 Curt Monash
The highest-profile applications for complex event/stream processing are probably the ones that require super-low latency, especially in financial trading. However, as I already noted in writing about StreamBase and Truviso, there are plenty of other CEP apps with less extreme latency requirements.
Commonly, these are data reduction apps – i.e., there’s a gushing stream of inputs, and the CEP engine filters and “enhances” it, so that only a small, modified subset is sent forward. In other cases, disk-based systems could do the job perfectly well from a performance standpoint, but the pattern matching and filtering requirements are just a better fit for the CEP paradigm.
Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Coral8, Hierarchies, networks, graphs, and trees, IBM and DB2, Memory-centric data management, Native XML, StreamBase | No Comments »
August 10th, 2007 Curt Monash
Complex event/stream processing vendors compete fiercely on the basis of low latency, down to the single-digit number of milliseconds, or even sub-millisecond levels. A question naturally springs to mind: When does this extreme low latency matter?
I think I’ve come up with a concise yet fairly accurate answer: Super-low latency matters when the application includes direct competition against a similarly fast opponent. The best example is automated stock trading – if you can exploit a market inefficiency 1 millisecond before your competition, you make money.
Other examples might arise in network security or battlefield systems, but I don’t know of any specific real-life cases. Instead, other applications for complex event/stream processing tend to be content with latencies that are easier to achieve. E.g., 100 milliseconds (1/10 of second) is likely to be plenty fast enough.
Keep getting great research about data management and related technologies. Get a FREE subscription by RSS/Atom or e-mail!
Posted in Complex event/stream processing (CEP), Memory-centric data management | No Comments »
August 10th, 2007 Curt Monash
Besides talking about what Coral8 and StreamBase (and other CEP vendors) have in common, Mark Tsimelzon and I talked quite a bit about what he sees as some of the important differences. There were a lot, of course, but three in particular stood out.
1. Mark believes Coral8 has significantly lower latency than StreamBase. E.g., the Wombat/Coral8 combo achieves sub-millisecond latency, with Coral8 itself consuming less than a tenth of that. The best comparable figures from StreamBase that I currently know of are almost an order of magnitude slower.
Top-end speed aside, Mark believes that Coral8 is fundamentally better suited for complex queries and pattern recognition, while StreamBase works well with simpler queries. For example, his other performance claims notwithstanding, he concedes that StreamBase is at least comparable to Coral8 in its throughput for huge numbers of simple queries. (The number he mentioned was ½ million queries/second.) Indeed, while we barely talked about customer/marketing issues, Mark asserts that the companies’ respective customer bases reflect this complex/simple distinction.*
Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Coral8, Memory-centric data management, Progress, Apama, and DataDirect, StreamBase | 4 Comments »
August 10th, 2007 Curt Monash
Last week, I complained that my first briefing with Coral8 wasn’t very technical. Wednesday I had a call with Mark Tsimelzon, CTO and founder of Coral8, and he made up for that in spades. In this post I’ll cover some of his general comments. Others will touch on more Coral8-specific topics, and his view of the Coral8/StreamBase comparison.
As Mark describes it, the big difference between a DBMS – even an in-memory DBMS – and a complex event processing engine is this: CEP engines do instantaneous incremental processing. He commonly refers to this as registering queries and operators for incremental evaluation. For example, suppose you need to maintain the sum of some data stream over the past 10 minutes. Then each second (or other short unit of time), the system adds in all the values that arrived in the past second, and subtracts all those that arrived 600-601 seconds ago. Voila! The sum is incrementally updated.
Now, rolling sums may not sound very interesting – but where you have rolling sums, you trivially also have rolling averages (just divide the sum by the count) and rolling standard deviations (same idea, with some squares and square roots mixed in). Those, of course, are primitives in Coral8 too. Ditto rolling maxima and minima. Ditto rolling joins (which are updated a lot like materialized views).
Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Coral8, Memory-centric data management | 1 Comment »
August 3rd, 2007 Curt Monash
For the most part, the vendors I talk with in complex event/stream processing like and speak well of each other (most of the exceptions seem to involve StreamBase). Even so, there are a lot of interesting competitive claims and counterclaims in this market. Prior posts and comment threads have covered Apama/StreamBase jousting on the subjects of who has more business and how many financial data feeds StreamBase supports. Other areas that generate interesting sparks are performance, parallelism, and determinism. Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Coral8, Memory-centric data management, Progress, Apama, and DataDirect, StreamBase | 1 Comment »
August 3rd, 2007 Curt Monash
My recent non-technical Apama briefing has now had a much more technical sequel, with charming founder and former Cambridge professor John Bates. He still didn’t fully open the kimono – trade secrets and all that — but here’s the essence of what’s going on.
Complex event/stream processing (CEP) is all about looking for many patterns at once. Reality – the stream(s) of data – is checked against these patterns for matches. In Apama, these patterns are kept in a kind of tree – they call it a hypertree — and John says the work to check them is only logarithmic in the number of patterns.
Since patterns commonly have multiple parts — and usually also take time to unfold — what really goes on is that partial matches are found, after which what’s being matched against is the REMAINDER of the pattern. Thus, there’s constant pruning and rebalancing of the tree. What’s more, a large fraction of all patterns – at least in the financial trading market — involve a short time window, which again creates a need for ongoing, rapid tree modification. Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Memory-centric data management, Progress, Apama, and DataDirect | 2 Comments »