May 26, 2009

Teradata Developer Exchange (DevX) begins to emerge

Every vendor needs developer-facing web resources, and Teradata turns out to have been working on a new umbrella site for its. It’s called Teradata Developer Exchange — DevX for short. Teradata DevX seems to be in a low-volume beta now, with a press release/bigger roll-out coming next week or so. Major elements are about what one would expect:

Articles
Blogs
Downloads
Surprisingly, so far as I can tell, no forums

If you’re a Teradata user, you absolutely should check out Teradata DevX. If you just research Teradata — my situation 🙂 — there are some aspects that might be of interest anyway. In particular, I found Teradata’s downloads instructive, most particularly those in the area of extensibility. Mainly, these are UDFs (User-Defined Functions), in areas such as:

Compression
Geospatial data
Imitating Oracle or DB2 UDFs (as migration aids)

Also of potential interest is a custom-portlet framework for Teradata’s management tool Viewpoint. A straightforward use would be to plunk some Viewpoint data into a more general system management dashboard. A yet cooler use — and I couldn’t get a clear sense of whether anybody’s ever done this yet — would be to offer end users some insight as to how long their queries are apt to run.

Categories: Database compression, Emulation, transparency, portability, GIS and geospatial, Teradata

2 Comments

May 22, 2009

Yet more on MySQL forks and storage engines

The issue of MySQL forks and their possible effect on closed-source storage engine vendors continues to get attention. The underlying question is:

Suppose Oracle wants to make life difficult for third-party storage engine vendors via its incipient control of MySQL? Can the storage engine vendors insulate themselves from this risk by working with a MySQL fork?

Categories: MySQL, Open source, PostgreSQL

11 Comments

May 21, 2009

Notes on CEP application development

While performance may not be all that great a source of CEP competitive differentiation, event processing vendors find plenty of other bases for technological competition, including application development, analytics, packaged applications, and data integration. In particular:

Most independent CEP vendors have some kind of application story in the capital markets vertical, such as packaged applications, ISV partners with packaged applications, application frameworks, and so on.
CEP vendors offer lots of connectors to specific financial industry price/quote/trade feeds, as well as the usual other kinds of database connectivity (SQL, XML, etc.)
Aleri/Coral8 (separately and now together) like to call attention to their business intelligence/analytics offerings. Analytics is front-and-center on Truviso’s web site too, not that Truviso does much to call attention to itself, period. (Roman Bukary once said he’d outline Truviso’s new strategy to me in 6-8 weeks or so … it’s now 14 months and counting.)

So far as I can tell, the areas of applications and analytics are fairly uncontroversial. Different CEP vendors have implemented different kinds of things, no doubt focusing on those they thought they would find easiest to build and then sell. But these seem to be choices in business execution, not in core technical philosophy.

In CEP application development, however, real philosophical differences do seem to arise. There are at least three different CEP application development paradigms: Read more

Categories: Aleri and Coral8, Business intelligence, Microsoft and SQL*Server, Progress, Apama, and DataDirect, StreamBase, Streaming and complex event processing (CEP)

5 Comments

May 21, 2009

Notes on CEP performance

I’ve been talking to CEP vendors on and off for a few years. So what I hear about performance is fairly patchwork. On the other hand, maybe 1-2+ year-old figures of per-core performance are still meaningful today. After all, Moore’s Law is being reflected more in core count than per-core performance, and it seems CEP vendors’ development efforts haven’t necessarily been concentrated on raw engine speed.

So anyway, what do you guys have to add to the following observations?

Super-low-latency financial services industry tasks are often “embarrassingly parallel.” Thus, near-linear scale-out is common.
That said, good parallelism seems fairly new in CEP engines (of course, CEP engines are fairly new themselves — for all I know, some have been parallel since inception).
I’ve heard claims of up to 400,000 messages/second/core for simple queries or patterns.
I’ve heard claims of 70,000 messages/core for not-so-simple queries or patterns, and probably higher than that depending on what the meaning of “simple” is.
IBM just disclosed >15,000 messages/core on a pretty low-powered processor.
I’ve heard that Coral8, Apama, and StreamBase rarely lost deals due to performance or throughput problems. I’ve heard that the same is not as true of Aleri.
StreamBase proudly says it’s been fully multithreaded since academic research-project days. For Apama multithreading is evidently a more recent feature. But does it matter much?

Categories: Aleri and Coral8, IBM and DB2, Memory-centric data management, Progress, Apama, and DataDirect, StreamBase, Streaming and complex event processing (CEP)

13 Comments

May 18, 2009

Followup on IBM System S/InfoSphere Streams

After posting about IBM’s System S/InfoSphere Streams CEP offering, I sent three followup questions over to Jeff Jones. It seems simplest to just post the Q&A verbatim.

1. Just how many processors or cores does it take to get those 5 million messages/sec through? A little birdie says 4,000 cores. Read more

Categories: Analytic technologies, IBM and DB2, Investment research and trading, Streaming and complex event processing (CEP)

7 Comments

May 15, 2009

MySQL forking heats up, but not yet to the benefit of non-GPLed storage engine vendors

Last month, I wrote “This is a REALLY good time to actively strengthen the MySQL forkers,” largely on behalf of closed-source/dual-source MySQL storage engine vendors such as Infobright, Kickfire, Calpont, Tokutek, or ScaleDB. Yesterday, two of my three candidates to lead the effort — namely Monty Widenius/MariaDB/Monty Program AB and Percona — came together to form something called the Open Database Alliance. Details may be found:

On the Open Database Alliance website
In a press release
On Monty Widenius’ blog
In a Stephen O’Grady blog post based on a discussion with Monty Widenius
In an ars technica blog post based on a discussion with Monty Program AB’s Kurt von Finck

But there’s no joy for the non-GPLed MySQL storage engine vendors in the early news. Read more

Categories: MySQL, Open source, Theory and architecture

16 Comments

May 14, 2009

Facebook’s experiences with compression

One little topic didn’t make it into my long post on Facebook’s Hadoop/Hive-based data warehouse: Compression. The story seems to be:

Facebook uses gzip, and gets a little bit more than 6X compression.
Experiments suggest bzip2 would reduce data by another 20% or so, increasing compression to the 7.5X range.
The downside of bzip2 is 15-25% processing overhead, depending on the kind of data.

Categories: Data warehousing, Database compression, Facebook, Hadoop

2 Comments

May 14, 2009

The secret sauce to Clearpace’s compression

In an introduction to archiving vendor Clearpace last December, I noted that Clearpace claimed huge compression successes for its NParchive product (Clearpace likes to use a figure of 40X), but didn’t give much reason that NParchive could compress a lot more effectively than other columnar DBMS. Let me now follow up on that.

To the extent there’s a Clearpace secret sauce, it seems to lie in NParchive’s unusual data access method. NParchive doesn’t just tokenize the values in individual columns; it tokenizes multi-column fragments of rows. Which particular columns to group together in that way seems to be decided automagically; the obvious guess is that this is based on estimates of the cardinality of their Cartesian products.

Of the top of my head, examples for which this strategy might be particularly successful include:

Denormalized databases
Message stores with lots of header information
Addresses

Categories: Archiving and information preservation, Columnar database management, Database compression, Rainstor

8 Comments

May 13, 2009

Microsoft announced CEP this week too

Microsoft still hasn’t worked out all the kinks regarding when and how intensely to brief me. So most of what I know about their announcement earlier this week of a CEP/stream processing product* is what I garnered on a consulting call in March. That said, I sent Microsoft my notes from that call, they responded quickly and clearly to my question as to what remained under NDA, and for good measure they included a couple of clarifying comments that I’ll copy below.

*”in the SQL Server 2008 R2 timeframe,” about which Microsoft wrote “the first Community Technology Preview (CTP) of SQL Server 2008 R2 will be available for download in the second half of 2009 and the release is on track to ship in the first half of calendar year 2010. “

Perhaps it is more than coincidence that IBM rushed out its own announcement of an immature CEP technology — due to be more mature in a 2010 release — immediately after Microsoft revealed its plans. Anyhow, taken together, these announcements support my theory that the small independent CEP/stream processing vendors are more or less ceding broad parts of the potential stream processing market.

The main use cases Microsoft talks about for CEP are in the area of sensor data. Read more

Categories: Analytic technologies, Application areas, Microsoft and SQL*Server, Streaming and complex event processing (CEP)

8 Comments

May 13, 2009

IBM System S Streams, aka InfoSphere Streams, aka stream processing, aka “please don’t call it CEP”

IBM has hastily announced System S Streams, a product that was supposed to be called InfoSphere Streams and introduced only in 2010. Apparently, the rush is because senior management wanted to talk about it later this week, and perhaps also because it was implicitly baked into some of IBM’s advertising already. Scrambling ensued. Even so, Jeff Jones and team got to me fast, and briefed me — fairly non-technically, unfortunately, but otherwise how I like it, namely on a harmless embargo and without any NDAs. That’s more than can be said for my clients at Microsoft, who also introduced CEP this week, but I digress …

*Indeed, as I draft this post-Celtics-game, the embargo is already expired.

Marketing aside, IBM System S/InfoSphere Streams is indeed a CEP/stream processing engine + language (with an Eclipse-based development environment). Apparently, IBM’s thinks InfoSphere Streams (if that’s what it winds up being renamed to) is or will be differentiated from other CEP packages in:

Scale-out. (That’s the one that appears to be real today. In fact, there’s a prototype running on Blue Gene.)
Support for complex datatypes such as XML, text, voice, video, etc.
Security and general industrial-strengthness.

Categories: Analytic technologies, Application areas, IBM and DB2, Investment research and trading, Scientific research, Streaming and complex event processing (CEP)

3 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in