May 30, 2009

Reinventing business intelligence

I’ve felt for quite a while that business intelligence tools are due for a revolution. But I’ve found the subject daunting to write about because — well, because it’s so multifaceted and big. So to break that logjam, here are some thoughts on the reinvention of business intelligence technology, with no pretense of being in any way comprehensive.

Natural language and classic science fiction

Actually, there’s a pretty well-known example of BI near-perfection — the Star Trek computers, usually voiced by the late Majel Barrett Roddenberry. They didn’t have a big role in the recent movie, which was so fast-paced nobody had time to analyze very much, but were a big part of the Star Trek universe overall. Star Trek’s computers integrated analytics, operations, and authentication, all with a great natural language/voice interface and visual displays. That example is at the heart of a 1998 article on natural language recognition I just re-posted.

As for reality: For decades, dating back at least to Artificial Intelligence Corporation’s Intellect, there have been offerings that provided “natural language” command, control, and query against otherwise fairly ordinary analytic tools. Such efforts have generally fizzled, for reasons outlined at the link above. Wolfram Alpha is the latest try; fortunately for its prospects, natural language is really only a small part of the Wolfram Alpha story.

A second theme has more recently emerged — using text indexing to get at data more flexibly than a relational schema would normally allow, either by searching on data values themselves (stressed by Attivio) or more by searching on the definitions of pre-built reports (the Google OneBox story). SAP’s Explorer is the latest such view, but I find Doug Henschen’s skepticism about SAP Explorer more persuasive than Cindi Howson’s cautiously favorable view. Partly that’s because I know SAP (and Business Objects); partly it’s because of difficulties such as those I already noted.

Flexibility and data exploration

It’s a truism that each generation of dashboard-like technology fails because it’s too inflexible. Users are shown the information that will provide them with the most insight. They appreciate it at first. But eventually it’s old hat, and when they want to do something new, the baked-in data model doesn’t support it.

The latest attempts to overcome this problem lie in two overlapping trends — cool data exploration/visualization tools, and in-memory analytics. Read more

May 29, 2009

Sneakernet to the cloud

Recently, Amazon CTO Werner Vogels put up a blog post which suggested that, now and in the future, the best way to get large databases into the cloud is via sneakernet.  In some circumstances, he is surely right. Possible implications include:

But for one-time moves of data sets — sure, sneaker net/snail mail should work just fine.

May 27, 2009

Song of the contract programming firm, and other filks

I heard a different version of the same idea at Boskone once, but here is a pretty good send-up of what might occur at a customer review session.  (Warning, however: Low production values.) Also, in case you missed them, considerably funnier are a couple of classic Star Trek filksongs, especially the first.

While I’m on the subject, a couple of more serious filksongs I really like are:

Other great serious filksongs are “Queen of Air and Darkness” (Poul Anderson lyrics) and Jordin Kare’s “When the Ship Lifts, All Debts Are Paid”, but I can’t find recordings of those now.

And finally, back to the humor: I just found a video to a song I posted previously.

May 26, 2009

Teradata Developer Exchange (DevX) begins to emerge

Every vendor needs developer-facing web resources, and Teradata turns out to have been working on a new umbrella site for its.  It’s called Teradata Developer Exchange — DevX for short.  Teradata DevX seems to be in a low-volume beta now, with a press release/bigger roll-out coming next week or so.  Major elements are about what one would expect:

If you’re a Teradata user, you absolutely should check out Teradata DevX.  If you just research Teradata — my situation 🙂 — there are some aspects that might be of interest anyway.  In particular, I found Teradata’s downloads instructive, most particularly those in the area of extensibility.  Mainly, these are UDFs (User-Defined Functions), in areas such as:

Also of potential interest is a custom-portlet framework for Teradata’s management tool Viewpoint.  A straightforward use would be to plunk some Viewpoint data into a more general system management dashboard.  A yet cooler use — and I couldn’t get a clear sense of whether anybody’s ever done this yet — would be to offer end users some insight as to how long their queries are apt to run.

May 22, 2009

Yet more on MySQL forks and storage engines

The issue of MySQL forks and their possible effect on closed-source storage engine vendors continues to get attention.  The underlying question is:

Suppose Oracle wants to make life difficult for third-party storage engine vendors via its incipient control of MySQL?  Can the storage engine vendors insulate themselves from this risk by working with a MySQL fork?

Read more

May 21, 2009

How big are the intelligence agencies’ data warehouses?

Edit:  The relevant part of the article cited has now been substantially changed, in line with Jeff Jonas’ remarks in the comment thread below.

Joe Harris linked me to an article that made a rather extraordinary claim:

At another federal agency Jonas worked at (he wouldn’t say which), they had a very large data warehouse in the basement. The size of the data warehouse was a secret, but Jonas estimated it at 4 exabytes (EB), and increasing at the rate of 5 TB per day.

Now, if one does the division, the quote claims it takes 800,000 days for the database to double in size, which is absurd.   Perhaps this (Jeff) Jonas guy was just talking about a 4 petabyte system and got confused.  (Of course, that would still be pretty big.)  But before I got my arithmetic straight, I ran the 4 exabyte figure past a couple of folks, as a target for the size of the US government’s largest classified database. Best guess turns out to be that it’s 1-2 orders of magnitude too high for the government’s largest database, not 3.  But that’s only a guess …

May 21, 2009

Notes on CEP application development

While performance may not be all that great a source of CEP competitive differentiation, event processing vendors find plenty of other bases for technological competition, including application development, analytics, packaged applications, and data integration. In particular:

So far as I can tell, the areas of applications and analytics are fairly uncontroversial. Different CEP vendors have implemented different kinds of things, no doubt focusing on those they thought they would find easiest to build and then sell. But these seem to be choices in business execution, not in core technical philosophy.

In CEP application development, however, real philosophical differences do seem to arise. There are at least three different CEP application development paradigms: Read more

May 21, 2009

Notes on CEP performance

I’ve been talking to CEP vendors on and off for a few years. So what I hear about performance is fairly patchwork. On the other hand, maybe 1-2+ year-old figures of per-core performance are still meaningful today. After all, Moore’s Law is being reflected more in core count than per-core performance, and it seems CEP vendors’ development efforts haven’t necessarily been concentrated on raw engine speed.

So anyway, what do you guys have to add to the following observations?

May 18, 2009

Followup on IBM System S/InfoSphere Streams

After posting about IBM’s System S/InfoSphere Streams CEP offering, I sent three followup questions over to Jeff Jones.  It seems simplest to just post the Q&A verbatim.

1.  Just how many processors or cores does it take to get those 5 million messages/sec through? A little birdie says 4,000 cores. Read more

May 15, 2009

MySQL forking heats up, but not yet to the benefit of non-GPLed storage engine vendors

Last month, I wrote “This is a REALLY good time to actively strengthen the MySQL forkers,” largely on behalf of closed-source/dual-source MySQL storage engine vendors such as Infobright, Kickfire, Calpont, Tokutek, or ScaleDB. Yesterday, two of my three candidates to lead the effort — namely Monty Widenius/MariaDB/Monty Program AB and Percona — came together to form something called the Open Database Alliance.  Details may be found:

But there’s no joy for the non-GPLed MySQL storage engine vendors in the early news. Read more

Next Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.