May 29, 2009

Sneakernet to the cloud

Recently, Amazon CTO Werner Vogels put up a blog post which suggested that, now and in the future, the best way to get large databases into the cloud is via sneakernet.  In some circumstances, he is surely right. Possible implications include:

But for one-time moves of data sets — sure, sneaker net/snail mail should work just fine.

May 27, 2009

Song of the contract programming firm, and other filks

I heard a different version of the same idea at Boskone once, but here is a pretty good send-up of what might occur at a customer review session.  (Warning, however: Low production values.) Also, in case you missed them, considerably funnier are a couple of classic Star Trek filksongs, especially the first.

While I’m on the subject, a couple of more serious filksongs I really like are:

Other great serious filksongs are “Queen of Air and Darkness” (Poul Anderson lyrics) and Jordin Kare’s “When the Ship Lifts, All Debts Are Paid”, but I can’t find recordings of those now.

And finally, back to the humor: I just found a video to a song I posted previously.

May 26, 2009

Teradata Developer Exchange (DevX) begins to emerge

Every vendor needs developer-facing web resources, and Teradata turns out to have been working on a new umbrella site for its.  It’s called Teradata Developer Exchange — DevX for short.  Teradata DevX seems to be in a low-volume beta now, with a press release/bigger roll-out coming next week or so.  Major elements are about what one would expect:

If you’re a Teradata user, you absolutely should check out Teradata DevX.  If you just research Teradata — my situation 🙂 — there are some aspects that might be of interest anyway.  In particular, I found Teradata’s downloads instructive, most particularly those in the area of extensibility.  Mainly, these are UDFs (User-Defined Functions), in areas such as:

Also of potential interest is a custom-portlet framework for Teradata’s management tool Viewpoint.  A straightforward use would be to plunk some Viewpoint data into a more general system management dashboard.  A yet cooler use — and I couldn’t get a clear sense of whether anybody’s ever done this yet — would be to offer end users some insight as to how long their queries are apt to run.

May 22, 2009

Yet more on MySQL forks and storage engines

The issue of MySQL forks and their possible effect on closed-source storage engine vendors continues to get attention.  The underlying question is:

Suppose Oracle wants to make life difficult for third-party storage engine vendors via its incipient control of MySQL?  Can the storage engine vendors insulate themselves from this risk by working with a MySQL fork?

Read more

May 21, 2009

How big are the intelligence agencies’ data warehouses?

Edit:  The relevant part of the article cited has now been substantially changed, in line with Jeff Jonas’ remarks in the comment thread below.

Joe Harris linked me to an article that made a rather extraordinary claim:

At another federal agency Jonas worked at (he wouldn’t say which), they had a very large data warehouse in the basement. The size of the data warehouse was a secret, but Jonas estimated it at 4 exabytes (EB), and increasing at the rate of 5 TB per day.

Now, if one does the division, the quote claims it takes 800,000 days for the database to double in size, which is absurd.   Perhaps this (Jeff) Jonas guy was just talking about a 4 petabyte system and got confused.  (Of course, that would still be pretty big.)  But before I got my arithmetic straight, I ran the 4 exabyte figure past a couple of folks, as a target for the size of the US government’s largest classified database. Best guess turns out to be that it’s 1-2 orders of magnitude too high for the government’s largest database, not 3.  But that’s only a guess …

May 21, 2009

Notes on CEP application development

While performance may not be all that great a source of CEP competitive differentiation, event processing vendors find plenty of other bases for technological competition, including application development, analytics, packaged applications, and data integration. In particular:

So far as I can tell, the areas of applications and analytics are fairly uncontroversial. Different CEP vendors have implemented different kinds of things, no doubt focusing on those they thought they would find easiest to build and then sell. But these seem to be choices in business execution, not in core technical philosophy.

In CEP application development, however, real philosophical differences do seem to arise. There are at least three different CEP application development paradigms: Read more

May 21, 2009

Notes on CEP performance

I’ve been talking to CEP vendors on and off for a few years. So what I hear about performance is fairly patchwork. On the other hand, maybe 1-2+ year-old figures of per-core performance are still meaningful today. After all, Moore’s Law is being reflected more in core count than per-core performance, and it seems CEP vendors’ development efforts haven’t necessarily been concentrated on raw engine speed.

So anyway, what do you guys have to add to the following observations?

May 18, 2009

Followup on IBM System S/InfoSphere Streams

After posting about IBM’s System S/InfoSphere Streams CEP offering, I sent three followup questions over to Jeff Jones.  It seems simplest to just post the Q&A verbatim.

1.  Just how many processors or cores does it take to get those 5 million messages/sec through? A little birdie says 4,000 cores. Read more

May 15, 2009

MySQL forking heats up, but not yet to the benefit of non-GPLed storage engine vendors

Last month, I wrote “This is a REALLY good time to actively strengthen the MySQL forkers,” largely on behalf of closed-source/dual-source MySQL storage engine vendors such as Infobright, Kickfire, Calpont, Tokutek, or ScaleDB. Yesterday, two of my three candidates to lead the effort — namely Monty Widenius/MariaDB/Monty Program AB and Percona — came together to form something called the Open Database Alliance.  Details may be found:

But there’s no joy for the non-GPLed MySQL storage engine vendors in the early news. Read more

May 14, 2009

Facebook’s experiences with compression

One little topic didn’t make it into my long post on Facebook’s Hadoop/Hive-based data warehouse: Compression. The story seems to be:

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.