January 18, 2008

The Great MapReduce Debate

Google’s highly parallel file manipulator MapReduce has gotten great attention recently, after a research paper revealed:

(Niall Kennedy popularized the paper and surveyed its results.)

David DeWitt and Mike Stonebraker then launched a blistering attack on MapReduce, accusing it of disregarding almost all the lessons of database management system theory and practice. A vigorous comment thread has ensued, pointing out that MapReduce is not a DBMS and asserting it therefore shouldn’t be judged as one.

While correct, that defense begs the question – what is MapReduce good for? Proponents of MapReduce highlight two advantages:

  1. MapReduce makes it very easy to program data transformations, including ones to which relational structures are of little relevance.
  2. MapReduce runs in massively parallel mode “for free,” without extra programming.

Based on those advantages, MapReduce would indeed seem to have significant uses, including: Read more

January 18, 2008

A sane article from a strict relational advocate

Anybody who cites — with approval — both Fabian Pascal and Joe Celko can’t be all bad. “Why Programmers Don’t Like Relational Databases” is a bit polemic, but on the whole it’s a good reminder of why relational-bashing often is overdone.

Personally, I think the applications for which traditional schema-heavy relational/SQL programming make sense are less interesting that those for which it doesn’t — but the world is indeed chock full of less interesting tasks.

January 16, 2008

Open source DBMS as a business model

Sun’s planned acquisition of MySQL is inspiring a lot of discussion about open source business models. Typical is Michael Arrington’s cheerleading for the idea that you can make a lot of money with open source. More interesting is Gordon Haff’s suggestion that it’s a lot easier to make money with open source when you have other things to actually sell to the same customers (e.g., the rest of Sun’s product line). (A similar view can be found here.)

To analyze this more carefully, it helps to distinguish among three different aspects of open source models:

Here’s what I think about each in the case of database management systems. Read more

January 16, 2008

Fixing Twitter in three letters: CEP

There’s a lot of agitation today because Twitter broke under the message volume generated during Steve Jobs’ Macworld keynote. I don’t know what that volume was, but I just checked the lower volume of tweets (i.e., updates) going through the “public timeline” (i.e., everything) twice, and both times it was under 200 messages per minute. So, let’s say there’s a much higher volume at peak times, and also hypothesize that Twitter would like to grow a lot, and say that Twitter would like to handle 10-100,000 messages/minute – i.e., 1000+/second — as soon as possible.

That’s easy using CEP (Complex Event Processing). A Twitter update is just a string of 140 or fewer characters. It is associated with three pieces of metadata – author, time, and mode of posting. It should be visible in real time to any of the author’s “followers,” as well as in a single public timeline; perhaps there will be other kinds of Twitter channels in the future. In most cases, these updates are only visible to a user upon page refresh. Almost nNo Twitter user seems to have more than about 7,000 followers, even Robert Scoble or Evan Williams.* The average number of followers, at least among active updaters, is probably in the low hundreds now. So basically, this is all a heckuva lot easier than the tick-monitoring systems Wall Street firms are using today.

*I believe there’s a hard cap of 7,500, but nobody seems to have bumped against it yet.Twitterholic gives a different figure than Twitter does for Scoble. And it correctly shows Dave Troy with a little over 10,000.

Here’s how to implement that. Read more

January 16, 2008

The blogosphere writes about Sun buying MySQL

More from me soon, but first here is a survey of what other people are saying about Sun’s billion-dollar deal to acquire MySQL:

January 16, 2008

Things could get interesting for Infobright

Of the many new specialty data warehouse DBMS and appliances, Infobright’s BrightHouse is the only leading one based on MySQL. I expect Sun and Infobright to have some interesting conversations now. Conversely, I wouldn’t be optimistic about any partnering discussions Infobright might have with, say, HP.

The most directly competitive relationship Sun now has to any future Infobright partnership is with ParAccel.

January 16, 2008

The other shoe finally drops for Oracle and BEA

As previously noted, I’ve been writing about an Oracle/BEA merger since 2002. So like many observers, I find I have little more to say on the subject. Let’s go straight to the bullet points: Read more

January 14, 2008

Martin MC Brown likes Bento

Apple/FileMaker has a new low-end personal database product called Bento. It’s Mac-only and cheap. My former Computerworld blogging colleague Martin MC Brown likes it. That’s a solid recommendation.

Edit: Fixed the link.

January 14, 2008

Forrester collects business intelligence buzzwords

Forrester says “It’s time to reinvent your BI strategy.” No argument there. And they have an article, charts, and a white paper to back it up. A lot of the details are quite dubious, like the chart in which they declared that columnar RDBMS aren’t relational. Still, the article is worth surveying to see if you have any “I hadn’t thought of that!” moments.

I particularly like this diagram, which has 27 layers, containing approximately 2 1/2 BI-related buzzphrases each.

January 14, 2008

LongJump is probably doing something interesting

According to VentureBeat, LongJump is offering a SaaS version of a “relational database architecture.” It’s also a “simple XML server.” And there are apps and workflow management. According to LongJump itself, there is “full search” and “wide palette of field types” and “multi-app mashup.” And since it’s a SaaS offering, the LongJump website also spends a whole page telling us how wonderful RackSpace is.

If VentureBeat got the “relational” part of the story wrong — perhaps out of confusion with Longjump’s parent company’s name “Relationals” — then the rest of it kind of hangs together: XML, composite apps, and so on. Otherwise — well, relational access, XML, and search can certainly be combined in a single package, as per MarkLogic, Attivio, or for that matter Oracle, DB2, and Microsoft SQL Server. But all that and apps and app dev too seem a lot to bite off for a single self-funded startup.

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.