January 18, 2008

The Great MapReduce Debate

Google’s highly parallel file manipulator MapReduce has gotten great attention recently, after a research paper revealed:

MapReduce is running the core Google search engine, plus much of Google Analytics and other applications.
MapReduce is processing 400+ petabytes of data per month.

(Niall Kennedy popularized the paper and surveyed its results.)

David DeWitt and Mike Stonebraker then launched a blistering attack on MapReduce, accusing it of disregarding almost all the lessons of database management system theory and practice. A vigorous comment thread has ensued, pointing out that MapReduce is not a DBMS and asserting it therefore shouldn’t be judged as one.

While correct, that defense begs the question – what is MapReduce good for? Proponents of MapReduce highlight two advantages:

MapReduce makes it very easy to program data transformations, including ones to which relational structures are of little relevance.
MapReduce runs in massively parallel mode “for free,” without extra programming.

Based on those advantages, MapReduce would indeed seem to have significant uses, including: Read more

Categories: Cloud computing, MapReduce, Michael Stonebraker

10 Comments

January 18, 2008

A sane article from a strict relational advocate

Anybody who cites — with approval — both Fabian Pascal and Joe Celko can’t be all bad. “Why Programmers Don’t Like Relational Databases” is a bit polemic, but on the whole it’s a good reminder of why relational-bashing often is overdone.

Personally, I think the applications for which traditional schema-heavy relational/SQL programming make sense are less interesting that those for which it doesn’t — but the world is indeed chock full of less interesting tasks.

Categories: Data models and architecture, Theory and architecture

Open source DBMS as a business model

Sun’s planned acquisition of MySQL is inspiring a lot of discussion about open source business models. Typical is Michael Arrington’s cheerleading for the idea that you can make a lot of money with open source. More interesting is Gordon Haff’s suggestion that it’s a lot easier to make money with open source when you have other things to actually sell to the same customers (e.g., the rest of Sun’s product line). (A similar view can be found here.)

To analyze this more carefully, it helps to distinguish among three different aspects of open source models:

Open source product packaging
Open source product development
Open source pricing

Here’s what I think about each in the case of database management systems. Read more

Categories: MySQL, Open source

5 Comments

January 16, 2008

Fixing Twitter in three letters: CEP

There’s a lot of agitation today because Twitter broke under the message volume generated during Steve Jobs’ Macworld keynote. I don’t know what that volume was, but I just checked the lower volume of tweets (i.e., updates) going through the “public timeline” (i.e., everything) twice, and both times it was under 200 messages per minute. So, let’s say there’s a much higher volume at peak times, and also hypothesize that Twitter would like to grow a lot, and say that Twitter would like to handle 10-100,000 messages/minute – i.e., 1000+/second — as soon as possible.

That’s easy using CEP (Complex Event Processing). A Twitter update is just a string of 140 or fewer characters. It is associated with three pieces of metadata – author, time, and mode of posting. It should be visible in real time to any of the author’s “followers,” as well as in a single public timeline; perhaps there will be other kinds of Twitter channels in the future. In most cases, these updates are only visible to a user upon page refresh. Almost nNo Twitter user seems to have more than about 7,000 followers, even Robert Scoble or Evan Williams.* The average number of followers, at least among active updaters, is probably in the low hundreds now. So basically, this is all a heckuva lot easier than the tick-monitoring systems Wall Street firms are using today.

*~~I believe there’s a hard cap of 7,500, but nobody seems to have bumped against it yet.~~Twitterholic gives a different figure than Twitter does for Scoble. And it correctly shows Dave Troy with a little over 10,000.

Here’s how to implement that. Read more

Categories: Aleri and Coral8, Memory-centric data management, StreamBase, Streaming and complex event processing (CEP)

13 Comments

January 16, 2008

The blogosphere writes about Sun buying MySQL

More from me soon, but first here is a survey of what other people are saying about Sun’s billion-dollar deal to acquire MySQL:

Jeremy Cole, evidently a very experienced high-end MySQL user, itemizes some serious problems with MySQL — optimizer, memory management, replication, and so on. (Uh, Jeremy — what part of the product do you like?) He also echoes a theme I’ve seen elsewhere, and to some extent noticed myself; MySQL has had a lot of management issues as a company.
Jeffrey McManus calls out Sun’s promise to continue to support non-Java programming languages in MySQL. Kaj Arnö of MySQL makes the point emphatically, reciting a list of operating systems and development environments/languages MySQL will continue to support.
Matt Asay quite reasonably interprets Sun’s move as a bid for overall leadership and development of the open source software platform industry. I would add that Sun CEO Jonathon Schwartz came up through the software side of the business. I would further add that Sun has a dismal track record with closed-source software acquisitions, including Forte’, NetDynamics, and the enterprise side of Netscape.
Matt also has selected quotes from the press conference, including Sun saying the coopetitionally obvious “Yeah, we’ll continue serious support for PostgreSQL and Oracle too.” Brian Aker also supports the PostgreSQL point.
Zack Urlocker of MySQL implies that Jonathon Schwartz was very involved in the deal personally. That makes all kinds of sense.
451 Group has some interesting links, and don’t miss the short comment thread.
The official MySQL and Sun company lines are summarized in this Zack Urlocker post on Infoworld (as well as some of the links above) and this post from Jonathon Schwartz of Sun.

Categories: MySQL, Open source, PostgreSQL

2 Comments

January 16, 2008

Things could get interesting for Infobright

Of the many new specialty data warehouse DBMS and appliances, Infobright’s BrightHouse is the only leading one based on MySQL. I expect Sun and Infobright to have some interesting conversations now. Conversely, I wouldn’t be optimistic about any partnering discussions Infobright might have with, say, HP.

The most directly competitive relationship Sun now has to any future Infobright partnership is with ParAccel.

Categories: Analytic technologies, Data warehousing, Infobright, MySQL, Open source, ParAccel

2 Comments

January 16, 2008

The other shoe finally drops for Oracle and BEA

As previously noted, I’ve been writing about an Oracle/BEA merger since 2002. So like many observers, I find I have little more to say on the subject. Let’s go straight to the bullet points: Read more

Categories: HP and Neoview, IBM and DB2, Oracle, Oracle TimesTen, SAP AG

2 Comments

January 14, 2008

Martin MC Brown likes Bento

Apple/FileMaker has a new low-end personal database product called Bento. It’s Mac-only and cheap. My former Computerworld blogging colleague Martin MC Brown likes it. That’s a solid recommendation.

Edit: Fixed the link.

Categories: FileMaker

Forrester collects business intelligence buzzwords

Forrester says “It’s time to reinvent your BI strategy.” No argument there. And they have an article, charts, and a white paper to back it up. A lot of the details are quite dubious, like the chart in which they declared that columnar RDBMS aren’t relational. Still, the article is worth surveying to see if you have any “I hadn’t thought of that!” moments.

I particularly like this diagram, which has 27 layers, containing approximately 2 1/2 BI-related buzzphrases each.

Categories: Analytic technologies, Business intelligence

3 Comments

January 14, 2008

LongJump is probably doing something interesting

According to VentureBeat, LongJump is offering a SaaS version of a “relational database architecture.” It’s also a “simple XML server.” And there are apps and workflow management. According to LongJump itself, there is “full search” and “wide palette of field types” and “multi-app mashup.” And since it’s a SaaS offering, the LongJump website also spends a whole page telling us how wonderful RackSpace is.

If VentureBeat got the “relational” part of the story wrong — perhaps out of confusion with Longjump’s parent company’s name “Relationals” — then the rest of it kind of hangs together: XML, composite apps, and so on. Otherwise — well, relational access, XML, and search can certainly be combined in a single package, as per MarkLogic, Attivio, or for that matter Oracle, DB2, and Microsoft SQL Server. But all that and apps and app dev too seem a lot to bite off for a single self-funded startup.

Categories: Software as a Service (SaaS)

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

The Great MapReduce Debate

A sane article from a strict relational advocate

Open source DBMS as a business model

Fixing Twitter in three letters: CEP

The blogosphere writes about Sun buying MySQL

Things could get interesting for Infobright

The other shoe finally drops for Oracle and BEA

Martin MC Brown likes Bento

Forrester collects business intelligence buzzwords

LongJump is probably doing something interesting

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin