The Great MapReduce Debate
Google’s highly parallel file manipulator MapReduce has gotten great attention recently, after a research paper revealed:
- MapReduce is running the core Google search engine, plus much of Google Analytics and other applications.
- MapReduce is processing 400+ petabytes of data per month.
(Niall Kennedy popularized the paper and surveyed its results.)
David DeWitt and Mike Stonebraker then launched a blistering attack on MapReduce, accusing it of disregarding almost all the lessons of database management system theory and practice. A vigorous comment thread has ensued, pointing out that MapReduce is not a DBMS and asserting it therefore shouldn’t be judged as one.
While correct, that defense begs the question – what is MapReduce good for? Proponents of MapReduce highlight two advantages:
- MapReduce makes it very easy to program data transformations, including ones to which relational structures are of little relevance.
- MapReduce runs in massively parallel mode “for free,” without extra programming.
Based on those advantages, MapReduce would indeed seem to have significant uses, including: Read more
Categories: Cloud computing, MapReduce, Michael Stonebraker | 10 Comments |
A sane article from a strict relational advocate
Anybody who cites — with approval — both Fabian Pascal and Joe Celko can’t be all bad. “Why Programmers Don’t Like Relational Databases” is a bit polemic, but on the whole it’s a good reminder of why relational-bashing often is overdone.
Personally, I think the applications for which traditional schema-heavy relational/SQL programming make sense are less interesting that those for which it doesn’t — but the world is indeed chock full of less interesting tasks.
Open source DBMS as a business model
Sun’s planned acquisition of MySQL is inspiring a lot of discussion about open source business models. Typical is Michael Arrington’s cheerleading for the idea that you can make a lot of money with open source. More interesting is Gordon Haff’s suggestion that it’s a lot easier to make money with open source when you have other things to actually sell to the same customers (e.g., the rest of Sun’s product line). (A similar view can be found here.)
To analyze this more carefully, it helps to distinguish among three different aspects of open source models:
- Open source product packaging
- Open source product development
- Open source pricing
Here’s what I think about each in the case of database management systems. Read more
Categories: MySQL, Open source | 5 Comments |
Fixing Twitter in three letters: CEP
There’s a lot of agitation today because Twitter broke under the message volume generated during Steve Jobs’ Macworld keynote. I don’t know what that volume was, but I just checked the lower volume of tweets (i.e., updates) going through the “public timeline” (i.e., everything) twice, and both times it was under 200 messages per minute. So, let’s say there’s a much higher volume at peak times, and also hypothesize that Twitter would like to grow a lot, and say that Twitter would like to handle 10-100,000 messages/minute – i.e., 1000+/second — as soon as possible.
That’s easy using CEP (Complex Event Processing). A Twitter update is just a string of 140 or fewer characters. It is associated with three pieces of metadata – author, time, and mode of posting. It should be visible in real time to any of the author’s “followers,” as well as in a single public timeline; perhaps there will be other kinds of Twitter channels in the future. In most cases, these updates are only visible to a user upon page refresh. Almost nNo Twitter user seems to have more than about 7,000 followers, even Robert Scoble or Evan Williams.* The average number of followers, at least among active updaters, is probably in the low hundreds now. So basically, this is all a heckuva lot easier than the tick-monitoring systems Wall Street firms are using today.
*I believe there’s a hard cap of 7,500, but nobody seems to have bumped against it yet.Twitterholic gives a different figure than Twitter does for Scoble. And it correctly shows Dave Troy with a little over 10,000.
Here’s how to implement that. Read more
Categories: Aleri and Coral8, Memory-centric data management, StreamBase, Streaming and complex event processing (CEP) | 13 Comments |
The blogosphere writes about Sun buying MySQL
More from me soon, but first here is a survey of what other people are saying about Sun’s billion-dollar deal to acquire MySQL:
- Jeremy Cole, evidently a very experienced high-end MySQL user, itemizes some serious problems with MySQL — optimizer, memory management, replication, and so on. (Uh, Jeremy — what part of the product do you like?) He also echoes a theme I’ve seen elsewhere, and to some extent noticed myself; MySQL has had a lot of management issues as a company.
- Jeffrey McManus calls out Sun’s promise to continue to support non-Java programming languages in MySQL. Kaj Arnö of MySQL makes the point emphatically, reciting a list of operating systems and development environments/languages MySQL will continue to support.
- Matt Asay quite reasonably interprets Sun’s move as a bid for overall leadership and development of the open source software platform industry. I would add that Sun CEO Jonathon Schwartz came up through the software side of the business. I would further add that Sun has a dismal track record with closed-source software acquisitions, including Forte’, NetDynamics, and the enterprise side of Netscape.
- Matt also has selected quotes from the press conference, including Sun saying the coopetitionally obvious “Yeah, we’ll continue serious support for PostgreSQL and Oracle too.” Brian Aker also supports the PostgreSQL point.
- Zack Urlocker of MySQL implies that Jonathon Schwartz was very involved in the deal personally. That makes all kinds of sense.
- 451 Group has some interesting links, and don’t miss the short comment thread.
- The official MySQL and Sun company lines are summarized in this Zack Urlocker post on Infoworld (as well as some of the links above) and this post from Jonathon Schwartz of Sun.
Categories: MySQL, Open source, PostgreSQL | 2 Comments |
Things could get interesting for Infobright
Of the many new specialty data warehouse DBMS and appliances, Infobright’s BrightHouse is the only leading one based on MySQL. I expect Sun and Infobright to have some interesting conversations now. Conversely, I wouldn’t be optimistic about any partnering discussions Infobright might have with, say, HP.
The most directly competitive relationship Sun now has to any future Infobright partnership is with ParAccel.
Categories: Analytic technologies, Data warehousing, Infobright, MySQL, Open source, ParAccel | 2 Comments |
The other shoe finally drops for Oracle and BEA
As previously noted, I’ve been writing about an Oracle/BEA merger since 2002. So like many observers, I find I have little more to say on the subject. Let’s go straight to the bullet points: Read more
Categories: HP and Neoview, IBM and DB2, Oracle, Oracle TimesTen, SAP AG | 2 Comments |
Martin MC Brown likes Bento
Apple/FileMaker has a new low-end personal database product called Bento. It’s Mac-only and cheap. My former Computerworld blogging colleague Martin MC Brown likes it. That’s a solid recommendation.
Edit: Fixed the link.
Categories: FileMaker | Leave a Comment |
Forrester collects business intelligence buzzwords
Forrester says “It’s time to reinvent your BI strategy.” No argument there. And they have an article, charts, and a white paper to back it up. A lot of the details are quite dubious, like the chart in which they declared that columnar RDBMS aren’t relational. Still, the article is worth surveying to see if you have any “I hadn’t thought of that!” moments.
I particularly like this diagram, which has 27 layers, containing approximately 2 1/2 BI-related buzzphrases each.
Categories: Analytic technologies, Business intelligence | 3 Comments |
LongJump is probably doing something interesting
According to VentureBeat, LongJump is offering a SaaS version of a “relational database architecture.” It’s also a “simple XML server.” And there are apps and workflow management. According to LongJump itself, there is “full search” and “wide palette of field types” and “multi-app mashup.” And since it’s a SaaS offering, the LongJump website also spends a whole page telling us how wonderful RackSpace is.
If VentureBeat got the “relational” part of the story wrong — perhaps out of confusion with Longjump’s parent company’s name “Relationals” — then the rest of it kind of hangs together: XML, composite apps, and so on. Otherwise — well, relational access, XML, and search can certainly be combined in a single package, as per MarkLogic, Attivio, or for that matter Oracle, DB2, and Microsoft SQL Server. But all that and apps and app dev too seem a lot to bite off for a single self-funded startup.
Categories: Software as a Service (SaaS) | Leave a Comment |