Michael Stonebraker
Discussion of the views of database management pioneer Mike Stonebraker. Related subjects include:
- Vertica
- StreamBase
- H-Store
- (in Software Memories) Stonebraker’s accomplishments
Three big myths about MapReduce
Once again, I find myself writing and talking a lot about MapReduce. But I suspect that MapReduce-related conversations would go better if we overcame three fairly common MapReduce myths:
- MapReduce is something very new
- MapReduce involves strict adherence to the Map-Reduce programming paradigm
- MapReduce is a single technology
| Categories: Analytic technologies, Aster Data, Cloudera, Data warehousing, Google, Greenplum, Hadoop, Log analysis, MapReduce, Michael Stonebraker, Parallelization, Web analytics | 11 Comments |
Introduction to the XLDB and SciDB projects
Before I write anything else about the overlapping efforts known as XLDB and SciDB, I probably should explain and disambiguate what they are as best I can. XLDB was organized and still is run by guys who want to solve a scientific problem in eXtremely Large DataBase Management, most especially Jacek Becla of SLAC (the organization previously known as Stanford Linear Accelerator Center). Becla’s original motivation was that he needs a DBMS to manage what will be 55 petabytes of raw image data and 100 petabytes of astronomical data total for LSST (Large Synoptic Survey Telescope). Read more
| Categories: Data models and architecture, Database diversity, eBay, Michael Stonebraker, Open source, Petabyte-scale data management, Scientific research, Theory and architecture | 2 Comments |
NoSQL?
Eric Lai emailed today to ask what I thought about the NoSQL folks, and especially whether I thought their ideas were useful for enterprises in general, as opposed to just Web 2.0 companies. That was the first I heard of NoSQL, which seems to be a community discussing SQL alternatives popular among the cloud/big-web-company set, such as BigTable, Hadoop, Cassandra and so on. My short answers are:
- In most cases, no.
- Most of these technologies are designed for simple, high-volume OLTP (OnLine Transaction Processing.) Most large enterprises have an established way of doing OLTP, probably via relational database management systems. Why change?
- MapReduce is an exception, in that it’s designed for analytics. MapReduce may be useful for enterprises. But where it is, it probably should be integrated into an analytic DBMS.
- There’s one big countervailing factor to all these generalities — schema flexibility.
As for the longer form, let me start by noting that there are two main kinds of reason for not liking SQL. Read more
There always seems to be a fire drill around MapReduce news
Last August I flew out to see my new clients at Greenplum. They told me they planned to roll out MapReduce in a few weeks, and asked for my help in publicizing it. From their offices I went to dinner with non-clients Aster Data, who told me they’d gotten wind of a Greenplum MapReduce announcement and planned to come out ahead of it. A couple of hours later, Aster signed up as a client. In something of a pickle — but not one of my own making — I knocked heads, and persuaded both vendors to announce MapReduce at the same time, namely the following Monday. Lots of publicity ensued for both vendors, and everybody was reasonably satisfied.
| Categories: About this blog, Analytic technologies, Aster Data, Greenplum, MapReduce, Michael Stonebraker, Vertica Systems | Leave a Comment |
Stonebraker, DeWitt, et al. compare MapReduce to DBMS
Along with five other coauthors — the lead author seems to be Andy Pavlo — famous MapReduce non-fans Mike Stonebraker and David DeWitt have posted a SIGMOD 2009 paper called “A Comparison of Approaches to Large-Scale Data Analysis.” The heart of the paper is benchmarks of Hadoop, Vertica, and “DBMS-X” on identical clusters of 100 low-end nodes., across a series of tests including (if I understood correctly):
- A couple of different flavors of a Grep task originally proposed in a Google MapReduce paper.
- A database query on simulated clickstream data
- A join on the same clickstream data.
- Two aggregations on the clickstream data.
| Categories: Analytic technologies, Hadoop, MapReduce, Michael Stonebraker, Parallelization, Vertica Systems | 6 Comments |
New England Database Day this Friday January 30
Dan Weinreb, to whose opinions I usually give great weight, spoke very favorably of last year’s New England Database Day conference. Well, this year’s is taking place on Friday. It’s at MIT and it’s free, with easy registration. A list of papers is here.
It’s pretty obvious who’s running the show. Sam Madden’s name is given as a contact; elsewhere it’s referred to as being organized by Madden and Mike Stonebraker. Of the six identified papers, 2-3 look like the subjects or people could be taken straight from Vertica’s Database Column blog. But that hardly means the event will be one long Vertica commercial. For example, the other papers include one from Netezza and one on Flash memory data access methods.
I really doubt I’ll make to Cambridge in time for the 9:00 am opening remarks
, but I’ll try to swing by later on.
| Categories: Michael Stonebraker, Theory and architecture, Vertica Systems | 3 Comments |
Mike Stonebraker’s counterarguments to MapReduce’s popularity
In response to recent posting I’ve done about MapReduce, Mike Stonebraker just got on the phone to give me his views. His core claim, more or less, is that anything you can do in MapReduce you could already do in a parallel database that complies with SQL-92 and/or has PostgreSQL underpinnnings. In particular, Mike says: Read more
| Categories: Data warehousing, MapReduce, Michael Stonebraker, PostgreSQL | 5 Comments |
Microsoft is buying DATAllegro
I’ve long argued that:
- Oracle and Microsoft are doomed in the data warehouse market unless they acquire MPP/shared-nothing data warehouse DBMS and/or data warehouse appliances.
- DATAllegro is the ideal acquisition for either of them.
Microsoft has now validated my claim by agreeing to buy DATAllegro. As you probably know, we’ve been covering DATAllegro extensively, as per the links listed below.
Basic deal highlights include: Read more
Mike Stonebraker may be oversimplifying data warehousing just a tad
Mike Stonebraker has now responded to the second post in my five-part database diversity series. Takeaways and rejoinders include: Read more
| Categories: Analytic technologies, Columnar database management, Data warehousing, Database diversity, Michael Stonebraker, Theory and architecture, Vertica Systems | 2 Comments |
Mike Stonebraker calls for the complete destruction of the old DBMS order
Last week, Dan Weinreb tipped me off to something very cool: Mike Stonebraker and a group of MIT/Brown/Yale colleagues are calling for a complete rewrite of OLTP DBMS. And they have a plan for how to do it, called H-Store, as per a paper and an associated slide presentation.
