Michael Stonebraker

Discussion of the views of database management pioneer Mike Stonebraker. Related subjects include:

February 19, 2008

Mike Stonebraker may be oversimplifying data warehousing just a tad

Mike Stonebraker has now responded to the second post in my five-part database diversity series. Takeaways and rejoinders include: Read more

February 18, 2008

Mike Stonebraker calls for the complete destruction of the old DBMS order

Last week, Dan Weinreb tipped me off to something very cool: Mike Stonebraker and a group of MIT/Brown/Yale colleagues are calling for a complete rewrite of OLTP DBMS. And they have a plan for how to do it, called H-Store, as per a paper and an associated slide presentation.

Read more

February 16, 2008

Mike Stonebraker’s DBMS taxonomy

In a response to my recent five-part series on DBMS diversity, Mike Stonebraker has proposed his own taxonomy of data management technologies over on Vertica’s Database Column blog. (Edit: Some good stuff disappeared when Vertica nuked that blog.)

  1. OLTP DBMSs focused on fast, reliable transaction processing
  2. Analytic/Data Warehouse DBMSs focused on efficient load and ad-hoc query performance
  3. Science DBMSs — after all MatLab does not scale to disk-sized arrays
  4. RDF stores focused on efficiently storing semi-structured data in this format
  5. XML stores focused on semi-structured data in this format
  6. Search engines — the big players all use proprietary engines in this area
  7. Stream Processing Engines focused on real-time StreamSQL
  8. “Lean and Mean,” less-than-a-database engines focused on doing a small number of things very well (embedded databases are probably in this category)
  9. MapReduce and Hadoop — after all Google has enough “throw weight” to define a category

He goes on to say that each will be architected differently, except that — as he already convinced me back in July — RDF will be well-managed by specialty data warehouse DBMS. Read more

February 8, 2008

Load speeds and related issues in columnar DBMS

Please do not rely on the parts of the post below that are about ParAccel. See our February 18 post about ParAccel instead.

I’ve already posted about a chat I had with Mike Stonebraker regarding Vertica yesterday. I naturally raised the subject of load speed, unaware that Mike’s colleague Stan Zlodnik had posted at length about load speed the day before. Given that post, it seems timely to go into a bit more detail, and in particular to address three questions:

  1. Can columnar DBMS do operational BI?
  2. Can columnar DBMS do ELT (Extract-Load-Transform, as opposed to ETL)?
  3. Are columnar DBMS’ load speeds a problem other than in issues #1 and #2?

Read more

February 7, 2008

Why the Great MapReduce Debate broke out

While chatting with Mike Stonebraker today, I finally understood why he and Dave DeWitt launched the Great MapReduce Debate:

It was all about academia.

DeWitt noticed cases where study of MapReduce replaced study of real database management in the computer science curriculum. And he thought some MapReduce-related research papers were at best misleading. So DeWitt and Stonebraker decided to set the record straight.

Fireworks ensued.

February 7, 2008

Vertica update

I chatted with Andy Ellicott and Mike Stonebraker of Vertica today. Some of the content is embargoed until February 19 (for TDWI), but here are some highlights of the rest.

We also addressed the subject of Vertica’s schema assumptions, but I’ll leave that to another post.

January 18, 2008

The Great MapReduce Debate

Google’s highly parallel file manipulator MapReduce has gotten great attention recently, after a research paper revealed:

(Niall Kennedy popularized the paper and surveyed its results.)

David DeWitt and Mike Stonebraker then launched a blistering attack on MapReduce, accusing it of disregarding almost all the lessons of database management system theory and practice. A vigorous comment thread has ensued, pointing out that MapReduce is not a DBMS and asserting it therefore shouldn’t be judged as one.

While correct, that defense begs the question – what is MapReduce good for? Proponents of MapReduce highlight two advantages:

  1. MapReduce makes it very easy to program data transformations, including ones to which relational structures are of little relevance.
  2. MapReduce runs in massively parallel mode “for free,” without extra programming.

Based on those advantages, MapReduce would indeed seem to have significant uses, including: Read more

September 18, 2007

The core of the Vertica story still seems to be compression

Back in March, I suggested that compression was a central and compelling aspect of Vertica’s story. Well, in their new blog, the Vertica guys now strongly reinforce that impression.

I recommend those two Database Column posts (by Sam Madden) highly. I’ve rarely seen such a clear, detailed presentation of a company’s technical argument. My own thoughts on the subject boil down to:

September 6, 2007

Three bold assertions by Mike Stonebraker

In the first “meat” — i.e., other than housekeeping — post on the new Database Column blog, Mike Stonebraker makes three core claims:

1. Different DBMS should be used for different purposes. I am in violent agreement with that point, which is indeed a major theme of this blog.

2. Vertica’s software is 50X faster than anything non-columnar and 10X faster than anything columnar. Now, some of these stats surely come from the syndrome of comparing the future release of your product, as tuned by world’s greatest experts on it who also hope to get rich on their stock options in your company, vs. some well-established production release of your competitors’ products, tuned to an unknown level of excellence,* with the whole thing running test queries that you, in your impartial wisdom, deem representative of user needs. Or something like that … Read more

June 18, 2007

More on stream processing integration with disk-based DBMS

Mike Stonebraker wrote in with one “nit pick” about yesterday’s blog. I had credited Truviso for strong DBMS/stream processor integration. He shot back that StreamBase has Sleepycat integrated in-process. He further pointed out that a Sleepycat record lookup takes only 5 microseconds if the data is in cache. Assuming what he means is that it’s in Sleepycat’s cache, that would be tight integration indeed.

I wonder whether StreamBase will indefinitely rely on Sleepycat, which is of course now an Oracle product …

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.