February 18th, 2008 Curt Monash
Last week, Dan Weinreb tipped me off to something very cool: Mike Stonebraker and a group of MIT/Brown/Yale colleagues are calling for a complete rewrite of OLTP DBMS. And they have a plan for how to do it, called H-Store, as per a paper and an associated slide presentation.
Read the rest of this entry »
Posted in Database diversity, Database theory and practice, H-Store, Memory-centric data management, Michael Stonebraker, OLTP database management | 28 Comments »
February 16th, 2008 Curt Monash
In a response to my recent five-part series on DBMS diversity, Mike Stonebraker has proposed his own taxonomy of data management technologies over on Vertica’s Database Column blog.
- OLTP DBMSs focused on fast, reliable transaction processing
- Analytic/Data Warehouse DBMSs focused on efficient load and ad-hoc query performance
-
Science DBMSs — after all MatLab does not scale to disk-sized arrays
- RDF stores focused on efficiently storing semi-structured data in this format
-
XML stores focused on semi-structured data in this format
- Search engines — the big players all use proprietary engines in this area
- Stream Processing Engines focused on real-time StreamSQL
- “Lean and Mean,” less-than-a-database engines focused on doing a small number of things very well (embedded databases are probably in this category)
- MapReduce and Hadoop — after all Google has enough “throw weight” to define a category
He goes on to say that each will be architected differently, except that — as he already convinced me back in July — RDF will be well-managed by specialty data warehouse DBMS. Read the rest of this entry »
Posted in Data types, Database diversity, Database theory and practice, Michael Stonebraker, Mid-range DBMS, OLTP database management, RDF and graphs, Relational database management systems | No Comments »
February 8th, 2008 Curt Monash
Please do not rely on the parts of the post below that are about ParAccel. See our February 18 post about ParAccel instead.
I’ve already posted about a chat I had with Mike Stonebraker regarding Vertica yesterday. I naturally raised the subject of load speed, unaware that Mike’s colleague Stan Zlodnik had posted at length about load speed the day before. Given that post, it seems timely to go into a bit more detail, and in particular to address three questions:
- Can columnar DBMS do operational BI?
- Can columnar DBMS do ELT (Extract-Load-Transform, as opposed to ETL)?
- Are columnar DBMS’ load speeds a problem other than in issues #1 and #2?
Read the rest of this entry »
Posted in Analytics and analytic technologies, Business intelligence, Columnar architectures, Data warehousing, Database theory and practice, EII, ETL, and/or EAI, Michael Stonebraker, ParAccel, Sybase, Vertica Systems | No Comments »
February 7th, 2008 Curt Monash
While chatting with Mike Stonebraker today, I finally understood why he and Dave DeWitt launched the Great MapReduce Debate:
It was all about academia.
DeWitt noticed cases where study of MapReduce replaced study of real database management in the computer science curriculum. And he thought some MapReduce-related research papers were at best misleading. So DeWitt and Stonebraker decided to set the record straight.
Fireworks ensued.
Posted in Google, BigTable, and MapReduce, Michael Stonebraker | 5 Comments »
February 7th, 2008 Curt Monash
I chatted with Andy Ellicott and Mike Stonebraker of Vertica today. Some of the content is embargoed until February 19 (for TDWI), but here are some highlights of the rest.
- Vertica now is “approaching” 50 paid customers, up from 15 or so in early November. (Compared to most of Vertica’s fellow data warehouse specialists, that’s a lot.) Many — perhaps most — of these customers are hedge funds or telcos.
- Vertica’s typical lag from sale to deployment is about one quarter.
- Vertica’s typical initial selling price is $250K. Or maybe it’s $100-150K. The Vertica guys are generally pretty forthcoming, but pricing is an exception. Whatever they charge, it’s strictly per terabyte of user data. They think they are competitive with other software vendors, and cheaper, all-in, than appliance vendors.
- One subject on which they’re totally non-forthcoming (lawyers’ orders) is the recent patent lawsuit filed by Sybase. They wouldn’t even say whether they thought it was bogus because they didn’t infringe, or whether they thought it was bogus because the patent shouldn’t have been granted.
- Average Vertica database size is a little under 10 terabytes of user data, with many examples in the 15-20 Tb range. Lots of customers plan to expand to 50-100 Tb.
- Vertica claims sustainable load speeds of 3-5 megabytes/sec/node, irrespective of database size. Data is sucked into RAM uncompressed, then written out a gig/node at a time, compressed. Gigabyte chunks are then merged on disk, which is superfast as it doesn’t involve sorting. (30 megabytes/second.) Mike insists this doesn’t compromise compression.
We also addressed the subject of Vertica’s schema assumptions, but I’ll leave that to another post.
Please sign up for our feed!
Posted in Analytics and analytic technologies, Data warehousing, Michael Stonebraker, Relational database management systems, Sybase, Vertica Systems | 5 Comments »
January 18th, 2008 Curt Monash
Google’s highly parallel file manipulator MapReduce has gotten great attention recently, after a research paper revealed:
- MapReduce is running the core Google search engine, plus much of Google Analytics and other applications.
- MapReduce is processing 400+ petabytes of data per month.
(Niall Kennedy popularized the paper and surveyed its results.)
David DeWitt and Mike Stonebraker then launched a blistering attack on MapReduce, accusing it of disregarding almost all the lessons of database management system theory and practice. A vigorous comment thread has ensued, pointing out that MapReduce is not a DBMS and asserting it therefore shouldn’t be judged as one.
While correct, that defense begs the question – what is MapReduce good for? Proponents of MapReduce highlight two advantages:
- MapReduce makes it very easy to program data transformations, including ones to which relational structures are of little relevance.
- MapReduce runs in massively parallel mode “for free,” without extra programming.
Based on those advantages, MapReduce would indeed seem to have significant uses, including: Read the rest of this entry »
Posted in Cloud computing, Google, BigTable, and MapReduce, Michael Stonebraker | 3 Comments »
September 18th, 2007 Curt Monash
Back in March, I suggested that compression was a central and compelling aspect of Vertica’s story. Well, in their new blog, the Vertica guys now strongly reinforce that impression.
I recommend those two Database Column posts (by Sam Madden) highly. I’ve rarely seen such a clear, detailed presentation of a company’s technical argument. My own thoughts on the subject boil down to:
- In principle, all the technology (and hence all the technological advantages) they’re talking about could be turned into features of one of the indexing options of a row-oriented RDBMS. But in practice, there’s no indication that this will happen any time soon.
-
Release 1 of the Vertica product will surely have many rough edges.
- Some startups are surprisingly ignorant of the issue involved in building a successful, industrial-strength DBMS. But a company that has both Mike Stonebraker and Jerry Held seriously involved has a big advantage. They may make other kinds of errors, but they won’t make many ignorant ones.
Technorati Tags: Vertica, database compression, columnar
Posted in Columnar architectures, Data warehousing, Database compression, Database theory and practice, Michael Stonebraker, Relational database management systems, Vertica Systems | 4 Comments »
September 6th, 2007 Curt Monash
In the first “meat” — i.e., other than housekeeping — post on the new Database Column blog, Mike Stonebraker makes three core claims:
1. Different DBMS should be used for different purposes. I am in violent agreement with that point, which is indeed a major theme of this blog.
2. Vertica’s software is 50X faster than anything non-columnar and 10X faster than anything columnar. Now, some of these stats surely come from the syndrome of comparing the future release of your product, as tuned by world’s greatest experts on it who also hope to get rich on their stock options in your company, vs. some well-established production release of your competitors’ products, tuned to an unknown level of excellence,* with the whole thing running test queries that you, in your impartial wisdom, deem representative of user needs. Or something like that … Read the rest of this entry »
Posted in Columnar architectures, Data warehousing, Database diversity, Database theory and practice, Michael Stonebraker, OLTP database management, Relational database management systems, Specialized data management in general, TransRelational | 2 Comments »
June 18th, 2007 Curt Monash
Mike Stonebraker wrote in with one “nit pick” about yesterday’s blog. I had credited Truviso for strong DBMS/stream processor integration. He shot back that StreamBase has Sleepycat integrated in-process. He further pointed out that a Sleepycat record lookup takes only 5 microseconds if the data is in cache. Assuming what he means is that it’s in Sleepycat’s cache, that would be tight integration indeed.
I wonder whether StreamBase will indefinitely rely on Sleepycat, which is of course now an Oracle product …
Want to continue getting great research about DBMS, analytics, data integration, and other technologies related to data management? Get a FREE subscription by RSS/Atom or e-mail! We recommend taking the integrated feed for all our blogs, but blog-specific ones are also easily available.
Technorati Tags: StreamBase, Sleepycat
Posted in Complex event/stream processing (CEP), Memory-centric data management, Michael Stonebraker, Oracle, StreamBase | No Comments »
June 18th, 2007 Curt Monash
After my call with Truviso and blog post referencing same, I had the chance to discuss stream processing with Mike Stonebraker, who among his many distinctions is also StreamBase’s Founder/CTO. We focused almost exclusively on the financial trading market. Here are some of the highlights. Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Memory-centric data management, Michael Stonebraker, StreamBase, Truviso | No Comments »
March 24th, 2007 Curt Monash
The following is by Mike Stonebraker, CTO of Vertica Systems, copyright 2007, as part of our ongoing discussion of data compression. My comments are in a separate post.
Row Store Compression versus Column Store Compression
I Introduction
There are three aspects of space requirements, which we discuss in this short note, namely:
structural space requirements
index space requirements
attribute space requirements.
Read the rest of this entry »
Posted in Data warehousing, Database compression, Database theory and practice, Michael Stonebraker, Vertica Systems | 3 Comments »
January 23rd, 2007 Curt Monash
Data warehouse appliance opponents like to argue that history is conclusively on their side. Database machine maker Britton-Lee, eventually bought by Teradata, fizzled. LISP machines were a spectacular failure. Rational Software’s origins as a special-purpose Ada machine maker had to be renounced before the company could succeed.
But the true story is more mixed. Teradata continues to this day as a major data warehouse technology player, and as far as I’m concerned Teradata indeed makes appliances. If we look further than the applications stack, we find that appliances actually occupy a large and growing share of the computing market. So a persuasive anti-appliance argument has to do more than just invoke the names of Britton-Lee and Symbolics.
I just ran across an article by MIT professor Samuel Madden that attempts to make such a case. And his MIT colleague Mike Stonebraker made similar arguments to me a few days ago. They are not wholly unbiased; indeed, both are involved in Vertica Systems. With that caveat, they have an interesting three-part argument:
Read the rest of this entry »
Posted in Data warehouse appliances, Data warehousing, Michael Stonebraker, Relational database management systems | 1 Comment »