July 7, 2009

Hasso Plattner calls for in-memory OLTP column stores

Former SAP CEO Hasso Plattner has written a paper called A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database, in association with a SIGMOD keynote address.* The approach Plattner advocates is an MPP in-memory column store, presumably somewhat akin to SAP’s frequently renamed Business Warehouse Accelerator/Business Intelligence Accelerator/BWA/BIA/Son-of-TREX technology. There also are strong similarities to the MPP in-memory row store project H-Store/VoltDB, although I don’t know whether Plattner would go so far as to adopt the H-Store view that all transactions should run in stored procedures. Unsurprisingly, SAP applications are used as the OLTP paradigm throughout.

*Thanks to Dave Kellogg for tipping me off to Plattner’s paper. I only went to two SIGMOD sessions, neither of which was Plattner’s. Nobody actually mentioned Plattner’s talk to me when I was down at SIGMOD.

Perhaps the most interesting part is Plattner’s claim that what’s demanding about OLTP isn’t database updating per se, but rather maintaining aggregates for quick-response analytics. In his main example of that point, Plattner proposes a real-life “more than 18” table schema, of which 2 are base tables, and (most of?) the rest are materialized views that his proposed database architecture dispenses with (because analytic performance is sufficiently good without them). Thus, Plattner’s core columnar argument seemingly is

columnar –> natively fast analytics –> no need to maintain aggregates –> much lower update burden.

That said — if Plattner’s paper contained a clear statement of how much more expensive it is to insert or update a single row in a columnar vs. row-based system, I overlooked it. Instead, Plattner seems to be arguing that the volume of base-table updates is low enough that — whatever it may be — column-store update overhead is an acceptable price to pay.  (At one point he claims that only 5% of the data inserted in a financial application ever gets changed.) That may actually be true in a financial accounting system, but seems more questionable in a sufficiently large application that gets its updates from automatic devices, or from the consumer web.

Other highlights include: Read more

July 7, 2009

Daniel Abadi has a theory about ParAccel

When I was at SIGMOD last week, ParAccel and its SIGMOD talk were mentioned several times, always in puzzled and at least slightly unflattering terms.  (Typical comment: “Why did they present a paper about that? We were doing the same thing in our company years ago.”) That doesn’t prove much per se, since most of the mentions were by competitors and/or Vertica-affiliated academics, and since my own unflattering ParAccel-related comments were rather fresh at the time.

But now Daniel Abadi has done a brilliant, detailed, speculative analysis of ParAccel’s publications.  Here’s the meat, emphasis mine: Read more

July 2, 2009

Notes on columnar/TPC-H compression

I was chatting with Omer Trajman of Vertica, and he said that a 70% compression figure for ParAccel’s recent TPC-H filing sounded about right.*  When I noted that seemed kind of low, Omer pointed out that TPC-H data is pseudo-random, while real-life data has much more correlation among the values in different columns. E.g., in retail, a customer is likely to consistently shop at the same stores and to put similar items into his shopping basket).

*Omer was involved in Vertica’s TPC-H-data-based load speed benchmark, and is Vertica’s representative to the TPC.

But why does this matter? After all, Vertica compresses one column at a time (unlike, say, Clearpace).  Well, the reason is that Vertica — like other column stores — wants to store different columns in the same row order, for obvious benefits in both reading and writing.  So, for example, if all the rows that include Gotham City are grouped sequentially, then all the rows mentioning Bruce Wayne are likely to be near each other as well, while none of the rows that mention Clark Kent will be mixed in.

And when a set of consecutive entries has low cardinality, it’s easier to get high levels of compression.

July 1, 2009

NoSQL?

Eric Lai emailed today to ask what I thought about the NoSQL folks, and especially whether I thought their ideas were useful for enterprises in general, as opposed to just Web 2.0 companies. That was the first I heard of NoSQL, which seems to be a community discussing SQL alternatives popular among the cloud/big-web-company set, such as BigTable, Hadoop, Cassandra and so on. My short answers are:

As for the longer form, let me start by noting that there are two main kinds of reason for not liking SQL. Read more

July 1, 2009

Correction to a recent quote

I’m quoted in a recent article around Aster’s appliance announcement as saying data warehouse appliances are more suitable for small workgroups of analysts crunching small amounts of data than they are for other uses.

But that’s not what I think at all.

I do think the ease-of-administration pitch for appliances makes them particularly well suited for users who want to scrape by without doing much database adminstration. This is especially appealing to departments or smaller enterprises. And the first/best scenario that comes to mind is indeed a small team of analysts, with good SQL skills but lightweight DBA experience, although Netezza has proved that many other kinds of users can find appliances appealing as well.

But that small team of analysts may maintain the largest database in the firm.

And by the way — notwithstanding the MySpace counterexample, most of Aster’s initial customers had <10 terabyte databases, and I think indeed <5 terabyte. The “frontline” pitch succeeded for Aster before (MySpace again aside) any better-big-data-crunching story did.

June 30, 2009

Is Expressor Software accomplishing anything?

Expressor Software is putting out a ton of press releases to the effect that it has signed up another reseller/systems integration partner or, in some cases, sponsored a webinar.  Less clear is whether Expressor is selling much of anything, delivering product people care about, and so on.  The one time I visited, Expressor told me that user interface was its strength, then showed me something very primitive and explained — as the famed joke* would have it — how good it was going to be.

*That would be the Thrice-Married Virgin, although I’ve recently seen versions in which the poor unfortunate was married 12 times. The last husband on the list is always a computer or software salesman, who keeps telling her how good it is going to be. I first heard the joke from Flip Filipowski. I decided it must not be too terribly sexist after hearing Sandy Kurtzig tell it to a group of stock analysts.

Am I missing anything major?

Edit: I emailed the company on May 8, asking what Expressor had in the way of customers. There has been no response.

June 29, 2009

Xtreme Data readies a different kind of FPGA-based data warehouse appliance

Xtreme Data called me to talk about its plans in the data warehouse appliance business, almost all details of which are currently embargoed. Still, a few points may be worth noting ahead of more precise information, namely:

So far as I can tell, Xtreme Data’s 1.0 product will — like most other 1.0 analytic database management products — be focused on price/performance, without little or no positive differentiation in the way of features.

June 29, 2009

Aster Data enters the appliance game

Aster Data is rolling out a line of nCluster appliances today.  Highlights include:

I don’t have a lot more to add right now, mainly because I wrote at some length about Aster’s non-appliance-specific, non-MapReduce technology and positioning a couple of weeks ago.

June 25, 2009

My current customer list among the analytic DBMS specialists

(This is an updated version of an August, 2008 post.)

One of my favorite pages on the Monash Research website is the list of many current and a few notable past customers. (Another favorite page is the one for testimonials.) For a variety of reasons, I won’t undertake to be more precise about my current customer list than that. But I don’t think it would hurt anything to list the analytic/data warehouse DBMS/appliance specialists in the group. They are:

All of those are Monash Advantage members.

If you care about all this, you may also be interested in the rest of my standards and disclosures.

June 23, 2009

ParAccel pricing

As I noted in connection with ParAccel’s recent TPC-H filing, I think the whole exercise is basically an expensive joke. But one slightly useful spin-off is that ParAccel disclosed pricing.  Specifically, ParAccel’s stated price in the disclosure document is:

Last year ParAccel quoted prices of $100,000/TB or $50,000/server.  The latter figure would seem to have led to lower numbers on the benchmark configuration, so perhaps it’s no longer an option on ParAccel’s price list.

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.