July 11, 2009

Groovy Corp

Groovy Corp sent over a press release and apparently suggested I write about the company’s wonderfulness immediately. This was without any kind of briefing. I don’t do that kind of thing.

However, a Twitter check revealed that Tony Bain is familiar with Groovy Corp and the Groovy SQL Switch (apparently they started out in Australia, where he lives and works, and he evidently knows the guys).  Tony’s take, in summary, is (emphasis mine):

There’s a little more detail at the above link.

July 9, 2009

Oracle cites Exadata wins

A couple of weeks ago, Oracle put out a press release about Exadata wins.  Highlights include:

Unless I missed it, none of the quotes implied Exadata was actually in production, and none compared hardware between the old/slow/production and Exadata/fast/test systems.

July 8, 2009

While I’m venting about benchmarks

Late last year, Vertica made hoo-hah about what it called a world-record data warehouse load speed benchmark.  I wrote at the time that this showed Vertica wasn’t painfully slow at loading, always a concern with column stores. But otherwise I mocked the idea that there was something useful to be learned from the whole exercise.

Well, guess what?  In a throwaway line in a comment on Daniel Abadi’s blog, Barry Zane of ParAccel pointed out

we posted a load rate of almost 9TB/hour, which is, of course record breaking on its own

Quite right.

I hope the nonsense stops there, but I’m not optimistic …

July 8, 2009

Progress in figuring out what ParAccel is doing

Barry Zane of ParAccel has — finally! — started a blog.  Barrry’s first post, probably in connection with ParAccel’s recent TPC-H submission and subsequent brouhaha, consisted mainly of metaphor + very elementary and well-known arguments for column stores. Barry’s second post, however, was in direct response to Daniel Abadi’s speculation about ParAccel’s architecture.  That post also promises a follow-up addressing the TPC-H in a more substantive way.

(Edit: As of October, 2010, those links have been redirected away from the original posts, which seem to have been taken down.)

Barry’s points include:

Also in the post, Barry:

Anyhow, it’s great to see ParAccel retreating from its obsessive secrecy, which in my opinion has been even worse than Netezza’s used to be.

July 8, 2009

Infobright metrics

Merv Adrian posted about Infobright, and included some company-supplied metrics. Most looked familiar from a post I did in April, but Infobright’s latest figure for # of paying customers seems to be “>60”, up from “>50”. Pricing aside, that’s Vertica/Greenplum territory — behind Netezza, Teradata, and the big OLTP DBMS vendors, but ahead of everybody else I think of as a modern analytic DBMS vendor.

July 7, 2009

Hasso Plattner calls for in-memory OLTP column stores

Former SAP CEO Hasso Plattner has written a paper called A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database, in association with a SIGMOD keynote address.* The approach Plattner advocates is an MPP in-memory column store, presumably somewhat akin to SAP’s frequently renamed Business Warehouse Accelerator/Business Intelligence Accelerator/BWA/BIA/Son-of-TREX technology. There also are strong similarities to the MPP in-memory row store project H-Store/VoltDB, although I don’t know whether Plattner would go so far as to adopt the H-Store view that all transactions should run in stored procedures. Unsurprisingly, SAP applications are used as the OLTP paradigm throughout.

*Thanks to Dave Kellogg for tipping me off to Plattner’s paper. I only went to two SIGMOD sessions, neither of which was Plattner’s. Nobody actually mentioned Plattner’s talk to me when I was down at SIGMOD.

Perhaps the most interesting part is Plattner’s claim that what’s demanding about OLTP isn’t database updating per se, but rather maintaining aggregates for quick-response analytics. In his main example of that point, Plattner proposes a real-life “more than 18” table schema, of which 2 are base tables, and (most of?) the rest are materialized views that his proposed database architecture dispenses with (because analytic performance is sufficiently good without them). Thus, Plattner’s core columnar argument seemingly is

columnar –> natively fast analytics –> no need to maintain aggregates –> much lower update burden.

That said — if Plattner’s paper contained a clear statement of how much more expensive it is to insert or update a single row in a columnar vs. row-based system, I overlooked it. Instead, Plattner seems to be arguing that the volume of base-table updates is low enough that — whatever it may be — column-store update overhead is an acceptable price to pay.  (At one point he claims that only 5% of the data inserted in a financial application ever gets changed.) That may actually be true in a financial accounting system, but seems more questionable in a sufficiently large application that gets its updates from automatic devices, or from the consumer web.

Other highlights include: Read more

July 7, 2009

Daniel Abadi has a theory about ParAccel

When I was at SIGMOD last week, ParAccel and its SIGMOD talk were mentioned several times, always in puzzled and at least slightly unflattering terms.  (Typical comment: “Why did they present a paper about that? We were doing the same thing in our company years ago.”) That doesn’t prove much per se, since most of the mentions were by competitors and/or Vertica-affiliated academics, and since my own unflattering ParAccel-related comments were rather fresh at the time.

But now Daniel Abadi has done a brilliant, detailed, speculative analysis of ParAccel’s publications.  Here’s the meat, emphasis mine: Read more

July 6, 2009

Yahoo is up to 10 petabytes now?

According to somebody (I forget who) who attended Yahoo’s SIGMOD presentation last week, the big Yahoo database is now up to 10 petabytes in size, in line with Yahoo’s predictions last year.  Apparently, Yahoo also gave more details of how the technology works.

July 2, 2009

User data vs. raw disk space as a marketing metric

I tried to post a comment on Daniel Abadi’s blog, but doing so seems to require some sort of registration process, so I’m posting here instead.

In a comment to his post on node scalability, Daniel Abadi argued that disk space is a better metric to use in marketing than (presumably compressed) user data.  Well, I imagine he didn’t quite mean to say that, but that’s actually what he wound up saying, starting from the accurate observation that compression ratios vary wildly from one data set to another, even more than they vary from product to product on the same data.

Nonetheless, I favor user data as a metric because:

July 2, 2009

The TPC-H schema

Would anybody recommend in real life running the TPC-H schema for that data? (I.e., fully normalized, no materialized views.) If so — why????

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.