Theory and architecture

Analysis of design choices in databases and database management systems. Related subjects include:

January 22, 2007

Are row-oriented RDBMS obsolete?

If Mike Stonebraker is to be believed, the era of columnar data stores is upon us.

Whether or not you buy completely into Mike’s claims, there certainly are cool ideas in his latest columnar offering, from startup Vertica Systems. The Vertica corporate site offers little detail, but Mike tells me that the product’s architecture closely resembles that of C-Store, which is described in this November, 2005 paper.

The core ideas behind Vertica’s product are as follows. Read more

January 22, 2007

Mike Stonebraker Blasts “One Size Fits All”

When it comes to DBMS inventors, Mike Stonebraker is the next closest thing to Codd. And he’s become a huge non-believer in the idea that one DBMS architecture meets all needs.

Frankly, there isn’t much in that paper that hasn’t already been said in this blog, except for the part that is specifically relevant to one of his startups, StreamBase. Still, it’s nice to have the high-powered agreement.

More recently, the argument in that paper has been extended with a benchmark-filled follow-up based on another Stonebraker startup, Vertica.

November 11, 2006

Federation in the MySQL empire

Marten Micklos, CEO of MySQL, gave a recent speech speculating about a big federated “database in the sky,” providing all sorts of Web 2.0 benefits. Apparently, the idea isn’t at all fleshed out yet. Even so, I have a nagging suspicion he’s pointing in somewhat the wrong direction.

That’s because I think federating relational databases is a generically bad idea. You can federate sets of services, and you can generate services from relational databases – and that’s where DBMS2 (DataBase Management System Services) got its name. This is a superior approach to direct database federation, for two main reasons. (By “direct federation,” I mean some sort of structure in which there’s a giant virtual database whose schema more or less directly incorporates much of the schema of each individual database.)

Read more

September 28, 2006

Relational data warehouse Expansion (or Explosion) Ratios

One of the least understood aspects of data warehouse technology is what may be called the

Expansion Ratio = (Total disk space used, except for mirroring) / (Size of the base database).

This is similar to the explosion ratio discussed in the OLAP Report’s justly famous discussion of database explosion, but I’m going with my own terminology because I don’t want to be tied to their precise terminology, nor to their technical focus. Expansion Ratios are hotly debated, with some figures being:

I don’t have actual figures from Netezza and DATallegro, but I imagine they’d come out lower than 2X, possibly well below.

Read more

September 24, 2006

More on data warehouse architecture choices

The very name of this blog comes from the kind of “horses for courses” data store strategy implied by my recent post on different kinds of data warehouse uses. A number of other commentators have recently made similar points, although they may not agree with every detail. For example, William McKnight pretty much makes the pure DBMS2 argument, pointing out that a partially virtual warehouse is often superior to a fully centralized physical one. And Andy Hayler of Kalido says pretty much the same thing, although he strongly calls out his difference in emphasis from William’s view.

A tip of the hat to Mark Rittman for pointing me to those two and others.

September 20, 2006

SAP’s BI Accelerator

I wrote about SAP’s BI Accelerator quite a bit in my white paper on memory-centric data management, but otherwise I seem not to have posted much about it here. In essence, it’s a product that’s all RAM-based, and generally geared for multi-hundred-gigabyte data marts. The basic design is a compression-heavy column-based architecture, evolved from SAP’s text-indexing technology TREX. Like data warehouse appliances, it eschews indexing, relying instead on blazingly fast table scans.

I asked Lothar Schubert of SAP how BIA was doing in the market in its early going. This was his response:

Read more

September 20, 2006

I say “sequential”, you say …

I talked with Teradata today, and they called me on my use of the term “sequential.” Basically, if there’s any head movement for disk seeks, some computer science researchers wouldn’t call it “sequential.” I didn’t know that; I was just familiar with the less precise usage of the term in some vendors’ marketing and discussions.* OK, I’ll make up a new, more precise term instead. How about “coarse-grained”?

*And so we have another instance of Monash’s First Law of Commercial Semantics: Bad jargon drives out good.

September 20, 2006

No locks, no logs — no problem?

There’s another cool-sounding part to the Netezza story, which straddles their chips and their software: The FPGA takes over the work of assuring database consistency. If the system attempts to read and write a record at the same time, the FPGA keeps thing straight. This eliminates the need for locks — at least if you don’t care about transactional integrity — and some of the reason for logs. (I guess that in lieu of any kind of rollback/rollforward they just rely on failover to mirrored disks.)

This isn’t exactly the way one would want to do OLTP, and in general my head is shaking as I write this — but it sure seems to suffice for some rather demanding data warehouse users.

September 19, 2006

Is data warehousing now all about sequential access?

A lot of evidence is pointing to a major paradigm shift in data warehouse RDBMS, along the lines of:

Old way: Assume I/O is random; lower total execution time by improving selectivity and thus lowering the amount of I/O.

New way: Drive the amount of random I/O to near zero, and do as much sequential I/O as necessary to achieve this goal.

Examples include:

Read more

August 17, 2006

Business Objects on EIM, ETL, etc.

I chatted with some Business Objects ETL/EIM (Enterprise Information Management) folks today, in a call that was a direct response to what I heard from and posted about Informatica. The core of the Business Objects story can be summarized (albeit brutally!) like this:

Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.