Theory and architecture

Analysis of design choices in databases and database management systems. Related subjects include:

September 28, 2008

Oracle Database Machine performance and compression

Greg Rahn was kind enough to recite in his blog what Oracle has disclosed about the first Exadata testers. I don’t track hardware model details, so I don’t know how the testers’ respective current hardware environments compare to that of the Oracle Database Machine.

Each of the customers cited below received “half” an Oracle Database Machine. As I previously noted, an Oracle Database Machine holds either 14.0 or 46.2 terabytes of uncompressed data. This suggests the 220 TB customer listed below — LGR Telecommunications — got compression of a little under 10:1 for a CDR (Call Detail Record) database. By comparison, Vertica claims 8:1 compression on CDRs.

Greg also writes of POS (Point Of Sale) data being used for the demo. If you do the arithmetic on the throughput figures (13.5 vs. a little over 3), compression was a little under 4.5:1. I don’t know what other vendors claim for POS compression.

Here are the details Greg posted about the four most open Oracle Database Machine tests: Read more

September 24, 2008

Vertica finally spells out its compression claims

Omer Trajman of Vertica put up a must-read blog post spelling out detailed compression numbers, based on actual field experience (which I’d guess is from a combination of production systems and POCs):

It’s clear what Omer means by most of those categories from reading the post, but I’m a little fuzzy on what “Consumer Data” or “Marketing Analytics” comprise in his taxonomy. Anyhow, Omer’s post is a huge improvement over my recent one — based on a conversation with Omer 🙂 — which featured some far less accurate or complete compression numbers.

Omer goes on to claim that trickle-feed data is harder for rival systems to compress than it is for Vertica, and generally to claim that Vertica’s compression is typically severalfold better than that of competitive row-based systems.

September 22, 2008

Database compression is heavily affected by the kind of data

I’ve written often of how different kinds or brands of data warehouse DBMS get very different compression figures. But I haven’t focused enough on how much compression figures can vary among different kinds of data. This was really brought home to me when Vertica told me that web analytics/clickstream data can often be compressed 60X in Vertica, while at the other extreme — some kind of floating point data, whose details I forget for now — they could only do 2.5X. Edit: Vertica has now posted much more accurate versions of those numbers. Infobright’s 30X compression reference at TradeDoubler seems to be for a clickstream-type app. Greenplum’s customer getting 7.5X — high for a row-based system — is managing clickstream data and related stuff. Bottom line:

When evaluating compression ratios — especially large ones — it is wise to inquire about the nature of the data.

September 15, 2008

Infobright’s open source move has a lot of potential

Infobright announced today that it’s going full-bore into open source – specifically in the MySQL ecosystem — with the licensing approach, pricing, distribution strategy, and VC money from Sun that such a move naturally entails. I think this is a great idea, for a number of reasons: Read more

September 5, 2008

Dividing the data warehousing work among MPP nodes

I talk with lots of vendors of MPP data warehouse DBMS. I’ve now heard enough different approaches to MPP architecture that I think it might be interesting to contrast some of the alternatives.

Read more

September 3, 2008

Head to head blog debate between EMC, NetApp, and HP

Chuck Hollis of EMC started a fierce debate with a blog post on how to measure effective storage capacity. Competitors from NetApp and HP responded in often sarcastic detail in the comment thread, Hollis shot back, and the volleying continued for quite a while.

I’m not a storage maven, and I don’t understand all the details of that stuff. If you’re like me in that regard, you may find the post worth skimming just to see what some of the choices, trade-offs, and complications are in designing and measuring storage systems. Stephen Foskett’s related post is also worth a look in that regard.

My recent foray into measuring disk storage pales by comparison.

September 1, 2008

Estimating user data vs. spinning disk

There’s a lot of confusion about how to measure data warehouse database size. Major complicating factors include:

Greenplum’s CTO Luke Lonergan recently walked me through the general disk usage arithmetic for Greenplum’s most common configuration (Sun Thors*, configured to Raid 10). I found it pretty interesting, and a good guide to factors that also affect other systems, from other vendors.

Read more

August 25, 2008

Greenplum’s single biggest customer

Greenplum offered a bit of clarification regarding the usage figures I posted last night. Everything on the list is in production, except that:

August 25, 2008

Greenplum is in the big leagues

After a March, 2007 call, I didn’t talk again with Greenplum until earlier this month. That changed fast. I flew out to see Greenplum last week and spent over a day with president/co-founder Scott Yara, CTO/co-founder Luke Lonergan, marketing VP Paul Salazar, and product management/marketing director Ben Werther. Highlights – besides some really great sushi at Sakae in Burlingame – start with an eye-opening set of customer proof points, such as: Read more

August 20, 2008

The Explosion in DBMS Choice

If there’s one central theme to DBMS2, it’s that modern DBMS alternatives should in many cases be used instead of the traditional market leaders. So it was only a matter of time before somebody sponsored a white paper on that subject. The paper, sponsored by EnterpriseDB, is now posted along with my other recent white papers. Its conclusion — summarizing what kinds of database management system you should use in which circumstances — is reproduced below.

Many new applications are built on existing databases, adding new features to already-operating systems. But others are built in connection with truly new databases. And in the latter cases, it’s rare that a market-leading product is the best choice. Mid-range DBMS (for OLTP) or specialty data warehousing systems (for analytics) are usually just as capable, and much more cost-effective. Exceptions arise mainly in three kinds of cases:

Otherwise, the less costly products are typically the wiser choice. Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.