Data warehousing

Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:

September 2, 2008

Introduction to Aster Data and nCluster

I’ve been writing a lot about Greenplum since a recent visit. But on the same trip I met with Aster Data, and have talked with them further since. Let me now redress the balance and outline some highlights of the Aster Data story.

Read more

September 1, 2008

Estimating user data vs. spinning disk

There’s a lot of confusion about how to measure data warehouse database size. Major complicating factors include:

Greenplum’s CTO Luke Lonergan recently walked me through the general disk usage arithmetic for Greenplum’s most common configuration (Sun Thors*, configured to Raid 10). I found it pretty interesting, and a good guide to factors that also affect other systems, from other vendors.

Read more

September 1, 2008

Yes, but what are the Very Biggest benefits of MapReduce?

On behalf of On-Demand Enterprise, nee’ Grid Today, Dennis Barker asked me to clarify the most important benefits, features, etc. to various constituencies (business users, programmers, DBAs, etc.) of the Greenplum and Aster Data MapReduce announcements. Questions like that are hard to answer simply. Here’s why.

The core benefit of MapReduce is price/performance (because it allows the cost benefits of parallelization to be applied to analyses that are hard to parallelize otherwise). Large price/performance gains commonly mix together three kinds of benefits.

1. They let you do what you did before, for less money.
2. They let you do a better version of what you did before, for similar money.
3. They let you do new things that didn’t make economic sense before, but now do.
Read more

August 30, 2008

Are analytic DBMS vendors overcomplicating their interconnect architectures?

I don’t usually spend a lot of time researching Ethernet switches. But I do think a lot about high-end data warehousing, and as I noted back in July, networking performance is a big challenge there. Among the very-large-scale MPP data warehouse software vendors, Greenplum is unusual in that its interconnect of choice is (sufficiently many) cheap 1 gigabit Ethernet switches.

A recent Network World story suggested that Greenplum wasn’t alone in this preference; other people also feel that clusters of commodity 1 gigabit Ethernet switches can be superior to higher-performing ones. So I pinged CTO Luke Lonergan of Greenplum for more comment. His response, which I got permission to publish, was: Read more

August 29, 2008

Sales figures for analytic DBMS

One of my clients asked how many new customers I thought were buying analytic DBMS each quarter. I don’t generally track such things, but hey — a client asked, so I did the best I could. And since I did the work, now I’ll share it generally. To wit:
Read more

August 29, 2008

Enterprises are buying multiple brands of analytic DBMS each

Over the past few weeks I’ve had a lot of NDA discussions about analytic DBMS vendors’ specific customers. And so I’ve been acutely aware of something I already sort of knew — just as there was in prior generations of database management technology, there’s huge overlap among analytic DBMS vendors’ customer bases as well. As they always have, enterprises are investing in multiple different brands of DBMS, even in cases where those DBMS can do pretty much the same things.

For example:

August 26, 2008

Vertica’s paying customer count

In a recent Computerworld article, Andy Ellicott of Vertica was cited as saying Vertica has 50 paying customers total. That’s very much on par with Greenplum’s figure, leaving aside any questions of deal size. (Greenplum runs a number of databases much larger than Vertica’s biggest. However, I believe Greenplum also charges a lot less per terabyte of user data.)

Previous Vertica paying customer count figures include:

August 26, 2008

Three approaches to parallelizing data transformation

Many MPP data warehousing vendors have told me their products are used for ELT (Extract/Load/Transform) instead of ETL (Extract/Transform/Load). I.e., needed data transformations are done on the MPP system, rather than on the — probably SMP — system the data comes from.* If the data transformation is being applied on a record-by-record basis, then it’s automatically fully parallelized. Even if the transforms are more complex, considerable parallel processing may still be going on.

*Or it’s some of each, at which point it’s called ETLT — I bet you can work out what that stands for.

Read more

August 26, 2008

Why MapReduce matters to SQL data warehousing

Greenplum and Aster Data have both just announced the integration of MapReduce into their SQL MPP data warehouse products. So why do I think this could be a big deal? The short answer is “Because MapReduce offers dramatic performance gains in analytic application areas that still need great performance speed-up.” The long answer goes something like this.

The core ideas of MapReduce are: Read more

August 25, 2008

Greenplum’s single biggest customer

Greenplum offered a bit of clarification regarding the usage figures I posted last night. Everything on the list is in production, except that:

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.