Data warehousing

Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:

August 26, 2008

Vertica’s paying customer count

In a recent Computerworld article, Andy Ellicott of Vertica was cited as saying Vertica has 50 paying customers total. That’s very much on par with Greenplum’s figure, leaving aside any questions of deal size. (Greenplum runs a number of databases much larger than Vertica’s biggest. However, I believe Greenplum also charges a lot less per terabyte of user data.)

Previous Vertica paying customer count figures include:

August 26, 2008

Three approaches to parallelizing data transformation

Many MPP data warehousing vendors have told me their products are used for ELT (Extract/Load/Transform) instead of ETL (Extract/Transform/Load). I.e., needed data transformations are done on the MPP system, rather than on the — probably SMP — system the data comes from.* If the data transformation is being applied on a record-by-record basis, then it’s automatically fully parallelized. Even if the transforms are more complex, considerable parallel processing may still be going on.

*Or it’s some of each, at which point it’s called ETLT — I bet you can work out what that stands for.

Read more

August 26, 2008

Why MapReduce matters to SQL data warehousing

Greenplum and Aster Data have both just announced the integration of MapReduce into their SQL MPP data warehouse products. So why do I think this could be a big deal? The short answer is “Because MapReduce offers dramatic performance gains in analytic application areas that still need great performance speed-up.” The long answer goes something like this.

The core ideas of MapReduce are:

Read more

August 25, 2008

Greenplum’s single biggest customer

Greenplum offered a bit of clarification regarding the usage figures I posted last night. Everything on the list is in production, except that:

August 25, 2008

Greenplum is in the big leagues

After a March, 2007 call, I didn’t talk again with Greenplum until earlier this month. That changed fast. I flew out to see Greenplum last week and spent over a day with president/co-founder Scott Yara, CTO/co-founder Luke Lonergan, marketing VP Paul Salazar, and product management/marketing director Ben Werther. Highlights – besides some really great sushi at Sakae – start with an eye-opening set of customer proof points, such as: Read more

August 24, 2008

My current customer list among the data warehouse specialists

One of my favorite pages on the Monash Research website is the list of many current and a few notable past customers. (Another favorite page is the one for testimonials.) For a variety of reasons, I won’t undertake to be more precise about my current customer list than that. But I don’t think it would hurt anything to list the data warehouse DBMS/appliance specialists in the group. They are:

All of those are Monash Advantage members.

If you care about all this, you may also be interested in the rest of my standards and disclosures.

August 20, 2008

Kevin Closson doesn’t like MPP

Kevin Closson of Oracle offers a long criticism of the popularity of MPP. Key takeaways include:

August 18, 2008

Three happy 100 terabyte-plus customers for DATAllegro

Over on my Network World blog, I asked the question “So who are DATAllegro’s actual current customers?” As regular readers know, that’s a fairly hard question to answer. TEOCO is widely known as DATAllegro’s flagship reference, but after that the list gets thin in a hurry.

As a by-the-by to other discussions, DATAllegro Stuart Frost undertook to respond in part himself. Specifically, he gave me two names of two other happy customers that are or imminently will be running DATAllegro against 100+ terabytes of user data. Read more

August 16, 2008

Exasol technical briefing

It took 5 ½ months after my non-technical introduction, but I finally got a briefing from Exasol’s technical folks (specifically, the very helpful Mathias Golombek and Carsten Weidmann). Here are some highlights: Read more

August 14, 2008

Patent nonsense in the data warehouse DBMS market

There are two recent patent lawsuits in the data warehouse DBMS market. In one, Sybase is suing Vertica. In another, an individual named Cary Jardin (techie founder of XPrime, a sort of predecessor company to ParAccel) is suing DATAllegro. Naturally, there’s press coverage of the DATAllegro case, due in part to its surely non-coincidental timing right after the Microsoft acquisition was announced and in part to a vigorous PR campaign around it. And the Sybase case so excited a troll who calls himself Bill Walters that he posted identical references to it on about 12 different threads in this blog, as well as to a variety of Vertica-related articles in the online trade press. But I think it’s very unlikely that any of these cases turn out to much matter. Read more

August 12, 2008

Compare/constrast of Vertica, ParAccel, and Exasol

I talked with Exasol today – at 5:00 am! — and of course want to blog about it. For clarity, I’d like to start by comparing/contrasting the fundamental data structures at Vertica, ParAccel, and Exasol. And it feels like that should be a separate post. So here goes.

Beyond the above, I plan to discuss in a separate post how Exasol does MPP shared-nothing software-only columnar data warehouse database management differently than Vertica and ParAccel do shared-nothing software-only columnar data warehouse database management. :)

August 9, 2008

Netezza update

In my usual dual role, I called Phil Francisco of Netezza to lay some post-Microsoft/DATAllegro consulting on him late on a Friday night — and then took the opportunity of being on the phone with him to get a general Netezza update. Netezza’s July quarter just ended, so they’re still in quiet period, so I didn’t press him for a lot of numerical detail. More generally, I didn’t find a lot out that wasn’t already covered in my May Netezza update. But notwithstanding all those disclaimers, it was still a pretty interesting chat.

My strongest takeaway was that Netezza sees concurrency as a significant competitive advantage. This is reflected in POCs, where Netezza guides prospects to simulate real-life mixed workloads. It also reflects the Netezza customer base. Phil says Netezza has “busy” warehouses with up to 80 terabytes of user data, with lots of busy ones in the single-digit to 20ish terabyte range. Multiple Netezza references have 100s of concurrent users, and the 1000 mark has been crossed.

Speaking of concurrency, Phil had a clear opinion of the typical Sybase IQ installation — a small reporting mart, supporting hundreds or thousands of users, but probably not a lot of ad hoc query. On the other hand, he recalls outright competing against Sybase only twice in the past year.

The vendor Netezza does see the most is, no surprise, Oracle. He put Oracle at 60ish percent, with most of the rest divided among Teradata and DB2 (only a few Microsoft SQL Server). Among the other new data warehouse specialists, Greenplum comes up the most often. (There was some confusion between “competitor” and “incumbent” in our discussion, and the sample sizes are small anyway, so fine levels of detail shouldn’t be taken too seriously.)

On the advanced analytics side, it sounds as if SAS integration akin to Teradata’s will happen sooner than any significant integration of Netezza’s own NuTech acquisition.

August 8, 2008

Database compression coming to the fore

I’ve posted extensively about data-warehouse-focused DBMS’ compression, which can be a major part of their value proposition. Most notable, perhaps, is a short paper Mike Stonebraker wrote for this blog — before he and his fellow researchers started their own blog — on column-stores’ advantages in compression over row stores. Compression has long been a big part of the DATAllegro story, while Netezza got into the compression game just recently. Part of Teradata’s pricing disadvantage may stem from weak compression results. And so on.

Well, the general-purpose DBMS vendors are working busily at compression too. Microsoft SQL Server 2008 exploits compression in several ways (basic data storage, replication/log shipping, backup). And Oracle offers compression too, as per this extensive writeup by Don Burleson.

If I had to sum up what we do and don’t know about database compression, I guess I’d start with this:

Compression is one of the most important features a database management system can have, since it creates large savings in storage and sometimes non-trivial gains in performance as well. Hence, it should be a key item in any DBMS purchase decision.

August 6, 2008

Column stores vs. vertically-partitioned row stores

Daniel Abadi and Sam Madden followed up their post on column stores vs. fully-indexed row stores with one about column stores vs. vertically-partitioned row stores. Once again, the apparently useful way to set up the row-store database backfired badly.* Read more

July 25, 2008

Further thoughts on DATAllegro/Microsoft

My first, biggest thought about DATAllegro’s acquisition by Microsoft is “Why the ____ did it have to happen while I was trying to relax on my annual Cayman vacation???” Not coincidentally, I don’t plan to neatly cross-link all my posts and so on about DATAllegro/Microsoft until I get back to Acton this weekend.

One linking screwup is that I previously forgot to mention that — in addition to the numerous posts here — I also made several DATAllegro/Microsoft-related posts on my Network World blog A World of Bytes.  They include: Read more

July 24, 2008

Other early coverage of Microsoft/DATAllegro

July 24, 2008

DATAllegro could provide Microsoft with a true enterprise data warehouse sooner than you think

Jim Ericson of DM Review emailed the excellent questions:

Does DATAllegro give MSFT full-service high end data warehousing capability? If not, what is missing?

My quick answers are:

Both are largely a matter of product maturity, and as a young company DATAllegro isn’t quite there yet.

That said, integration with Microsoft SQL Server is apt to be a big help in addressing both issues.

Read more

July 24, 2008

The data warehouse DBMS consolidation has begun

There are, or soon will be, a number of strong players in the market for data warehouse specialty DBMS.

That doesn’t leave a lot of room for other players.

Read more

July 24, 2008

How will Oracle save its data warehouse business?

By acquiring DATAllegro, Microsoft has seriously leapfrogged Oracle in data warehouse technology. All doubts about maturity and versatility notwithstanding, DATAllegro has a 10X or better size advantage (actually, I think it’s more like 20-40X) versus Oracle in warehouses its technology can straightforwardly handle. Oracle cannot afford to let this move go unanswered.

It’s of course possible that Oracle has been successfully developing comparable data warehouse technology internally. But it’s unlikely. Oracle hasn’t done anything that radical, internally and successfully, for about 15 years, RAC (Real Application Clusters) excepted. (I.e., since the object/relational extensibility framework started in Release 7.) So in all likelihood, the answer will come via acquisition. I think there are four candidates that make the most sense: Teradata, Vertica, ParAccel, and Greenplum. Kognitio (controlled by former Oracle honcho Geoff Squire) might be in the mix as well. Netezza is probably a non-starter because of its hardware-centric strategy.

Here’s why I’m emphasizing Teradata, Vertica, ParAccel, and Greenplum:

Read more

July 24, 2008

Microsoft is buying DATAllegro

I’ve long argued that:

Microsoft has now validated my claim by agreeing to buy DATAllegro. As you probably know, we’ve been covering DATAllegro extensively, as per the links listed below.

Basic deal highlights include:

Read more

July 24, 2008

Long, confused overview of data warehouse DBMS vendors

Steven Swoyer has an article for Enterprise Systems that covers a lot of issues in data warehouse technology. Unfortunately, however, it doesn’t always cover them correctly. E.g., he seems to imply that columnar architectures aren’t relational.  (Oops.)  I wouldn’t put too much credence in the other market segmentations he posits either.

Some of his theses, however, are basically correct.  E.g., he points out that demand for fast, cost-effective, (almost) unconstrained ad hoc queries keeps growing, and that much of the recent innovation is concerned with supplying them.

July 3, 2008

Declaration of Data Independence (humor)

The data warehouse appliance industry has a well-developed funny bone. Dataupia’s contribution is a Declaration of Data Independence, which begins:

When in the Course of an increasingly competitive global economy it becomes necessary for one data set to dissolve its connections to a constraining environment, the separate but inherently unequal station to which the Laws of Whose budget is larger prevails.

Related links:

July 3, 2008

Three cartoons from DATAllegro

DATAllegro Cartoon demanding
DATAllegro Cartoon forever
DATAllegro Cartoon gerbils

Related links:

July 1, 2008

The IRS data warehouse

According to a recent Eric Lai Computerworld story and a 2006 Sybase.com success story,

I can’t entirely reconcile those numbers, but in any case the database sounds plenty big.

Computerworld also said:

the research division also uses Microsoft Corp.’s SQL Server to store all of the metadata for the data warehouse and the rest of the agency. Managing and cleaning all of that metadata — 10,000 labels for 150 databases — is a huge task in itself,

July 1, 2008

Jerry Held on cloud data warehousing and how business intelligence will be transformed by it

Vertica Chairman Jerry Held has a pair of blog posts on analytics and data warehousing in the cloud. The first lays out a number of potential benefits and consequences of cloud data warehousing, under the heading of “Transforming BI”: Read more

Next Page →

Feed including blog about database management, data warehousing, and business intelligence Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Recent white paper

The Explosion in DBMS Choice

August, 2008

Recent webcast

What leading database vendors don't want you to know

Originally broadcast April 9, 2008

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.