Data warehousing

Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:

November 19, 2008

Interpreting the results of data warehouse proofs-of-concept (POCs)

When enterprises buy new brands of analytic DBMS, they almost always run proofs-of-concept (POCs) in the form of private benchmarks. The results are generally confidential, but that doesn’t keep a few stats from occasionally leaking out. As I noted recently, those leaks are problematic on multiple levels. For one thing, even if the results are to be taken as accurate and basically not-misleading, the way vendors describe them leaves a lot to be desired.

Here’s a concrete example to illustrate the point. One of my vendor clients sent over the stats from a recent POC, in which its data warehousing product was compared against a name-brand incumbent. 16 reports were run. The new product beat the old 16 out of 16 times. The lowest margin was a 1.8X speed-up, while the best was a whopping 335.5X.

My client helpfully took the “simple average” — i.e. the mean – of the 16 factors, and described this as an average 62X drubbing. But is that really fair?

Read more

November 16, 2008

When people don’t want accurate predictions made about them

In a recent article on governmental anti-terrorism data mining efforts — and the privacy risks associated with same — The Economist wrote (emphasis mine):

Abdul Bakier, a former official in Jordan’s General Intelligence Department, says that tips to foil data-mining systems are discussed at length on some extremist online forums. Tricks such as calling phone-sex hotlines can help make a profile less suspicious. “The new generation of al-Qaeda is practising all that,” he says.

Well, duh. Terrorists and fraudsters don’t want to be detected. Algorithms that rely on positive evidence of bad intent may work anyway. But if you rely on evidence that shows people are not bad actors, that’s likely to work about as well as Bayesian spam detectors.* Read more

November 15, 2008

High-performance analytics

For the past few months, I’ve collected a lot of data points to the effect that high-performance analytics – i.e., beyond straightforward query — is becoming increasingly important. And I’ve written about some of them at length. For example:

Ack. I can’t decide whether “analytics” should be a singular or plural noun. Thoughts?

Another area that’s come up which I haven‘t blogged about so much is data mining in the database. Data mining accounts for a large part of data warehouse use. The traditional way to do data mining is to extract data from the database and dump it into SAS. But there are problems with this scenario, including:

Read more

November 15, 2008

Beyond query

I sometimes describe database management systems as “big SQL interpreters,” because that’s the core of what they do. But it’s not all they do, which is why I describe them as “electronic file clerks” too. File clerks don’t just store and fetch data; they also put a lot of work into neatening, culling, and generally managing the health of their information hoards.

Already 15 years ago, online backup was as big a competitive differentiator in the database wars as any particular SQL execution feature. Security became important in some market segments. Reliability and availability have been important from the getgo. And manageability has been crucial ever since Microsoft lapped Oracle in that regard, back when SQL Server had little else to recommend it except price.*

*Before Oracle10g, the SQL Server vs. Oracle manageability gap was big.

Now data warehousing is demanding the same kinds of infrastructure richness.*

Read more

November 15, 2008

The query from hell, and other stories

I write about a lot of products whose core job boils down to Make queries run fast. Without exception, their vendors tout stories of remarkable performance gains over conventional/incumbent DBMS (reported improvement is usually at least 50-fold, and commonly 100-500+). They further claim at least 2-3X better performance than their close competitors. In making these claims, vendors usually stress that their results come from live customer benchmarks. In few if any of cases, I judge, are they lying outright. So what’s going on? Read more

October 23, 2008

Carson Schmidt of Teradata on SSDs

Carson Schmidt is, in essence, Teradata’s VP of product development for everything other than applications and database software. For example, he oversees Teradata’s hardware, storage, and switching technology. So when Teradata Chief Development Officer Scott Gnau didn’t have answers at his fingertips to some questions about SSDs (Solid-State Drives), he bucked me over to Carson. A very interesting discussion about SSDs (and other subjects) ensued.

Highlights included: Read more

October 23, 2008

How to tell Teradata’s product lines apart

Once Netezza hit the market, Teradata had a classic “disruptive” price problem – it offered a high end product, at a high price, sporting lots of features that not all customers needed or were willing to pay for. Teradata has at times slashed prices in competitive situations, but there are obvious risks to that, especially when a customer already has a number of other Teradata systems for which it paid closer to full price.

This year, Teradata has introduced a range of products that flesh out its competitive lineup. There now are three mainstream Teradata offerings, plus two with more specialized applicability. Teradata no longer has to sell Cadillacs to customers on Corolla budgets.

But how do we tell the five Teradata product lines apart? The names are confusing, both in their hardware-vendor product numbers and their data-warehousing-dogma product names, especially since in real life Teradata products’ capabilities overlap. Indeed, Teradata executives freely admit that the Teradata Data Mart Appliance 551 can run smaller data warehouses, while the Teradata Data Warehouse Appliance 2550 is positioned in large part at what Teradata quite reasonably calls data marts.

When one looks past the difficulties of naming, Teradata’s product lineup begins to make more sense. Let’s start by considering the three main Teradata products.

Read more

October 22, 2008

Update on Aster Data Systems and nCluster

I spent a few hours at Aster Data on my West Coast swing last week, which has now officially put out Version 3 of nCluster. Highlights included:

Read more

October 22, 2008

Introduction to Kickfire

I’ve spent a few hours visiting or otherwise talking with my new clients at Kickfire recently, so I think I have a better feel for their story. A few details are still missing, however, either because I didn’t get around to asking about them, or because an unexplained accident corrupted my notes (and I wasn’t even using Office 2007). Highlights include:

Read more

October 20, 2008

Coral8 proposes CEP as a BI data platform

It used to be that Coral8 and StreamBase were the two complex event/stream processing (CEP) vendors most committed to branching out beyond the super-low-latency algorithmic trading marketing. But StreamBase seems to have pulled in its horns after a management change, focusing much more on the financial market (and perhaps the defense/intelligence market as well). Aleri, Truviso, and Progress Apama, while each showing signs of branching out, don’t seem to have gone as far as Coral8 yet. And so, though it’s a small company with not all that many dozens of customers, my client Coral8 seems to be the one to look at when seeing whether CEP really is relevant to a broad range of mainstream – no pun intended – applications.

Coral8 today unveiled a new product release – the not-so-concisely named “Coral8 Engine and Portal Release 5.5” – and a new buzzphrase — “Continuous Intelligence.” The interesting part boils down to this:

Coral8 is proposing CEP — excuse me, “Continuous Intelligence” — as a data-store-equivalent for business intelligence.

This includes both operational BI (the current sweet spot) and dashboards (the part with cool, real-time-visualization demos).

Read more

October 17, 2008

Oracle notes

I spent about six hours at Oracle today — talking with Andy Mendelsohn, Ray Roccaforte, Juan Loaiza, Cetin Ozbutun, et al. — and plan to write more later. For now, let me pass along a few quick comments. Read more

October 15, 2008

Teradata’s Petabyte Power Players

As previously hinted, Teradata has now announced 4 of the 5 members of its “Petabyte Power Players” club.  These are enterprises with 1+ petabyte of data on Teradata equipment.  As is commonly the case when Teradata discusses such figures, there’s some confusion as to how they’re actually counting.  But as best I can tell, Teradata is counting: Read more

October 15, 2008

Vertica offers some more numbers

Eric Lai interviewed Dave Menninger of Vertica.  Highlights included:

October 14, 2008

Teradata Virtual Storage

One of the big features of Teradata 13.0, announced this week (Edit: and to be shipped some time in 2009), is Teradata Virtual Storage, which sounds pretty cool. So far as I can tell, Teradata Virtual Storage has two major aspects, namely: Read more

October 14, 2008

Teradata Geospatial, and datatype extensibility in general

As part of it’s 13.0 release this week, Teradata is productizing its geospatial datatype, which previously was just a downloadable library. (Edit:  More precisely, Teradata announced 13.0, which will actually be shipped some time in 2009.) What Teradata Geospatial now amounts to is:

Teradata also intends in the future to implement actual geospatial indexing; candidates include r-trees and tesselation.

Hearing this was a good wake-up call for me, because in the past I’ve conflated two issues on datatype extensibility, namely:

But as Teradata just pointed out, those two issues can indeed be separated from each other.

October 14, 2008

Quick guide to Teradata’s announcements this week

The Teradata Partners (i.e., user) conference is this week.  So there have been lots of press releases, some presentations, lots of meetings, and so on.  A lot of Teradata’s messaging is in flux, as it moves fairly rapidly to correct what I believe have been some deficiencies in the past.  One confusing result is that there was very little prebriefing about the actual announcement details, and we’re all scrambling to figure out what’s up.

Teradata does a good job of collecting its press releases at one URL.  So without linking to most of them individually, let me jump in to an overview of Teradata news this week (whether or not in actual press release format): Read more

October 11, 2008

A data warehouse pricing complication: Software vs. appliances

Juan Loaiza of Oracle disagrees with a number of my opinions. We plan to talk about some of that when I visit on Thursday, after Teradata Partners. :) But I’d like to throw one of his ideas out there right now. Juan contends that comparisons of Oracle Exadata pricing are apt to be misleading because — among other reasons — Oracle licenses can be reused on other hardware, in ways that appliance software can not. (The same reasoning would of course apply to almost everybody else except Teradata and Netezza.) Read more

October 11, 2008

Patrick Walravens’ SAP/Teradata speculation doesn’t make much sense

A persistent analyst named Patrick Walravens keeps speculating about an SAP acquisition of Teradata. So far as I can tell, Walravens is the sole source of this rumor, evidently because he actually thinks the combination would make some kind of business sense.

An example of the “logic” behind this theory is:

Mr. Walravens’s latest evidence pointing to such a move stems from the expected departure of a SAP executive who had been running the company’s NetWeaver software line, which includes a data warehouse package.

At a guess, Walravens is saying that Teradata’s products and SAP’s BI Accelerator somehow substitute for each other in the marketplace. If you believe that comparison, I’d like to sell you a railroad locomotive made by Jaguar. Read more

October 9, 2008

Aster Data on online marketing data warehousing

Aster Data’s blog is getting to be like Vertica’s, in that I find myself recommending a large fraction of its posts.

The virtue of the latest one is that it strings together several customer examples in related areas of online marketing (which is pretty much the only sector Aster has so far sold into). I’ve tended to overgeneralize a bit, and use terms like “web analytics” or “clickstream analysis” even when they don’t wholly apply. The Aster post is a good antidote to that.

October 5, 2008

Advance sound bites on the Microsoft/DATAllegro announcement

Microsoft said they’d prebrief me on at least the DATAllegro part of tomorrow’s SQL Server announcements, but that didn’t turn out to happen (at least as of 9 pm Eastern time Sunday night). An embargoed press release did just arrive, but it’s so concise and high-level as to contain almost nothing of interest.

So I might as well post sound bites in advance. Here goes:

I’m going to be pretty busy Monday anyway. Linda is having a bit of oral surgery. And if I get back from that in time, I have calls set up with a couple of clients.

October 2, 2008

History, focus, and technology of HP Neoview

On the basis of market impact to date, HP Neoview is just another data warehouse market participant – a dozen sales or so, a few systems in production, some evidence that it can handle 100 TB+ workloads, and so on. But HP’s BI Group CTO Greg Battas thinks Neoview is destined for greater things, because:

Read more

October 2, 2008

HP Neoview in the market to date

I evidently got HP’s attention by a recent post in which I questioned its stance on the relative positioning of the Exadata-based HP Oracle data warehouse appliance and the HP Neoview data warehouse appliance. A conversation with Greg Battas and John Miller (respectively CTO and CMO of HP’s BI group) quickly ensued. Mainly we talked about Neoview product goals and architecture. But before I get to that in a separate post, here are some Neoview market-presence highlights, so far as I’ve been able to figure them out:

Read more

October 1, 2008

Automatic redistribution of data warehouse data

In a recent Oracle Exadata FAQ, Kevin Closson writes:

Q. [...] don’t some of the DW vendors split the data up in a shared nothing method. Thus when the data has to be repartitioned it gets expensive. Whereas here you just add another cell and ASM goes to work in the background. (depending upon the ASM power level you set.)
A. All the DW Appliance vendors implement shared-nothing so, yes, the data is chopped up into physical partitions. If you add hardware to increase performance of queries against your current dataset the data will have to be reloaded into the new partitioning scheme. As has always been the case with ASM, adding new disks-and therefore Exadata Storage Server cells-will cause the existing data to be redistributed automatically over all (including the new) drives. This ASM data redistribution is an online function.

Hmm. That sounds much like the story I’ve heard from various other data warehousing DBMS vendors as well.

Rather than try to speak for them, however, I’ll just post this and see whether they choose to add anything to the comment thread.

October 1, 2008

Greenplum pricing

Edit: Actually, this post is completely incorrect. The $20K/terabyte is for software only. So far, my attempts to get Greenplum to estimate hardware costs have been unsuccessful.

Greenplum’s Scott Yara was recently quoted citing a $20K/terabyte figure for Greenplum pricing. That naturally raises the question:

Greenplum charges around $20K/terabyte of what?

Read more

September 30, 2008

Oracle Database Machine and Exadata pricing: Part 2

My Oracle Database Machine and Exadata pricing spreadsheet has been updated. Specifically:

Read more

Next Page →

Feed including blog about database management, data warehousing, and business intelligence Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Recent white paper

The Explosion in DBMS Choice

August, 2008

Recent webcast

What leading database vendors don't want you to know

Originally broadcast April 9, 2008

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.