Data warehouse appliances

Analysis of data warehouse appliances – i.e., of hardware/software bundles optimized for fast query and analysis of large volumes of (usually) relational data. Related subjects include:

August 18, 2008

Three happy 100 terabyte-plus customers for DATAllegro

Over on my Network World blog, I asked the question “So who are DATAllegro’s actual current customers?” As regular readers know, that’s a fairly hard question to answer. TEOCO is widely known as DATAllegro’s flagship reference, but after that the list gets thin in a hurry.

As a by-the-by to other discussions, DATAllegro Stuart Frost undertook to respond in part himself. Specifically, he gave me two names of two other happy customers that are or imminently will be running DATAllegro against 100+ terabytes of user data. Read more

August 9, 2008

Netezza update

In my usual dual role, I called Phil Francisco of Netezza to lay some post-Microsoft/DATAllegro consulting on him late on a Friday night — and then took the opportunity of being on the phone with him to get a general Netezza update. Netezza’s July quarter just ended, so they’re still in quiet period, so I didn’t press him for a lot of numerical detail. More generally, I didn’t find a lot out that wasn’t already covered in my May Netezza update. But notwithstanding all those disclaimers, it was still a pretty interesting chat.

My strongest takeaway was that Netezza sees concurrency as a significant competitive advantage. This is reflected in POCs, where Netezza guides prospects to simulate real-life mixed workloads. It also reflects the Netezza customer base. Phil says Netezza has “busy” warehouses with up to 80 terabytes of user data, with lots of busy ones in the single-digit to 20ish terabyte range. Multiple Netezza references have 100s of concurrent users, and the 1000 mark has been crossed.

Speaking of concurrency, Phil had a clear opinion of the typical Sybase IQ installation — a small reporting mart, supporting hundreds or thousands of users, but probably not a lot of ad hoc query. On the other hand, he recalls outright competing against Sybase only twice in the past year.

The vendor Netezza does see the most is, no surprise, Oracle. He put Oracle at 60ish percent, with most of the rest divided among Teradata and DB2 (only a few Microsoft SQL Server). Among the other new data warehouse specialists, Greenplum comes up the most often. (There was some confusion between “competitor” and “incumbent” in our discussion, and the sample sizes are small anyway, so fine levels of detail shouldn’t be taken too seriously.)

On the advanced analytics side, it sounds as if SAS integration akin to Teradata’s will happen sooner than any significant integration of Netezza’s own NuTech acquisition.

July 24, 2008

How will Oracle save its data warehouse business?

By acquiring DATAllegro, Microsoft has seriously leapfrogged Oracle in data warehouse technology. All doubts about maturity and versatility notwithstanding, DATAllegro has a 10X or better size advantage (actually, I think it’s more like 20-40X) versus Oracle in warehouses its technology can straightforwardly handle. Oracle cannot afford to let this move go unanswered.

It’s of course possible that Oracle has been successfully developing comparable data warehouse technology internally. But it’s unlikely. Oracle hasn’t done anything that radical, internally and successfully, for about 15 years, RAC (Real Application Clusters) excepted. (I.e., since the object/relational extensibility framework started in Release 7.) So in all likelihood, the answer will come via acquisition. I think there are four candidates that make the most sense: Teradata, Vertica, ParAccel, and Greenplum. Kognitio (controlled by former Oracle honcho Geoff Squire) might be in the mix as well. Netezza is probably a non-starter because of its hardware-centric strategy.

Here’s why I’m emphasizing Teradata, Vertica, ParAccel, and Greenplum:

Read more

July 24, 2008

Microsoft is buying DATAllegro

I’ve long argued that:

Microsoft has now validated my claim by agreeing to buy DATAllegro. As you probably know, we’ve been covering DATAllegro extensively, as per the links listed below.

Basic deal highlights include:

Read more

July 3, 2008

Declaration of Data Independence (humor)

The data warehouse appliance industry has a well-developed funny bone. Dataupia’s contribution is a Declaration of Data Independence, which begins:

When in the Course of an increasingly competitive global economy it becomes necessary for one data set to dissolve its connections to a constraining environment, the separate but inherently unequal station to which the Laws of Whose budget is larger prevails.

Related links:

June 28, 2008

Oracle Optimized Warehouse Initiative

Oracle’s response to data warehouse appliances — and to IBM’s BCUs (Balanced Configuration Units) — so far is the Oracle Optimized Warehouse Initiative (OOW, not to be confused with Oracle Open World). A small amount of information about Oracle Optimized Warehouse can be found on Oracle’s website. Another small amount can be found in this recent long and breathless TDWI article, full of such brilliancies as attributing to the data warehouse appliance vendors the “claim that relational databases simply aren’t cut out for analytic workloads.” (Uh, what does he think they’re running — CODASYL DBMS?)

So far as I can tell, what Oracle Optimized Warehouse — much like IBM’s BCU — boils down to is the same old Oracle DBMS, but with recommended hardware configuration and tuning parameters. Thus, a lot of the hassle is taken out of ordering and installing an Oracle data warehouse, which is surely a good thing. But I doubt it does much to solve Oracle’s problems with price, price/performance, or the inevitable DBA hassles derived from a poorly-performing DBMS.

May 24, 2008

DATAllegro on compression

DATAllegro CEO Stuart Frost has been blogging quite a bit recently (and not before time!). A couple of his posts have touched on compression. In one he gave actual numbers for compression, namely:

DATAllegro compresses between 2:1 and 6:1 depending on the content of the rows, whereas column-oriented systems claim 4:1 to 10:1.

In another recent post, Stuart touched on architecture, saying:

Due to the way our compression code works, DATAllegro’s current products are optimized for performance under heavy concurrency. The end result is that we don’t use the full power of the platform when running one query at a time.

Read more

May 23, 2008

Data warehouse appliance power user TEOCO

If you had to name super-high-end users of data warehouse technology, your list might start with a few retailers, credit data processors, and telcos, plus the US intelligence establishment. Well, it turns out that TEOCO runs outsourced data warehouses for several of the top US telcos, making it one of the top data warehouse technology users around.

A few weeks ago, I had a fascinating chat with John Devolites of TEOCO. Highlights included:

May 22, 2008

Netezza on compression

Phil Francisco put up a nice post on Netezza’s company blog about a month ago, explaining the Netezza compression story. Highlights include:

Read more

May 20, 2008

Netezza has an EMC deal too

Netezza has an EMC deal too. As befits a hardware vendor, Netezza has an actual OEM relationship with EMC, in which it is offering CLARiiONs built straight into NPS appliances. 5 TB of CLARiiON will be free in any Netezza system from 2 racks on upward. (A rack holds about 12.5 TB.) In addition, you’ll be able to buy 10 TB more of CLARiiON in every Netezza rack, if you want. The whole thing is supposed to ship before year-end. Read more

May 19, 2008

Netezza, enterprise data warehouses, and the 100 terabyte mark

Phil Francisco of Netezza checked in tonight with some news that will be embargoed for a few hours. While I had him on the phone anyway, I asked him about large databases and/or enterprise data warehouses. Highlights included:

May 19, 2008

ParAccel unveils its EMC-related appliance strategy

Embargoes are getting ever more stupid these days, wasting analysts’ and bloggers’ time in doomed attempts to micromanage the news flow. ParAccel is no exception to the rule. An announcement that’s actually been public knowledge for a couple of months was finally made official a few minutes ago. It’s an appliance, or at least an attempt to gain customers for an appliance. The core ideas include:

April 25, 2008

Yet another data warehouse database and appliance overview

For a recent project, it seemed best to recapitulate my thoughts on the overall data warehouse specialty DBMS and appliance marketplace. While what resulted is highly redundant with what I’ve posted in this blog before, I’m sharing anyway, in case somebody finds this integrated presentation more useful. The original is excerpted to remove confidential parts.

… This is a crowded market, with a lot of subsegments, and blurry, shifting borders among the subsegments.

Everybody starts out selling consumer marketing and telecom call-detail-record apps. …

Oracle and similar products are optimized for updates above everything else. That is, short rows of data are banged into tables. The main indexing scheme is the “b-tree,” which is optimized for finding specific rows of data as needed, and also for being updated quickly in lockstep with updates to the data itself.

By way of contrast, an analytic DBMS is optimized for some or all of:

Database and/or DBMS design techniques that have been applied to analytic uses include:

Read more

April 21, 2008

DATAllegro finally has a blog

It took a lot of patient nagging, but DATAllegro finally has a blog. Based on the first post, I predict:

The crunchiest part of the first post is probably

Another very important aspect of performance is ensuring sequential reads under a complex workload. Traditional databases do not do a good job in this area - even though some of the management tools might tell you that they are! What we typically see is that the combination of RAID arrays and intervening storage infrastructure conspires to break even large reads by the database into very small reads against each disk. The end result is that most large DW installations have very large arrays of expensive, high-speed disks behind them - and still suffer from poor performance.

I’ve pounded the table about sequential reads multiple times — including in a (DATAllegro-sponsored) white paper — but the point about misleading management tools is new to me.

Now if I could just get a production DATAllegro reference, I’d be completely happy …

April 21, 2008

Netezza pricing

In connection with the announcement of the Teradata 2500, I asked some Teradata competitors about pricing. Netezza’s response amounted to “We don’t disclose list pricing, but our cheapest system handles about 3 1/4 TB and sells for under $200K.” So Netezza’s actual pricing is well below the list price of the Teradata 2500.

April 21, 2008

Teradata introduces lower-cost appliances

After months of leaks, Teradata has unveiled its new lines of data warehouse appliances, raising the total number either from 1 to 3 (my view) or 0 to 2 (what you believe if you think Teradata wasn’t previously an appliance vendor). Most significant is the new Teradata 2500 series, meant to compete directly with the smaller data warehouse specialists. Highlights include:

Read more

April 18, 2008

Kickfire kicks off

I chatted with Raj Cherabuddi and others on the Kickfire (formerly C2) team for over an hour on Monday, and now have a better sense of their story. There are some very basic questions I still don’t have answers to; I’ll fill those in when I can.

Highlights of what I have and haven’t figured out so far include:

*Somebody – perhaps adman extraordinaire Rick Bennett? — may want to check my memory on this, but I think Oracle’s famed “Gentlemen, start your snails” ad in the early 1990s was about PC World tests, not TPCs. Oracle also had an ad about WW1-style planes nosediving, but I don’t think those referenced TPCs either.

April 8, 2008

Kickfire is de-cloaking

Kickfire, the renamed C2, is doing one of those buzz-building rollouts in which they make sure the first word comes from people on their payroll golly-gee-whizzing. You can see those at Xarpb and Diamond Notes, as well as a forthcoming article in MySQL magazine. Farhan Mashraqi also appears to be involved. Kickfire is also sponsoring the MySQL user conference next week.

I plan to write more after I get some substance, but a few things seem clear:

1. Kickfire’s product is an appliance that functions as a MySQL storage engine.
2. There’s a custom chip involved.
3. Kickfire plans to throw around the “stream processing” buzzphrase a lot.

Now, “stream processing” means a lot of different things to different people. E.g., Netezza uses the phrase just because their FPGA throws away a lot of data before ever routing it to more conventional SQL processing. But pending a briefing, I’m guessing that Kickfire’s sense is similar to what underlies the case for using CEP in BI.

Edit: Here’s an update after an actual Kickfire briefing.

Please subscribe to our feed!

April 5, 2008

Positioning the data warehouse appliances and specialty DBMS

There now are four hardware vendors that each offer or seem about to announce two different tiers of data warehouse appliances: Sun, HP, EMC, and Teradata. Specifically:

Read more

April 5, 2008

EMC is partnering with ParAccel

A talk about a ParAccel/EMC partnership has been promised for a forthcoming EMC user conference. Otherwise, ParAccel is exposing no useful information on the matter.*

*So what else is new?

The talk is called Highly Scalable Analytic Appliance Powered by EMC and ParAccel, and the abstract says: Read more

April 1, 2008

Netezza’s April Fool press release

Short and cute. Even makes a genuine marketing point (low power consumption), and ties into past marketing gimmicks (they’ve played Pimp My SPU in the past, with dramatic paint jobs).

Netezza Corporation (NYSE Arca: NZ), the global leader in data warehouse and analytic appliances, today introduced a limited-edition range of its award-winning Netezza system. Expected to become an instant industry collectible, the systems can now be purchased in a variety of color finishes – pink, blue, red or silver. The standard gun-metal gray unit will continue to be the default option for orders requiring eight or more units, to ensure availability.

Affectionately known as ‘the Netezza’ by customers and partners, the systems not only offer unparalleled processing performance, but the secret sauce of its innovative design is also leading the way in effective power and cooling management – making it a truly green option for any data center.

Not earth-shaking — even if it purports to be earth-saving — but unless I’ve overlooked a biggie, there isn’t much competition this rather lame April Fool’s year.

March 28, 2008

Disruption versus chasm crossing in the database market

The 451 Group just released a report on open source DBMS adoption. In a blog post announcing same, Matthew Aslett wrote (emphasis mine):

you only have to look at the comparative revenues of the open source and proprietary vendors to see that there is a vast chasm to be crossed.

“Chasm” memes were introduced by Geoffrey Moore, founder of the Chasm Group and author of Crossing the Chasm. His defining example was Oracle, and the database market in general. The core insight was that platform markets get to tipping points, after which the leaders have tremendous advantages that make them tend to remain leaders for a good long time.

The sequel to “chasm” theory is Clayton Christensen’s “disruption” rubric, popularized in The Innovator’s Dilemma. I’ve argued previously that the DBMS market is being disrupted, in both the ways that Christensen records: Read more

March 14, 2008

Data warehousing with paper clips and duct tape

An interesting part of my conversation with Dataupia’s CTO John O’Brien came when we talked about data warehousing in general. On the one hand, he endorsed the view that using Oracle probably isn’t a good idea for data warehouses larger than 10 terabytes, with SQL Server’s limit being well below that. On the other hand, he said he’d helped build 50-60 terabyte warehouses in Oracle years ago.

The point is that to build warehouses that big in Oracle or other traditional DBMS, you have to pull out a large bag of tricks. Read more

March 14, 2008

Dataupia catch-up

I had a catch-up phone meeting with Dataupia, since I hadn’t spoke with the company since the middle of last year. Like several other companies in the data warehouse specialist market, Dataupia can be annoyingly secretive. On the plus side – and this is very refreshing — Dataupia doesn’t seem to expect credit for accomplishments beyond those they’re willing to provide actual evidence for.

What I’ve gleaned about Dataupia’s customer activity to date amounts to: Read more

February 26, 2008

The biggest eBay database

There’s been some confusion over my post about eBay’s multiple petabytes of data. So to clarify, let me say:

Please subscribe to our feed!

Next Page →

Feed including blog about database management, data warehousing, and business intelligence Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Recent white paper

Pervasive PSQL Summit v10 Highlights

September, 2007

Recent webcast

What leading database vendors don't want you to know

Originally broadcast April 9, 2008

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.