October 31, 2007

Netezza cites three warehouses over 50 terabytes

Netezza is finally making it clear that they run some largish warehouses. Their latest press release cites Catalina Marketing, Epsilon, and NYSE Euronext as having 50+ terabytes each. I checked with Netezza’s Marketing VP Ellen Rubin, and she confirmed that those are clean figures — user data, single warehouses, etc. Ellen further tells me that Netezza’s total count of warehouses that big is “significantly more” than the 3 named in the release.

Of course, this makes sense, given that Netezza’s largest box, the NPS 10800, runs 100 terabytes. And Catalina was named as having bought a 10800 in a press release back in December, 2006. Read more

Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Netezza

1 Comment

October 29, 2007

ParAccel opens the kimono slightly

Please do not rely on the parts of this post that draw a distinction between in-memory and disk-based operation. See our February 18, 2008 post about ParAccel instead. It turns out that communication with ParAccel was yet worse than I had realized.

Officially launched today at the TDWI conference, ParAccel is out to compete with Netezza. Right out of the chute, ParAccel may have surpassed Netezza in at least one area: pointlessly annoying secrecy. (In other regards I love them dearly, but that paranoia can be a real pain.) As best I can remember, here are some things about ParAccel that I both am allowed to say and find interesting:

ParAccel offers a columnar, MPP data warehouse DBMS, called the ParAccel Analytic Database.
ParAccel’s product runs in two main modes. “Maverick” is normal, stand-alone mode. “Amigo” mode amounts to a plug-compatible accelerator for Oracle or Microsoft SQL*Server. Early sales and marketing were concentrated on SQL*Server Amigo mode.
ParAccel’s product also runs in another pair of modes – in-memory and disk-based. Early sales and marketing were concentrated on in-memory mode. Hybrid memory-centric processing sounds like something for a future release.
Sun has a reseller partnership with ParAccel, focused on in-memory mode.
Sun and ParAccel published record-shattering 100 gigabyte, 300 gigabyte, and 1 terabyte TPC-H benchmarks today, based on in-memory mode. (If you’d like to throw 13 terabytes of disk at 1 terabyte of user data, running simple and repetitive queries, that benchmark might be a useful guide to your own experience. But hey – that’s a big improvement on the prior champion, who used 40 terabytes of disk. To ParAccel’s credit, they’re not pretending that this is a bigger deal than it is.)

Categories: Analytic technologies, Columnar database management, Data warehouse appliances, Data warehousing, Emulation, transparency, portability, Microsoft and SQL*Server, Oracle, ParAccel

1 Comment

October 28, 2007

Infobright responds

An InfoBright employee posted something quite reasonable-looking in response to my inaugaral post about BrightHouse. Even so, InfoBright asked if they could substitute something with a slightly different tone. I agreed. Here’s what they sent in.

Curt, thanks for the write-up and the opportunity to talk about our customer success stories. As you say, our customer story is definitely “more than zero.” We are addressing a number of critical customer issues with our unique approach to data warehousing.

Infobright currently has 5 customers – customers that have bucked the trend of throwing hardware at the problem. To be perfectly braggadocio about this, we have never lost a competitive proof of concept in which we’ve been engaged. This is accomplished with the horsepower of one box (though for redundancy customers may deploy multiple boxes with a load balancer). Read more

Categories: Analytic technologies, Columnar database management, Data warehousing, Database compression, Infobright

Dude, you stole my joke!

October 15: We know what BEA is — now it is just a matter of negotiating the price

October 25: We’ve already established what you are, now we’re just working out a price

The news in the latter is that BEA has admitted it.

Note: Of course, the original joke is so old as to be variously attributed to all of George Bernard Shaw (most credibly), Winston Churchill, and Oscar Wilde.

Categories: Application servers, Humor, Oracle

DATAllegro discloses a few numbers

Privately held DATAllegro just announced a few tidbits about financial results and suchlike for the fiscal year ended June, 2007. I sent over a few clarifying questions yesterday. Responses included:

Yes, the company experienced 330% year-over-year annual revenue growth.
The majority of DATAllegro customers have bought systems in the 25-100 terabyte range.
One system over 250 terabytes has been in production for months (surely the one I previously wrote about); a second is being installed.
DATAllegro has “about 100” employees. By way of comparison, Netezza reported 225 full-time employees for the year ended January, 2007 – which probably means as of January 31, 2007.

All told, it sounds as if DATAllegro is more than 1/3 the size of Netezza, although given its higher system size and price points I’d guess it has well under 1/3 as many customers.

Here’s a link. I’ll likely edit that to something more permament-seeming later, and generally spruce this up when I’m not so rushed.

Categories: Analytic technologies, Data warehouse appliances, Data warehousing, DATAllegro

8 Comments

October 23, 2007

Vertica — just star and snowflake schemas?

One of the longest-running technotheological disputes I know of is the one pitting flat/normalized data warehouse architectures vs. cubes, stars, and snowflake schemas. Teradata, for example, is a flagwaver for the former camp; Microstrategy is firmly in the latter. (However, that doesn’t keep lots of retailers from running Microstrategy on Teradata boxes.) Attensity (a good Teradata partner) is in the former camp; text mining rival Clarabridge (sort of a Microstrategy spinoff) is in the latter. And so on.

Vertica is clearly in the star/snowflake camp as well. I asked them about this, and Vertica’s CTO Mike Stonebraker emailed a response. I’m reproducing it below, with light edits; the emphasis is also mine. Key points include:

Almost everybody (that Vertica sees) wants stars and snowflakes, so that’s what Vertica optimizes for.
Replicating small dimension tables across nodes is great for performance.
Even so, Vertica is broadening its support for more general schemas as well.

Great question. This is something that we’ve thought a lot about and have done significant research on with large enterprise customers. … short answer is as follows:

Vertica supports star and snowflake schemas because that is the desired data structure for data warehousing. The overwhelming majority of the schemas we see are of this form, and we have highly optimized for this case. Read more

Categories: Analytic technologies, Columnar database management, Data models and architecture, Data warehousing, Theory and architecture, Vertica Systems

6 Comments

October 23, 2007

Vertica update

Vertica has been quietly selling product for three quarters and has about 50 customers.

Andy Ellicott of Vertica pointed me to the above Richard Hackathorn quote. Sadly, he asked me not to name and shame another analyst who foolishly said Vertica hadn’t “launched” yet.

But then, I understand. I’m also not going to identify the client who gave me fits by insisting on believing that nonsense, even in the face of the well-known facts that Vertica has shipping product, paying customers, and so on.

Categories: Columnar database management, Data warehousing, Vertica Systems

3 Comments

October 23, 2007

Either there’s enormous interest in EnterpriseDB and/or mid-range relational DBMS …

… or else I’m one heck of a webinar draw.

We had 364 attendees for today’s webcast with EnterpriseDB, which is a huge number for that sort of thing.

Categories: EnterpriseDB and Postgres Plus, Mid-range, Open source

1 Comment

October 22, 2007

Infobright BrightHouse — columnar, VERY compressed, simple, and related to MySQL

To a first approximation, Infobright – maker of BrightHouse — is yet another data warehouse DBMS specialist with a columnar architecture, boasting great compression and running on commodity hardware, emphasizing easy set-up, simple administration, great price-performance, and hence generally low TCO. BrightHouse isn’t actually MPP yet, but Infobright confidently promises a generally available MPP version by the end of 2008. The company says that experience shows >10:1 compression of user data is realistic – i.e., an expansion ratio that’s fractional, and indeed better than 1/10:1. Accordingly, despite the lack of shared-nothing parallelism, Infobright claims a sweet spot of 1-10 terabyte warehouses, and makes occasional references to figures up to 30 terabytes or so of user data.

BrightHouse is essentially a MySQL storage engine, and hence gets a lot of connectivity and BI tool support features from MySQL for “free.” Beyond that, Infobright’s core technical idea is to chop columns of data into 64K chunks, called data packs, and then store concise information about what’s in the packs. The more basic information is stored in data pack nodes,* one per data pack. If you’re familiar with Netezza zone maps, data pack nodes sound like zone maps on steroids. They store maximum values, minimum values, and (where meaningful) aggregates, and also encode information as to which intervals between the min and max values do or don’t contain actual data values. Read more

Categories: Analytic technologies, Columnar database management, Data warehousing, Database compression, Infobright, MySQL, Open source

2 Comments

October 22, 2007

Native XML performance, and Philip Howard on recent IBM DBMS announcements

Philip Howard went to at least one conference this month I didn’t, namely IBM’s, and wrote up some highlights. As usual, he seems to have been favorably impressed.

In one note, he says that IBM is claiming a 2-5X XML performance improvement. This is a good step, since one of my clients who evaluated such engines dismissed IBM early on for being an order of magnitude too slow. That client ultimately chose Marklogic, with Cache’ having been the only other choice to make the short list.

Speaking of IBM, I flew back from the Business Objects conference next to a guy who supports IMS. He told me that IBM has bragged of an actual new customer win for IMS within the past couple of years (a large bank in China). Read more

Categories: IBM and DB2, Intersystems and Cache', MarkLogic, Structured documents

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Netezza cites three warehouses over 50 terabytes

ParAccel opens the kimono slightly

Infobright responds

Dude, you stole my joke!

DATAllegro discloses a few numbers

Vertica — just star and snowflake schemas?

Vertica update

Either there’s enormous interest in EnterpriseDB and/or mid-range relational DBMS …

Infobright BrightHouse — columnar, VERY compressed, simple, and related to MySQL

Native XML performance, and Philip Howard on recent IBM DBMS announcements

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin