October 31, 2007

Netezza cites three warehouses over 50 terabytes

Netezza is finally making it clear that they run some largish warehouses. Their latest press release cites Catalina Marketing, Epsilon, and NYSE Euronext as having 50+ terabytes each. I checked with Netezza’s Marketing VP Ellen Rubin, and she confirmed that those are clean figures — user data, single warehouses, etc. Ellen further tells me that Netezza’s total count of warehouses that big is “significantly more” than the 3 named in the release.

Of course, this makes sense, given that Netezza’s largest box, the NPS 10800, runs 100 terabytes. And Catalina was named as having bought a 10800 in a press release back in December, 2006. Read more

October 29, 2007

ParAccel opens the kimono slightly

Please do not rely on the parts of this post that draw a distinction between in-memory and disk-based operation. See our February 18, 2008 post about ParAccel instead. It turns out that communication with ParAccel was yet worse than I had realized.

Officially launched today at the TDWI conference, ParAccel is out to compete with Netezza. Right out of the chute, ParAccel may have surpassed Netezza in at least one area: pointlessly annoying secrecy. (In other regards I love them dearly, but that paranoia can be a real pain.) As best I can remember, here are some things about ParAccel that I both am allowed to say and find interesting:

Read more

October 28, 2007

Infobright responds

An InfoBright employee posted something quite reasonable-looking in response to my inaugaral post about BrightHouse. Even so, InfoBright asked if they could substitute something with a slightly different tone. I agreed. Here’s what they sent in.

Curt, thanks for the write-up and the opportunity to talk about our customer success stories. As you say, our customer story is definitely “more than zero.” We are addressing a number of critical customer issues with our unique approach to data warehousing.

Infobright currently has 5 customers - customers that have bucked the trend of throwing hardware at the problem. To be perfectly braggadocio about this, we have never lost a competitive proof of concept in which we’ve been engaged. This is accomplished with the horsepower of one box (though for redundancy customers may deploy multiple boxes with a load balancer).

Read more

October 26, 2007

Dude, you stole my joke!

October 15: We know what BEA is — now it is just a matter of negotiating the price

October 25: We’ve already established what you are, now we’re just working out a price

The news in the latter is that BEA has admitted it.

Note: Of course, the original joke is so old as to be variously attributed to all of George Bernard Shaw (most credibly), Winston Churchill, and Oscar Wilde.


October 25, 2007

DATAllegro discloses a few numbers

Privately held DATAllegro just announced a few tidbits about financial results and suchlike for the fiscal year ended June, 2007. I sent over a few clarifying questions yesterday. Responses included:

All told, it sounds as if DATAllegro is more than 1/3 the size of Netezza, although given its higher system size and price points I’d guess it has well under 1/3 as many customers.

Here’s a link. I’ll likely edit that to something more permament-seeming later, and generally spruce this up when I’m not so rushed.

October 23, 2007

Vertica — just star and snowflake schemas?

One of the longest-running technotheological disputes I know of is the one pitting flat/normalized data warehouse architectures vs. cubes, stars, and snowflake schemas. Teradata, for example, is a flagwaver for the former camp; Microstrategy is firmly in the latter. (However, that doesn’t keep lots of retailers from running Microstrategy on Teradata boxes.) Attensity (a good Teradata partner) is in the former camp; text mining rival Clarabridge (sort of a Microstrategy spinoff) is in the latter. And so on.

Vertica is clearly in the star/snowflake camp as well. I asked them about this, and Vertica’s CTO Mike Stonebraker emailed a response. I’m reproducing it below, with light edits; the emphasis is also mine. Key points include:

Great question. This is something that we’ve thought a lot about and have done significant research on with large enterprise customers. … short answer is as follows:

Vertica supports star and snowflake schemas because that is the desired data structure for data warehousing. The overwhelming majority of the schemas we see are of this form, and we have highly optimized for this case.

Read more

October 23, 2007

Vertica update

Vertica has been quietly selling product for three quarters and has about 50 customers.

Andy Ellicott of Vertica pointed me to the above Richard Hackathorn quote. Sadly, he asked me not to name and shame another analyst who foolishly said Vertica hadn’t “launched” yet.

But then, I understand. I’m also not going to identify the client who gave me fits by insisting on believing that nonsense, even in the face of the well-known facts that Vertica has shipping product, paying customers, and so on.

October 23, 2007

Either there’s enormous interest in EnterpriseDB and/or mid-range relational DBMS …

… or else I’m one heck of a webinar draw.

We had 364 attendees for today’s webcast with EnterpriseDB, which is a huge number for that sort of thing.

October 22, 2007

Infobright BrightHouse — columnar, VERY compressed, simple, and related to MySQL

To a first approximation, Infobright – maker of BrightHouse — is yet another data warehouse DBMS specialist with a columnar architecture, boasting great compression and running on commodity hardware, emphasizing easy set-up, simple administration, great price-performance, and hence generally low TCO. BrightHouse isn’t actually MPP yet, but Infobright confidently promises a generally available MPP version by the end of 2008. The company says that experience shows >10:1 compression of user data is realistic – i.e., an expansion ratio that’s fractional, and indeed better than 1/10:1. Accordingly, despite the lack of shared-nothing parallelism, Infobright claims a sweet spot of 1-10 terabyte warehouses, and makes occasional references to figures up to 30 terabytes or so of user data.

BrightHouse is essentially a MySQL storage engine, and hence gets a lot of connectivity and BI tool support features from MySQL for “free.” Beyond that, Infobright’s core technical idea is to chop columns of data into 64K chunks, called data packs, and then store concise information about what’s in the packs. The more basic information is stored in data pack nodes,* one per data pack. If you’re familiar with Netezza zone maps, data pack nodes sound like zone maps on steroids. They store maximum values, minimum values, and (where meaningful) aggregates, and also encode information as to which intervals between the min and max values do or don’t contain actual data values. Read more

October 22, 2007

Native XML performance, and Philip Howard on recent IBM DBMS announcements

Philip Howard went to at least one conference this month I didn’t, namely IBM’s, and wrote up some highlights. As usual, he seems to have been favorably impressed.

In one note, he says that IBM is claiming a 2-5X XML performance improvement. This is a good step, since one of my clients who evaluated such engines dismissed IBM early on for being an order of magnitude too slow. That client ultimately chose Marklogic, with Cache’ having been the only other choice to make the short list.

Speaking of IBM, I flew back from the Business Objects conference next to a guy who supports IMS. He told me that IBM has bragged of an actual new customer win for IMS within the past couple of years (a large bank in China). Read more

October 20, 2007

Wrinkles in the Informatica versus Business Objects patent litigation

Business Objects recently lost a patent lawsuit to Informatica in the area of data integration. While I was at the Business Objects conference, I asked about it, and was told in effect “It’s no big deal. In fact, the monetary award was reduced. Anyhow, we shipped a non-infringing version within 12 days after the decision, and sales are rolling along.” I then reflected that answer back to Informatica’s stellar analyst relations guy Chas Kielt. He checked with corporate counsel, and sent back the detailed clarification below. Since I got my Business Objects answers from a couple of caught-off-guard non-lawyer French guys, while Chas got a careful explanation of an American court’s judgment from an American lawyer, I’m inclined to think that in any details where they might conflict, Chas’ version is more likely to be accurate.

There’s a more substantive disagreement as to whether the features deleted from BOBJ’s product due to the injunction are actually important in the marketplace. I’m looking into that subject, and hope to post about it in the near future. Read more

October 19, 2007

Webinar on mid-range OLTP DBMS Tuesday October 23 12 noon Eastern time

I’m doing another webinar on mid-range OLTP DBMS next Tuesday, at 12 noon Eastern. It’s sponsored by EnterpriseDB, who also sponsored one six months ago on the same subject. Hopefully, this one will be a bit fresher. Sign up today! The expected turnout is humongous.

Technorati Tags: , ,

October 19, 2007

One Greenplum customer — 35 terabytes and growing fast

I was at the Business Objects conference this week, and as usual went to very few sessions. But one I did stroll into was on “Managing Rapid Growth With the Right BI Strategy.” This was by Reliance Telecommunications, an outfit in India that is adding telecom subscribers very quickly, and consequently banging 100-150 gigs of data per day into a 35 terabyte warehouse.

The beginning of the talk astonished me, as the presenter seemed to be saying they were doing all this on Oracle. Hah. Oracle is what they moved away from; instead, they got Greenplum. I couldn’t get details; indeed, as a BI guy he was far enough away from DBMS to misspeak and say that Greenplum was brought in by ‘HP’, before quickly correcting himself when prompted. Read more

October 19, 2007

Gartner 2007 Magic Quadrant for Data Warehouse Database Management Systems

It’s early autumn, the leaves are turning in New England, and Gartner has issued another Magic Quadrant for data warehouse DBMS. The big winners vs. last year are Greenplum and, secondarily, Sybase. Teradata continues to lead. Oracle has also leapfrogged IBM, and there are various other minor adjustments as well, among repeat mentionees Netezza, DATAllegro, Sand, Kognitio, and MySQL. HP isn’t on the radar yet; ditto Vertica. Read more

October 15, 2007

We know what BEA is — now it is just a matter of negotiating the price

After the long Oracle/Peoplesoft drama, I don’t see any likely way the Oracle bid for BEA will end with anything other than a rather rapid acquisition of BEA, probably by Oracle.

But for now it’s not a done deal, as BEA is quite reasonably still haggling about price.

October 12, 2007

SAP is losing crucial managerial talent

In the past month or so, both Dennis Moore and Nimish Mehta have left SAP. Their reasons are well-known among Oracle alumni to be — at least in large part — discomfort with SAP’s direction. (My unnamed sources on that are highly reliable.) And of course Shai Agassi left earlier this year. It now looks as if my contrarian viewpoint pooh-poohing the importance of Shai’s departure was probably wrong.

Based on all that, I don’t think there’s much reason for optimism about SAP’s system software futures, except perhaps for those that are placed wholly under the control of the Business Objects division. NetWeaver? Already a creaking omnibus. MaxDB? They didn’t get it right the first time around; what will be different now? BI Accelerator? That one actually could do well under Business Objects. The dream of other kinds of appliances? Not likely to achieve take-off. TREX? They weren’t really enhancing that much anyway. The rest of the search-related vision Dennis outlined for me? That’s another one that actually could thrive under Business Objects, but I expect a considerable number of false starts at best before they work out a coherent new strategy.

The high-end app business, the new SaaS business, the new Business Objects subsidiary — any and all of those could do well. But the attempts to become a broad-based system software player rivaling Oracle, Microsoft, and/or IBM are looking a lot less healthy than they used to.

Keep getting great research about enterprise applications, analytics and related technologies. Get a FREE subscription by RSS or email!

Technorati Tags: , , , ,

October 12, 2007

More on the Oracle-BEA deal

Jeff Nolan has a great post on the Oracle/BEA deal. Yeah, he still has some of his old SAP good/Oracle evil reflexes, but he can be forgiven those and the tinfoilhattishness associated with them. His analysis of sellers’ and buyers’ deal habits is revealing and sound. Ditto the start of his remarks on Oracle product delays and internal politics, and SAP/Oracle competition. Even better, nothing in his analysis seems to disagree with mine. :)

What Oracle now needs to do is make Oracle Application Server be a seamless “upgrade” from Weblogic. Then they can integrate in whatever kitchen-sink stuff they want from Oracle data caching (already there), app and/or dev tool run times, TimesTen, Tangosol, and so on, creating an app server stack that’s a worthy counterpart to the Oracle database in how it meets high-end OLTP needs. Meanwhile, Weblogic should remain as a not-bloated app-server-for-the-rest-of-us. Read more

October 12, 2007

Three ways Oracle or Microsoft could go MPP

I’ve been arguing for a while that Oracle and Microsoft are screwed in high-end data warehousing. The reason is that they’re stuck with SMP (Symmetric Multi-Processing) architectures, while Teradata, Netezza, DATAllegro, and many others enjoy the benefits of MPP (Massively Parallel Processing). Thus, Teradata and DATAllegro boast installations in the hundreds of terabytes each, while Oracle and Microsoft users usually have to perform unnatural acts of hard-coded partitioning even to reach the 10 terabyte level.

That said, there are at least three ways Oracle and/or Microsoft could get out of this technical box:

1. They could buy or just partner with MPP vendors such as Dataupia, who offer plug-compatibility with their respective main DBMS.

2. They could buy whoever they want, plug-compatibility be damned. Presumably, they’d quickly add a light-weight data federation front-end to give the appearance of integration, then merge the products more closely over time.

3. They could develop or buy technology like DATAllegro’s, which essentially federates instances of an ordinary SMP DBMS across nodes of an MPP grid (Greenplum does something similar). I imagine that, for example, ripping Ingres out of DATAllegro and slotting in Oracle instead would be a pretty straightforward exercise; even without dramatic change to any of the optimizations, the resulting port would be something that ran pretty quickly on Day 1.

Bottom line: Oracle and Microsoft are hemorrhaging at the data warehouse high end now. But there are ways they could stanch the bleeding.

October 12, 2007

Oracle and BEA — sometimes I am waaaay early

Back in December, 2002, I wrote up the rationale for an Oracle acquisition of BEA. The deal finally seems like it may be happening. Oddly, when I proposed it then, I was accused by Oracle’s analyst relations department of being “unprofessional” for having the temerity to suggest it. And while the specific individual who threw that tantrum is long gone, I haven’t talked all that much with Oracle’s core server groups since … but I digress.

Actually, the logic of an Oracle/BEA deal now isn’t much different from what it was way back then. One exception is that in the intervening half-decade Oracle has acquired a formidable amount of experience in integrating large and/or technically overlapping acquisitions. Technically, however, the story remains pretty much the same. Oracle’s app server and BEA Weblogic do pretty similar things, more or less compliant to standards, only with different add-on functionality. And BEA’s most important add-ons are in an area — integration with outside applications — where Oracle has long needed to improve. Read more

October 10, 2007

SAS goes MPP on Teradata first

After a hurried discussion with SAS CTO Keith Collins and a followup with Teradata CTO Todd Walter, I think I’ve figured out the essence of the SAS port to Teradata. (Subtle nuances, however, have to await further research.) Here’s what I think is going on:

1. SAS is porting or creating two different products or modules, with two different names (and I don’t know exactly what those names are). The two different things they are porting amount to modeling (i.e., analysis) and scoring (i.e., using the results of the model for automated decision-making).

2. Both products are slated for delivery at or near the time of SAS 9.2, which is slated for GA at or near the middle of next year. (Maybe somebody from SAS could send me the official word, as well as product names and so on?)

3. The essence of the modeling port is a library of static UDFs (User Defined Functions).

4. The essence of the SAS scoring port is the ability to easily generate a single “dynamic” UDF to score according to a particular model. This would seem to leverage Teradata scoring-related enhancements much more than it would compete or conflict with them.

5. There are two different kinds of benefits SAS gets from integrating with an MPP (Massively Parallel Processing) DBMS. One is actual parallel processing of operations, shortening absolute calculation time dramatically, and also leveraging Moore’s Law without painful SMP (Symmetric MultiProcessing) overhead. The other is a radical reduction in data movement costs for the handoff between the database and the SAS software. Interestingly, SAS reports huge performance gains even from putting its software on a single node inside the Teradata grid. That is, changing how data movement is done is already a huge win, even when there’s no reduction in the overall amount moved. But of course, in the complete implementation, where database and SAS processing are done on the same nodes, there’s also a huge reduction in actual data movement effort required.

One obvious question would be: How hard would it be for SAS to replicate this work on other MPP DBMS? Well, at its core this work involves implementing a variety of elementary arithmetic and data manipulation functions. So a first-best guess is that a fairly efficient port would be easy (given that this one has already been performed), but that the last 20% or whatever of the performance optimizations require a lot more work. As to whether or not this is more than a theoretical question — well, both SAS and SPSS are disclosed members of the Netezza Developers Network. As for SMP DBMS — well, some of the work certainly could be replicated, but other important parts don’t even make sense on Oracle or Microsoft the way they do on Teradata, Netezza, DATAllegro, et al. Read more

October 9, 2007

Marketing versus reality on the one-petabyte barrier

Usually, I don’t engage in the kind of high-speed quick-response blogging I have over the past couple of days from the Teradata Partners conference (and more generally have for the past week or so). And I’m not sure it’s working out so well.

For example, the claim that Teradata has surpassd the one-petabyte mark comes as quite a surprise to variety of Teradata folks, not to mention at least one reliable outside anonymous correspondent. That claim may indeed be true about raw disk space on systems sold. But the real current upper limit, according to CTO Todd Walter,* is 5-700 terabytes of user data. He thinks half a dozen or so customers are in that range. I’d guess quite strongly that three of those are Wal-Mart, eBay, and an unspecified US intelligence agency.

*Teradata seems to have quite a few CTOs. But I’ve seen things much sillier than that in the titles department, and accordingly shan’t scoff further — at least on that particular subject. ;)

On the other hand, if anybody did want to buy a 10 petabyte system, Teradata could ship them one. And by the way, the Teradata people insist Sybase’s claims in the petabyte area are quite bogus. Teradata claims to have had bigger internal systems tested earlier than the one Sybase writes about.

Technorati Tags: , , , , ,

October 9, 2007

Yet more on petabyte-scale Teradata databases

I managed to buttonhole Teradata’s Darryl MacDonald again, to follow up on yesterday’s brief chat. He confirmed that there are more than one petabyte+ Teradata databases out there, of which at least one is commercial rather than government/classified. Without saying who any of them were, he dropped a hint suggestive of Wal-Mart. That makes sense, given that a 423 terabyte figure for Wal-Mart is now three years old, and Wal-Mart is in the news for its 4 petabyte futures. Yes, that news has tended to mention HP NeoView recently more than Teradata. But it seems very implausible that a NeoView replacement of Teradata has already happened, if if such a thing is a possibility for the future. So right now however much data Wal-Mart has on its path from 423 terabytes to 4 petabytes and beyond is probably collected mainly on Teradata machines.

Technorati Tags: , , , , ,

October 9, 2007

Another firm that never sees DB2 in data warehousing

At the Teradata show today, I talked with Mike Weber of Scorecard Systems Inc. Scorecard’s business is vertical BI for telecommunications companies to analyze call data. They support Teradata (obviously), Oracle, and Microsoft SQL*Server, with Netezza coming soon. But not DB2.

Mike says that, in ten years in this business, he’s never seen DB2. Read more

October 8, 2007

One reason Teradata spun out publicly rather than being bought

There were well-publicized tax reasons for Teradata to be spun out publicly from NCR rather than just sold off. Back in April, I questioned these, suggesting there was a pretty good workaround.

Today, however, after hearing Teradata management repeatedly finesse the question of why they didn’t pursue the buyout option, a very good reason hit me like a ton of bricks. Teradata employees — especially senior managers — got hefty stock options in connection with the spinout. The same would probably have happened if Teradata were LBOed. But it would surely have not have happened if Teradata had merely been sold off to a third company.

Continue getting great research about text mining, data warehouse appliances, and other hot analytics-related topics! Take our comprehensive feed, by RSS/Atom or e-mail!

Technorati Tags: ,

October 8, 2007

Hot buzzword — multidimensional partitioning

Teradata finally announced multidimensional range partitioning in Version 12, not that they kept their plans in that regard a big secret. DATAllegro has also shipped multidimensional partitioning to at least one customer. Other vendors — well, I’ll stop there, given my ongoing atttitude problems about vendors’ self-defeating NDAs.

Whether or not multidimensional partitioning is a big improvement over single-dimensional will of course depend a great deal on the details of a particular database. Teradata used a figure of 30% performance improvement, but that’s surely just an example. Certainly in some extreme cases one could have a rather large reduction in the amount of data retrieved, and correspondingly a many-times-X improvement in the performance of certain important queries. Read more

Next Page →

Feed including blog about database management, data warehousing, and business intelligence Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Recent white paper

Pervasive PSQL Summit v10 Highlights

September, 2007

Recent webcast

What leading database vendors don't want you to know

Originally broadcast April 9, 2008

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.