Netezza cites three warehouses over 50 terabytes
Netezza is finally making it clear that they run some largish warehouses. Their latest press release cites Catalina Marketing, Epsilon, and NYSE Euronext as having 50+ terabytes each. I checked with Netezza’s Marketing VP Ellen Rubin, and she confirmed that those are clean figures — user data, single warehouses, etc. Ellen further tells me that Netezza’s total count of warehouses that big is “significantly more” than the 3 named in the release.
Of course, this makes sense, given that Netezza’s largest box, the NPS 10800, runs 100 terabytes. And Catalina was named as having bought a 10800 in a press release back in December, 2006. Read more
ParAccel opens the kimono slightly
Please do not rely on the parts of this post that draw a distinction between in-memory and disk-based operation. See our February 18, 2008 post about ParAccel instead. It turns out that communication with ParAccel was yet worse than I had realized.
Officially launched today at the TDWI conference, ParAccel is out to compete with Netezza. Right out of the chute, ParAccel may have surpassed Netezza in at least one area: pointlessly annoying secrecy. (In other regards I love them dearly, but that paranoia can be a real pain.) As best I can remember, here are some things about ParAccel that I both am allowed to say and find interesting:
- ParAccel offers a columnar, MPP data warehouse DBMS, called the ParAccel Analytic Database.
- ParAccel’s product runs in two main modes. “Maverick” is normal, stand-alone mode. “Amigo” mode amounts to a plug-compatible accelerator for Oracle or Microsoft SQL*Server. Early sales and marketing were concentrated on SQL*Server Amigo mode.
- ParAccel’s product also runs in another pair of modes – in-memory and disk-based. Early sales and marketing were concentrated on in-memory mode. Hybrid memory-centric processing sounds like something for a future release.
- Sun has a reseller partnership with ParAccel, focused on in-memory mode.
- Sun and ParAccel published record-shattering 100 gigabyte, 300 gigabyte, and 1 terabyte TPC-H benchmarks today, based on in-memory mode. (If you’d like to throw 13 terabytes of disk at 1 terabyte of user data, running simple and repetitive queries, that benchmark might be a useful guide to your own experience. But hey – that’s a big improvement on the prior champion, who used 40 terabytes of disk. To ParAccel’s credit, they’re not pretending that this is a bigger deal than it is.)
Infobright responds
An InfoBright employee posted something quite reasonable-looking in response to my inaugaral post about BrightHouse. Even so, InfoBright asked if they could substitute something with a slightly different tone. I agreed. Here’s what they sent in.
Curt, thanks for the write-up and the opportunity to talk about our customer success stories. As you say, our customer story is definitely “more than zero.” We are addressing a number of critical customer issues with our unique approach to data warehousing.
Infobright currently has 5 customers - customers that have bucked the trend of throwing hardware at the problem. To be perfectly braggadocio about this, we have never lost a competitive proof of concept in which we’ve been engaged. This is accomplished with the horsepower of one box (though for redundancy customers may deploy multiple boxes with a load balancer).
| Categories: Analytic technologies, Columnar database management, Data warehousing, Database compression, Infobright | Leave a Comment |
Dude, you stole my joke!
October 15: We know what BEA is — now it is just a matter of negotiating the price
October 25: We’ve already established what you are, now we’re just working out a price
The news in the latter is that BEA has admitted it.
Note: Of course, the original joke is so old as to be variously attributed to all of George Bernard Shaw (most credibly), Winston Churchill, and Oscar Wilde.
| Categories: Application servers, Humor, Oracle | Leave a Comment |
DATAllegro discloses a few numbers
Privately held DATAllegro just announced a few tidbits about financial results and suchlike for the fiscal year ended June, 2007. I sent over a few clarifying questions yesterday. Responses included:
- Yes, the company experienced 330% year-over-year annual revenue growth.
- The majority of DATAllegro customers have bought systems in the 25-100 terabyte range.
- One system over 250 terabytes has been in production for months (surely the one I previously wrote about); a second is being installed.
- DATAllegro has “about 100” employees. By way of comparison, Netezza reported 225 full-time employees for the year ended January, 2007 – which probably means as of January 31, 2007.
All told, it sounds as if DATAllegro is more than 1/3 the size of Netezza, although given its higher system size and price points I’d guess it has well under 1/3 as many customers.
Here’s a link. I’ll likely edit that to something more permament-seeming later, and generally spruce this up when I’m not so rushed.
| Categories: Analytic technologies, DATAllegro, Data warehouse appliances, Data warehousing | 7 Comments |
Vertica — just star and snowflake schemas?
One of the longest-running technotheological disputes I know of is the one pitting flat/normalized data warehouse architectures vs. cubes, stars, and snowflake schemas. Teradata, for example, is a flagwaver for the former camp; Microstrategy is firmly in the latter. (However, that doesn’t keep lots of retailers from running Microstrategy on Teradata boxes.) Attensity (a good Teradata partner) is in the former camp; text mining rival Clarabridge (sort of a Microstrategy spinoff) is in the latter. And so on.
Vertica is clearly in the star/snowflake camp as well. I asked them about this, and Vertica’s CTO Mike Stonebraker emailed a response. I’m reproducing it below, with light edits; the emphasis is also mine. Key points include:
- Almost everybody (that Vertica sees) wants stars and snowflakes, so that’s what Vertica optimizes for.
- Replicating small dimension tables across nodes is great for performance.
- Even so, Vertica is broadening its support for more general schemas as well.
Great question. This is something that we’ve thought a lot about and have done significant research on with large enterprise customers. … short answer is as follows:
Vertica supports star and snowflake schemas because that is the desired data structure for data warehousing. The overwhelming majority of the schemas we see are of this form, and we have highly optimized for this case.
| Categories: Analytic technologies, Columnar database management, Data models and architecture, Data warehousing, Theory and architecture, Vertica Systems | 1 Comment |
Vertica update
Vertica has been quietly selling product for three quarters and has about 50 customers.
Andy Ellicott of Vertica pointed me to the above Richard Hackathorn quote. Sadly, he asked me not to name and shame another analyst who foolishly said Vertica hadn’t “launched” yet.
But then, I understand. I’m also not going to identify the client who gave me fits by insisting on believing that nonsense, even in the face of the well-known facts that Vertica has shipping product, paying customers, and so on.
Either there’s enormous interest in EnterpriseDB and/or mid-range relational DBMS …
… or else I’m one heck of a webinar draw.
We had 364 attendees for today’s webcast with EnterpriseDB, which is a huge number for that sort of thing.
| Categories: EnterpriseDB and Postgres Plus, Mid-range, Open source | 1 Comment |
Infobright BrightHouse — columnar, VERY compressed, simple, and related to MySQL
To a first approximation, Infobright – maker of BrightHouse — is yet another data warehouse DBMS specialist with a columnar architecture, boasting great compression and running on commodity hardware, emphasizing easy set-up, simple administration, great price-performance, and hence generally low TCO. BrightHouse isn’t actually MPP yet, but Infobright confidently promises a generally available MPP version by the end of 2008. The company says that experience shows >10:1 compression of user data is realistic – i.e., an expansion ratio that’s fractional, and indeed better than 1/10:1. Accordingly, despite the lack of shared-nothing parallelism, Infobright claims a sweet spot of 1-10 terabyte warehouses, and makes occasional references to figures up to 30 terabytes or so of user data.
BrightHouse is essentially a MySQL storage engine, and hence gets a lot of connectivity and BI tool support features from MySQL for “free.” Beyond that, Infobright’s core technical idea is to chop columns of data into 64K chunks, called data packs, and then store concise information about what’s in the packs. The more basic information is stored in data pack nodes,* one per data pack. If you’re familiar with Netezza zone maps, data pack nodes sound like zone maps on steroids. They store maximum values, minimum values, and (where meaningful) aggregates, and also encode information as to which intervals between the min and max values do or don’t contain actual data values. Read more
| Categories: Analytic technologies, Columnar database management, Data warehousing, Database compression, Infobright, MySQL, Open source | 1 Comment |
Native XML performance, and Philip Howard on recent IBM DBMS announcements
Philip Howard went to at least one conference this month I didn’t, namely IBM’s, and wrote up some highlights. As usual, he seems to have been favorably impressed.
In one note, he says that IBM is claiming a 2-5X XML performance improvement. This is a good step, since one of my clients who evaluated such engines dismissed IBM early on for being an order of magnitude too slow. That client ultimately chose Marklogic, with Cache’ having been the only other choice to make the short list.
Speaking of IBM, I flew back from the Business Objects conference next to a guy who supports IMS. He told me that IBM has bragged of an actual new customer win for IMS within the past couple of years (a large bank in China). Read more
| Categories: IBM and DB2, Intersystems and Cache', Mark Logic, Native XML | Leave a Comment |
Wrinkles in the Informatica versus Business Objects patent litigation
Business Objects recently lost a patent lawsuit to Informatica in the area of data integration. While I was at the Business Objects conference, I asked about it, and was told in effect “It’s no big deal. In fact, the monetary award was reduced. Anyhow, we shipped a non-infringing version within 12 days after the decision, and sales are rolling along.” I then reflected that answer back to Informatica’s stellar analyst relations guy Chas Kielt. He checked with corporate counsel, and sent back the detailed clarification below. Since I got my Business Objects answers from a couple of caught-off-guard non-lawyer French guys, while Chas got a careful explanation of an American court’s judgment from an American lawyer, I’m inclined to think that in any details where they might conflict, Chas’ version is more likely to be accurate.
There’s a more substantive disagreement as to whether the features deleted from BOBJ’s product due to the injunction are actually important in the marketplace. I’m looking into that subject, and hope to post about it in the near future. Read more
| Categories: Business Objects, EAI, EII, ETL, ELT, ETLT, Informatica | Leave a Comment |
Webinar on mid-range OLTP DBMS Tuesday October 23 12 noon Eastern time
I’m doing another webinar on mid-range OLTP DBMS next Tuesday, at 12 noon Eastern. It’s sponsored by EnterpriseDB, who also sponsored one six months ago on the same subject. Hopefully, this one will be a bit fresher. Sign up today! The expected turnout is humongous.
Technorati Tags: EnterpriseDB, OLTP, database management system
| Categories: Emulation, transparency, portability, EnterpriseDB and Postgres Plus, Mid-range, OLTP | Leave a Comment |
One Greenplum customer — 35 terabytes and growing fast
I was at the Business Objects conference this week, and as usual went to very few sessions. But one I did stroll into was on “Managing Rapid Growth With the Right BI Strategy.” This was by Reliance Telecommunications, an outfit in India that is adding telecom subscribers very quickly, and consequently banging 100-150 gigs of data per day into a 35 terabyte warehouse.
The beginning of the talk astonished me, as the presenter seemed to be saying they were doing all this on Oracle. Hah. Oracle is what they moved away from; instead, they got Greenplum. I couldn’t get details; indeed, as a BI guy he was far enough away from DBMS to misspeak and say that Greenplum was brought in by ‘HP’, before quickly correcting himself when prompted. Read more
| Categories: Analytic technologies, Business Objects, Data warehouse appliances, Data warehousing, Greenplum, Investment research and trading, Oracle, Specific users | Leave a Comment |
Gartner 2007 Magic Quadrant for Data Warehouse Database Management Systems
It’s early autumn, the leaves are turning in New England, and Gartner has issued another Magic Quadrant for data warehouse DBMS. The big winners vs. last year are Greenplum and, secondarily, Sybase. Teradata continues to lead. Oracle has also leapfrogged IBM, and there are various other minor adjustments as well, among repeat mentionees Netezza, DATAllegro, Sand, Kognitio, and MySQL. HP isn’t on the radar yet; ditto Vertica. Read more
We know what BEA is — now it is just a matter of negotiating the price
After the long Oracle/Peoplesoft drama, I don’t see any likely way the Oracle bid for BEA will end with anything other than a rather rapid acquisition of BEA, probably by Oracle.
But for now it’s not a done deal, as BEA is quite reasonably still haggling about price.
| Categories: Application servers, Oracle | 1 Comment |
SAP is losing crucial managerial talent
In the past month or so, both Dennis Moore and Nimish Mehta have left SAP. Their reasons are well-known among Oracle alumni to be — at least in large part — discomfort with SAP’s direction. (My unnamed sources on that are highly reliable.) And of course Shai Agassi left earlier this year. It now looks as if my contrarian viewpoint pooh-poohing the importance of Shai’s departure was probably wrong.
Based on all that, I don’t think there’s much reason for optimism about SAP’s system software futures, except perhaps for those that are placed wholly under the control of the Business Objects division. NetWeaver? Already a creaking omnibus. MaxDB? They didn’t get it right the first time around; what will be different now? BI Accelerator? That one actually could do well under Business Objects. The dream of other kinds of appliances? Not likely to achieve take-off. TREX? They weren’t really enhancing that much anyway. The rest of the search-related vision Dennis outlined for me? That’s another one that actually could thrive under Business Objects, but I expect a considerable number of false starts at best before they work out a coherent new strategy.
The high-end app business, the new SaaS business, the new Business Objects subsidiary — any and all of those could do well. But the attempts to become a broad-based system software player rivaling Oracle, Microsoft, and/or IBM are looking a lot less healthy than they used to.
Keep getting great research about enterprise applications, analytics and related technologies. Get a FREE subscription by RSS or email!
Technorati Tags: SAP, NetWeaver, Business Objects, TREX, BI Accelerator
| Categories: Business Objects, Business intelligence, SAP AG | Leave a Comment |
More on the Oracle-BEA deal
Jeff Nolan has a great post on the Oracle/BEA deal. Yeah, he still has some of his old SAP good/Oracle evil reflexes, but he can be forgiven those and the tinfoilhattishness associated with them. His analysis of sellers’ and buyers’ deal habits is revealing and sound. Ditto the start of his remarks on Oracle product delays and internal politics, and SAP/Oracle competition. Even better, nothing in his analysis seems to disagree with mine.
What Oracle now needs to do is make Oracle Application Server be a seamless “upgrade” from Weblogic. Then they can integrate in whatever kitchen-sink stuff they want from Oracle data caching (already there), app and/or dev tool run times, TimesTen, Tangosol, and so on, creating an app server stack that’s a worthy counterpart to the Oracle database in how it meets high-end OLTP needs. Meanwhile, Weblogic should remain as a not-bloated app-server-for-the-rest-of-us. Read more
Three ways Oracle or Microsoft could go MPP
I’ve been arguing for a while that Oracle and Microsoft are screwed in high-end data warehousing. The reason is that they’re stuck with SMP (Symmetric Multi-Processing) architectures, while Teradata, Netezza, DATAllegro, and many others enjoy the benefits of MPP (Massively Parallel Processing). Thus, Teradata and DATAllegro boast installations in the hundreds of terabytes each, while Oracle and Microsoft users usually have to perform unnatural acts of hard-coded partitioning even to reach the 10 terabyte level.
That said, there are at least three ways Oracle and/or Microsoft could get out of this technical box:
1. They could buy or just partner with MPP vendors such as Dataupia, who offer plug-compatibility with their respective main DBMS.
2. They could buy whoever they want, plug-compatibility be damned. Presumably, they’d quickly add a light-weight data federation front-end to give the appearance of integration, then merge the products more closely over time.
3. They could develop or buy technology like DATAllegro’s, which essentially federates instances of an ordinary SMP DBMS across nodes of an MPP grid (Greenplum does something similar). I imagine that, for example, ripping Ingres out of DATAllegro and slotting in Oracle instead would be a pretty straightforward exercise; even without dramatic change to any of the optimizations, the resulting port would be something that ran pretty quickly on Day 1.
Bottom line: Oracle and Microsoft are hemorrhaging at the data warehouse high end now. But there are ways they could stanch the bleeding.
Oracle and BEA — sometimes I am waaaay early
Back in December, 2002, I wrote up the rationale for an Oracle acquisition of BEA. The deal finally seems like it may be happening. Oddly, when I proposed it then, I was accused by Oracle’s analyst relations department of being “unprofessional” for having the temerity to suggest it. And while the specific individual who threw that tantrum is long gone, I haven’t talked all that much with Oracle’s core server groups since … but I digress.
Actually, the logic of an Oracle/BEA deal now isn’t much different from what it was way back then. One exception is that in the intervening half-decade Oracle has acquired a formidable amount of experience in integrating large and/or technically overlapping acquisitions. Technically, however, the story remains pretty much the same. Oracle’s app server and BEA Weblogic do pretty similar things, more or less compliant to standards, only with different add-on functionality. And BEA’s most important add-ons are in an area — integration with outside applications — where Oracle has long needed to improve. Read more
| Categories: Application servers, Data integration and middleware, EAI, EII, ETL, ELT, ETLT, Oracle | 3 Comments |
SAS goes MPP on Teradata first
After a hurried discussion with SAS CTO Keith Collins and a followup with Teradata CTO Todd Walter, I think I’ve figured out the essence of the SAS port to Teradata. (Subtle nuances, however, have to await further research.) Here’s what I think is going on:
1. SAS is porting or creating two different products or modules, with two different names (and I don’t know exactly what those names are). The two different things they are porting amount to modeling (i.e., analysis) and scoring (i.e., using the results of the model for automated decision-making).
2. Both products are slated for delivery at or near the time of SAS 9.2, which is slated for GA at or near the middle of next year. (Maybe somebody from SAS could send me the official word, as well as product names and so on?)
3. The essence of the modeling port is a library of static UDFs (User Defined Functions).
4. The essence of the SAS scoring port is the ability to easily generate a single “dynamic” UDF to score according to a particular model. This would seem to leverage Teradata scoring-related enhancements much more than it would compete or conflict with them.
5. There are two different kinds of benefits SAS gets from integrating with an MPP (Massively Parallel Processing) DBMS. One is actual parallel processing of operations, shortening absolute calculation time dramatically, and also leveraging Moore’s Law without painful SMP (Symmetric MultiProcessing) overhead. The other is a radical reduction in data movement costs for the handoff between the database and the SAS software. Interestingly, SAS reports huge performance gains even from putting its software on a single node inside the Teradata grid. That is, changing how data movement is done is already a huge win, even when there’s no reduction in the overall amount moved. But of course, in the complete implementation, where database and SAS processing are done on the same nodes, there’s also a huge reduction in actual data movement effort required.
One obvious question would be: How hard would it be for SAS to replicate this work on other MPP DBMS? Well, at its core this work involves implementing a variety of elementary arithmetic and data manipulation functions. So a first-best guess is that a fairly efficient port would be easy (given that this one has already been performed), but that the last 20% or whatever of the performance optimizations require a lot more work. As to whether or not this is more than a theoretical question — well, both SAS and SPSS are disclosed members of the Netezza Developers Network. As for SMP DBMS — well, some of the work certainly could be replicated, but other important parts don’t even make sense on Oracle or Microsoft the way they do on Teradata, Netezza, DATAllegro, et al. Read more
| Categories: Analytic technologies, Data warehouse appliances, Data warehousing, SAS Institute, Teradata | 4 Comments |
Marketing versus reality on the one-petabyte barrier
Usually, I don’t engage in the kind of high-speed quick-response blogging I have over the past couple of days from the Teradata Partners conference (and more generally have for the past week or so). And I’m not sure it’s working out so well.
For example, the claim that Teradata has surpassd the one-petabyte mark comes as quite a surprise to variety of Teradata folks, not to mention at least one reliable outside anonymous correspondent. That claim may indeed be true about raw disk space on systems sold. But the real current upper limit, according to CTO Todd Walter,* is 5-700 terabytes of user data. He thinks half a dozen or so customers are in that range. I’d guess quite strongly that three of those are Wal-Mart, eBay, and an unspecified US intelligence agency.
*Teradata seems to have quite a few CTOs. But I’ve seen things much sillier than that in the titles department, and accordingly shan’t scoff further — at least on that particular subject.
On the other hand, if anybody did want to buy a 10 petabyte system, Teradata could ship them one. And by the way, the Teradata people insist Sybase’s claims in the petabyte area are quite bogus. Teradata claims to have had bigger internal systems tested earlier than the one Sybase writes about.
Technorati Tags: Teradata, petabyte, data warehouse, Sybase, Wal-Mart, eBay
| Categories: Data warehouse appliances, Data warehousing, Specific users, Sybase, Teradata | 1 Comment |
Yet more on petabyte-scale Teradata databases
I managed to buttonhole Teradata’s Darryl MacDonald again, to follow up on yesterday’s brief chat. He confirmed that there are more than one petabyte+ Teradata databases out there, of which at least one is commercial rather than government/classified. Without saying who any of them were, he dropped a hint suggestive of Wal-Mart. That makes sense, given that a 423 terabyte figure for Wal-Mart is now three years old, and Wal-Mart is in the news for its 4 petabyte futures. Yes, that news has tended to mention HP NeoView recently more than Teradata. But it seems very implausible that a NeoView replacement of Teradata has already happened, if if such a thing is a possibility for the future. So right now however much data Wal-Mart has on its path from 423 terabytes to 4 petabytes and beyond is probably collected mainly on Teradata machines.
Technorati Tags: Teradata, petabyte, data warehouse, HP, Hewlett-Packard, NeoView
| Categories: Analytic technologies, Data warehouse appliances, Data warehousing, HP and Neoview, Teradata | 1 Comment |
Another firm that never sees DB2 in data warehousing
At the Teradata show today, I talked with Mike Weber of Scorecard Systems Inc. Scorecard’s business is vertical BI for telecommunications companies to analyze call data. They support Teradata (obviously), Oracle, and Microsoft SQL*Server, with Netezza coming soon. But not DB2.
Mike says that, in ten years in this business, he’s never seen DB2. Read more
| Categories: Analytic technologies, Business intelligence, Data warehousing, IBM and DB2, Microsoft and SQL*Server, Oracle, Teradata | Leave a Comment |
One reason Teradata spun out publicly rather than being bought
There were well-publicized tax reasons for Teradata to be spun out publicly from NCR rather than just sold off. Back in April, I questioned these, suggesting there was a pretty good workaround.
Today, however, after hearing Teradata management repeatedly finesse the question of why they didn’t pursue the buyout option, a very good reason hit me like a ton of bricks. Teradata employees — especially senior managers — got hefty stock options in connection with the spinout. The same would probably have happened if Teradata were LBOed. But it would surely have not have happened if Teradata had merely been sold off to a third company.
Continue getting great research about text mining, data warehouse appliances, and other hot analytics-related topics! Take our comprehensive feed, by RSS/Atom or e-mail!
| Categories: Teradata | Leave a Comment |
Hot buzzword — multidimensional partitioning
Teradata finally announced multidimensional range partitioning in Version 12, not that they kept their plans in that regard a big secret. DATAllegro has also shipped multidimensional partitioning to at least one customer. Other vendors — well, I’ll stop there, given my ongoing atttitude problems about vendors’ self-defeating NDAs.
Whether or not multidimensional partitioning is a big improvement over single-dimensional will of course depend a great deal on the details of a particular database. Teradata used a figure of 30% performance improvement, but that’s surely just an example. Certainly in some extreme cases one could have a rather large reduction in the amount of data retrieved, and correspondingly a many-times-X improvement in the performance of certain important queries. Read more
