Analysis of data warehouse appliance vendor DATAllegro and its products. Related subjects include:
There are two recent patent lawsuits in the data warehouse DBMS market. In one, Sybase is suing Vertica. In another, an individual named Cary Jardin (techie founder of XPrime, a sort of predecessor company to ParAccel) is suing DATAllegro. Naturally, there’s press coverage of the DATAllegro case, due in part to its surely non-coincidental timing right after the Microsoft acquisition was announced and in part to a vigorous PR campaign around it. And the Sybase case so excited a troll who calls himself Bill Walters that he posted identical references to it on about 12 different threads in this blog, as well as to a variety of Vertica-related articles in the online trade press. But I think it’s very unlikely that any of these cases turn out to much matter. Read more
|Categories: Columnar database management, Data warehousing, Database compression, DATAllegro, Sybase, Vertica Systems||7 Comments|
My first, biggest thought about DATAllegro’s acquisition by Microsoft is “Why the ____ did it have to happen while I was trying to relax on my annual Cayman vacation???” Not coincidentally, I don’t plan to neatly cross-link all my posts and so on about DATAllegro/Microsoft until I get back to Acton this weekend.
One linking screwup is that I previously forgot to mention that — in addition to the numerous posts here — I also made several DATAllegro/Microsoft-related posts on my Network World blog A World of Bytes. They include: Read more
|Categories: Analytic technologies, Data warehousing, DATAllegro, Microsoft and SQL*Server||8 Comments|
- Here’s the official press release on DATAllegro’s site, and Microsoft’s.
- Doug Henschen of Intelligent Enterprise has a good article. He got quotes from Microsoft claiming that SQL Server on its own would be able to handle 10s of terabytes of data in the next release, but DATAllegro was needed to get up to the 100s of terabytes. That said, the quotes don’t say whether that’s user data or total disk usage — the latter frankly seems more plausible.
- James Kobielus of Forrester has a long post on the Microsoft/DATAllegro deal, emphasizing product packaging issues and glossing over technological differentiators. (Edit: The post seems down as of Friday midday.)
- This is a few weeks old, but Kevin Closson is extremely skeptical of some of DATAllegro’s technical claims. (Not that it matters much if he’s right — more nodes = more throughput, no matter how much Oracle folks rant.)
- Eric Lai of Computerworld gets it right.
- Larry Dignan thinks the acquisition is part of an overall strong Microsoft enterprise push.
- William McKnight thinks Microsoft usually does a good job of integrating acquisitions.
- DATAllegro CEO Stuart Frost is happy.
- David Hunter thinks Microsoft will blithely continue with DATAllegro’s limited-hardware-support strategy. He’s almost certainly wrong.
- Philip Howard says almost nothing I agree with, although I can’t argue with the part
Conversely, it’s bad news for Ingres, bad news for Oracle, bad news for IBM, bad news for Teradata and bad news for HP, all for obvious reasons. As for the other appliance vendors: they will not be too happy either. In particular, we now have to consider who can survive on their own, who might be acquired, who might do the acquiring, and who is going to disappear.
Jim Ericson of DM Review emailed the excellent questions:
Does DATAllegro give MSFT full-service high end data warehousing capability? If not, what is missing?
My quick answers are:
- Two things:
- Hard-core multi-user concurrency.
- Support for more esoteric analytic tools and functionality
Both are largely a matter of product maturity, and as a young company DATAllegro isn’t quite there yet.
That said, integration with Microsoft SQL Server is apt to be a big help in addressing both issues. Read more
By acquiring DATAllegro, Microsoft has seriously leapfrogged Oracle in data warehouse technology. All doubts about maturity and versatility notwithstanding, DATAllegro has a 10X or better size advantage (actually, I think it’s more like 20-40X) versus Oracle in warehouses its technology can straightforwardly handle. Oracle cannot afford to let this move go unanswered.
It’s of course possible that Oracle has been successfully developing comparable data warehouse technology internally. But it’s unlikely. Oracle hasn’t done anything that radical, internally and successfully, for about 15 years, RAC (Real Application Clusters) excepted. (I.e., since the object/relational extensibility framework started in Release 7.) So in all likelihood, the answer will come via acquisition. I think there are four candidates that make the most sense: Teradata, Vertica, ParAccel, and Greenplum. Kognitio (controlled by former Oracle honcho Geoff Squire) might be in the mix as well. Netezza is probably a non-starter because of its hardware-centric strategy.
Here’s why I’m emphasizing Teradata, Vertica, ParAccel, and Greenplum: Read more
|Categories: Analytic technologies, Data warehouse appliances, Data warehousing, DATAllegro, Greenplum, Microsoft and SQL*Server, Oracle, ParAccel, Teradata, Vertica Systems||15 Comments|
I’ve long argued that:
- Oracle and Microsoft are doomed in the data warehouse market unless they acquire MPP/shared-nothing data warehouse DBMS and/or data warehouse appliances.
- DATAllegro is the ideal acquisition for either of them.
Microsoft has now validated my claim by agreeing to buy DATAllegro. As you probably know, we’ve been covering DATAllegro extensively, as per the links listed below.
Basic deal highlights include: Read more
|Categories: Analytic technologies, Data warehouse appliances, Data warehousing, DATAllegro, IBM and DB2, Memory-centric data management, Michael Stonebraker, Microsoft and SQL*Server, MySQL, Oracle, PostgreSQL||5 Comments|
DATAllegro CEO Stuart Frost has been blogging quite a bit recently (and not before time!). A couple of his posts have touched on compression. In one he gave actual numbers for compression, namely:
DATAllegro compresses between 2:1 and 6:1 depending on the content of the rows, whereas column-oriented systems claim 4:1 to 10:1.
In another recent post, Stuart touched on architecture, saying:
Due to the way our compression code works, DATAllegro’s current products are optimized for performance under heavy concurrency. The end result is that we don’t use the full power of the platform when running one query at a time.
|Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Database compression, DATAllegro||Leave a Comment|
If you had to name super-high-end users of data warehouse technology, your list might start with a few retailers, credit data processors, and telcos, plus the US intelligence establishment. Well, it turns out that TEOCO runs outsourced data warehouses for several of the top US telcos, making it one of the top data warehouse technology users around.
A few weeks ago, I had a fascinating chat with John Devolites of TEOCO. Highlights included:
- TEOCO runs a >200 TB DATAllegro warehouse for a major US telco. (When we hear about a big DATAllegro telco site that’s been in production for a while, that’s surely the one they’re talking about.)
- TEOCO runs around 450 TB total of DATAllegro databases across its various customers. (When Stuart Frost blogs of >400 TB “systems,” that may be what he’s talking about.)
- TEOCO likes DATAllegro better than Netezza, although the margin is now small. This is mainly for financial reasons, specifically price-per-terabyte. When TEOCO spends its own money without customer direction as to appliance brand, it buys DATAllegro.
- TEOCO runs at least one 50 TB Netezza system — originally due to an acquisition of a Netezza user — with more coming. There also is more DATAllegro coming.
- TEOCO feels 15-30 concurrent users is the current practical limit for both DATAllegro and Netezza. That’s greater than it used to be.
- Netezza is a little faster than DATAllegro on a few esoteric queries, but the difference is not important to TEOCO’s business.
- Official price lists notwithstanding, TEOCO sees prices as being in the $10K/TB range. DATAllegro’s price advantage has shrunk greatly, as others have come down to more or less match. However, since John stated his price preference for DATAllegro as being in the present tense, I presume the price match isn’t perfect.
- Teradata was never a serious consideration, for price reasons.
- In the original POC a few years ago, the incumbent Oracle — even after extensive engineering — couldn’t get an important query down under 8 hours of running time. DATAllegro and Netezza both handled it in 2-3 minutes. Similarly, Oracle couldn’t get the load time for 100 million call detail records (CDRs) below 24 hours.
- Applications sound pretty standard for telecom: Lots of CDR processing — 550 million/day on the big DATAllegro system cited above. Pricing and fraud checking. Some data staging for legal reasons (giving the NSA what it subpoenas and no more).
|Categories: Analytic technologies, Data mart outsourcing, Data warehouse appliances, Data warehousing, DATAllegro, Netezza, Pricing, Specific users, Telecommunications, TEOCO||7 Comments|
It took a lot of patient nagging, but DATAllegro finally has a blog. Based on the first post, I predict:
- DATAllegro’s blog will live up to CEO Stuart Frost’s talent for clear, interesting writing.
- Like a number of other vendor blogs — e.g., Netezza’s — DATAllegro’s will have infrequent but usually long posts.
The crunchiest part of the first post is probably
Another very important aspect of performance is ensuring sequential reads under a complex workload. Traditional databases do not do a good job in this area – even though some of the management tools might tell you that they are! What we typically see is that the combination of RAID arrays and intervening storage infrastructure conspires to break even large reads by the database into very small reads against each disk. The end result is that most large DW installations have very large arrays of expensive, high-speed disks behind them – and still suffer from poor performance.
I’ve pounded the table about sequential reads multiple times — including in a (DATAllegro-sponsored) white paper — but the point about misleading management tools is new to me.
Now if I could just get a production DATAllegro reference, I’d be completely happy …
|Categories: Analytic technologies, Data warehouse appliances, Data warehousing, DATAllegro||Leave a Comment|