Data warehousing
Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:
Exadata: Oracle finally answers the data warehouse challengers
Oracle, in partnership with HP, has announced a new data warehouse appliance product line, cleverly branded “Exadata.” The basic idea seems to be that database processing is split among two sets of servers:
- (The new stuff) A set of back-end servers — the Oracle Exadata Storage Servers — that gets data off of disk and does some preliminary query processing.
- (The old stuff) A conventional Oracle RAC cluster on the front-end.
Numbers are being thrown around suggesting that, unlike prior Oracle offerings, the Oracle Exadata-based appliance at least has scalability and price/performance worth comparing to Teradata — hey, Exa is bigger than Tera! — Netezza, et al.
Kevin Closson, who evidently worked on the project, offers the most useful and detailed description of Oracle Exadata I’ve seen so far. In particular, he and Oracle seem to claim: Read more
Categories: Data warehousing, Exadata, Oracle, Parallelization | 18 Comments |
Peter Batty on Netezza Spatial
As previously noted, I’m not up to speed on Netezza Spatial. Phil Francisco of Netezza has promised we’ll fix that ASAP. In the mean time, I found a blog by a guy named Peter Batty, who evidently:
- Knows a lot about geospatial data and its uses
- Is consulting to Netezza
- Is smart
Batty offers a lot of detail in two recent posts, intermixed with some gollygeewhiz about Netezza in general. If you’re interested in this stuff, Batty’s blog is well worth checking out. Read more
Categories: Analytic technologies, Data warehousing, GIS and geospatial, Netezza, Telecommunications | 2 Comments |
Database compression is heavily affected by the kind of data
I’ve written often of how different kinds or brands of data warehouse DBMS get very different compression figures. But I haven’t focused enough on how much compression figures can vary among different kinds of data. This was really brought home to me when Vertica told me that web analytics/clickstream data can often be compressed 60X in Vertica, while at the other extreme — some kind of floating point data, whose details I forget for now — they could only do 2.5X. Edit: Vertica has now posted much more accurate versions of those numbers. Infobright’s 30X compression reference at TradeDoubler seems to be for a clickstream-type app. Greenplum’s customer getting 7.5X — high for a row-based system — is managing clickstream data and related stuff. Bottom line:
When evaluating compression ratios — especially large ones — it is wise to inquire about the nature of the data.
Categories: Data warehousing, Database compression, Greenplum, Infobright, Vertica Systems, Web analytics | 4 Comments |
Oracle announcements next week, data warehouse appliance, 11g R2 or otherwise
Eric Lai and Chris Kanarcus put up an article on Oracle’s announcements next week. Much of the speculation revolved around generic grid/clustering, with more detail than I posted yesterday. Most interesting to me was the last section of the article, which sounds as if it could be talking about the same thing Luke Lonergan referred to in a comment thread when he said:
Oracle is about to unveil a secret project that uses HP DL185 servers as storage devices with some predicate pushdowns to implement a data warehouse “appliance”.
Categories: Data warehouse appliances, Data warehousing, Oracle | 1 Comment |
Wikipedia needs some urgent help in the database area
One or more people are going around clobbering Wikipedia’s coverage of analytic DBMS vendors. Netezza’s article has been gutted, and is marked for deletion. Aster Data’s and Dataupia’s articles are marked for deletion, although it seems that at least Aster’s will survive. Greenplum’s article is already gone, as is DATAllegro’s. I can’t immediately tell whether there ever was one on Infobright or ParAccel.
Vertica’s, by way of contrast, is in good shape. (But then, the Vertica guys are a little sharper about internet marketing that most of their peers.) Teradata’s isn’t in danger of deletion, but definitely could use some sprucing up. Read more
Categories: Data warehousing | 7 Comments |
Netezza overseas
22% of Netezza’s revenue comes from outside the US, at least if we use last quarter’s figures as a guide. At first blush, that doesn’t sound like much. Indeed, percentage-wise it surely lags behind Teradata, Greenplum (which has sold a lot in Asia/Pacific under Netezza’s former head of that region), and a few smaller competitors headquartered outside the US. But a few conversations I had today suggest a rosier view. Read more
Categories: Data warehouse appliances, Data warehousing, Greenplum, Kognitio, Market share and customer counts, Netezza, Teradata | Leave a Comment |
Microsoft/DATAllegro time frame announced
Edit: Actually, an email did eventually wend its way to me about a day later, which evidently had run into major congestion somewhere in the intertubes.
My resolve to eschew scathing sarcasm is being sorely tested tonight. The lastest trial is my discovery that nobody thought to so much as email me a press release, let alone brief me, on Microsoft’s announcement of a timetable for DATAllegro/SQL Server integration. Per Ina Fried — with a hat tip to anonymous commenter L.J. — Microsoft says:
The final version of that product is slated for the first half of 2010, though Microsoft said it will begin giving customers and partners access to early “community technology preview” releases within the next 12 months.
Categories: Data warehousing, DATAllegro, Microsoft and SQL*Server | Leave a Comment |
Netezza application areas
I’m at the Netezza “Enzee” user conference in Orlando. So one or more Netezza posts are in order.
One theme of the brief analyst meeting was Netezza’s increasing business focus on vertical markets. In particular, Netezza is hiring managers for a range of vertical markets. The commercial ones cited (at various levels of maturity) included: Read more
Categories: Application areas, Data warehouse appliances, Data warehousing, Market share and customer counts, Netezza, Telecommunications | Leave a Comment |
More mysteries regarding Oracle CDR load speed
Last spring, DATAllegro user John Devolites of TEOCO told me of troubles his firm had had loading CDRs (Call Detail Records) into Oracle, and how those had been instrumental in his eventual adoption of DATAllegro. That claim was contemptously challenged in a couple of comment threads.
Well, tonight at the Netezza user conference, Netezza gave awards to its first customers. The very first to accept was Jim Hayden, who’d bought Netezza for a company called Vibrant Solutions, which coincidentally was later acquired by TEOCO itself. In front of hundreds of people, he talked about how, back in 2003, it had taken 23 hours to load 400 million CDRs into Oracle on Nextel’s behalf, but only 40 minutes on Netezza.
And I’ll erase the rest of what I’d drafted here, as it was dripping in sarcasm …
Categories: Data warehousing, Netezza, Oracle, Telecommunications, TEOCO | 2 Comments |
Teradata sound bites
In connection with Teradata’s attempt to get into the Netezza news cycle with an appliance product announcement, I’ve whipped up a few Teradata-related sound bites suitable for quoting.
- Teradata has been in the data warehouse appliance business since 1984. I’m glad they’re finally admitting it.
- Teradata’s users love them. The users’ bosses, who sign the checks, aren’t as thrilled. Price competition is a big issue for Teradata.
- Teradata pricing has caused some real resistance, and even anger. Price is the big reason some startups are growing so much faster than Teradata. Ease of installation is sometimes a second factor.
- Teradata isn’t going to win many price-per-terabyte shootouts. (Note: I mean price per terabyte of user data.)
- The 5-10X+ performance advantage isn’t as crazy as it sounds, at least for some use cases. Teradata does still get a lot of business, and wins some price/performance shootouts to get it.
- Many Teradata customers are buying newer analytic DBMS as well. But they aren’t throwing out Teradata. Most stories of Teradata replacements are misunderstandings.
- The analytic DBMS startups all still do most of their business supporting data marts. If you have a high-concurrency workload, you usually need more mature technology. That’s where Teradata shines.
- That said, the very largest data warehouses are usually really data marts. High-concurrency BI is usually run against somewhat smaller databases.
- The upper limit for data warehouse sizes is skyrocketing. In 18 months, we’re seeing the largest known production systems go from under 1 petabyte of user data to multiple petabytes.
- Teradata has more competition for the very largest databases than it used to, which are now being found in relatively young web companies even more than in old-line telcos, retailers, or banks.