Analytic technologies
Discussion of technologies related to information query and analysis. Related subjects include:
- Business intelligence
- Data warehousing
- (in Text Technologies) Text mining
- (in The Monash Report) Data mining
- (in The Monash Report) General issues in analytic technology
So what does Oracle Exadata mean for HP Neoview?
That HP is committed to selling a lot of data warehouse hardware — and probably data warehouse appliances in particular — seems obvious, for reasons including:
- HP bought a big BI/data warehousing consulting operation in Knightsbridge.
- HP has put considerable effort into its data warehouse appliance Neoview.
- HP CEO Mark Hurd comes from data warehouse appliance vendor Teradata.
- Data warehousing where the big bucks are.
But Oracle Exadata could produce those appliance sales. So where does HP Neoview fit in?
I was told by an investor today that HP’s investor relations department is saying Oracle Exadata is a Netezza competitor, while Neoview is more in the Teradata market. That’s laughable. Read more
| Categories: Data warehouse appliances, Data warehousing, Exadata, HP and Neoview, Netezza, Teradata | 16 Comments |
Other notes on Oracle data warehousing
Obviously, the big news this week is Exadata, and its parallelization or lack thereof. But let’s not forget the rest of Oracle’s data warehousing technology.
- Frankly, I’ve come to think that disk-based OLAP cubes and materialized views are both cop-outs, indicative of a relational data warehouse architecture that can’t answer queries quickly enough straight-up. But if you disagree, then you might like Oracle’s new OLAP cube materialized views, which sound like a worthy competitor to Microsoft Analysis Services. (Further confusing things, I’ve seen reports that Oracle is increasing its commitment to Essbase, a separate MOLAP engine. I hope those are incorrect.)
- A few weeks ago, I came to realize that Oracle’s data mining database features actually mattered — perhaps not quite as much as Charlie Berger might think, but to say that is to praise with faint damns. 😉 SPSS seems to be getting large performance gains from leveraging the scoring part, and perhaps the transformation part as well. I haven’t focused on getting my details right yet, so I haven’t been writing about it. But heck, with all the other Oracle data warehousing discussion, it seems right to at least mention this part too.
So what’s Oracle’s MPP-aware optimizer and query execution plan story?
Edit: Answers to the title question have now shown up, and so the post below is now superseded by this one.
In most respects — including most data warehousing respects — Oracle’s query optimizer is the most sophisticated on the planet (even ahead of IBM’s, I’d say). But in all the Exadata discussion — and also in a good, comprehensive review of Oracle’s data warehouse technology — I haven’t seen any claims that Oracle has tackled the hard problems of parallel analytics.
Yes, Oracle is now getting data off of multiple disks onto multiple processors at once, without SAN bottlenecks, and doing some local filtering. That’s the heart of the Exadata storage story, and it’s indeed a huge advance over Oracle’s prior technology. But what happens to the data after that? It’s sent over to a RAC cluster. And unless I’m terribly mistaken, any further processing will be done on just a single node in that cluster.
| Categories: Data warehousing, Oracle, Parallelization | 9 Comments |
Oracle Exadata and Oracle data warehouse appliance sound bites
In addition to my previously posted thoughts on the Oracle Exadata/data warehouse appliance announcement, let me offer some more concise observations.
- Microsoft had leapfrogged Oracle with its DATAllegro acquisition. Now Oracle’s back in the game.
- But Oracle Exadata Release 1 is hardly going to put Teradata, Netezza, or Greenplum out of business.
- After long denying it, Oracle has finally admitted that putting more than 10 TB on Oracle had been an extremely painful thing to do.
- Oracle’s idea of splitting database processing between a couple of types of server is a smart one, and is consistent with what multiple other vendors are doing.
- Medium-long term, the Exadata technical strategy could work very well. Exadata storage management addresses some of the problems with shared-everything; Oracle RAC addresses other; and it may not take many releases before Oracle gets query parallelization right as well. Edit: This point is superseded by my updated take on Oracle query parallelization.
- Now Oracle and Microsoft are both supporting Infiniband for high end data warehousing.
- Oracle’s Exadata-based appliance doesn’t have the out-of-the-box simplicity that other appliances and analytic DBMS do.
- Licensing details aren’t yet clear, but Oracle Exadata’s list price probably won’t be terribly appealing either. Of course, nobody in their right mind pays Oracle list prices anyway.
- New web-based businesses have no reason to buy the Oracle data warehouse appliance. Exadata makes sense only for established enterprises.
Contradicting all that potential goodness, Oracle has been making ringing anti-shared-nothing statements, such as the silly:
There are “speed-of-light issues” associated with … scale-out-style grids
That mindset doesn’t auger well for Oracle to ever be a fully competitive high-end data warehouse DBMS vendor.
| Categories: Data warehouse appliances, Data warehousing, Exadata, Oracle | 5 Comments |
Some of Oracle’s largest data warehouses
Googling around, I came across an Oracle presentation – given some time this year – that lists some of Oracle’s largest data warehouses. 10 databases total are listed with >16 TB, which is fairly consistent with Larry Ellison’s confession during the Exadata announcement that Oracle has trouble over 10 TB (which is something I’ve gotten a lot of flack from a few Oracle partisans for pointing out … 😀 ).
However, what’s being measured is probably not the same in all cases. For example, I think the Amazon 70 TB figure is obviously for spinning disk (elsewhere in the presentation it’s stated that Amazon has 71 TB of disk). But the 16 TB British Telecom figure probably is user data — indeed, it’s the same figure Computergram cited for BT user data way back in 2001.
The list is: Read more
| Categories: Data warehousing, Oracle, Specific users, Telecommunications, Yahoo | 6 Comments |
Exadata: Oracle finally answers the data warehouse challengers
Oracle, in partnership with HP, has announced a new data warehouse appliance product line, cleverly branded “Exadata.” The basic idea seems to be that database processing is split among two sets of servers:
- (The new stuff) A set of back-end servers — the Oracle Exadata Storage Servers — that gets data off of disk and does some preliminary query processing.
- (The old stuff) A conventional Oracle RAC cluster on the front-end.
Numbers are being thrown around suggesting that, unlike prior Oracle offerings, the Oracle Exadata-based appliance at least has scalability and price/performance worth comparing to Teradata — hey, Exa is bigger than Tera! — Netezza, et al.
Kevin Closson, who evidently worked on the project, offers the most useful and detailed description of Oracle Exadata I’ve seen so far. In particular, he and Oracle seem to claim: Read more
| Categories: Data warehousing, Exadata, Oracle, Parallelization | 18 Comments |
Oracle is integrating clickstream and network analytics too
Oracle announced today the not-so-concisely-named Oracle Real User Experience Insight, which actually seems to be an official nickname for what is more properly called “Oracle Enterprise Manager Real User Experience Insight.” Trying saying that 10 times straight at network speeds … but I digress.
If I’m reading things correctly, add Oracle to the already long list of vendors who see clickstream and network event analytics as being two sides of the same coin.
| Categories: Analytic technologies, Oracle, Web analytics | 2 Comments |
A few operational BI/BPM/business rules stories
Intersystems is rolling out DeepSee, which is a Cache’-specific BI engine. Since some Intersystems OEMs have been known to pay more money to Business Objects/Crystal Reports than to Intersystems itself, the business motivation is obvious. Technically, Intersystems’ claims include: Read more
| Categories: Business intelligence, Intersystems and Cache', Oracle | 1 Comment |
Peter Batty on Netezza Spatial
As previously noted, I’m not up to speed on Netezza Spatial. Phil Francisco of Netezza has promised we’ll fix that ASAP. In the mean time, I found a blog by a guy named Peter Batty, who evidently:
- Knows a lot about geospatial data and its uses
- Is consulting to Netezza
- Is smart
Batty offers a lot of detail in two recent posts, intermixed with some gollygeewhiz about Netezza in general. If you’re interested in this stuff, Batty’s blog is well worth checking out. Read more
| Categories: Analytic technologies, Data warehousing, GIS and geospatial, Netezza, Telecommunications | 2 Comments |
Database compression is heavily affected by the kind of data
I’ve written often of how different kinds or brands of data warehouse DBMS get very different compression figures. But I haven’t focused enough on how much compression figures can vary among different kinds of data. This was really brought home to me when Vertica told me that web analytics/clickstream data can often be compressed 60X in Vertica, while at the other extreme — some kind of floating point data, whose details I forget for now — they could only do 2.5X. Edit: Vertica has now posted much more accurate versions of those numbers. Infobright’s 30X compression reference at TradeDoubler seems to be for a clickstream-type app. Greenplum’s customer getting 7.5X — high for a row-based system — is managing clickstream data and related stuff. Bottom line:
When evaluating compression ratios — especially large ones — it is wise to inquire about the nature of the data.
