Netezza and Teradata on analytic geospatial data management
Geospatial data management is one of the flavors of the month:
- Last week, Teradata claimed it has the most sophisticated analytic geospatial data management capability.
- Also last week, Netezza’s newly acquired Netezza Spatial technology attracted a lot of attention.
- This week, Oracle called attention to its geospatial capabilities.
So I asked Netezza and Teradata what this geospatial analytics stuff is all about. Read more
Categories: Analytic technologies, Data warehousing, GIS and geospatial, Netezza, Teradata | 3 Comments |
So what does Oracle Exadata mean for HP Neoview?
That HP is committed to selling a lot of data warehouse hardware — and probably data warehouse appliances in particular — seems obvious, for reasons including:
- HP bought a big BI/data warehousing consulting operation in Knightsbridge.
- HP has put considerable effort into its data warehouse appliance Neoview.
- HP CEO Mark Hurd comes from data warehouse appliance vendor Teradata.
- Data warehousing where the big bucks are.
But Oracle Exadata could produce those appliance sales. So where does HP Neoview fit in?
I was told by an investor today that HP’s investor relations department is saying Oracle Exadata is a Netezza competitor, while Neoview is more in the Teradata market. That’s laughable. Read more
Categories: Data warehouse appliances, Data warehousing, Exadata, HP and Neoview, Netezza, Teradata | 16 Comments |
Another round of discussion on in-memory OLTP data management
Oracle Exadata was pre-teased as “Extreme performance.” Some incorrect speculation shortly before the announcement focused on the possibility of OLTP without disk, which clearly would speed things up a lot. I interpret that in part as being wishful thinking. 🙂
The most compelling approach I’ve seen to that problem yet is H-Store, which however makes some radical architectural assumptions. One point I didn’t stress in my earlier posts, but which turned out to be a deal-breaker for one early tire-kicker, is that to use H-Store you have to be able to shoehorn each transaction into its own stored procedure. Depending on how intricate your logic is, that might make it hard to port an existing app to H-Store.
Even for new apps, it could get in the way of some things you might want to do, such as rule-based processing. And that could be a problem. A significant fraction of the highest-performance OLTP apps are customer-facing, and customer-facing apps are one of the biggest areas where rule-based processing comes into play.
Categories: In-memory DBMS, Memory-centric data management, OLTP, VoltDB and H-Store | 3 Comments |
Other notes on Oracle data warehousing
Obviously, the big news this week is Exadata, and its parallelization or lack thereof. But let’s not forget the rest of Oracle’s data warehousing technology.
- Frankly, I’ve come to think that disk-based OLAP cubes and materialized views are both cop-outs, indicative of a relational data warehouse architecture that can’t answer queries quickly enough straight-up. But if you disagree, then you might like Oracle’s new OLAP cube materialized views, which sound like a worthy competitor to Microsoft Analysis Services. (Further confusing things, I’ve seen reports that Oracle is increasing its commitment to Essbase, a separate MOLAP engine. I hope those are incorrect.)
- A few weeks ago, I came to realize that Oracle’s data mining database features actually mattered — perhaps not quite as much as Charlie Berger might think, but to say that is to praise with faint damns. 😉 SPSS seems to be getting large performance gains from leveraging the scoring part, and perhaps the transformation part as well. I haven’t focused on getting my details right yet, so I haven’t been writing about it. But heck, with all the other Oracle data warehousing discussion, it seems right to at least mention this part too.
So what’s Oracle’s MPP-aware optimizer and query execution plan story?
Edit: Answers to the title question have now shown up, and so the post below is now superseded by this one.
In most respects — including most data warehousing respects — Oracle’s query optimizer is the most sophisticated on the planet (even ahead of IBM’s, I’d say). But in all the Exadata discussion — and also in a good, comprehensive review of Oracle’s data warehouse technology — I haven’t seen any claims that Oracle has tackled the hard problems of parallel analytics.
Yes, Oracle is now getting data off of multiple disks onto multiple processors at once, without SAN bottlenecks, and doing some local filtering. That’s the heart of the Exadata storage story, and it’s indeed a huge advance over Oracle’s prior technology. But what happens to the data after that? It’s sent over to a RAC cluster. And unless I’m terribly mistaken, any further processing will be done on just a single node in that cluster.
Categories: Data warehousing, Oracle, Parallelization | 9 Comments |
Oracle Exadata and Oracle data warehouse appliance sound bites
In addition to my previously posted thoughts on the Oracle Exadata/data warehouse appliance announcement, let me offer some more concise observations.
- Microsoft had leapfrogged Oracle with its DATAllegro acquisition. Now Oracle’s back in the game.
- But Oracle Exadata Release 1 is hardly going to put Teradata, Netezza, or Greenplum out of business.
- After long denying it, Oracle has finally admitted that putting more than 10 TB on Oracle had been an extremely painful thing to do.
- Oracle’s idea of splitting database processing between a couple of types of server is a smart one, and is consistent with what multiple other vendors are doing.
- Medium-long term, the Exadata technical strategy could work very well. Exadata storage management addresses some of the problems with shared-everything; Oracle RAC addresses other; and it may not take many releases before Oracle gets query parallelization right as well. Edit: This point is superseded by my updated take on Oracle query parallelization.
- Now Oracle and Microsoft are both supporting Infiniband for high end data warehousing.
- Oracle’s Exadata-based appliance doesn’t have the out-of-the-box simplicity that other appliances and analytic DBMS do.
- Licensing details aren’t yet clear, but Oracle Exadata’s list price probably won’t be terribly appealing either. Of course, nobody in their right mind pays Oracle list prices anyway.
- New web-based businesses have no reason to buy the Oracle data warehouse appliance. Exadata makes sense only for established enterprises.
Contradicting all that potential goodness, Oracle has been making ringing anti-shared-nothing statements, such as the silly:
There are “speed-of-light issues” associated with … scale-out-style grids
That mindset doesn’t auger well for Oracle to ever be a fully competitive high-end data warehouse DBMS vendor.
Categories: Data warehouse appliances, Data warehousing, Exadata, Oracle | 5 Comments |
Some of Oracle’s largest data warehouses
Googling around, I came across an Oracle presentation – given some time this year – that lists some of Oracle’s largest data warehouses. 10 databases total are listed with >16 TB, which is fairly consistent with Larry Ellison’s confession during the Exadata announcement that Oracle has trouble over 10 TB (which is something I’ve gotten a lot of flack from a few Oracle partisans for pointing out … 😀 ).
However, what’s being measured is probably not the same in all cases. For example, I think the Amazon 70 TB figure is obviously for spinning disk (elsewhere in the presentation it’s stated that Amazon has 71 TB of disk). But the 16 TB British Telecom figure probably is user data — indeed, it’s the same figure Computergram cited for BT user data way back in 2001.
The list is: Read more
Categories: Data warehousing, Oracle, Specific users, Telecommunications, Yahoo | 6 Comments |
Exadata: Oracle finally answers the data warehouse challengers
Oracle, in partnership with HP, has announced a new data warehouse appliance product line, cleverly branded “Exadata.” The basic idea seems to be that database processing is split among two sets of servers:
- (The new stuff) A set of back-end servers — the Oracle Exadata Storage Servers — that gets data off of disk and does some preliminary query processing.
- (The old stuff) A conventional Oracle RAC cluster on the front-end.
Numbers are being thrown around suggesting that, unlike prior Oracle offerings, the Oracle Exadata-based appliance at least has scalability and price/performance worth comparing to Teradata — hey, Exa is bigger than Tera! — Netezza, et al.
Kevin Closson, who evidently worked on the project, offers the most useful and detailed description of Oracle Exadata I’ve seen so far. In particular, he and Oracle seem to claim: Read more
Categories: Data warehousing, Exadata, Oracle, Parallelization | 18 Comments |
Vertica finally spells out its compression claims
Omer Trajman of Vertica put up a must-read blog post spelling out detailed compression numbers, based on actual field experience (which I’d guess is from a combination of production systems and POCs):
- CDR – 8:1 (87%)
- Consumer Data – 30:1 (96%)
- Marketing Analytics – 20:1 (95%)
- Network logging – 60:1 (98%)
- Switch Level SNMP – 20:1 (95%)
- Trade and Quote Exchange – 5:1 (80%)
- Trade Execution Auditing Trails – 10:1 (90%)
- Weblog and Click-stream – 10:1 (90%)
It’s clear what Omer means by most of those categories from reading the post, but I’m a little fuzzy on what “Consumer Data” or “Marketing Analytics” comprise in his taxonomy. Anyhow, Omer’s post is a huge improvement over my recent one — based on a conversation with Omer 🙂 — which featured some far less accurate or complete compression numbers.
Omer goes on to claim that trickle-feed data is harder for rival systems to compress than it is for Vertica, and generally to claim that Vertica’s compression is typically severalfold better than that of competitive row-based systems.
Categories: Database compression, Vertica Systems, Web analytics | 5 Comments |
Oracle is integrating clickstream and network analytics too
Oracle announced today the not-so-concisely-named Oracle Real User Experience Insight, which actually seems to be an official nickname for what is more properly called “Oracle Enterprise Manager Real User Experience Insight.” Trying saying that 10 times straight at network speeds … but I digress.
If I’m reading things correctly, add Oracle to the already long list of vendors who see clickstream and network event analytics as being two sides of the same coin.
Categories: Analytic technologies, Oracle, Web analytics | 2 Comments |