Oracle, in partnership with HP, has announced a new data warehouse appliance product line, cleverly branded “Exadata.” The basic idea seems to be that database processing is split among two sets of servers:
- (The new stuff) A set of back-end servers — the Oracle Exadata Storage Servers — that gets data off of disk and does some preliminary query processing.
- (The old stuff) A conventional Oracle RAC cluster on the front-end.
Numbers are being thrown around suggesting that, unlike prior Oracle offerings, the Oracle Exadata-based appliance at least has scalability and price/performance worth comparing to Teradata — hey, Exa is bigger than Tera! — Netezza, et al.
Kevin Closson, who evidently worked on the project, offers the most useful and detailed description of Oracle Exadata I’ve seen so far. In particular, he and Oracle seem to claim:
- I/O will no longer be a bottleneck, due to direct-attached storage (DAS), Infiniband, and so on. (That sounds plausible.)
- Oracle Exadata files will be optimized simultaneously for sequential table scans and conventional block-based random I/O. (Huh?)
If for the sake of argument we grant the claims so far, it’s still not clear to me whether Oracle’s approach is fully competitive with Teradata, Netezza, et al. Whatever query processing isn’t already done at the Oracle Exadata Storage end has to be done in Oracle RAC. But what exactly does RAC bring to query parallelization? Well, it should help with concurrency. Whatever performance Oracle can get with a small number of users shouldn’t degrade too badly as the user load grows. Oracle’s Exadata-based appliance will probably prove to have much better concurrency than startup vendors’ Release 1s typically have.
That’s the good-news side of my guessing. The other traditional Release 1 bottleneck is that too much data is shipped to the “fat head,” and query processing isn’t really parallelized in more than a simple-minded way. So far, I’ve seen nothing to suggest that Oracle isn’t as subject to that problem as any other vendor.
As for Oracle’s sophisticated query accelerations such as sophisticated materialized views and so on — I think users increasingly want all queries to run quickly, rather than just the ones that were previously planned for. So I’m not sure how much of an advantage those will prove to be.
And of course Oracle’s management tools are robust and its prices high. Those are both givens.
- Dividing the data warehousing work among MPP nodes
- SANs vs. DAS in MPP data warehousing
- Three ways Oracle or Microsoft could go MPP