After several months of careful optimization, Oracle managed to pick the most inconvenient* day possible for me to get an Exadata update from Juan Loaiza. But the call itself was long and fascinating, with the two main takeaways being:
- Oracle thinks flash memory is the most important hardware technology of the decade, one that could lead to Oracle being “bumped off” if they don’t get it right.
- Juan believes the “bulk” of Oracle’s business will move over to Exadata-like technology over the next 5-10 years. Numbers-wise, this seems to be based more on Exadata being a platform for consolidating an enterprise’s many Oracle databases than it is on Exadata running a few Especially Big Honking Database management tasks.
And by the way, Oracle doesn’t make its storage-tier software available to run on anything than Oracle-designed boxes. At the moment, that means Exadata Versions 1 and 2. Since Exadata is by far Oracle’s best DBMS offering (at least in theory), that means Oracle’s best database offering only runs on specific Oracle-sold hardware platforms.
*E.g., I was sitting upstairs in my parents’ apartment in Columbus, OH having the call while their doctor, who I’ve never met, was visiting downstairs. He offered to make a special trip back Saturday afternoon because he missed me Wednesday, but he’s notorious for not coming when he says he will. Update: He didn’t come Saturday. On Saturday he said he’d come Sunday. He didn’t do that either.
Other high- and lowlights of our conversation included:
- Flash is the main new hardware element in Exadata Version 2. Otherwise, Exadata 2 is just an annual refresh of Exadata Version 1 to include updated components (Nehalem chips, bigger disk drives, etc.)
- Juan thinks it’s suboptimal to use flash memory through the bottleneck of disk controllers, favoring PCIe cards instead. (I emphatically agree.)
- Juan resolutely ducked questions about actual Exadata production deployment. Literally the only fact he shared in that regard is that there are at least 2 Exadata production systems running that each have 2 or more racks cabled together.
- Juan stressed that Exadata runs apps written over Oracle DBMS unchanged.
- When making mixed-workload claims for Exadata 2, Juan stressed consolidation of multiple databases, some OLTP and some analytic. He didn’t really argue with my skepticism about integrating OLTP and analytics in the same database, with one exception:
- Juan pointed out that in major OLTP apps such as ERP systems, there often is actually more processing going on in reporting and other batch stuff than there is in true OLTP.
- Exadata 2’s flash memory is designed as a disk cache, smarter than LRU (Least Recently Used). The two examples Juan gave of “smarter than LRU” are that backups and table scans don’t flush the cache.
- I forget whether this is new in Exadata 2 (I think it is), but anyhow – Exadata has a “Storage Index” that’s a lot like a Netezza zone map. I.e., for each megabyte or so of data it stores the min and max value of every column; if a query predicate rules out those ranges, that megabyte is never retrieved.
- Oracle has long offered what sounds like flexible workload management capability, and this has now been extended to specifically include I/O resources on the storage tier.
- This isn’t Exadata-specific, but Oracle has built a file system on top of its DBMS, optimized for speed, which helps with, e.g., ELT (Extract/Load/Transform). Evidently, it’s not at all the same thing as Mark Benioff’s 1990s Microsoft-annoying IFS (Internet File System) project, which seems to have morphed into a content management SDK.
Highlights specifically in the area of parallelization included:
- Juan stressed that all databases consolidated onto an Exadata machine are/should be striped across all storage units.
- On the other hand, Juan said that different databases should be confined to specific cores or CPUs on the database tier.
- But on the third hand, Juan also stressed – in what could be called a “private cloud” pitch – that there’s great elasticity as to which databases are matched to which server CPUs.
- Contrary to what I thought he and/or his colleagues told me a year ago, Juan said RAC (Real Application Clusters) is a big part of Oracle’s data warehouse processing.
- However, Juan says that what I regard(ed) as a major objection to Oracle’s database-tier parallelization — the need to manually specify “degrees of parallelism” — has now been obviated by automation. Juan thinks that few data warehouse DBAs will now need to manually tune parallelism, with minor exceptions. One exception he cites is that if a nightly report really is non-urgent, it can just be forced to run on a single core with no chance to grab more resources. (However, Juan thinks manual tuning of parallelism will continue to play a greater role in OLTP.)
OK. That’s all I can get done tonight (see above re: inconvenience of timing). Follow-on subjects I’d like to and indeed plan to post about include:
- What Juan said about hybrid columnar compression
- Oracle’s delightfully non-confidential slide deck, and a few comments about same