Oracle
Analysis of software titan Oracle and its efforts in database management, analytics, and middleware. Related subjects include:
- Oracle TimesTen
- (in The Monash Report)Operational and strategic issues for Oracle
- (in Software Memories) Historical notes on Oracle
- Most of what’s written about in this blog
Two cornerstones of Oracle’s database hardware strategy
After several months of careful optimization, Oracle managed to pick the most inconvenient* day possible for me to get an Exadata update from Juan Loaiza. But the call itself was long and fascinating, with the two main takeaways being:
- Oracle thinks flash memory is the most important hardware technology of the decade, one that could lead to Oracle being “bumped off” if they don’t get it right.
- Juan believes the “bulk” of Oracle’s business will move over to Exadata-like technology over the next 5-10 years. Numbers-wise, this seems to be based more on Exadata being a platform for consolidating an enterprise’s many Oracle databases than it is on Exadata running a few Especially Big Honking Database management tasks.
And by the way, Oracle doesn’t make its storage-tier software available to run on anything than Oracle-designed boxes. At the moment, that means Exadata Versions 1 and 2. Since Exadata is by far Oracle’s best DBMS offering (at least in theory), that means Oracle’s best database offering only runs on specific Oracle-sold hardware platforms. Read more
This and that
I have various subjects backed up that I don’t really want to write about at traditional blog-post length. Here are a few of them. Read more
| Categories: Analytic technologies, Columnar database management, Complex event processing (CEP), Mark Logic, Native XML, Open source, Oracle, Theory and architecture, Vertica Systems | 2 Comments |
Oracle lifts the cloud hanging over MySQL storage engine vendors
Oracle has put out a press release promising to play nicely with MySQL if its Sun takeover is approved. The parts in italics below are quotes. My comments are in plain text.
1. Continued Availability of Storage Engine APIs. Oracle shall maintain and periodically enhance MySQL’s Pluggable Storage Engine Architecture to allow users the flexibility to choose from a portfolio of native and third party supplied storage engines.
MySQL’s Pluggable Storage Engine Architecture shall mean MySQL’s current practice of using, publicly-available, documented application programming interfaces to allow storage engine vendors to “plug” into the MySQL database server. Documentation shall be consistent with the documentation currently provided by Sun.
Well, duh.
2. Non-assertion. As copyright holder, Oracle will change Sun’s current policy and shall not assert or threaten to assert against anyone that a third party vendor’s implementations of storage engines must be released under the GPL because they have implemented the application programming interfaces available as part of MySQL’s Pluggable Storage Engine Architecture.
A commercial license will not be required by Oracle from third party storage engine vendors in order to implement the application programming interfaces available as part of MySQL’s Pluggable Storage Engine Architecture.
Oracle shall reproduce this commitment in contractual commitments to storage vendors who at present have a commercial license with Sun.
This is the biggie, lifting a major cloud from the MySQL storage engine business. It sounds like the third of four options I suggested as to how Oracle could legitimately earn antitrust approval of its MySQL takeover. Sure, Infobright, Kickfire, et al. already had what they saw as adequate safeguards or contingency plans vs. Oracle skullduggery. It’s still big even so.
(Quoted out of order.) The geographic scope of these commitments shall be worldwide and these commitments shall continue until the fifth anniversary of the closing of the transaction.
Not a disaster, but with respect to at least point #2 there should be no time limit whatsoever. I’d like to see the EC require that change as a further Oracle concession.
| Categories: MySQL, Open source, Oracle, Pricing | 15 Comments |
Notes on RainStor, the company formerly known as Clearpace
Information preservation* DBMS vendor Clearpace officially changed its name to RainStor this week. RainStor is also relocating its CEO John Bantleman and more generally its headquarters to San Francisco. This all led to a visit with John and his colleague Ramon Chen, highlights of which included: Read more
| Categories: Archiving and information preservation, Clearpace, Market share, Oracle, SenSage, Telecommunications | Leave a Comment |
Reports of perfectly-balanced hardware configurations are greatly exaggerated
Data warehouse appliance and software appliance vendors like to claim that they’ve worked out just the right hardware configuration(s), and that a single configuration is correct for a fairly broad range of workloads. But there are a lot of reasons to be dubious about that. Specific vendor evidence includes:
- Teradata ascribes considerable importance to a Virtual Storage technology whose main purpose is to allow mixing of heterogeneous storage devices in a single system. And the discussion rarely suggests that these parts will be in a rigid fixed relationship.
- Netezza — as Teradata keeps reminding me — often sells boxes with the expectation that they won’t be filled with data, so as to increase spindle count and hence performance.
- Oracle/Sun have dropped some comments about Exadata being more flexibly configured going forward.
- Kickfire’s new “high-end” appliance lets you attach fairly arbitrary amounts of external storage.
- And of course, software-only analytic DBMS vendors run their software in all sorts of hardware and storage environments.
What’s more, the claim never made a lot of sense anyway. With the rarest of exceptions, even a single data warehouse’s workload will contain different queries that strain different parts of the system in different ratios. Calculating the “ideal” hardware configuration for that single workload would be forbiddingly difficult. And even if one could calculate it, it almost surely would be different than another user’s “ideal” configuration. How a single hardware configuration can be “ideally balanced” for a broad class of use cases boggles the imagination.
| Categories: Data warehouse appliances, Data warehousing, Exadata, Kickfire, Netezza, Oracle, Teradata | 5 Comments |
Oracle Exadata customers presenting at Oracle Open World
Greg Rahn tweeted a list of Exadata-focused sessions at Oracle Open World next week. As Oracle employees and supporters have been foreshadowing, there will be Exadata users and user-like folks presenting. I identified what look like half a dozen (not counting any who, for example, will make surprise appearances at keynote addresses), specifically: Read more
| Categories: Data warehousing, Exadata, Market share, Oracle, Teradata | 5 Comments |
Oracle and Vertica on compression and other physical data layout features
In my recent post on Exadata pricing, I highlighted the importance of Oracle’s compression figures to the discussion, and the uncertainty about same. This led to a Twitter discussion featuring Greg Rahn* of Oracle and Dave Menninger and Omer Trajman of Vertica. I also followed up with Omer on the phone. Read more
| Categories: Columnar database management, Data models and architecture, Data warehousing, Database compression, Oracle, Theory and architecture, Vertica Systems | 13 Comments |
Oracle’s version of “actually, we’ve been doing MapReduce all along too”
In a recent blog post, Jean-Pierre Dijcks of Oracle makes the argument that Oracle has supported MapReduce all along, essentially because:
- You can do lots of procedural logic in the Oracle database, in a broad choice of languages, so in particular you can do Map steps.
- You can do lots of procedural logic in the Oracle database, in a broad choice of languages, so in particular you can do Reduce steps.
- Oracle offers a mechanism for parallelizing procedural logic.
Oracle doesn’t appear to have an explicit Map/Reduce programming interface, but I wouldn’t be surprised if Oracle Consulting cranked one out at some point to meet customer demand.
The post goes on to claim the usual in-database MapReduce benefit of avoiding the overhead of intermediate query result materialization. Presumably, then, Oracle’s quasi-MapReduce would also lack query fault-tolerance.
| Categories: Analytic technologies, MapReduce, Oracle, Parallelization | Leave a Comment |
Oracle Exadata 2 capacity pricing
Summary of Oracle Exadata 2 capacity pricing
Analyzing Oracle Exadata pricing is always harder than one would first think. But I’ve finally gotten around to doing an Oracle Exadata 2 pricing spreadsheet. The main takeaways are:
- If we believe Oracle’s claims of 10X compression, Exadata 2 costs more per terabyte of user data than Netezza TwinFin — $22-26K/TB vs. TwinFin’s <$20K — but less than the Teradata 2550.
- These figures are highly sensitive to assumptions about Oracle’s hybrid columnar compression.
- Similarly, if Netezza or Teradata were to significantly upgrade their own compression, the price comparison would look quite different.
- Options such as Data Mining or Oracle Spatial add 12% or so each to Exadata’s total system price.
Longer version
When Oracle introduced Exadata last year it was, well, expensive. Exadata 2 has now been announced, and it is significantly cheaper than Exadata 1 per terabyte of user data, based on:
- Similar overall pricing
- Twice the disk capacity
- Better compression
| Categories: Analytic technologies, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, Exadata, Netezza, Oracle, Pricing, Teradata | 13 Comments |
Yahoo wants to do decapetabyte-scale data warehousing in Hadoop
My old client Mark Tsimelzon moved over to Yahoo after Coral8 was acquired, and I caught up with him last month. He turns out to be running development for a significant portion of Yahoo’s Hadoop effort — everything other than HDFS (Hadoop Distributed File System). Yahoo evidently plans to, within a year or so, get Hadoop to the point that it is managing 10s of petabytes of data for Yahoo, with reasonable data warehousing functionality.
Highlights of our visit included:
- There are dozens of people at Yahoo doing Hadoop development that will wind up getting open sourced. (Full-time or close to it.) In particular, everything Mark’s team does goes to open source.
- Yahoo is moving as much of its analytics to Hadoop as possible. Much of this is being moved away from Oracle and from Yahoo’s own Everest.
- A column store is being put on top of HDFS, based on Yahoo technology. Columns will be striped across nodes. Perhaps that’s why the effort is called Project Zebra.
- Mark believes that in a year Hadoop will be much further along in meeting traditional data warehousing requirements, in areas such as:
- Metadata
- SLAs/high availability/other workload management
- Data retention policies
- Security/privacy*
- Yahoo views the time-to-market benefits of Hadoop as being more important than TCO.
