Oracle
Analysis of software titan Oracle and its efforts in database management, analytics, and middleware. Related subjects include:
- Oracle TimesTen
- (in The Monash Report)Operational and strategic issues for Oracle
- (in Software Memories) Historical notes on Oracle
- Most of what’s written about in this blog
Microsoft SQL Server 2012 and enterprise database choices in general
Microsoft is launching SQL Server 2012 on March 7. An IM chat with a reporter resulted, and went something like this.
Reporter: [Care to comment]?
CAM: SQL Server is an adequate product if you don’t mind being locked into the Microsoft stack. For example, the ColumnStore feature is very partial, given that it can’t be updated; but Oracle doesn’t have columnar storage at all.
Reporter: Is the lock-in overall worse than IBM DB2, Oracle?
CAM: Microsoft locks you into an operating system, so yes.
Reporter: Is this release something larger Oracle or IBM shops could consider as a lower-cost alternative a co-habitation scenario, in the event they’re mulling whether to buy more Oracle or IBM licenses?
CAM: If they have a strong Microsoft-stack investment already, sure. Otherwise, why?
Reporter: [How about] just cost?
CAM: DB2 works just as well to keep Oracle honest as SQL Server does, and without a major operating system commitment. For analytic databases you want an analytic DBMS or appliance anyway.
Best is to have one major vendor of OTLP/general-purpose DBMS, a web DBMS, a DBMS for disposable projects (that may be the same as one of the first two), plus however many different analytic data stores you need to get the job done.
By “web DBMS” I mean MySQL, NewSQL, or NoSQL. Actually, you might need more than one product in that area.
| Categories: Data warehousing, IBM and DB2, Microsoft and SQL*Server, Mid-range, MySQL, NoSQL, Oracle | 6 Comments |
Notes on the Oracle Big Data Appliance
Oracle announced its Big Data Appliance. Specs may be found in the Oracle Big Data Appliance press release. Beyond that:
- The most important software on the Oracle Big Data Appliance is a full set of Cloudera Enterprise code. Oracle will do Tier 1 Cloudera/Hadoop support, while Cloudera handles Tiers 2 and 3.
- The key spec ratios are 1 core/4 GB RAM/3 TB raw disk. That’s reasonably in line with Cloudera figures I published in June, 2010.
- This is really Oracle’s multi-structured big data appliance. Oracle’s relational big data appliance is Exadata, which has been out for years and has comparable capacity to Oracle’s new “Big Data Appliance.” (Chris Preimesberger made a similar point.)
- The Oracle Big Data Appliance list price is $450,000 for 18 12-core servers, plus $54,000/year maintenance.
- That’s around $25,000 per server (and associated storage).
- That’s also around $2,000/core.
- That’s also around $500/TB of spinning disk, before compression.
- None of those per-unit figures sounds ridiculous …
- … but because of Oracle’s appliance configuration there’s indeed a hefty minimum initial purchase.
A couple of links explaining Cloudera Manager
Predictably, I wasn’t pre-briefed on the details of Oracle’s Big Data Appliance announcement today, and an inquiry to partner Cloudera doesn’t happen to have been immediately answered.* But anyhow, it’s clear from coverage by Larry Dignan and Derrick Harris that Oracle’s Big Data Appliance includes:
- Some version of Cloudera Manager (I’m guessing more or less the best one).*
- Some version of Apache Hadoop (I’m guessing the same distribution that Cloudera prefers to use).*
- Some kind of support.
In other words, it’s a lot like getting Cloudera Enterprise,* plus some hardware, plus some other stuff.
*Edit: About 2 minutes after I posted this, I got email from Cloudera CEO Mike Olson. Yes, the Oracle Big Data Appliance bundles Cloudera Enterprise.
That raises an anyway recurring question: What exactly is Cloudera Manager? Read more
| Categories: Cloudera, Data warehouse appliances, Hadoop, MapReduce, Oracle | Leave a Comment |
Big data terminology and positioning
Recently, I observed that Big Data terminology is seriously broken. It is reasonable to reduce the subject to two quasi-dimensions:
- Bigness — Volume, Velocity, size
- Structure — Variety, Variability, Complexity
given that
- High-velocity “big data” problems are usually high-volume as well.*
- Variety, variability, and complexity all relate to the simply-structured/poly-structured distinction.
But the conflation should stop there.
*Low-volume/high-velocity problems are commonly referred to as “event processing” and/or “streaming”.
When people claim that bigness and structure are the same issue, they oversimplify into mush. So I think we need four pieces of terminology, reflective of a 2×2 matrix of possibilities. For want of better alternatives, my suggestions are:
- Relational big data is data of high volume that fits well into a relational DBMS.
- Multi-structured big data is data of high volume that doesn’t fit well into a relational DBMS. Alternative: Poly-structured big data.
- Conventional relational data is data of not-so-high volume that fits well into a relational DBMS. Alternatives: Ordinary/normal/smaller relational data.
- Smaller poly-structured data is data for which dynamic schema capabilities are important, but which doesn’t rise to “big data” volume.
Some big-vendor execution questions, and why they matter
When I drafted a list of key analytics-sector issues in honor of look-ahead season, the first item was “execution of various big vendors’ ambitious initiatives”. By “execute” I mean mainly:
- “Deliver products that really meet customers’ desires and needs.”
- “Successfully convince them that you’re doing so …”
- “… at an attractive overall cost.”
Vendors mentioned here are Oracle, SAP, HP, and IBM. Anybody smaller got left out due to the length of this post. Among the bigger omissions were:
- salesforce.com (multiple subjects).
- SAS HPA.
- The evolution of Hadoop.
NoSQL notes
Last week I visited with James Phillips of Couchbase, Max Schireson and Eliot Horowitz of 10gen, and Todd Lipcon, Eric Sammer, and Omer Trajman of Cloudera. I guess it’s time for a round-up NoSQL post.
Views of the NoSQL market horse race are reasonably consistent, with perhaps some elements of “Where you stand depends upon where you sit.”
- As James tells it, NoSQL is simply a three-horse race between Couchbase, MongoDB, and Cassandra.
- Max would include HBase on the list.
- Further, Max pointed out that metrics such as job listings suggest MongoDB has the most development activity, and Couchbase/Membase/CouchDB perhaps have less.
- The Cloudera guys remarked on some serious HBase adopters.*
- Everybody I spoke with agreed that Riak had little current market presence, although some Basho guys could surely be found who’d disagree.
| Categories: Basho and Riak, Cassandra, Cloudera, Clustering, Couchbase, HBase, Market share and customer counts, MongoDB and 10gen, NoSQL, Open source, Oracle, Parallelization | 12 Comments |
Transparent relational OLTP scale-out
There’s a perception that, if you want (relatively) worry-free database scale-out, you need a non-relational/NoSQL strategy. That perception is false. In the analytic case it’s completely ridiculous, as has been demonstrated by Teradata, Vertica, Netezza, and various other MPP (Massively Parallel Processing) analytic DBMS vendors. And now it’s false for short-request/OLTP (OnLine Transaction Processing) use cases as well.
My favorite relational OLTP scale-out choice these days is the SchoonerSQL/dbShards partnership. Schooner Information Technology (SchoonerSQL) and Code Futures (dbShards) are young, small companies, but I’m not too concerned about that, because the APIs they want you to write to are just MySQL’s. The main scenarios in which I can see them failing are ones in which they are competitively leapfrogged, either by other small competitors – e.g. ScaleBase, Akiban, TokuDB, or ScaleDB — or by Oracle/MySQL itself. While that could suck for my clients Schooner and Code Futures, it would still provide users relying on MySQL scale-out with one or more good product alternatives.
Relying on non-MySQL NewSQL startups, by way of contrast, would leave me somewhat more concerned. (However, if their code is open sourced. you have at least some vendor-failure protection.) And big-vendor scale-out offerings, such as Oracle RAC or DB2 pureScale, may be more complex to deploy and administer than the MySQL and NewSQL alternatives.
| Categories: Clustering, IBM and DB2, MySQL, NoSQL, OLTP, Open source, Oracle, Parallelization, Schooner Information Technology, dbShards and CodeFutures | 2 Comments |
Schooner pivots further
Schooner Information Technology started out as a complete-system MySQL appliance vendor. Then Schooner went software-only, but continued to brag about great performance in configurations with solid-state drives. Now Schooner has pivoted further, and is emphasizing high availability, clustered performance, and other hardware-agnostic OLTP (OnLine Transaction Processing) features. Fortunately, Schooner has some interesting stuff in those areas to talk about.
The short form of the SchoonerSQL (as Schooner’s product is now called) story goes roughly like this:
- SchoonerSQL replicates data — synchronously if the replication target is local, asynchronously if it is remote.
- Local synchronous replication provides high availability; remote asynchronous replication provides disaster recovery.
- SchoonerSQL’s local synchronous replication also provides read scale-out.
- Schooner has a partnership with Code Futures/dbShards to provide write scale-out via transparent sharding.
- SchoonerSQL has some secret sauce in replication performance. This has the effect of significantly increasing write performance (assuming you were going to replicate anyway), because otherwise you might have to slow down the master server’s write performance so that the slaves can keep up with it.
- Schooner believes it still has some single-server performance advantages as well.
| Categories: Clustering, MySQL, OLTP, Oracle, Parallelization, Schooner Information Technology, dbShards and CodeFutures | 3 Comments |
More notes on Oracle NoSQL
A reporter asked me for some thoughts on Oracle’s new NoSQL product. For the most part, I stand by my previous comments on Oracle NoSQL. Still, NoSQL in general deserves a place in Oracle shops, so it makes sense for Oracle to try to coopt it.
Oracle’s core DBMS is not well suited to track interactions (e.g. web clicks), even in cases where it’s the choice for transactions; it’s unnecessarily heavyweight. What’s worse, using the same database to store actions and interactions can lead to serious reliability problems. If a better architecture is to dump the clicks into some NoSQL store, massage the information, and eventually put some derived data into a relational DBMS, then Oracle will naturally try to own each step of the data pipeline.
Dynamic schemas are another area of Oracle weakness, leading in some cases to outright Oracle replacements. However, pure key-value stores go too far to the opposite extreme; you should at least be able to index and retrieve data one field at a time. Based on what I’ve seen of Oracle’s marketing literature, that feature will be missing from the first release of Oracle’s NoSQL.* Until it’s in there, and until it works well, I don’t see why anybody should use Oracle’s NoSQL product.
*Frankly, that choice makes no sense to me on any level. Yet it’s the way Oracle seems to have elected to go — or, if it isn’t, then there’s somebody writing Oracle marketing collateral who’s clearly in the wrong line of work.
| Categories: NoSQL, Oracle, Web analytics | 3 Comments |
Oracle is buying Endeca
Oracle is buying Endeca. The official talking points for the deal aren’t a perfect match for Endeca’s actual technology, but so be it.
In that post, I wrote:
… the Endeca paradigm is really to help you make your way through a structured database, where different portions of the database have different structures. Thus, at various points in your journey, it automagically provides you a list of choices as to where you could go next.
That kind of thing could help Oracle with apps like the wireless telco product catalog deal MongoDB got.
Going back to the Endeca-post quote well, Endeca itself said:
Inside the MDEX Engine there is no overarching schema; each data record carries its own metadata. This enables the rapid combination of a wide range of structured and unstructured content into Latitude’s unified data model. Once inside, the MDEX Engine derives common dimensions and metrics from the available metadata, instantly exposing each for high-performance refinement and analysis in the Discovery Framework. Have a new data source? Simply add it and the MDEX Engine will create new relationships where possible. Changes in source data schema? No problem, adjustments on the fly are easy.
And I pointed out that the MDEX engine was a columnar DBMS.
Meanwhile, Oracle’s own columnar DBMS efforts have been disappointing. Endeca could be an intended answer to that. However, while Oracle’s track record with standalone DBMS acquisitions is admirable (DEC RDB, MySQL, etc.), Oracle’s track record of integrating DBMS acquisitions into the Oracle product itself is not so good. (Express? Essbase? The text product line? None of that has gone particularly well.)
So while I would expect Endeca’s flagship e-commerce shopping engine products to flourish under Oracle’s ownership, I would be cautious about the integration of Endeca’s core technology into the Oracle product line.
| Categories: Columnar database management, Endeca, Oracle | 2 Comments |
