Analysis of Oracle Exadata and the Oracle Database Machine. Related subjects include:
What are the central challenges in internet system design? We probably all have similar lists, comprising issues such as scale, scale-out, throughput, availability, security, programming ease, UI, or general cost-effectiveness. Screw those up, and you don’t have an internet business.
Much new technology addresses those challenges, with considerable success. But the success is usually one silo at a time — a short-request application here, an analytic database there. When it comes to integration, unsolved problems abound.
The top integration and integration-like challenges for me, from a practical standpoint, are:
- Integrating silos — a decades-old problem still with us in a big way.
- Dynamic schemas with joins.
- Low-latency business intelligence.
- Human real-time personalization.
Other concerns that get mentioned include:
- Geographical distribution due to privacy laws, which for some users is a major requirement for compliance.
- Logical data warehouse, a term that doesn’t actually mean anything real.
- In-memory data grids, which some day may no longer always be hand-coupled to the application and data stacks they accelerate.
Let’s skip those latter issues for now, focusing instead on the first four.
This is a draft entry for the DBMS2 analytic glossary. Please comment with any ideas you have for its improvement!
Note: Words and phrases in italics will be linked to other entries when the glossary is complete.
A data warehouse appliance is a combination of hardware and software that includes an analytic DBMS (DataBase Management System). However, some observers incorrectly apply the term “data warehouse appliance” to any analytic DBMS.
The paradigmatic vendors of data warehouse appliances are:
- Teradata, which embraced the term “data warehouse appliance” in 2008.
- Netezza — now an IBM company — which popularized the term “data warehouse appliance” in the 2000s.
Further, vendors of analytic DBMS commonly offer — directly or through partnerships — optional data warehouse appliance configurations; examples include:
- Greenplum, now part of EMC.
- Vertica, now an HP company.
- IBM DB2, under the brand “Smart Analytic System”.
- Microsoft (Parallel Data Warehouse).
Oracle Exadata is sometimes regarded as a data warehouse appliance as well, despite not being solely focused on analytic use cases.
Data warehouse appliances inherit marketing claims from the category of analytic DBMS, such as: Read more
|Categories: Analytic glossary, Data warehouse appliances, Data warehousing, EMC, Exadata, Greenplum, HP and Neoview, IBM and DB2, Microsoft and SQL*Server, Netezza, Oracle, Teradata||4 Comments|
In a call Monday with a prominent company, I was told:
- Teradata, Netezza, Greenplum and Vertica aren’t relational.
- Teradata, Netezza, Greenplum and Vertica are all data warehouse appliances.
That, to put it mildly, is not accurate. So I shall try, yet again, to set the record straight.
In an industry where people often call a DBMS just a “database” — so that a database is something that manages a database! — one may wonder why I bother. Anyhow …
1. The products commonly known as Oracle, Exadata, DB2, Sybase, SQL Server, Teradata, Sybase IQ, Netezza, Vertica, Greenplum, Aster, Infobright, SAND, ParAccel, Exasol, Kognitio et al. all either are or incorporate relational database management systems, aka RDBMS or relational DBMS.
2. In principle, there can be difficulties in judging whether or not a DBMS is “relational”. In practice, those difficulties don’t arise — yet. Every significant DBMS still falls into one of two categories:
- Was designed to do relational stuff* from the get-go, even if it now does other things too.
- Supports a lot of SQL.
- Was designed primarily to do non-relational things.*
- Doesn’t support all that much SQL.
*I expect the distinction to get more confusing soon, at which point I’ll adopt terms more precise than “relational things” and “relational stuff”.
3. There are two chief kinds of relational DBMS: Read more
A reporter asked me to speculate about the next releases of Oracle and Exadata. He and I agreed:
- It seems likely that they’ll be discussed at Oracle OpenWorld in a couple of months.
- Exadata in particular is due for a hardware refresh.
- Oracle12c is a good guess at a name, where “C” is for “Cloud”.
My answers mixed together thoughts on what Oracle should and will emphasize (which aren’t the same thing but hopefully bear some relationship to each other ;)). They were (lightly edited):
- The worst thing about Oracle is the ongoing DBA work for what should be automatic.
- Oracle RAC still makes scale-out too difficult. Presumably, Oracle is looking to build aggressively on recent steps in automating parallelism.
- For Exadata, assume that Oracle is always looking to improve how data gets allocated among disk, flash, and RAM. Look also for Exadata versions with different silicon-disk ratios than are available now.
- Tighter integration among the various appliances is surely a goal, …
- … but I don’t know whether Oracle will pick them apart and let you put various kinds of hardware in the same racks or not. I’d guess against that, because the current set-up gives them a pretext to sell you more capacity than you need.
- I wonder whether Oracle will finally introduce a true columnar storage option, a year behind Teradata. That would be the obvious enhancement on the data warehousing side, if they can pull it off. If they can’t, it’s a damning commentary on the core Oracle codebase.
- Probably Oracle will have something that it portrays as good multi-tenancy support. Some of that could be based on Label Security and so on.
- Anything that makes schema change easier could be a win on the DBA and multi-tenancy sides alike, which would be a nice two-fer.
|Categories: Clustering, Columnar database management, Data warehouse appliances, Data warehousing, Exadata, Oracle, Teradata||7 Comments|
Chris Kanaracus uncovered a case of Oracle actually pulling an ad after having been found “guilty” of false advertising. The essence seems to be that Oracle claimed 20X hardware performance vs. IBM, based on a comparison done against 6 year old hardware running an earlier version of the Oracle DBMS. My quotes in the article were:
- “Everybody’s guilty of that kind of exaggeration.”
- “Oracle tends to be even a little guiltier than others.”
- “If your new system can’t outperform somebody else’s old system by a huge factor on at least some queries, you’re doing something wrong.”
- “Use newer, better hardware; use newer, better software; have a top sales engineer do a great job of tuning it and of course you’ll see huge performance results.”
Another example of Oracle exaggeration was around the Exadata replacement of Teradata at Softbank. But the bogosity flows both ways. Netezza used to make a flat claim of 50X better performance than Oracle, while Vertica’s standard press release boilerplate long boasted
50x-1000x faster performance at 30% the cost of traditional solutions
Of course, reality is a lot more complicated. Even if you assume apples-to-apples comparisons in terms of hardware and software versions, performance comparisons can vary greatly depending upon queries, databases, or use cases. For example:
- Many queries are inherently much faster over columnar storage than over row-based.
- Different data sets respond very differently to various compression algorithms.
- Some analytic RDBMS can maintain strong performance at high levels of concurrent usage. Some can’t.
- Some queries that run very fast on one DBMS without tuning might require careful tuning in another system.
- Some DBMS scale out much better than others.
- Vendors optimize for different usage assumptions, which may or may not apply in your particular case.
And so, vendor marketing claims about across-the-board performance should be viewed with the utmost of suspicion.
|Categories: Columnar database management, Data warehouse appliances, Data warehousing, Database compression, Exadata, Netezza, Oracle, Vertica Systems||Leave a Comment|
Various reporters have asked me about Oracle’s third quarter 2012 earnings conference call. Specific Q&A includes:
What did Oracle do to have its earnings beat Wall Street’s estimates?
Have a bad second quarter and then set Wall Street’s expectations too low for Q3. This isn’t about strong results; it’s about modest expectations.
Can Oracle be a leader in both hardware and software?
- It’s not inconceivable.
- The observation that Oracle, IBM, and Teradata all are pushing hardware-software combinations has been intriguing ever since IBM bought Netezza. (SAP really isn’t, however; ditto Microsoft.)
- I do think Oracle may be somewhat overoptimistic as to how cooperative the Sun user base will be in buying more high-end product and in paying more in maintenance for the gear they already have.
Beyond that, please see below.
What about Oracle in the cloud?
MySQL is an important cloud supplier. But Oracle overall hasn’t demonstrated much understanding of what cloud technology and business are all about. An expensive SaaS acquisition here or there could indeed help somewhat, but it seems as if Oracle still has a very long way to go.
|Categories: Cloud computing, Exadata, Humor, In-memory DBMS, Oracle, SAP AG, Software as a Service (SaaS)||5 Comments|
I’d like to survey a few related ideas:
- Enterprises should each have a variety of different analytic data stores.
- Vendors — especially but not only IBM and Teradata — are acknowledging and marketing around the point that enterprises should each have a number of different analytic data stores.
- In addition to having multiple analytic data management technology stacks, it is also desirable to have an agile way to spin out multiple virtual or physical relational data marts using a single RDBMS. Vendors are addressing that need.
- Some observers think that the real essence of analytic data management will be in data integration, not the actual data management.
Here goes. Read more
|Categories: Data warehousing, Database diversity, EAI, EII, ETL, ELT, ETLT, Exadata, Greenplum, Hadoop, Hortonworks, IBM and DB2, Informatica, Netezza, Oracle, Sybase, Teradata, Workload management||14 Comments|
This year’s Gartner Magic Quadrant for Data Warehouse Database Management Systems is out.* I shall now comment, just as I did on the 2010, 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants, to varying extents. To frame the discussion, let me start by saying:
- In general, I regard Gartner Magic Quadrants as a bad use of good research.
- Illustrating the uselessness of — or at least poor execution on — the overall quadrant metaphor, a large majority of the vendors covered are lined up near the line x = y, each outpacing the one below in both of the quadrant’s dimensions.
- I find fewer specifics to disagree with in this Gartner Magic Quadrant than in previous year’s versions. Two factors jump to mind as possible reasons:
- This year’s Gartner Magic Quadrant for Data Warehouse Database Management Systems is somewhat less ambitious than others; while it gives as much company detail as its predecessors, it doesn’t add as much discussion of overall trends. So there’s less to (potentially) disagree with.
- Merv Adrian is now at Gartner.
- Whatever the problems may be with Gartner’s approach, the whole thing comes out better than do Forrester’s failed imitations.
*As of February, 2012 — and surely for many months thereafter — Teradata is graciously paying for a link to the report.
Specific company comments, roughly in line with Gartner’s rough single-dimensional rank ordering, include: Read more
Recently, I observed that Big Data terminology is seriously broken. It is reasonable to reduce the subject to two quasi-dimensions:
- Bigness — Volume, Velocity, size
- Structure — Variety, Variability, Complexity
- High-velocity “big data” problems are usually high-volume as well.*
- Variety, variability, and complexity all relate to the simply-structured/poly-structured distinction.
But the conflation should stop there.
*Low-volume/high-velocity problems are commonly referred to as “event processing” and/or “streaming”.
When people claim that bigness and structure are the same issue, they oversimplify into mush. So I think we need four pieces of terminology, reflective of a 2×2 matrix of possibilities. For want of better alternatives, my suggestions are:
- Relational big data is data of high volume that fits well into a relational DBMS.
- Multi-structured big data is data of high volume that doesn’t fit well into a relational DBMS. Alternative: Poly-structured big data.
- Conventional relational data is data of not-so-high volume that fits well into a relational DBMS. Alternatives: Ordinary/normal/smaller relational data.
- Smaller poly-structured data is data for which dynamic schema capabilities are important, but which doesn’t rise to “big data” volume.
|Categories: Cassandra, Data models and architecture, Data warehousing, Exadata, Facebook, Google, Hadoop, HBase, Log analysis, Market share and customer counts, MarkLogic, NewSQL, NoSQL, Oracle, Splunk, Yahoo||10 Comments|
When I drafted a list of key analytics-sector issues in honor of look-ahead season, the first item was “execution of various big vendors’ ambitious initiatives”. By “execute” I mean mainly:
- “Deliver products that really meet customers’ desires and needs.”
- “Successfully convince them that you’re doing so …”
- “… at an attractive overall cost.”
Vendors mentioned here are Oracle, SAP, HP, and IBM. Anybody smaller got left out due to the length of this post. Among the bigger omissions were: