Analysis of Oracle Exadata and the Oracle Database Machine. Related subjects include:
Relational DBMS used to be fairly straightforward product suites, which boiled down to:
- A big SQL interpreter.
- A bunch of administrative and operational tools.
- Some very optional add-ons, often including an application development tool.
Now, however, most RDBMS are sold as part of something bigger.
- Oracle has hugely thickened its stack, as part of an Innovator’s Solution strategy — hardware, middleware, applications, business intelligence, and more.
- IBM has moved aggressively to a bundled “appliance” strategy. Even before that, IBM DB2 long sold much better to committed IBM accounts than as a software-only offering.
- Microsoft SQL Server is part of a stack, starting with the Windows operating system.
- Sybase was an exception to this rule, with thin(ner) stacks for both Adaptive Server Enterprise and Sybase IQ. But Sybase is now owned by SAP, and increasingly integrated as a business with …
- … SAP HANA, which is closely associated with SAP’s applications.
- Teradata has always been a hardware/software vendor. The most successful of its analytic DBMS rivals, in some order, are:
- Netezza, a pure appliance vendor, now part of IBM.
- Greenplum, an appliance-mainly vendor for most (not all) of its existence, and in particular now as a part of EMC Pivotal.
- Vertica, more of a software-only vendor than the others, but now owned by and increasingly mainstreamed into hardware vendor HP.
- MySQL’s glory years were as part of the “LAMP” stack.
- Various thin-stack RDBMS that once were or could have been important market players … aren’t. Examples include Progress OpenEdge, IBM Informix, and the various strays adopted by Actian.
I’ll start with three observations:
- Computer systems can’t be entirely tightly coupled — nothing would ever get developed or tested.
- Computer systems can’t be entirely loosely coupled — nothing would ever get optimized, in performance and functionality alike.
- In an ongoing trend, there is and will be dramatic refactoring as to which connections wind up being loose or tight.
As written, that’s probably pretty obvious. Even so, it’s easy to forget just how pervasive the refactoring is and is likely to be. Let’s survey some examples first, and then speculate about consequences. Read more
- The trend to clustered computing is sustainable.
- The trend to appliances is also sustainable.
- The “single” enterprise cluster is almost as much of a pipe dream as the single enterprise database.
I shall explain.
Arguments for hosting applications on some kind of cluster include:
- If the workload requires more than one server — well, you’re in cluster territory!
- If the workload requires less than one server — throw it into the virtualization pool.
- If the workload is uneven — throw it into the virtualization pool.
Arguments specific to the public cloud include:
- A large fraction of new third-party applications are SaaS (Software as a Service). Those naturally live in the cloud.
- Cloud providers have efficiencies that you don’t.
That’s all pretty compelling. However, these are not persuasive reasons to put everything on a SINGLE cluster or cloud. They could as easily lead you to have your VMware cluster and your Exadata rack and your Hadoop cluster and your NoSQL cluster and your object storage OpenStack cluster — among others — all while participating in several different public clouds as well.
Why would you not move work into a cluster at all? First, if ain’t broken, you might not want to fix it. Some of the cluster options make it easy for you to consolidate existing workloads — that’s a central goal of VMware and Exadata — but others only make sense to adopt in connection with new application projects. Second, you might just want device locality. I have a gaming-class PC next to my desk; it drives a couple of monitors; I like that arrangement. Away from home I carry a laptop computer instead. Arguments can be made for small remote-office servers as well.
|Categories: Cloud computing, Clustering, Data warehouse appliances, Exadata, NoSQL, Software as a Service (SaaS)||2 Comments|
1. It boggles my mind that some database technology companies still don’t view compression as a major issue. Compression directly affects storage and bandwidth usage alike — for all kinds of storage (potentially including RAM) and for all kinds of bandwidth (network, I/O, and potentially on-server).
Trading off less-than-maximal compression so as to minimize CPU impact can make sense. Having no compression at all, however, is an admission of defeat.
2. People tend to misjudge Hadoop’s development pace in either of two directions. An overly expansive view is to note that some people working on Hadoop are trying to make it be all things for all people, and to somehow imagine those goals will soon be achieved. An overly narrow view is to note an important missing feature in Hadoop, and think there’s a big business to be made out of offering it alone.
At this point, I’d guess that Cloudera and Hortonworks have 500ish employees combined, many of whom are engineers. That allows for a low double-digit number of 5+ person engineering teams, along with a number of smaller projects. The most urgently needed features are indeed being built. On the other hand, a complete monument to computing will not soon emerge.
3. Schooner’s acquisition by SanDisk has led to the discontinuation of Schooner’s SQL DBMS SchoonerSQL. Schooner’s flash-optimized key-value store Membrain continues. I don’t have details, but the Membrain web page suggests both data store and cache use cases.
4. There’s considerable personnel movement at Boston-area database technology companies right now. Please ping me directly if you care.
I recently complained that the Gartner Magic Quadrant for Data Warehouse DBMS conflates many use cases into one set of rankings. So perhaps now would be a good time to offer some thoughts on how to tell use cases apart. Assuming you know that you really want to manage your analytic database with a relational DBMS, the first questions you ask yourself could be:
- How big is your database? How big is your budget?
- How do you feel about appliances?
- How do you feel about the cloud?
- What are the size and shape of your workload?
- How fresh does the data need to be?
Let’s drill down. Read more
I’m not at Oracle OpenWorld, but as usual that won’t keep me from commenting. My bottom line on the first night’s announcements is:
- At many large enterprises, Oracle has a lock on much of their IT efforts. (But not necessarily in the internet or investigative analytics areas.) Tonight’s announcements serve to strengthen that.
- Tonight’s announcements do little to help Oracle in other market segments.
1. At the highest level, my view of Oracle’s strategy is the same as it’s been for several years:
Clayton Christensen’s The Innovator’s Solution teaches us that Oracle should focus on selling a thick stack of technology to its highest-end customers, and that’s exactly what Oracle does focus on.
2. Tonight’s news is closely in line with what Oracle’s Juan Loaiza told me three years ago, especially:
- Oracle thinks flash memory is the most important hardware technology of the decade, one that could lead to Oracle being “bumped off” if they don’t get it right.
- Juan believes the “bulk” of Oracle’s business will move over to Exadata-like technology over the next 5-10 years. Numbers-wise, this seems to be based more on Exadata being a platform for consolidating an enterprise’s many Oracle databases than it is on Exadata running a few Especially Big Honking Database management tasks.
3. Oracle is confusing people with its comments on multi-tenancy. I suspect:
- What Oracle is talking about when it says “multi-tenancy” is more like consolidation than true multi-tenancy.
- Probably there are a couple of true multi-tenancy features as well.
4. SaaS (Software as a Service) vendors don’t want to use Oracle, because they don’t want to pay for it.* This limits the potential impact of Oracle’s true multi-tenancy features. Even so: Read more
|Categories: Business intelligence, Cloud computing, Columnar database management, Data warehouse appliances, Data warehousing, Exadata, Memory-centric data management, Oracle, Software as a Service (SaaS), Solid-state memory, Storage||9 Comments|
What are the central challenges in internet system design? We probably all have similar lists, comprising issues such as scale, scale-out, throughput, availability, security, programming ease, UI, or general cost-effectiveness. Screw those up, and you don’t have an internet business.
Much new technology addresses those challenges, with considerable success. But the success is usually one silo at a time — a short-request application here, an analytic database there. When it comes to integration, unsolved problems abound.
The top integration and integration-like challenges for me, from a practical standpoint, are:
- Integrating silos — a decades-old problem still with us in a big way.
- Dynamic schemas with joins.
- Low-latency business intelligence.
- Human real-time personalization.
Other concerns that get mentioned include:
- Geographical distribution due to privacy laws, which for some users is a major requirement for compliance.
- Logical data warehouse, a term that doesn’t actually mean anything real.
- In-memory data grids, which some day may no longer always be hand-coupled to the application and data stacks they accelerate.
Let’s skip those latter issues for now, focusing instead on the first four.
This is a draft entry for the DBMS2 analytic glossary. Please comment with any ideas you have for its improvement!
Note: Words and phrases in italics will be linked to other entries when the glossary is complete.
A data warehouse appliance is a combination of hardware and software that includes an analytic DBMS (DataBase Management System). However, some observers incorrectly apply the term “data warehouse appliance” to any analytic DBMS.
The paradigmatic vendors of data warehouse appliances are:
- Teradata, which embraced the term “data warehouse appliance” in 2008.
- Netezza — now an IBM company — which popularized the term “data warehouse appliance” in the 2000s.
Further, vendors of analytic DBMS commonly offer — directly or through partnerships — optional data warehouse appliance configurations; examples include:
- Greenplum, now part of EMC.
- Vertica, now an HP company.
- IBM DB2, under the brand “Smart Analytic System”.
- Microsoft (Parallel Data Warehouse).
Oracle Exadata is sometimes regarded as a data warehouse appliance as well, despite not being solely focused on analytic use cases.
Data warehouse appliances inherit marketing claims from the category of analytic DBMS, such as: Read more
|Categories: Analytic glossary, Data warehouse appliances, Data warehousing, EMC, Exadata, Greenplum, HP and Neoview, IBM and DB2, Microsoft and SQL*Server, Netezza, Oracle, Teradata||4 Comments|
In a call Monday with a prominent company, I was told:
- Teradata, Netezza, Greenplum and Vertica aren’t relational.
- Teradata, Netezza, Greenplum and Vertica are all data warehouse appliances.
That, to put it mildly, is not accurate. So I shall try, yet again, to set the record straight.
In an industry where people often call a DBMS just a “database” — so that a database is something that manages a database! — one may wonder why I bother. Anyhow …
1. The products commonly known as Oracle, Exadata, DB2, Sybase, SQL Server, Teradata, Sybase IQ, Netezza, Vertica, Greenplum, Aster, Infobright, SAND, ParAccel, Exasol, Kognitio et al. all either are or incorporate relational database management systems, aka RDBMS or relational DBMS.
2. In principle, there can be difficulties in judging whether or not a DBMS is “relational”. In practice, those difficulties don’t arise — yet. Every significant DBMS still falls into one of two categories:
- Was designed to do relational stuff* from the get-go, even if it now does other things too.
- Supports a lot of SQL.
- Was designed primarily to do non-relational things.*
- Doesn’t support all that much SQL.
*I expect the distinction to get more confusing soon, at which point I’ll adopt terms more precise than “relational things” and “relational stuff”.
3. There are two chief kinds of relational DBMS: Read more
A reporter asked me to speculate about the next releases of Oracle and Exadata. He and I agreed:
- It seems likely that they’ll be discussed at Oracle OpenWorld in a couple of months.
- Exadata in particular is due for a hardware refresh.
- Oracle12c is a good guess at a name, where “C” is for “Cloud”.
My answers mixed together thoughts on what Oracle should and will emphasize (which aren’t the same thing but hopefully bear some relationship to each other ). They were (lightly edited):
- The worst thing about Oracle is the ongoing DBA work for what should be automatic.
- Oracle RAC still makes scale-out too difficult. Presumably, Oracle is looking to build aggressively on recent steps in automating parallelism.
- For Exadata, assume that Oracle is always looking to improve how data gets allocated among disk, flash, and RAM. Look also for Exadata versions with different silicon-disk ratios than are available now.
- Tighter integration among the various appliances is surely a goal, …
- … but I don’t know whether Oracle will pick them apart and let you put various kinds of hardware in the same racks or not. I’d guess against that, because the current set-up gives them a pretext to sell you more capacity than you need.
- I wonder whether Oracle will finally introduce a true columnar storage option, a year behind Teradata. That would be the obvious enhancement on the data warehousing side, if they can pull it off. If they can’t, it’s a damning commentary on the core Oracle codebase.
- Probably Oracle will have something that it portrays as good multi-tenancy support. Some of that could be based on Label Security and so on.
- Anything that makes schema change easier could be a win on the DBA and multi-tenancy sides alike, which would be a nice two-fer.
|Categories: Clustering, Columnar database management, Data warehouse appliances, Data warehousing, Exadata, Oracle, Teradata||7 Comments|