Discussion of choices and variety in database management system architecture. Related subjects include:
When I find myself making the same observation fairly frequently, that’s a good impetus to write a post based on it. And so this post is based on the thought that there are many analogies between:
- Oracle and the Oracle DBMS.
- IBM and the IBM mainframe.
And when you look at things that way, Oracle seems to be swimming against the tide.
Drilling down, there are basically three things that can seriously threaten Oracle’s market position:
- Growth in apps of the sort for which Oracle’s RDBMS is not well-suited. Much of “Big Data” fits that description.
- Outright, widespread replacement of Oracle’s application suites. This is the least of Oracle’s concerns at the moment, but could of course be a disaster in the long term.
- Transition to “the cloud”. This trend amplifies the other two.
Oracle’s decline, if any, will be slow — but I think it has begun.
There’s a clear market lead in the core product category. IBM was dominant in mainframe computing. While not as dominant, Oracle is definitely a strong leader in high-end OTLP/mixed-use (OnLine Transaction Processing) RDBMS.
That market lead is even greater than it looks, because some of the strongest competitors deserve asterisks. Many of IBM’s mainframe competitors were “national champions” — Fujitsu and Hitachi in Japan, Bull in France and so on. Those were probably stronger competitors to IBM than the classic BUNCH companies (Burroughs, Univac, NCR, Control Data, Honeywell).
Similarly, Oracle’s strongest direct competitors are IBM DB2 and Microsoft SQL Server, each of which is sold primarily to customers loyal to the respective vendors’ full stacks. SAP is now trying to play a similar game.
The core product is stable, secure, richly featured, and generally very mature. Duh.
The core product is complicated to administer — which provides great job security for administrators. IBM had JCL (Job Control Language). Oracle has a whole lot of manual work overseeing indexes. In each case, there are many further examples of the point. Edit: A Twitter discussion suggests the specific issue with indexes has been long fixed.
Niche products can actually be more reliable than the big, super-complicated leader. Tandem Nonstop computers were super-reliable. Simple, “embeddable” RDBMS — e.g. Progress or SQL Anywhere — in many cases just work. Still, if you want one system to run most of your workload 24×7, it’s natural to choose the category leader. Read more
|Categories: Cloud computing, Database diversity, Exadata, IBM and DB2, Market share and customer counts, Microsoft and SQL*Server, NoSQL, Oracle, Software as a Service (SaaS)||25 Comments|
Mike Stonebraker and Larry Ellison have numerous things in common. If nothing else:
- They’re both titanic figures in the database industry.
- They both gave me testimonials on the home page of my business website.
- They both have been known to use the present tense when the future tense would be more accurate.
I mention the latter because there’s a new edition of Readings in Database Systems, aka the Red Book, available online, courtesy of Mike, Joe Hellerstein and Peter Bailis. Besides the recommended-reading academic papers themselves, there are 12 survey articles by the editors, and an occasional response where, for example, editors disagree. Whether or not one chooses to tackle the papers themselves — and I in fact have not dived into them — the commentary is of great interest.
But I would not take every word as the gospel truth, especially when academics describe what they see as commercial market realities. In particular, as per my quip in the first paragraph, the data warehouse market has not yet gone to the extremes that Mike suggests,* if indeed it ever will. And while Joe is close to correct when he says that the company Essbase was acquired by Oracle, what actually happened is that Arbor Software, which made Essbase, merged with Hyperion Software, and the latter was eventually indeed bought by the giant of Redwood Shores.**
*When it comes to data warehouse market assessment, Mike seems to often be ahead of the trend.
**Let me interrupt my tweaking of very smart people to confess that my own commentary on the Oracle/Hyperion deal was not, in retrospect, especially prescient.
Mike pretty much opened the discussion with a blistering attack against hierarchical data models such as JSON or XML. To a first approximation, his views might be summarized as: Read more
In the previous post I broke product differentiation into 6-8 overlapping categories, which may be abbreviated as:
- (Other) trustworthiness
- User experience
and sometimes also issues in adoption and administration.
Now let’s use this framework to examine two market categories I cover — data management and, in separate post, business intelligence.
Applying this taxonomy to data management:
|Categories: Buying processes, Clustering, Data warehousing, Database diversity, Microsoft and SQL*Server, Predictive modeling and advanced analytics, Pricing||2 Comments|
1. The rise of SAP (and later Siebel Systems) was greatly helped by Anderson Consulting, even before it was split off from the accounting firm and renamed as Accenture. My main contact in that group was Rob Kelley, but it’s possible that Brian Sommer was even more central to the industry-watching part of the operation. Brian is still around, and he just leveled a blast at the ERP* industry, which I encourage you to read. I agree with most of it.
*Enterprise Resource Planning
Brian’s argument, as I interpret it, boils down mainly to two points:
- Big ERP companies selling big ERP systems are pathetically slow at adding new functionality. He’s right. My favorite example is the multi-decade slog to integrate useful analytics into operational apps.
- The world of “Big Data” is fundamentally antithetical to the design of current-generation ERP systems. I think he’s right in that as well.
I’d add that SaaS (Software As A Service)/on-premises tensions aren’t helping incumbent vendors either.
But no article addresses all the subjects it ideally should, and I’d like to call out two omissions. First, what Brian said is in many cases applicable just to large and/or internet-first companies. Plenty of smaller, more traditional businesses could get by just fine with no more functionality than is in “Big ERP” today, if we stipulate that it should be:
- Delivered via SaaS.
- Much easier to adopt and use.
- Multi-model database management has been around for decades. Marketers who say otherwise are being ridiculous.
- Thus, “multi-model”-centric marketing is the last refuge of the incompetent. Vendors who say “We have a great DBMS, and by the way it’s multi-model (now/too)” are being smart. Vendors who say “You need a multi-model DBMS, and that’s the reason you should buy from us” are being pathetic.
- Multi-logical-model data management and multi-latency-assumption data management are greatly intertwined.
Before supporting my claims directly, let me note that this is one of those posts that grew out of a Twitter conversation. The first round went:
Merv Adrian: 2 kinds of multimodel from DBMS vendors: multi-model DBMSs and multimodel portfolios. The latter create more complexity, not less.
Me: “Owned by the same vendor” does not imply “well integrated”. Indeed, not a single example is coming to mind.
Merv: We are clearly in violent agreement on that one.
Around the same time I suggested that Intersystems Cache’ was the last significant object-oriented DBMS, only to get the pushback that they were “multi-model” as well. That led to some reasonable-sounding justification — although the buzzwords of course aren’t from me — namely: Read more
|Categories: Data models and architecture, Database diversity, Databricks, Spark and BDAS, Intersystems and Cache', MOLAP, Object, Streaming and complex event processing (CEP)||3 Comments|
I talked with a couple of Cloudera folks about HBase last week. Let me frame things by saying:
- The closest thing to an HBase company, ala MongoDB/MongoDB or DataStax/Cassandra, is Cloudera.
- Cloudera still uses a figure of 20% of its customers being HBase-centric.
- HBaseCon and so on notwithstanding, that figure isn’t really reflected in Cloudera’s marketing efforts. Cloudera’s marketing commitment to HBase has never risen to nearly the level of MongoDB’s or DataStax’s push behind their respective core products.
- With Cloudera’s move to “zero/one/many” pricing, Cloudera salespeople have little incentive to push HBase hard to accounts other than HBase-first buyers.
- Cloudera no longer dominates HBase development, if it ever did.
- Cloudera is the single biggest contributor to HBase, by its count, but doesn’t make a majority of the contributions on its own.
- Cloudera sees Hortonworks as having become a strong HBase contributor.
- Intel is also a strong contributor, as are end user organizations such as Chinese telcos. Not coincidentally, Intel was a major Hadoop provider in China before the Intel/Cloudera deal.
- As far as Cloudera is concerned, HBase is just one data storage technology of several, focused on high-volume, high-concurrency, low-latency short-request processing. Cloudera thinks this is OK because of HBase’s strong integration with the rest of the Hadoop stack.
- Others who may be inclined to disagree are in several cases doing projects on top of HBase to extend its reach. (In particular, please see the discussion below about Apache Phoenix and Trafodion, both of which want to offer relational-like functionality.)
|Categories: Cloudera, Clustering, Data models and architecture, Database diversity, Hadoop, HBase, Hortonworks, HP and Neoview, Intel, Market share and customer counts, NoSQL, Open source||4 Comments|
7-10 years ago, I repeatedly argued the viewpoints:
- Relational DBMS were the right choice in most cases.
- Multiple kinds of relational DBMS were needed, optimized for different kinds of use case.
- There were a variety of specialized use cases in which non-relational data models were best.
Since then, however:
- Hadoop has flourished.
- NoSQL has flourished.
- Graph DBMS have matured somewhat.
- Much of the action has shifted to machine-generated data, of which there are many kinds.
So it’s probably best to revisit all that in a somewhat organized way.
I’m commonly asked to assess vendor claims of the kind:
- “Our system lets you do multiple kinds of processing against one database.”
- “Otherwise you’d need two or more data managers to get the job done, which would be a catastrophe of unthinkable proportion.”
So I thought it might be useful to quickly review some of the many ways organizations put multiple data stores to work. As usual, my bottom line is:
- The most extreme vendor marketing claims are false.
- There are many different choices that make sense in at least some use cases each.
Horses for courses
It’s now widely accepted that different data managers are better for different use cases, based on distinctions such as:
- Short-request vs. analytic.
- SQL vs. non-SQL (NoSQL or otherwise).
- Expensive/heavy-duty vs. cheap/easy-to-support.
Vendors are part of this consensus; already in 2005 I observed
For all practical purposes, there are no DBMS vendors left advocating single-server strategies.
Vendor agreement has become even stronger in the interim, as evidenced by Oracle/MySQL, IBM/Netezza, Oracle’s NoSQL dabblings, and various companies’ Hadoop offerings.
Multiple data stores for a single application
We commonly think of one data manager managing one or more databases, each in support of one or more applications. But the other way around works too; it’s normal for a single application to invoke multiple data stores. Indeed, all but the strictest relational bigots would likely agree: Read more
The name of this blog comes from an August, 2005 column. 8 1/2 years later, that analysis holds up pretty well. Indeed, I’d keep the first two precepts exactly as I proposed back then:
- Task-appropriate data managers. Much of this blog is about task-appropriate data stores, so I won’t say more about them in this post.
- Drastic limitations on relational schema complexity. I think I’ve been vindicated on that one by, for example:
- NoSQL and dynamic schemas.
- Schema-on-read, and its smarter younger brother schema-on-need.
- Limitations on the performance and/or allowed functionality of joins in scale-out short-request RDBMS, and the relative lack of complaints about same.
- Funky database design from major Software as a Service (SaaS) vendors such as Workday and Salesforce.com.
- A whole lot of logs.
I’d also keep the general sense of the third precept, namely appropriately-capable data integration, but for that one the specifics do need some serious rework.
For starters, let me say: Read more
|Categories: About this blog, Business intelligence, Database diversity, EAI, EII, ETL, ELT, ETLT, Investment research and trading, NoSQL, Schema on need||2 Comments|
Two subjects in one post, because they were too hard to separate from each other
Any sufficiently complex software is developed in modules and subsystems. DBMS are no exception; the core trinity of parser, optimizer/planner, and execution engine merely starts the discussion. But increasingly, database technology is layered in a more fundamental way as well, to the extent that different parts of what would seem to be an integrated DBMS can sometimes be developed by separate vendors.
Major examples of this trend — where by “major” I mean “spanning a lot of different vendors or projects” — include:
- The object/relational, aka universal, extensibility features developed in the 1990s for Oracle, DB2, Informix, Illustra, and Postgres. The most successful extensions probably have been:
- Geospatial indexing via ESRI.
- Full-text indexing, notwithstanding questionable features and performance.
- MySQL storage engines.
- MPP (Massively Parallel Processing) analytic RDBMS relying on single-node PostgreSQL, Ingres, and/or Microsoft SQL Server — e.g. Greenplum (especially early on), Aster (ditto), DATAllegro, DATAllegro’s offspring Microsoft PDW (Parallel Data Warehouse), or Hadapt.
- Splits in which a DBMS has serious processing both in a “database” layer and in a predicate-pushdown “storage” layer — most famously Oracle Exadata, but also MarkLogic, InfiniDB, and others.
- SQL-on-HDFS — Hive, Impala, Stinger, Shark and so on (including Hadapt).
Other examples on my mind include:
- Data manipulation APIs being added to key-value stores such as Couchbase and Aerospike.
- TokuMX, the Tokutek/MongoDB hybrid I just blogged about.
- NuoDB’s willing reliance on third-party key-value stores (or HDFS in the role of one).
- FoundationDB’s strategy, and specifically its acquisition of Akiban.
And there are several others I hope to blog about soon, e.g. current-day PostgreSQL.
In an overlapping trend, DBMS increasingly have multiple data manipulation APIs. Examples include: Read more