Open source
Discussion of relational database management systems that are offered through some version of open source licensing. Related subjects include:
Thinking about market segments
It is a reasonable (over)simplification to say that my business boils down to:
- Advising vendors what/how to sell.
- Advising users what/how to buy.
One complication that commonly creeps in is that different groups of users have different buying practices and technology needs. Usually, I nod to that point in passing, perhaps by listing different application areas for a company or product. But now let’s address it head on. Whether or not you care about the particulars, I hope the sheer length of this post reminds you that there are many different market segments out there.
Last June I wrote:
In almost any IT decision, there are a number of environmental constraints that need to be acknowledged. Organizations may have standard vendors, favored vendors, or simply vendors who give them particularly deep discounts. Legacy systems are in place, application and system alike, and may or may not be open to replacement. Enterprises may have on-premise or off-premise preferences; SaaS (Software as a Service) vendors probably have multitenancy concerns. Your organization can determine which aspects of your system you’d ideally like to see be tightly integrated with each other, and which you’d prefer to keep only loosely coupled. You may have biases for or against open-source software. You may be pro- or anti-appliance. Some applications have a substantial need for elastic scaling. And some kinds of issues cut across multiple areas, such as budget, timeframe, security, or trained personnel.
I’d further say that it matters whether the buyer:
- Is a large central IT organization.
- Is the well-staffed IT organization of a particular business department.
- Is a small, frazzled IT organization.
- Has strong engineering or technical skills, but less in the way of IT specialists.
- Is trying to skate by without much technical knowledge of any kind.
Now let’s map those considerations (and others) to some specific market segments. Read more
DataStax Enterprise and Cassandra revisited
My last post about DataStax Enterprise and Cassandra didn’t go so well. As follow-up, I chatted for two hours with Rick Branson and Billy Bosworth of DataStax. Hopefully I can do better this time around.
For starters, let me say there are three kinds of data management nodes in DataStax Enterprise:
- Vanilla Cassandra.
- Cassandra plus Solr. Solr is a superset of the text-indexing system Lucene.
- Solr adds a lot more secondary indexing to Cassandra.
- In addition, these nodes serve as Solr emulation; you can run generic Solr apps on them.
- Cassandra plus Hadoop.
- You can use Hadoop MapReduce to manipulate generic Cassandra data.
- In addition, these nodes serve as Hadoop/HDFS (Hadoop Distributed File System) emulation; you can run generic Hadoop apps on them.
- Hadoop jobs can interweave access to the two kinds of data structure.
Cassandra, Solr, Lucene, and Hadoop are all Apache projects.
If we look at this from the standpoint of DML (Data Manipulation Language) and data access APIs:
- Cassandra is a column-group kind of NoSQL DBMS. You can get at its data programmatically.
- There’s something called CQL (Cassandra Query Language), said to be SQL-like.
- There’s a JDBC driver for CQL.
- With Hadoop MapReduce also come Hive, Pig, and Sqoop.
- With Solr and Lucene come full-text search.
In addition, it is sometimes recommended that you use “in-entity caching”, where an entire data structure (e.g. in JSON) winds up in a single Cassandra column.
The two main ways to get direct SQL* access to data in DataStax Enterprise are:
- JDBC/SQL.
- Hive/Hadoop.
*or very SQL-like, depending on how you view things
Before going further, let’s recall some Cassandra basics: Read more
| Categories: Cassandra, DataStax, Hadoop, MapReduce, Market share and customer counts, NoSQL, Open source, Text | 5 Comments |
The 2011/2012 Gartner Magic Quadrant for Business Intelligence Platforms — company-by-company comments
This is one of a series of posts on business intelligence and related analytic technology subjects, keying off the 2011/2012 version of the Gartner Magic Quadrant for Business Intelligence Platforms. The four posts in the series cover:
- Overview comments about the 2011/2012 Gartner Magic Quadrant for Business Intelligence Platforms, as well as a link to the actual document.
- Business intelligence industry trends — some of Gartner’s thoughts but mainly my own.
- (This post) Company-by-company comments based on the 2011/2012 Gartner Magic Quadrant for Business Intelligence Platforms.
- Third-party analytics, pulling together and expanding on some points I made in the first three posts.
The heart of Gartner Group’s 2011/2012 Magic Quadrant for Business Intelligence Platforms was the company comments. I shall expound upon some, roughly in declining order of Gartner’s “Completeness of Vision” scores, dubious though those rankings may be. Read more
Quick notes on MySQL Cluster
According to the MySQL Cluster home page, today’s MySQL Cluster release has — give or take terminology details
– added transparent sharding (Edit: Actually, please see the first comment below) and a memcached interface. My quick comments on all this to a reporter a couple of days ago were:
- Persistent memcached is a useful thing. Couchbase’s sales illustrate that point: http://www.dbms2.com/2012/02/01/couchbase-update/
- MySQL has always given good performance when used just as a key-value store, e.g. http://www.dbms2.com/2010/08/22/workday-technology-stack/ . So it’s reasonable to hope the memcached interface will have good performance out of the box.
- MySQL’s clustering capabilities have long been weak, providing a window of opportunity for companies and products such as Schooner Information and dbShards. The gold standard for clustering is:
- Efficient transparent sharding: http://www.dbms2.com/2011/02/24/transparent-sharding/
- Synchronous replication at much better than two-phase-commit speeds. http://www.dbms2.com/2011/10/23/schooner-pivots-further/
I don’t really know enough about MySQL Cluster right now to comment in more detail.
| Categories: Clustering, memcached, MySQL, NoSQL, OLTP, Open source | 2 Comments |
Comments on the analytic DBMS industry and Gartner’s Magic Quadrant for same
This year’s Gartner Magic Quadrant for Data Warehouse Database Management Systems is out.* I shall now comment, just as I did on the 2010, 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants, to varying extents. To frame the discussion, let me start by saying:
- In general, I regard Gartner Magic Quadrants as a bad use of good research.
- Illustrating the uselessness of — or at least poor execution on — the overall quadrant metaphor, a large majority of the vendors covered are lined up near the line x = y, each outpacing the one below in both of the quadrant’s dimensions.
- I find fewer specifics to disagree with in this Gartner Magic Quadrant than in previous year’s versions. Two factors jump to mind as possible reasons:
- This year’s Gartner Magic Quadrant for Data Warehouse Database Management Systems is somewhat less ambitious than others; while it gives as much company detail as its predecessors, it doesn’t add as much discussion of overall trends. So there’s less to (potentially) disagree with.
- Merv Adrian is now at Gartner.
- Whatever the problems may be with Gartner’s approach, the whole thing comes out better than do Forrester’s failed imitations.
*As of February, 2012 — and surely for many months thereafter — Teradata is graciously paying for a link to the report.
Specific company comments, roughly in line with Gartner’s rough single-dimensional rank ordering, include: Read more
Hadoop-related market categorization
I wasn’t the only one to be dubious about Forrester Research’s Hadoop taxonomy (or lack thereof). GigaOm’s Derrick Harris was as well, and offered a much superior approach of his own. In Derrick’s view, there’s Hadoop, Hadoop distributions, Hadoop management, and Hadoop applications. Taking those out of order, and recalling that no market categorization is ever precise:
- “Hadoop applications” is a catch-all category. Since Derrick offered suitable caveats around the label, I’m fine with what he said.
- Hadoop management software commonly comes in the form of suites. Derrick’s discussion was solid.
- Derrick seems to want to define “Hadoop” as being whatever is in the relevant Apache projects. Cool. He does seem to wind up on both sides of the “MapR and DataStax put Hadoop MapReduce on top of something that isn’t HDFS — so is that Hadoop or isn’t it?” question, but that’s a tough ambiguity to avoid.
- Derrick could have been a little clearer on the subject of Hadoop distributions.
Let’s drill down into that last one. Derrick refers to Hadoop distributions as “products” that:
package a set of Hadoop projects (MapReduce, Hive, Sqoop, Pig, etc.) in a way that in theory makes them integrate more naturally, and to run both smoothly and securely.
While that’s a reasonable recitation of the idea’s benefits, I’d rather say that a “distribution” of open source software comprises: Read more
| Categories: Cloudera, Hadoop, MapReduce, Open source | Leave a Comment |
Couchbase update
I checked in with James Phillips for a Couchbase update, and I understand better what’s going on. In particular:
- Give or take minor tweaks, what I wrote in my August, 2010 Couchbase updates still applies.
- Couchbase now and for the foreseeable future has one product line, called Couchbase.
- Couchbase 2.0, the first version of Couchbase (the product) to use CouchDB for persistence, has slipped …
- … because more parts of CouchDB had to be rewritten for performance than Couchbase (the company) had hoped.
- Think mid-year or so for the release of Couchbase 2.0, hopefully sooner.
- In connection with the need to rewrite parts of CouchDB, Couchbase has:
- Gotten out of the single-server CouchDB business.
- Donated its proprietary single-sever CouchDB intellectual property to the Apache Foundation.
- The 150ish new customers in 2011 Couchbase brags about are real, subscription customers.
- Couchbase has 60ish people, headed to >100 over the next few months.
| Categories: Basho and Riak, Cassandra, Couchbase, CouchDB, DataStax, Market share and customer counts, MongoDB and 10gen, NoSQL, Open source, Parallelization, Web analytics, Zynga | 5 Comments |
Notes from the Couch blogs
Couchbase in general, and CouchDB project founder Damien Katz in particular, are to some extent walking away from CouchDB. That is:
- The Couchbase product will not be upward compatible with CouchDB.
- Couchbase will no longer offer a CouchDB distribution, and is doing the natural and responsible thing, namely …
- … donating to the Apache Foundation the previously proprietary aspects of that distribution.
Even so:
- All — or at least “all” — the code Couchbase offers will, at least for now, be open source.
The story unfolded in a bombshell post by Damien, and clarification follow-ups by Damien and by Couchbase CEO Bob Wiederhold. The meatiest of the three was probably Damien’s follow-up, in which he said, among other things:
Read more
| Categories: Couchbase, CouchDB, Market share and customer counts, Open source | 1 Comment |
Hope for a new PostgreSQL era?
In a comedy of briefing errors, I’m not too clear on the details of my client salesforce.com’s new PostgreSQL-as-a-service offering, nor exactly on what my clients at VMware are bringing to the PostgreSQL virtualization/cloud party. That said:
- PostgreSQL is good technology.
- MySQL is narrowing the gap, but PostgreSQL is still ahead of MySQL in some ways. (Database extensibility if nothing else.)
- PostgreSQL has a lot of users. (Many of them in academia and/or Russia.)
- Neither EnterpriseDB (which now calls itself “The enterprise PostgreSQL company”) nor the PostgreSQL community leadership have covered themselves with stewardship glory.
- A significant number of interesting DBMS products can be regarded as PostgreSQL forks (e.g. Greenplum, Aster Data nCluster, Netezza if you squint, and Vertica if you stand on your head*).
- PostgreSQL advancement is not dead. For example, Hadapt beta users are running actual PostgreSQL on many nodes each.
- There’s no assurance that Oracle will be a benevolent MySQL steward forever. (Specifically, Oracle’s “Play nicely with others” antitrust commitments expire in 2014.)
So I think it would be cool if one or the other big company put significant wood behind the PostgreSQL arrow.
*While Vertica was originally released using little or no PostgreSQL code — reports varied — it featured high degrees of PostgreSQL compatibility.
| Categories: Aster Data, EnterpriseDB and Postgres Plus, Greenplum, MySQL, Netezza, Open source, salesforce.com, Vertica Systems | 7 Comments |
NoSQL notes
Last week I visited with James Phillips of Couchbase, Max Schireson and Eliot Horowitz of 10gen, and Todd Lipcon, Eric Sammer, and Omer Trajman of Cloudera. I guess it’s time for a round-up NoSQL post.
Views of the NoSQL market horse race are reasonably consistent, with perhaps some elements of “Where you stand depends upon where you sit.”
- As James tells it, NoSQL is simply a three-horse race between Couchbase, MongoDB, and Cassandra.
- Max would include HBase on the list.
- Further, Max pointed out that metrics such as job listings suggest MongoDB has the most development activity, and Couchbase/Membase/CouchDB perhaps have less.
- The Cloudera guys remarked on some serious HBase adopters.*
- Everybody I spoke with agreed that Riak had little current market presence, although some Basho guys could surely be found who’d disagree.
