The pessimist thinks the glass is half-empty.
The optimist thinks the glass is half-full.
The engineer thinks the glass was poorly designed.
Most of what I wrote in Part 1 of this post was already true 15 years ago. But much gets added in the modern era, considering that:
- Clusters will have node hiccups more often than single nodes will. (Duh.)
- Networks are relatively slow even when uncongested, and furthermore congest unpredictably.
- In many applications, it’s OK to sacrifice even basic-seeming database functionality.
And so there’s been innovation in numerous cluster-related subjects, two of which are:
- Distributed query and update. When a database is distributed among many modes, how does a request access multiple nodes at once?
- Fault-tolerance in long-running jobs.When a job is expected to run on many nodes for a long time, how can it deal with failures or slowdowns, other than through the distressing alternatives:
- Start over from the beginning?
- Keep (a lot of) the whole cluster’s resources tied up, waiting for things to be set right?
Distributed database consistency
When a distributed database lives up to the same consistency standards as a single-node one, distributed query is straightforward. Performance may be an issue, however, which is why we have seen a lot of:
- Analytic RDBMS innovation.
- Short-request applications designed to avoid distributed joins.
- Short-request clustered RDBMS that don’t allow fully-general distributed joins in the first place.
But in workloads with low-latency writes, living up to those standards is hard. The 1980s approach to distributed writing was two-phase commit (2PC), which may be summarized as: Read more
|Categories: Clustering, CouchDB, Data warehousing, Databricks, Spark and BDAS, Facebook, Hadoop, MapReduce, Sybase, Theory and architecture, VoltDB and H-Store||1 Comment|
Cloudant is one of the few NoSQL companies with >100 paying subscription customers. For starters:
- Cloudant’s core software is a fork of CouchDB.
- Cloudant only sells you software as a service.
- More precisely, whether Cloudant offers DBaaS (DataBase as a Service) or PaaS (Platform as a Service) or a “data layer” (Cloudant’s preferred terminology) depends on your taste in buzzwords.
- I gather that Cloudant (the company) wants to handle pretty much all your data management needs. But Cloudant (the product) isn’t there yet, especially on the analytic side.
- Before CouchDB and Membase joined together, Cloudant was positioned as the big(ger) data version of CouchDB.
Company demographics include:
- Cloudant is based in Boston.
- Cloudant started out as a Y Combinator company in 2008, and “got serious” in 2009.
- Cloudant now has ~20 employees.
- Management hires include a couple of former Vertica guys.
The Cloudant guys gave me some customer counts in May that weren’t much higher than those they gave me in February, and seem to have forgotten to correct the discrepancy. Oh well. The latter (probably understated) figures included ~160 paying customers, of which:
- ~100 were multitenant.
- ~60 were single tenant.
- 1 was on-premise (but still managed by Cloudant) because of privacy concerns.
The largest Cloudant deployments seem to be in the 10s of terabytes, across a very low double digit number of servers.
|Categories: Cloudant, Clustering, Couchbase, CouchDB, MapReduce, Market share and customer counts, NoSQL, Pricing, Specific users, Storage||2 Comments|
I checked in with James Phillips for a Couchbase update, and I understand better what’s going on. In particular:
- Give or take minor tweaks, what I wrote in my August, 2010 Couchbase updates still applies.
- Couchbase now and for the foreseeable future has one product line, called Couchbase.
- Couchbase 2.0, the first version of Couchbase (the product) to use CouchDB for persistence, has slipped …
- … because more parts of CouchDB had to be rewritten for performance than Couchbase (the company) had hoped.
- Think mid-year or so for the release of Couchbase 2.0, hopefully sooner.
- In connection with the need to rewrite parts of CouchDB, Couchbase has:
- Gotten out of the single-server CouchDB business.
- Donated its proprietary single-sever CouchDB intellectual property to the Apache Foundation.
- The 150ish new customers in 2011 Couchbase brags about are real, subscription customers.
- Couchbase has 60ish people, headed to >100 over the next few months.
|Categories: Basho and Riak, Cassandra, Couchbase, CouchDB, DataStax, Market share and customer counts, MongoDB, NoSQL, Open source, Parallelization, Web analytics, Zynga||6 Comments|
Couchbase in general, and CouchDB project founder Damien Katz in particular, are to some extent walking away from CouchDB. That is:
- The Couchbase product will not be upward compatible with CouchDB.
- Couchbase will no longer offer a CouchDB distribution, and is doing the natural and responsible thing, namely …
- … donating to the Apache Foundation the previously proprietary aspects of that distribution.
- All — or at least “all” — the code Couchbase offers will, at least for now, be open source.
The story unfolded in a bombshell post by Damien, and clarification follow-ups by Damien and by Couchbase CEO Bob Wiederhold. The meatiest of the three was probably Damien’s follow-up, in which he said, among other things:
I decided I needed some Couchbase drilldown, on business and technology alike, so I had solid chats with both CEO Bob Wiederhold and Chief Architect Dustin Sallings. Pretty much everything I wrote at the time Membase and CouchOne merged to form Couchbase (the company) still holds up. But I have more detail now.
Context for any comments on customer traction includes:
- Membase went into limited production release in October, and full release in January. Similar things are true of CouchDB.
- Hence, most sales of Couchbase’s products have been made over the past 6 months.
- Couchbase (the merged product) is at this point only in a pre-production developer’s release.
- Couchbase has both a direct sales force and a classic open-source “funnel”-based online selling model. Naturally, Couchbase’s understanding of what its customers are doing is more solid with respect to the direct sales base.
- Most of Couchbase’s revenue to date seems to have come from a limited number of big-ticket “lighthouse” accounts (as opposed to, say, the larger number of smaller deals that come in through the online funnel).
- Most Membase purchases are for new applications, as opposed to memcached migrations. However, customers are the kinds of companies that probably also are using memcached elsewhere.
- Most other Membase purchases are replacements for the Membase/MySQL combination. Bob says those are easy sales with short sales cycles.
- Pure memcached support is a small but non-zero business for Couchbase, and a fine source of upsell opportunities.
- In the pipeline but not so much yet in the customer base are SaaS vendors and the like who use and may want to replace traditional DBMS such as Oracle. Other than among those, Couchbase doesn’t compete much yet with Oracle et al.
- Pure CouchDB isn’t all that much of a business, at least relative to community size, as CouchDB is a single-server product commonly used by people who are content not to pay for support.
Membase sales are concentrated in five kinds of internet-centric companies, which in declining order are: Read more
Membase, the company whose product is Membase and whose former company name is Northscale, has merged with CouchOne, the company whose product is CouchDB and whose former name is Couch.io. The result (product and company) will be called Couchbase. CouchDB inventor Damien Katz will join the Membase (now Couchbase) management team as CTO. Couchbase can reasonably be regarded as a document-oriented NoSQL DBMS, a product category I not coincidentally posted about yesterday.
In essence, Couchbase will be CouchDB with scale-out. Alternatively, Couchbase will be Membase with a richer programming interface. The Couchbase sweet spot is likely to be: Read more
|Categories: Application areas, Cache, Couchbase, CouchDB, Market share and customer counts, memcached, NoSQL, Open source, Parallelization, Solid-state memory||2 Comments|
When people talk about document-oriented NoSQL or some similar term, they usually mean something like:
Or, if they really mean,
The essence of whatever it is that CouchDB and MongoDB have in common.
well, that’s pretty much the same thing as what I said in the first place.
Of the various questions that might arise, three of the more definitional ones are:
- Why JSON rather than XML?
- What’s with this fluidity between the terms “document” and “object”?
- Are you serious about the lack of joins?
Let me take a crack at each. Read more
When I talked with MarkLogic’s Ken Chestnut about MarkLogic 4.2, I was surprised to learn that MarkLogic really, truly doesn’t do anything like a join. Unlike some other non-SQL DBMS, MarkLogic has no SQL interface, no ODBC or JDBC. Nothing, nada. (MarkLogic has a Java interface for Xquery, but not for anything like SQL.)
|Categories: CouchDB, MarkLogic, NoSQL, Structured documents, Text, Theory and architecture||7 Comments|
Since posting last Wednesday morning that I’m looking into NoSQL and HVSP, I’ve had a lot of conversations, including with (among others):
- Dwight Merriman of 10gen (MongoDB)
- Damien Katz of Couchio (CouchDB)
- Matt Pfeil of Riptano (Cassandra)
- Todd Lipcon of Cloudera (HBase committer)
- Tony Falco of Basho (Riak)
- John Busch of Schooner
- Ori Herrnstadt of Akiban
I have a large number of posts still in backlog. For starters, there are ones based on recent visits with Aster, Greenplum, Sybase, Vertica, and a Very Large User. I suspect I’ll write more soon on Oracle as well. Plus there’s my whole future-of-online-media area. And quite a bit more will grow out of planned research.
So there are a whole lot of other worthy subjects I doubt I’ll be getting to any time soon. In some cases, of course, other people are doing great jobs of writing about same. Here are pointers to a few links that I am glad to recommend:
- I wrote recently that I’ve discovered a number of different in-memory OLAP engines. Cindi Howson far outdid that, writing at length for Intelligent Enterprise on in-memory analytics, in an article that seems to itself be a teaser for a longer, free white paper on the subject.
- CouchDB posted an eye-catching, risque slide presentation promoting CouchDB and, more generally, key-value stores, at least for internet applications. And yes, they’ve integrated MapReduce.
- Merv Adrian posted favorably about Birst, with special reference to its OEM efforts. As previously noted, I was highly unimpressed with Birst’s end-user BI story at the time of its September roll-out, and Jerome Pineau’s recent examination did nothing to reassure me. But perhaps OEM is a different matter.
- Merv also offers an interesting post about data integration upstart Expressor, and a highly favorable one about “visualization” vendor Tableau.
- Ann All interviewed Nigel Pendse, who grumped that BI features are overrated, and what end users really want is great query performance. I’m not so sure about the features side of that, but I’m hugely in agreement about the performance. That’s a big part of why the analytic DBMS industry is so vibrant. It’s also why in-memory OLAP is suddenly so hot.
|Categories: Analytic technologies, Business intelligence, CouchDB, Data warehousing, EAI, EII, ETL, ELT, ETLT, Expressor, MapReduce, Memory-centric data management, MOLAP, Presentations, Tableau Software, Theory and architecture||Leave a Comment|