CouchDB – DBMS 2 : DataBase Management System Services

Optimism, pessimism, and fatalism — fault-tolerance, Part 2

Curt Monash — Sun, 08 Jun 2014 16:58:35 +0000

The pessimist thinks the glass is half-empty.
The optimist thinks the glass is half-full.
The engineer thinks the glass was poorly designed.

Most of what I wrote in Part 1 of this post was already true 15 years ago. But much gets added in the modern era, considering that:

Clusters will have node hiccups more often than single nodes will. (Duh.)
Networks are relatively slow even when uncongested, and furthermore congest unpredictably.
In many applications, it’s OK to sacrifice even basic-seeming database functionality.

And so there’s been innovation in numerous cluster-related subjects, two of which are:

Distributed query and update. When a database is distributed among many modes, how does a request access multiple nodes at once?
Fault-tolerance in long-running jobs.When a job is expected to run on many nodes for a long time, how can it deal with failures or slowdowns, other than through the distressing alternatives:
- Start over from the beginning?
- Keep (a lot of) the whole cluster’s resources tied up, waiting for things to be set right?

Distributed database consistency

When a distributed database lives up to the same consistency standards as a single-node one, distributed query is straightforward. Performance may be an issue, however, which is why we have seen a lot of:

Analytic RDBMS innovation.
Short-request applications designed to avoid distributed joins.
Short-request clustered RDBMS that don’t allow fully-general distributed joins in the first place.

But in workloads with low-latency writes, living up to those standards is hard. The 1980s approach to distributed writing was two-phase commit (2PC), which may be summarized as:

A write is planned and parceled out to occur on all the different nodes where the data needs to be placed.
Each node decides it’s ready to commit the write.
Each node informs the others of its readiness.
Each node actually commits.

Unfortunately, if any of the various messages in the 2PC process is delayed, so is the write. This creates way too much likelihood of work being blocked. And so modern approaches to distributed data writing are more … well, if I may repurpose the famous Facebook slogan, they tend to be along the lines of “Move fast and break things”,* with varying tradeoffs among consistency, other accuracy, reliability, functionality, manageability, and performance.

By the way — Facebook recently renounced that motto, in favor of “Move fast with stable infrastructure.” Hmm …

Back in 2010, I wrote about various approaches to consistency, with the punch line being:

A conventional relational DBMS will almost always feature RYW consistency. Some NoSQL systems feature tunable consistency, in which — depending on your settings — RYW consistency may or may not be assured.

The core ideas of RYW consistency, as implemented in various NoSQL systems, are:

Let N = the number of copies of each record distributed across nodes of a parallel system.

Let W = the number of nodes that must successfully acknowledge a write for it to be successfully committed. By definition, W <= N.

Let R = the number of nodes that must send back the same value of a unit of data for it to be accepted as read by the system. By definition, R <= N.

The greater N-R and N-W are, the more node or network failures you can typically tolerate without blocking work.

As long as R + W > N, you are assured of RYW consistency.

That bolded part is the key point, and I suggest that you stop and convince yourself of it before reading further.

Eventually :), Dan Abadi claimed that the key distinction is synchronous/asynchronous — is anything blocked while waiting for acknowledgements? From many people, that would simply be an argument for optimistic locking, in which all writes go through, and conflicts — of the sort that locks are designed to prevent — cause them to be rolled back after-the-fact. But Dan isn’t most people, so I’m not sure — especially since the first time I met Dan was to discuss VoltDB predecessor H-Store, which favors application designs that avoid distributed transactions in the first place.

One idea that’s recently gained popularity is a kind of semi-synchronicity. Writes are acknowledged as soon as they arrive at a remote node (that’s the synchronous part). Each node then updates local permanent storage on its own, with no further confirmation. I first heard about this in the context of replication, and generally it seems designed for replication-oriented scenarios.

Single-job fault-tolerance

Finally, let’s consider fault-tolerance within a single long-running job, whether that’s a big query or some other kind of analytic task. In most systems, if there’s a failure partway through a job, they just say “Oops!” and start it over again. And in non-extreme cases, that strategy is often good enough.

Still, there are a lot of extreme workloads these days, so it’s nice to absorb a partial failure without entirely starting over.

Hadoop MapReduce, which stores intermediate results anyway, finds it easy to replay just the parts of the job that went awry.
Spark, which is more flexible in execution graph and data structures alike, has a similar capability.

Additionally, both Hadoop and Spark support speculative execution, in which several clones of a processing step are executed at once (presumably on different nodes), to hedge against the risk that any one copy of the process runs slowly or fails outright. According to my notes, speculative execution is a major part of NuoDB’ architecture as well.

Further topics

I’ve rambled on for two long posts, which seems like plenty — but this survey is in no way complete. Other subjects I could have covered include but are hardly limited to:

Occasionally-connected operation, which for example is a design point of CouchDB, SQL Anywhere (sort of), and most kinds of mobile business intelligence.
Avoiding planned downtime — i.e., operating despite self-inflicted wounds.
Data cleaning and master data management, both of which exist in large part to fix errors people have made in the past.

Related links

Uninterrupted DBMS operation (September, 2012)
The cardinal rules of DBMS development (March, 2013)
Bottleneck Whack-A-Mole (August, 2009)

Introduction to Cloudant

Curt Monash — Sun, 03 Jun 2012 11:00:48 +0000

Cloudant is one of the few NoSQL companies with >100 paying subscription customers. For starters:

Cloudant’s core software is a fork of CouchDB.
Cloudant only sells you software as a service.
More precisely, whether Cloudant offers DBaaS (DataBase as a Service) or PaaS (Platform as a Service) or a “data layer” (Cloudant’s preferred terminology) depends on your taste in buzzwords.
I gather that Cloudant (the company) wants to handle pretty much all your data management needs. But Cloudant (the product) isn’t there yet, especially on the analytic side.
Before CouchDB and Membase joined together, Cloudant was positioned as the big(ger) data version of CouchDB.

Company demographics include:

Cloudant is based in Boston.
Cloudant started out as a Y Combinator company in 2008, and “got serious” in 2009.
Cloudant now has ~20 employees.
Management hires include a couple of former Vertica guys.

The Cloudant guys gave me some customer counts in May that weren’t much higher than those they gave me in February, and seem to have forgotten to correct the discrepancy. Oh well. The latter (probably understated) figures included ~160 paying customers, of which:

~100 were multitenant.
~60 were single tenant.
1 was on-premise (but still managed by Cloudant) because of privacy concerns.

The largest Cloudant deployments seem to be in the 10s of terabytes, across a very low double digit number of servers.

The difference between single- and multi-tenant Cloudant is:

Just as you would think, single-tenant customers get their own sets of servers, while multi-tenant customers share servers with others.
There’s a fixed pricing scheme for multi-tenant customers, while single-tenant pricing is “let’s make a deal”.

Monthly costs (in dollars) for multi-tenant customers are typically 1-3 digits; for single-tenant they’re typically 4-5.

Despite only being available as a service, Cloudant has a free option too. It has >7000 total sign-ups. 2/3 of sign-ups wind up at least creating a database. But Cloudant doesn’t have figures available for production (as opposed to development-only) use on the free side.

Cloudant has some big-name customers, both among traditional enterprises and internet companies. Two of the flashier ones are:

OMGPOP used Cloudant to build a new subsystem rather than continuing entirely with Membase/Couchbase. However, OMGPOP was acquired by flagship Membase user Zynga, so that relationship is expiring, leaving behind a glowing quote to remember it by.
Monsanto is using Cloudant to manage genomic data (and hence is a non-internet user).

Cloudant says that CouchDB users used to constitute 100% of its pipeline, and still make up a (shrinking) majority.

There’s been some recent drama in the CouchDB world. Couchbase (the company) ran into delays merging CouchDB into Couchbase — often because of performance challenges — and no longer positions Couchbase as a straightforward scale-out enhancement to CouchDB. Realistically, if you like CouchDB but just wish it would scale out, you should still talk to both Couchbase and Cloudant, but it’s no longer the case that Couchbase is the obvious leader of the CouchDB community.

So how do you get at data in Cloudant? The basics seem to be:

CouchDB and Cloudant are JSON-based document-oriented NoSQL stores.
Cloudant’s core indexing system is an append-only b-tree. Supplementary approaches are being researched.
Actually, there are at least two b-trees, one for document_ID and one for time of last update (not original document creation). The latter index is to support a kind of incremental MapReduce, which is used to, among other things:
- Create secondary indexes (and to do so without blocking writes).
- To build simple aggregates.
There’s full-text search based on Lucene libraries (but not the Lucene indexer).
You can replicate from Cloudant to CouchDB, which seems to be the main way to replicate to the outside world.

The essence of Cloudant’s incremental MapReduce seems to be that data is selected only if it’s been updated since the last run. Obviously, this only works for MapReduce algorithms whose eventual output can be run on different subsets of the target data set, then aggregated in a simple way.

Finally, some other technical notes on Cloudant include:

Cloudant’s clustering scheme is much as you’d expect:
- Consistent hashing.
- RYW quorum consistency, with a default of 3 copies (across 2 data centers), 2 reads, and 2 writes.
Cloudant has rewritten various components of CouchDB for performance or performance predictability, often in C (vs. the Erlang that the rest of CouchDB is written in). These include:
- JSON handling.
- I/O prioritization/(mixed) workload management.
- Compaction (which necessitated some changes to the core storage model).
Cloudant has generally preserved CouchDB’s goodness in terms of synchronization and so on, which I gather is based on maintaining a sequence of updates and surfacing cases where multi-master edits cause conflicts.
Multi-tenant servers still use disks (as opposed to solid-state storage). Single-tenant customers can choose among various different configurations.
No doubt Cloudant has written various management and administrative aids, but we didn’t talk about those much. Those are things Cloudant uses, much more than it exposes them to its customers.

Couchbase update

Curt Monash — Thu, 02 Feb 2012 04:00:24 +0000

I checked in with James Phillips for a Couchbase update, and I understand better what’s going on. In particular:

Give or take minor tweaks, what I wrote in my August, 2010 Couchbase updates still applies.
Couchbase now and for the foreseeable future has one product line, called Couchbase.
Couchbase 2.0, the first version of Couchbase (the product) to use CouchDB for persistence, has slipped …
… because more parts of CouchDB had to be rewritten for performance than Couchbase (the company) had hoped.
Think mid-year or so for the release of Couchbase 2.0, hopefully sooner.
In connection with the need to rewrite parts of CouchDB, Couchbase has:
- Gotten out of the single-server CouchDB business.
- Donated its proprietary single-sever CouchDB intellectual property to the Apache Foundation.
The 150ish new customers in 2011 Couchbase brags about are real, subscription customers.
Couchbase has 60ish people, headed to >100 over the next few months.

If you previously heard the brand names Couchbase Single or Couchbase Mobile, pay no further attention to them. Couchbase Single was CouchDB; Couchbase Mobile is part of Couchbase’s feature set.

The current product is Couchbase 1.8, which is a whole lot like what previously was called Membase. New features in Couchbase 1.8 (versus prior versions of Membase) were concentrated in client libraries/SDK (Software Development Kit). Not coincidentally, Couchbase has hired developer evangelists who are in charge of making Couchbase play nicely with various specific languages (e.g. C/C++)

Drilling down further into the CouchDB part of the story:

Couchbase 2.0 will replace Couchbase 1.8/Membase’s SQLite back-end with CouchDB.
Parts of CouchDB that do things like read, write, or compact data have been rewritten from Erlang to C.
Couchbase still uses other Erlang parts of Apache CouchDB, and would be delighted if the community were to usefully enhance them.
Couchbase’s heavy contributions to development of open source CouchDB will, for the most part, continue.
CouchDB stuff donated to the Apache Foundation includes:
- Documentation
- Packaging
- Performance enhancements

There’s at least one Couchbase user with >1000 nodes (at a guess, Zynga). More typical might be 20 nodes or less. This led me to wonder how much data one puts on a Couchbase node anyway. The answer turns out to vary widely, in that you want your working set to be in RAM, and whether that’s your entire database or just a slice of it depends on the nature of the application.

James echoed a trend I’ve heard elsewhere as well, in which products one things of as being internet-specific are also sold in a few cases to conventional enterprises for — you guessed it! — their internet operations. I also asked him about competition, and he asserted:

MongoDB is the big competition. He believes Couchbase has an excellent win rate vs. 10gen for actual paying accounts.
DataStax/Cassandra wins over Couchbase only when multi-data-center capability is important. Naturally, multi-data-center capability is planned for Couchbase. (Indeed, that’s one of the benefits of swapping in CouchDB at the back end.)
Redis has “dropped off the radar”, presumably because there’s no particular persistence strategy for it.
Riak doesn’t show up much.

Notes from the Couch blogs

Curt Monash — Wed, 18 Jan 2012 07:57:09 +0000

Couchbase in general, and CouchDB project founder Damien Katz in particular, are to some extent walking away from CouchDB. That is:

The Couchbase product will not be upward compatible with CouchDB.
Couchbase will no longer offer a CouchDB distribution, and is doing the natural and responsible thing, namely …
… donating to the Apache Foundation the previously proprietary aspects of that distribution.

Even so:

All — or at least “all” — the code Couchbase offers will, at least for now, be open source.

The story unfolded in a bombshell post by Damien, and clarification follow-ups by Damien and by Couchbase CEO Bob Wiederhold. The meatiest of the three was probably Damien’s follow-up, in which he said, among other things:

… maybe I should explain why I think Couchbase is the future?

Simple Fast Elastic.

That’s pretty much it. …

The Membase product was very fast and scalable, but a bit too simple, with no reporting capability or cross-datacenter replication capability.

The CouchDB product has a lot of features, but is too slow, unable to keep up with high loads and inability scale-out on it’s own. …

Our 2.0 product is coming soon, adding CouchDB style views and reporting with a nifty trick for extremely fast failover while maintaining full coherency with the underling distributed data storage (we are calling it our B-Superstar index). We’ll of course have lighting fast reads (same as Memcached) but also very fast durable writes. For 2kb docs, we are currently getting sustained random insert/updates rates of 25k writes/sec, fully durable, with compaction in background so it can go all day and all night. We’ve got some more write work coming soon which we are hoping will give us another performance boost too before 2.0. Stay tuned …

And so while we focus on the features and customers that most quickly make us a viable business (and it’s growing fast), we are still looking to build the features and technology to expand our use cases and, get customers and developers excited. Future versions are planned to have full CouchDB compatible replication technology, with the ability to support all sorts of mobile and embedded databases, such as our new TouchDB projects for iOS and Android.

Meanwhile, in a separate blog post, Bob said that in 2011 Couchbase

… added thousands of open source deployments, as well as more than 150 paying customers who have put thousands of nodes into production throughout the year.

Couchbase business update

Curt Monash — Sun, 14 Aug 2011 04:02:42 +0000

I decided I needed some Couchbase drilldown, on business and technology alike, so I had solid chats with both CEO Bob Wiederhold and Chief Architect Dustin Sallings. Pretty much everything I wrote at the time Membase and CouchOne merged to form Couchbase (the company) still holds up. But I have more detail now.

Context for any comments on customer traction includes:

Membase went into limited production release in October, and full release in January. Similar things are true of CouchDB.
Hence, most sales of Couchbase’s products have been made over the past 6 months.
Couchbase (the merged product) is at this point only in a pre-production developer’s release.
Couchbase has both a direct sales force and a classic open-source “funnel”-based online selling model. Naturally, Couchbase’s understanding of what its customers are doing is more solid with respect to the direct sales base.
Most of Couchbase’s revenue to date seems to have come from a limited number of big-ticket “lighthouse” accounts (as opposed to, say, the larger number of smaller deals that come in through the online funnel).

That said,

Most Membase purchases are for new applications, as opposed to memcached migrations. However, customers are the kinds of companies that probably also are using memcached elsewhere.
Most other Membase purchases are replacements for the Membase/MySQL combination. Bob says those are easy sales with short sales cycles.
Pure memcached support is a small but non-zero business for Couchbase, and a fine source of upsell opportunities.
In the pipeline but not so much yet in the customer base are SaaS vendors and the like who use and may want to replace traditional DBMS such as Oracle. Other than among those, Couchbase doesn’t compete much yet with Oracle et al.
Pure CouchDB isn’t all that much of a business, at least relative to community size, as CouchDB is a single-server product commonly used by people who are content not to pay for support.

Membase sales are concentrated in five kinds of internet-centric companies, which in declining order are:

Social gaming
Ad platforms
Online retail
Online business, including B2B SaaS
Social networking

Bob said that Couchbase often sees MongoDB competitively, but never Riak, HBase, or Redis. I got the impression Couchbase sees at least a little Cassandra. That would, of course, all pertain only to direct sales, rather than download/community kinds of usage.

Couchbase is also excited about the potential for the CouchDB-based Couchbase Mobile occasionally-connected offering. The hottest use cases, interestingly, seem to be non-consumer; Bob rattled off military, farming, and health care, and surely could have named more besides. However, the Couchbase Mobile sales effort still seems to be in early days, as is evidenced by the fact that Couchbase has not yet competitively encountered Sybase SQL Anywhere.

With all that said, I’ll go now to a separate post for a Couchbase technical update.

Membase and CouchOne merged to form Couchbase

Curt Monash — Tue, 08 Feb 2011 05:59:35 +0000

Membase, the company whose product is Membase and whose former company name is Northscale, has merged with CouchOne, the company whose product is CouchDB and whose former name is Couch.io. The result (product and company) will be called Couchbase. CouchDB inventor Damien Katz will join the Membase (now Couchbase) management team as CTO. Couchbase can reasonably be regarded as a document-oriented NoSQL DBMS, a product category I not coincidentally posted about yesterday.

In essence, Couchbase will be CouchDB with scale-out. Alternatively, Couchbase will be Membase with a richer programming interface. The Couchbase sweet spot is likely to be:

Internet applications, especially ones that involve connectivity between a host and mobile devices.
Delivery of data, content, and/or software across a network. (That’s a high-profile CouchDB use case today.)
(Possibly) transactions for virtual goods that have no scarcity. (Once there’s actual inventory involved, the traditional relational database model starts looking pretty appealing.)

And now let’s go to the lists of bullet points.

Background to the Membase/CouchDB/Couchbase integration story:

Membase is a key-value store with the memcached interface. Its strengths are memcached compatibility and performant scale-out. What it stores are in essence JSON documents.
CouchDB is designed for ease of programming, and for built-in handling of occasionally-connected replication. (Not coincidentally, Damien Katz used to work on Lotus Notes.) CouchDB indexes individual data fields for reasonable query capability, although joins are problematic. What CouchDB stores are in essence JSON documents.

Highlights of how Membase works and is deployed today:

Your API is Get/Set, just like in memcached.
To a first approximation, Membase just persists memcached cache at every node. That said, it can certainly store more data per node than fits in cache.
Most Membase installations are in Amazon EC2, where flash memory is not available. Most in-house Membase installations, however, use flash.

Business background on Couchbase predecessors:

Membase raised $15 million, had 20 employees, and has a number of paying customers.
CouchOne raised $2 million, had 16 employees, hadn’t focused much on traditional customer acquisition yet or on building an enterprise edition of the product, and had about 4 customers anyway …
… except that CouchOne’s plans included CouchDB hosting, and there are around 4500 users of same in a free beta that’s on the verge of going non-free. Damien positions his hosting as being focused on high throughput and concurrency, while rival CouchDB host Cloudant is in his opinion more focused on big data.
The apparent repositioning of CouchOne as being highly focused on mobile applications (with unreliable host connections) never really had time to take hold. Indeed …
… Damien asserts that CouchDB has a lot more mission-critical enterprise deployments than MongoDB, whereas he concedes that MongoDB is doing great in a Ruby-centric market.

Happy talk around Membase/CouchDB/Couchbase product integration:

Hey, both Membase and CouchDB talk JSON.
Product strengths and weaknesses are synergistic. For example:
- Membase started with caching technology (memcached). CouchDB doesn’t yet make much use of cache.
- Membase’s back end is SQLite, used in a “dumb” way. CouchDB can presumably do everything the dumb implementation of SQLite can.
- Membase’s scale-out is designed for a single data center, with strict consistency. CouchDB’s is designed for wide-area networks, with eventual consistency. At least one big internet company likes the idea of strict consistency within data centers, but eventual consistency among them.
- The CouchDB interface takes the place of something Membase planned to build called Node Code, which was going to overcome the limitations of a simple key-value interface. Node Code development didn’t ever really get started, and indeed was deferred for a couple of months while CouchOne acquisition discussions were underway. However, Membase did build Node Code’s underpinnings, called the “TAP” interface.
- And on the operations side: Membase has been in Mountain View, right by the CalTrain. CouchOne has been in Oakland, but with a lot of at-home workers. One option is to move the Oakland office to a San Francisco location that, you guessed it, is also right by the CalTrain.

Other technical notes:

The only current API to CouchDB is http/https. memcached protocols will be added to Couchbase.
CouchDB has design documents. These are used to tell you how to do indexes. They’re built on the fly if they don’t already exist. Then there are Javascript functions that update the indexes as documents are added/updated.
In particular, CouchDB has a geospatial index, in a true R-tree. Damien fondly thinks it already has most albeit not all the features of PostgreSQL GIS. I gather CouchDB geospatial will be straightforwardly integrated into Couchbase.
There’s also a CouchDB add-on project for full-text indexing. Damien seems less confident of how that will be integrated into Couchbase.

Finally, I’m curious about the relative performance of Couchbase/Membase and Schooner Membrain when using flash memory. I would guess that the comparison favors Schooner, because of Schooner’s extensive focus on flash optimization. I would also guess that Schooner’s edge is small, because I’d think it would be less than Schooner’s advantage vs. alternative Flash uses on the MySQL side, and Schooner’s MySQL performance advantage seems to be less than 2X even when Schooner is doing the benchmarks.

Notes on document-oriented NoSQL

Curt Monash — Mon, 07 Feb 2011 08:51:08 +0000

When people talk about document-oriented NoSQL or some similar term, they usually mean something like:

Database management that uses a JSON model and gives you reasonably robust access to individual field values inside a JSON (JavaScript Object Notation) object.

Or, if they really mean,

The essence of whatever it is that CouchDB and MongoDB have in common.

well, that’s pretty much the same thing as what I said in the first place.

Of the various questions that might arise, three of the more definitional ones are:

Why JSON rather than XML?
What’s with this fluidity between the terms “document” and “object”?
Are you serious about the lack of joins?

Let me take a crack at each.

Like XML, JSON is a data-interchange format that has been repurposed as a data persistence model. JSON is evidently beating out XML in web applications, for reasons including:

XML is more verbose and slower than JSON. (Whether this matters or not is of course use-case-dependent.)
Like SQL, XML requires what some web programmers regard as too much formalism and up-front specification.
JSON is associated with JavaScript.
JSON is regarded as being more suited to straightforwardly fielded data, while XML is regarded as being more suited to “mixed content” — e.g., real text documents.
In general, XML feels “enterprisey” to developers who don’t like that feel.

One good starting point for recent JSON vs. XML discussion is here. My favorite from the 2007 iteration of the debate is this one.

So, in essence:

The reasons JSON beats XML for web application data interchange have some applicability to web application data storage as well.
There’s ever more JSON around, at the expense of XML.

But truth be told, I don’t think XML and JSON actually go head to head against each other on the DBMS side very often at all. E.g., Dwight Merriman (the 10gen/MongoDB guy) told me he never, ever competes against MarkLogic, and I found that very credible.*

*Proof point: Dwight was clueless about MarkLogic specifics in a way he never would be if they were any kind of competitive consideration for him.

Note that the one area where (almost) everybody agrees XML wins is for what one might call “real” documents. By way of contrast, JSON is best suited for stringing data attributes and values together. So the “documents” that JSON models can indeed just as reasonably be called “objects.”

That said, JSON-based DBMS are not what one would normally call object-oriented DBMS; for an example of those, consider Intersystems Cache’. And just to close the loop on confusion — Cache’ can also be used as an XML DBMS.

As I previously noted, one downside to today’s document-oriented DBMS is that you can’t do joins. Let me now add that I think joins will be added to document DBMS in the future. Plausibility arguments for this opinion include:

MarkLogic — the XML database gold standard — sells to enterprises, and enterprises like joins.
The alternative to joins in CouchDB and MongoDB is in essence MapReduce. Well, Hive proves that you can do joins on top of MapReduce if you want to. (So, for that matter, does Aster Data nCluster; Aster says its SQL parallelism is built on top of MapReduce.)
Intersystems quite happily put SQL on top of an object-oriented DBMS, Cache’. And Cache’ is so similar to an XML DBMS that it in fact is sometimes used as one.

But that is indeed a future. For discussion of the current state of affairs, I refer you to my earlier post on the subject of joinlessness linked above.

Document-oriented DBMS without joins

Curt Monash — Mon, 29 Nov 2010 08:55:40 +0000

When I talked with MarkLogic’s Ken Chestnut about MarkLogic 4.2, I was surprised to learn that MarkLogic really, truly doesn’t do anything like a join. Unlike some other non-SQL DBMS, MarkLogic has no SQL interface, no ODBC or JDBC. Nothing, nada. (MarkLogic has a Java interface for Xquery, but not for anything like SQL.)

Since MarkLogic and other XML DBMS are used in applications for brokerage trades and the like, I used that area as my example for a challenge question: What happens when one brokerage firm buys another? (Similar challenges could be made about medical records or consumer profiling.) The answer was that you just have to update or augment each existing record with the new firm’s information. And by the way, if you choose to augment, then you have the new and old information side-by-side, both of which could conceivably come in handy.

Document-oriented NoSQL DBMS such as CouchDB and MongoDB face similar challenges, of course. I didn’t pursue the matter in depth in either case, but:

If I understood Damien Katz correctly, CouchDB has a view capability that provides some kind of workaround. (A quick web search turned up this page on a kind of entity-relationship modeling in CouchDB and the associated querying.)
Dwight Merriman suggested to me that in MongoDB, you can work around the lack of joins via client-side logic, or by embedding lots of data in each document (e.g., all the line items for an order) and extracting what you need via MapReduce jobs.

I’m not totally sure what I think about joinlessness, but one way of looking at it could be:

The reason we have joins is because we normalize. If it’s OK to be highly denormalized, then it’s less important to have joins.
When normalization is good and denormalization is bad, one or both of two reasons are commonly in play:
- The logical burden of keeping straight all the different places you’d have to update the same data is too great for the poor, overburdened programmers.
- The performance burden of doing all that updating is too great for the poor, overburdened hardware.
For the logical reason to have great force, there has to be a pretty complex schema, or else a frequently changing one. But when schemas change frequently, relational designs have their own problems.
The physical reason automatically has great force if you have huge update volumes and keep many copies of the same data. Otherwise, its strength has a lot to do with the specific architecture of the DBMS. E.g., if it’s a lot cheaper to update a small record than a big one, short rows are better. But otherwise, denormalization may not have that much effect on performance.

Putting all that together, I’m inclined to think that for many applications, it’s OK to denormalize, or to have such a simple schema that normalization is moot. But even so, I’d be a lot more comfortable if a DBMS offered at least some way of doing a join.

All this raises a related question: What are transactions like in document-oriented DBMS? I’ve never pushed the point with MarkLogic, but when they talk of their ACID compliance they sound as if they are using the phrase in the usual way. MongoDB only lets you do transactions in single documents. I’ve never asked the question about CouchDB, but I do note with interest CouchDB’s “crash-only” architecture, which boils down to:

CouchDB shutdown is “instantaneous.”
You can only shut down CouchDB by crashing it.
There’s no way to shut down or crash CouchDB that causes data to be inconsistent.

More on NoSQL and HVSP (or OLRP)

Curt Monash — Thu, 26 Aug 2010 09:10:31 +0000

Since posting last Wednesday morning that I’m looking into NoSQL and HVSP, I’ve had a lot of conversations, including with (among others):

Dwight Merriman of 10gen (MongoDB)
Damien Katz of Couchio (CouchDB)
Matt Pfeil of Riptano (Cassandra)
Todd Lipcon of Cloudera (HBase committer)
Tony Falco of Basho (Riak)
John Busch of Schooner
Ori Herrnstadt of Akiban

By no means do I have time to do these conversations justice, in terms of giving them the write-ups and/or immediate follow-up that they deserve. Indeed, I’ll leave for vacation Saturday morning with my 2000-word NoSQL article still unwritten. So I’ll dump as many observations as I can into one or a few posts now, and play catch-up later as circumstances allow.

In no particular order:

A number of NoSQL offerings have had more uptake to date than most of the scale-out SQL offerings have.
“Document-oriented” NoSQL projects CouchDB and MongoDB have probably had the most users get into production, but perhaps for pretty small systems.
Cassandra and Hbase — the column-group-architecture guys — have probably had the most bang-in-lots-of-writes HVSP production uptake.*
I didn’t talk customer count with Schooner, but the decently-stocked Schooner customer page suggests Schooner may be something of an exception to these generalities.
A lot of these companies are in the low-to-mid-teens of employees.
The SQL-oriented companies, despite having fewer or no customers, often seem to have more money. (One reason I get the impression SQL guys have more money is, frankly, that more of them are talking about engaging my services.)
- Schooner cites $20 million in VC.
- Clustrix cites a figure close to that.
- Basho cites $10 million, plus a new round of $1.5 or $2 or $2.5 million. The new round is at a lowered valuation.
- That same site says Tokutek finally was able to raise some VC. Congrats!
It’s only a two-company trend, but I was pleased to hear that both 10gen/MongoDB and Akiban were seeing Drupal as a major use case or potential use case. No word on rescuing WordPress from its MySQL implementation, alas, but it seems that a Drupal site typically has 40-200+ tables, while a WordPress one has 10ish.
Another trend I think I’m seeing is serious object-oriented apps banging things straight into a simple back end. Workday is a huge example of that. Akiban hopes to do something similar with Hibernate.
Stability and maturity are still issues for many of these products. E.g., HBase isn’t even in Release 1.0 yet. Ditto Cassandra, and surely many of the others. Unsurprisingly, making Cassandra stable is still a challenge.

*As is common for terms I suggest, the “HVSP” name is not getting any traction. What do you think of Marton Trencseni’s suggestion of OLRP, for OnLine Request Processing?

One thing that makes following this area interesting is that so many projects are open source, leading there to be a lot of information in the wild. I hardly have time to read the mailing list for each project; but the people I talk with do, and often they may sorta kinda remember something somebody else posted one or several months back. As just one example, the mailing lists are said to confirm:

Contrary to rumor, Facebook hasn’t moved in-box search off of Cassandra.
Apparently, however, it’s true that Cassandra inventor Facebook has stopped working on Cassandra, and Facebook’s core Cassandra developers have shifted over to HBase.

Also, figuring out usage of open source software can be … interesting.

People who use open source software don’t have to reveal themselves, as there’s no purchase transaction to kick things off.
On the other hand, if they’re serious enough in their use, they often do.
- There are two main ways to get tech support for open source software — the community or a company that sells support — and both ways let the main support-selling company know that one is a user.
- Some folks even add themselves to open lists of users, for example these rather long lists for HBase and CouchDB.
- Or they show up at conferences. For example, two tweets from Riptano founder Jonathan Ellis suggest at least 30 production Cassandra users were represented at a recent event. That’s more detail than his colleague Matt Pfeil wanted to give me when talked.

OK. This post has gotten pretty long, even without me saying anything resembling an overview of any of the seven companies I listed up top, or of their products’ adoption. So I’ll just publish this now.

Clearing some of my buffer

Curt Monash — Wed, 22 Apr 2009 17:21:46 +0000

I have a large number of posts still in backlog. For starters, there are ones based on recent visits with Aster, Greenplum, Sybase, Vertica, and a Very Large User. I suspect I’ll write more soon on Oracle as well. Plus there’s my whole future-of-online-media area. And quite a bit more will grow out of planned research.

So there are a whole lot of other worthy subjects I doubt I’ll be getting to any time soon. In some cases, of course, other people are doing great jobs of writing about same. Here are pointers to a few links that I am glad to recommend:

I wrote recently that I’ve discovered a number of different in-memory OLAP engines. Cindi Howson far outdid that, writing at length for Intelligent Enterprise on in-memory analytics, in an article that seems to itself be a teaser for a longer, free white paper on the subject.
CouchDB posted an eye-catching, risque slide presentation promoting CouchDB and, more generally, key-value stores, at least for internet applications. And yes, they’ve integrated MapReduce.
Merv Adrian posted favorably about Birst, with special reference to its OEM efforts. As previously noted, I was highly unimpressed with Birst’s end-user BI story at the time of its September roll-out, and Jerome Pineau’s recent examination did nothing to reassure me. But perhaps OEM is a different matter.
Merv also offers an interesting post about data integration upstart Expressor, and a highly favorable one about “visualization” vendor Tableau.
Ann All interviewed Nigel Pendse, who grumped that BI features are overrated, and what end users really want is great query performance. I’m not so sure about the features side of that, but I’m hugely in agreement about the performance. That’s a big part of why the analytic DBMS industry is so vibrant. It’s also why in-memory OLAP is suddenly so hot.