August 13, 2011

Couchbase technical update

My Couchbase business update with Bob Wiederhold was very interesting, but it didn’t answer much about the actual Couchbase product. For that, I talked with Dustin Sallings. We jumped around a lot, and some important parts of the Couchbase product haven’t had their designs locked down yet anyway. But here’s at least a partial explanation of what’s up.

memcached is a way to cache data in RAM across a cluster of servers and have it all look logically like a single memory pool, extremely popular among large internet companies. The Membase product — which is what Couchbase has been selling this year — adds persistence to memcached, an obvious improvement on requiring application developers to write both to memcached and to non-transparently-sharded MySQL. The main technical points in adding persistence seem to have been:

Couchbase is essentially Membase improved by integrating CouchDB into it, with the main changes being:

Let’s drill down a bit into Membase/Couchbase clustering and consistency.

*Edit: They’re called vbuckets.

So if we consider Membase technology alone, Couchbase is CA in the CAP Theorem.  CouchDB, however, is gloriously AP in the CAP Theorem, in that it was written to assume an occasionally connected topology.* Based on that, Couchbase will allow AP operation between data centers (i.e. “stay synchronized if you can, to within the limitations of physics and so on, but don’t beat yourself up on the rare occasions that you can’t.”) I don’t know that that capability will quite be in the first release of Couchbase, but it’s coming soon.

*CouchDB also has other features friendly to occasionally-connected use cases, such as a lot of flexibility as to which parts of the database are or aren’t synced when you do reconnect. These are at the heart of the Couchbase Mobile offering.

memcached and Membase have a very simple key-value interface. CouchDB adds secondary indexes and so on. I think in the first release of Couchbase this is pretty much like having two different APIs for the same product; more elegant integration is planned down the road, and more language support as well.

The highest-performing way to use Couchbase will probably always be to just pretend it is Membase, which is to say memcached+. Dustin told me of Membase users who demanded 10-40 millisecond response times, and that not even for single queries but rather for sequences of several queries in succession. He further told me of customers asking for 1-200 microsecond response, and insisting on no worse than 1 millisecond. Frankly, the first requirement could be met by lots of technologies I can think of, at least if  you don’t rely on disk; the second is thoroughly impossible if you rely on disk, and pretty demanding no matter what kind of hardware and storage you have.

Couchbase performance against disk is a work in progress. CouchDB started out 8X slower than SQLite as a backing store, apples to apples, but Couchbase is fixing that before they roll the product out. (After all, they wouldn’t want to slow the product down in the course of an upgrade.) Beyond that, when you do exploit the indexing capability of CouchDB, performance of course slows down. Work is underway to lower the performance hit; I imagine much improvement can indeed be made, given how few resources CouchDB has been able to devote to date to Bottleneck Whack-A-Mole.

Comments

6 Responses to “Couchbase technical update”

  1. Vlad Rodionov on August 14th, 2011 12:32 pm

    I am just trying to find a use case where Membase/Couchbase is a good fit and Cassandra/Voldemort are not (except probably low latency applications, but as far as I understand with persistence on and under heavy load Membase is starting having serious latency issues as well).

  2. Toby DiPasquale on August 14th, 2011 3:42 pm

    The people asking for microsecond latency are either in HFT or the RTB ad space. Redis can make ~1ms round-trip latency quite fine on low-end hardware with 1K records (without pipelining, even). As well, the snapshot backup effects on performance can be mitigated by only backing up on the slaves.

    I don’t see how the integration of CouchDB will help this scenario for the Couchbase crew at all since these systems tend to be extremely online and used as scratch store/counters for data needed very quickly in an ad impression selection/matching decision engine. The backups for these systems need to be restored, but typically aren’t themselves mined very heavily; that’s typically accomplished with a separate system altogether.

    One thing I will say for Membase is that vbuckets are much better than consistent hashing. The transparent movement of data when nodes are added or removed is the big win with Membase at current. Having said that, it would not be very hard to replicate this feature for Redis if someone were so inclined.

  3. Curt Monash on August 14th, 2011 6:24 pm

    Toby,

    I don’t see why the integration of Couch into Membase would add performance either. It adds features, in indexing and in replication/sync. Not coincidentally, indexing (which is to say query flexibility) is where a pure key-value store is at its worst, and replication/sync is where CouchDB is at its best.

  4. Curt Monash on August 14th, 2011 6:27 pm

    Vlad,

    Cassandra vs. Voldemort vs. Couch starts out as a programming model kind of decision. Of course, Couch is a work in progress there, but then all these systems are a work in progress pretty much everywhere.

  5. REST, Ruby On Rails, CouchDB and Me – Part 6 Getting The Data Ready for CouchDB « Cloud2013 Or Bust on August 18th, 2011 2:28 pm

    [...] Couchbase technical update (dbms2.com) =>ShareLike this:LikeBe the first to like this post. [...]

  6. kimberlad on November 18th, 2011 4:21 am

    I have to say I’m still a bit confused. What couchbase product do I need and does it currently exist if I want to do the following:

    Collect data in standard data format (JSON), process that data (mapreduce) and then allow the aggregated data to be accessed via an API at a internet scale? and I want to scale that configuration i.e mater mater replication etc

    I get Couchdb is the back end and memchache can server and assumed that couchbase would combine both, but looking on the couchbase website it does not appear to be the case:

    http://www.couchbase.com/products-and-services/overview/which-product

    i.e couchbase single server does not seem to have memcache integration. So is the post above describing the next evolution of couchbase?

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.