<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Cache</title>
	<atom:link href="http://www.dbms2.com/category/memory-centric-data-management/cache/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 09 Feb 2012 09:21:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Couchbase technical update</title>
		<link>http://www.dbms2.com/2011/08/13/couchbase-technical-update/</link>
		<comments>http://www.dbms2.com/2011/08/13/couchbase-technical-update/#comments</comments>
		<pubDate>Sun, 14 Aug 2011 04:08:03 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cache]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[memcached]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5081</guid>
		<description><![CDATA[My Couchbase business update with Bob Wiederhold was very interesting, but it didn&#8217;t answer much about the actual Couchbase product. For that, I talked with Dustin Sallings. We jumped around a lot, and some important parts of the Couchbase product haven&#8217;t had their designs locked down yet anyway. But here&#8217;s at least a partial explanation [...]]]></description>
			<content:encoded><![CDATA[<p>My <a href="http://www.dbms2.com/2011/08/13/couchbase-business-update/">Couchbase business update</a> with Bob Wiederhold was very interesting, but it didn&#8217;t answer much about the actual Couchbase product. For that, I talked with Dustin Sallings. We jumped around a lot, and some important parts of the Couchbase product haven&#8217;t had their designs locked down yet anyway. But here&#8217;s at least a partial explanation of what&#8217;s up.</p>
<p>memcached is a way to cache data in RAM across a cluster of servers and have it all look logically like a single memory pool, extremely popular among large internet companies. The Membase product &#8212; which is what Couchbase has been selling this year &#8212; adds persistence to memcached, an obvious improvement on requiring application developers to write both to memcached and to <a href="http://www.dbms2.com/2011/02/24/transparent-sharding/">non-transparently-sharded MySQL</a>. The main technical points in adding persistence seem to have been:</p>
<ul>
<li>A <strong>persistent backing store</strong> (duh), namely SQLite.</li>
<li>A <strong>change to the hashing algorithm,</strong> to avoid losing data when the cluster configuration is changed.</li>
</ul>
<p>Couchbase is essentially Membase improved by integrating CouchDB into it, with the main changes being:</p>
<ul>
<li><strong>Changing the backing store to CouchDB</strong> (duh). This will be in the first Couchbase release.</li>
<li><strong>Adding cross data center replication on CouchDB&#8217;s consistency model.</strong> This will not, I believe, be in the first Couchbase release.</li>
<li><strong>Offering CouchDB&#8217;s programming and query interfaces as an option.</strong> So far as I can tell, this will be implemented straightforwardly in the first Couchbase release, with elegance planned for later down the road.</li>
</ul>
<p>Let&#8217;s drill down a bit into <strong>Membase/Couchbase clustering and consistency. </strong><span id="more-5081"></span></p>
<ul>
<li>When data is written to RAM in memcached, it immediately gets copied to another server. The same is of course true in Membase/Couchbase. The terminology on all this is confusing, but I think:
<ul>
<li>The portion of data that is stored as a primary copy on any given server is called a &#8220;shard&#8221;.</li>
<li>That would seem to make sense, as that data could correspond to what goes &#8212; <a href="http://www.dbms2.com/2011/02/24/transparent-sharding/">non-transparently</a> &#8212; into an instance of MySQL in a classical memcached/MySQL set-up.</li>
</ul>
</li>
<li>Updates are of course also banged to disk ASAP &#8212; but at times of heavy load, that can take a while. A few seconds to a couple of minutes is normal operation; if it takes an hour, you really should buy more hardware. (Or solid-state storage.)</li>
<li>Similarly, the replication of data to a second machine&#8217;s RAM may not happen at times of heavy load &#8212; and that&#8217;s another sign you don&#8217;t have enough machines.</li>
<li>Each Membase/Couchbase &#8220;shard&#8221; has lots of logical sub-shards.* (1024 for now, at least as default, although Dustin finds that number excessive and is looking to lower it.)  So if you add a node, some of the sub-shards get sent over to the new node. Unlike the case for straight memcached, no data is lost from cache (and of course not also from the persistent store). Blocking of operations from such a move only happens in narrow time windows, and then only in edge cases.</li>
</ul>
<p><em>*Edit: They&#8217;re called <a href="http://dustin.github.com/2010/06/29/memcached-vbuckets.html">vbuckets</a>.</em></p>
<p><em></em>So if we consider Membase technology alone, Couchbase is CA in the CAP Theorem.  CouchDB, however, is gloriously AP in the CAP Theorem, in that it was written to assume an occasionally connected topology.* Based on that, Couchbase will allow AP operation between data centers (i.e. &#8220;stay synchronized if you can, to within the limitations of physics and so on, but don&#8217;t beat yourself up on the rare occasions that you can&#8217;t.&#8221;) I don&#8217;t know that that capability will quite be in the first release of Couchbase, but it&#8217;s coming soon.</p>
<p><em>*CouchDB also has other features friendly to occasionally-connected use cases, such as a lot of flexibility as to which parts of the database are or aren&#8217;t synced when you do reconnect. These are at the heart of the Couchbase Mobile offering.</em></p>
<p>memcached and Membase have a very simple key-value interface. CouchDB adds secondary indexes and so on. I think in the first release of Couchbase this is pretty much like having two different APIs for the same product; more elegant integration is planned down the road, and more language support as well.</p>
<p>The highest-performing way to use Couchbase will probably always be to just pretend it is Membase, which is to say memcached+. Dustin told me of Membase users who demanded 10-40 millisecond response times, and that not even for single queries but rather for sequences of several queries in succession. He further told me of customers asking for 1-200 microsecond response, and insisting on no worse than 1 millisecond. Frankly, the first requirement could be met by lots of technologies I can think of, at least if  you don&#8217;t rely on disk; the second is thoroughly impossible if you rely on disk, and pretty demanding no matter what kind of hardware and storage you have.</p>
<p>Couchbase performance against disk is a work in progress. CouchDB started out 8X slower than SQLite as a backing store, apples to apples, but Couchbase is fixing that before they roll the product out. (After all, they wouldn&#8217;t want to slow the product down in the course of an upgrade.) Beyond that, when you do exploit the indexing capability of CouchDB, performance of course slows down. Work is underway to lower the performance hit; I imagine much improvement can indeed be made, given how few resources CouchDB has been able to devote to date to <a href="http://www.dbms2.com/2009/08/21/bottleneck-whack-a-mole/">Bottleneck Whack-A-Mole</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/08/13/couchbase-technical-update/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Soundbites: the Facebook/MySQL/NoSQL/VoltDB/Stonebraker flap, continued</title>
		<link>http://www.dbms2.com/2011/07/15/facebook-mysql-nosql-voltdb-stonebraker/</link>
		<comments>http://www.dbms2.com/2011/07/15/facebook-mysql-nosql-voltdb-stonebraker/#comments</comments>
		<pubDate>Fri, 15 Jul 2011 08:27:18 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Akiban]]></category>
		<category><![CDATA[Cache]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Clustrix]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[MongoDB and 10gen]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[ScaleBase]]></category>
		<category><![CDATA[ScaleDB]]></category>
		<category><![CDATA[Schooner Information Technology]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Tokutek]]></category>
		<category><![CDATA[VoltDB and H-Store]]></category>
		<category><![CDATA[dbShards and CodeFutures]]></category>
		<category><![CDATA[memcached]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4977</guid>
		<description><![CDATA[As a follow-up to the latest Stonebraker kerfuffle, Derrick Harris asked me a bunch of smart followup questions. My responses and afterthoughts include: Facebook et al. are in effect Software as a Service (SaaS) vendors, not enterprise technology users. In particular: They have the technical chops to rewrite their code as  needed. Unlike packaged software [...]]]></description>
			<content:encoded><![CDATA[<p>As a follow-up to the latest <a href="http://www.dbms2.com/2011/07/14/an-odd-claim-attributed-to-mike-stonebraker/">Stonebraker kerfuffle</a>, Derrick Harris asked me a bunch of smart followup questions. My responses and afterthoughts include:</p>
<ul>
<li>Facebook et al. are in effect Software as a Service (SaaS) vendors, not enterprise technology users. In particular:
<ul>
<li>They have the technical chops to rewrite their code as  needed.</li>
<li>Unlike packaged software vendors, they&#8217;re not answerable to anybody for keeping legacy code alive after a rewrite. That makes migration a lot easier.</li>
<li>If they want to write different parts of their system on different technical underpinnings, nobody can stop them. For example &#8230;</li>
<li>&#8230;  <a href="http://www.dbms2.com/2008/07/21/project-cassandra-facebook-open-sourced-quasi-dbms/">Facebook innovated Cassandra</a>, and is now heavily committed to HBase.</li>
</ul>
</li>
<li>It makes little sense to talk of Facebook&#8217;s use of &#8220;MySQL.&#8221; Better to talk of Facebook&#8217;s use of &#8220;MySQL +  memcached  + non-transparent sharding.&#8221; That said:
<ul>
<li>It&#8217;s hard to see why somebody today would use MySQL +  memcached  + non-transparent sharding for a new project. At least one of <a href="http://www.dbms2.com/2011/02/08/couchbase-membase-couchone-couchdb/">Couchbase</a> or <a href="http://www.dbms2.com/2011/02/24/transparent-sharding/">transparently-sharded</a> MySQL is very likely a superior alternative. Other alternatives might be better yet.</li>
<li>As noted above in the example of Facebook, the many major web businesses that are using MySQL +  memcached  + non-transparent sharding for existing projects can be presumed able to migrate away from that stack as the need arises.</li>
</ul>
</li>
</ul>
<p>Continuing with that discussion of DBMS alternatives:</p>
<ul>
<li>If you just want to write to the memcached API anyway, why not go with Couchbase?</li>
<li>If you want to go relational, why not go with MySQL? There are many alternatives for scaling or accelerating MySQL &#8212; dbShards, Schooner, Akiban, Tokutek, ScaleBase, ScaleDB, Clustrix, and Xeround come to mind quickly, so there&#8217;s a great chance that one or more will fit your use case. (And if you don&#8217;t get the choice of MySQL flavor right the first time, porting to another one shouldn&#8217;t be all THAT awful.)</li>
<li>If you really, really want to go in-memory, and don&#8217;t mind writing Java stored procedures, and don&#8217;t need to do the kinds of joins it isn&#8217;t good at, but do need to do the kinds of joins it is, VoltDB could indeed be a good alternative.</li>
</ul>
<p>And while we&#8217;re at it &#8212; going <strong>schema-free</strong> often makes a whole lot of sense. I need to write much more about the point, but for now let&#8217;s just say that I look favorably on the Big Four schema-free/NoSQL options of MongoDB, Couchbase, HBase, and Cassandra.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/15/facebook-mysql-nosql-voltdb-stonebraker/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>An odd claim attributed to Mike Stonebraker</title>
		<link>http://www.dbms2.com/2011/07/14/an-odd-claim-attributed-to-mike-stonebraker/</link>
		<comments>http://www.dbms2.com/2011/07/14/an-odd-claim-attributed-to-mike-stonebraker/#comments</comments>
		<pubDate>Thu, 14 Jul 2011 11:10:34 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cache]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[Games and virtual worlds]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[VoltDB and H-Store]]></category>
		<category><![CDATA[memcached]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4964</guid>
		<description><![CDATA[This post has a sequel. Last week, Mike Stonebraker insulted MySQL and Facebook&#8217;s use of it, by implication advocating VoltDB instead. Kerfuffle ensued. To the extent Mike was saying that non-transparently sharded MySQL isn&#8217;t an ideal way to do things, he&#8217;s surely right. That still leaves a lot of options for massive short-request databases, however, [...]]]></description>
			<content:encoded><![CDATA[<p><em>This post has a <a href="http://www.dbms2.com/2011/07/15/facebook-mysql-nosql-voltdb-stonebraker/">sequel</a>.</em></p>
<p>Last week, Mike Stonebraker <a href="http://gigaom.com/cloud/facebook-trapped-in-mysql-fate-worse-than-death/">insulted MySQL and Facebook&#8217;s use of it</a>, by implication advocating <a href="http://www.dbms2.com/2010/06/30/details-and-analysis-of-the-voltdb-argument/">VoltDB</a> instead. Kerfuffle ensued. To the extent Mike was saying that non-transparently sharded MySQL isn&#8217;t an ideal way to do things, he&#8217;s surely right. That still leaves a lot of options for massive <a href="http://www.dbms2.com/2011/03/02/short-request-processing/">short-request</a> databases, however, including <a href="http://www.dbms2.com/2011/02/24/transparent-sharding/">transparently sharded</a> RDBMS, scale-out <a href="http://www.dbms2.com/2011/05/23/databases-ram/">in-memory DBMS</a> (whether or not VoltDB*), and various NoSQL options. If nothing else, <a href="http://www.dbms2.com/2011/02/08/couchbase-membase-couchone-couchdb/">Couchbase</a> would seem superior to memcached/non-transparent MySQL if you were starting a project today.</p>
<p><em>*The big problem with VoltDB, last I checked, was its reliance on Java stored procedures to get work done.</em></p>
<p>Pleasantries continued in <em><a href="http://www.theregister.co.uk/2011/07/13/mike_stonebraker_versus_facebook/">The Register</a>,</em> which got an amazing-sounding quote from Mike. If <em>The Reg</em> is to be believed &#8212; something <a href="http://www.monashreport.com/2006/03/22/goodmail-esther-dyson-andrew-orlowski-etc/">I wouldn&#8217;t necessarily take for granted</a> &#8212; Mike claimed that he (i.e. VoltDB) knows how to solve the <strong>distributed join</strong> performance problem.  <span id="more-4964"></span></p>
<blockquote><p>So, it&#8217;s Stonebraker against the web. And the difference of option is  severe. In May, at a MongoDB developer conference in San Francisco,  Mongo creator Dwight Merriman told his audience there was &#8220;no way&#8221; to do distributed joins in a way that really scales.  &#8220;I&#8217;m not smart enough to do distributed joins that scale horizontally,  widely, and are super fast. You have to choose something else. We have  no choice but to not be relational,&#8221; he said</p>
<p>&#8220;You can do distributed transactions, but if you do them with no loss  of generality and you do them across a thousand machines, it&#8217;s not  going to be that fast.&#8221;</p>
<p>Stonebraker says precisely the opposite, and in typical fashion, he  goes right for the jugular. &#8220;I reject what Merriman says out of hand,&#8221;  he tells <em>The Register</em>. Merriman and his company, 10gen, declined  to comment for this story. But Stonebaker says words don&#8217;t matter. As  much as he likes to wield his opinions, he insists the debate will be  decided elsewhere. &#8220;Let the bake-off begin,&#8221; he crows.</p></blockquote>
<p>But when last I checked, VoltDB made nowhere near that claim. And well it shouldn&#8217;t have. In the fully general case, there&#8217;s no way to ensure super distributed join performance other than by throwing lots and lots of gear at the problem. But if you do that, many alternatives are fast. More specialized cases may be a different matter &#8212; but there are many fast alternatives for those too.</p>
<p>I imagine there will be use cases for which VoltDB sustains a lead as the truly fastest alternative, similarly-architected competitors perhaps excepted.* But what Mike supposedly said seems quite forward-leaning when compared to technical reality.</p>
<p><em>*The canonical VoltDB use case is <a href="http://www.dbms2.com/2010/05/25/voltdb-finally-launches/">e-commerce in virtual goods</a>, the point of &#8220;virtual&#8221; being that physical inventory might necessitate costlier kinds of joins.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/14/an-odd-claim-attributed-to-mike-stonebraker/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Traditional databases will eventually wind up in RAM</title>
		<link>http://www.dbms2.com/2011/05/23/databases-ram/</link>
		<comments>http://www.dbms2.com/2011/05/23/databases-ram/#comments</comments>
		<pubDate>Mon, 23 May 2011 16:05:24 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Cache]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Oracle TimesTen]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[VoltDB and H-Store]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[solidDB]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4520</guid>
		<description><![CDATA[In January, 2010, I posited that it might be helpful to view data as being divided into three categories: Human/Tabular data –i.e., human-generated data that fits well into relational tables or arrays. Human/Nontabular data — i.e., all other data generated by humans. Machine-Generated data. I won&#8217;t now stand by every nuance in that post, which [...]]]></description>
			<content:encoded><![CDATA[<p>In January, 2010, I posited that <a href="http://www.dbms2.com/2010/01/17/three-broad-categories-of-data/">it might be helpful to view data as being divided into three categories</a>:</p>
<ul>
<li><strong>Human/Tabular</strong> data –i.e., human-generated data that  fits well 	into relational tables or arrays.</li>
<li><strong>Human/Nontabular</strong> data — i.e., all other data  generated by humans.</li>
<li><strong>Machine-Generated</strong> data.</li>
</ul>
<p>I won&#8217;t now stand by every nuance in that post, which may differ slightly from those in my more recent posts about <a href="http://www.dbms2.com/2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a> and <a href="http://www.dbms2.com/2011/05/17/poly-structured-database/">poly-structured databases</a>. But one general idea is hard to dispute:</p>
<p><strong>Traditional database data</strong> &#8212; records of human transactional activity, referred to as &#8220;Human/Tabular data above&#8221; &#8212; <strong>will not grow as fast as Moore&#8217;s Law makes computer chips cheaper.</strong></p>
<p>And that point has a straightforward corollary, namely:</p>
<p><strong>It will become ever more affordable to</strong><strong> put traditional database data entirely into RAM. </strong> <span id="more-4520"></span> </p>
<p>Actually, there are numerous ways for OLTP, other <a href="http://www.dbms2.com/2011/03/30/short-request-and-analytic-processing/">short-request</a>, and some analytic databases to wind up in RAM.</p>
<ul>
<li><a href="http://www.dbms2.com/2009/07/07/hasso-plattner-calls-for-in-memory-oltp-column-stores/">SAP has some good ideas</a> for how it could happen, banging transactions into what is essentially an in-memory analytic database. (I dispute SAP&#8217;s claims of transformational database technology leadership, but that doesn&#8217;t mean the underlying ideas aren&#8217;t good.)</li>
<li>For those who can afford the associated technology disruption, <a href="http://www.dbms2.com/2011/05/21/object-oriented-database-management-systems-oodbms/">memory-centric object-oriented DBMS</a> could be appealing.</li>
<li>Web scalability best practices commonly include keeping data in RAM (e.g., that&#8217;s pretty much the point of caching layer memcached).</li>
<li>SaaS (Software as a Service) companies &#8212; such as <a href="http://www.dbms2.com/2010/08/22/workday-technology-stack/">Workday</a> &#8212; often bring a particular tenant&#8217;s database entirely into RAM.</li>
<li><a href="http://www.dbms2.com/2010/06/12/the-underlying-technology-of-qlikview/">QlikView</a> highlights the benefits of doing business intelligence in RAM.</li>
<li><a href="http://www.dbms2.com/2011/04/21/sas-hpa-does-make-sense-after-all/">SAS HPA</a> makes the argument that even &#8220;big data analytics&#8221; should sometimes be done in RAM.</li>
<li>I don&#8217;t have particularly favorable opinions at this time about marketing strategies or momentum at <a href="http://www.dbms2.com/2008/12/29/ordinary-oltp-dbms-vs-memory-centric-processing/">Oracle TimesTen, IBM solidDB</a>, or <a href="http://www.dbms2.com/2010/06/30/details-and-analysis-of-the-voltdb-argument/">VoltDB</a>, but those examples at least serve to illustrate that memory-centric OLTP DBMS have existed for years.</li>
<li>Actually, SAP has at least two good ideas, if you count <a href="http://www.dbms2.com/2010/02/05/sybase-aleri-rap/">Sybase</a> as part of SAP.</li>
</ul>
<p>And here&#8217;s the kicker: Intel told me last year that <strong>CPUs are headed to 46-bit address spaces around mid-decade.</strong> Indeed, they hired me to help figure out if that was enough.* That multiplies out to <strong>64 terabytes of RAM on a single server,</strong> chip costs permitting. So most of what we now think of as operational databases &#8212; and many of the analytic ones too &#8212; will fit in-memory, even if they run very large businesses.</p>
<p><em>*And did so without putting the discussion under any kind of NDA.</em></p>
<p>Likely consequences of all this include:</p>
<ul>
<li><strong>Legacy apps will</strong> (eventually)<strong> be consolidated and virtualized in-memory.</strong> Their underlying databases will grow so slowly that eventually the cost of putting them in RAM will be too low to worry about.</li>
<li><strong>Expensive storage systems will </strong>(continue to)<strong> be irrelevant to database processing. </strong>Databases that don&#8217;t fit in RAM will typically be big enough to require the attention of a lot of CPUs &#8212; and in those cases the DBMS software itself will handle all the storage tasks.</li>
<li><strong>Major OLTP DBMS vendors, </strong>such as Oracle,<strong> will need alternate in-memory code lines, </strong>because disk-centric architectures are sub-optimal in-memory. Well, that&#8217;s what they have those big R&amp;D budgets for.</li>
<li><strong>SaaS vendors and web businesses may not rely on today&#8217;s major OLTP DBMS vendors.</strong> (I was going to say &#8220;won&#8217;t&#8221; rather than &#8220;may not&#8221; until I recalled the likely M&amp;A endgame.) Traditional enterprises may blanch at migrating away from their legacy DBMS environments, but the trade-offs are different for technology companies using DBMS as subsystems.</li>
</ul>
<p>Of course, the same trends that make data-storing chips cheaper will make data-generating chips cheaper too. So, just as there are huge amounts of machine-generated data that you&#8217;d never pay to store in RAM, the same will still be true 10 years from now; the data volumes involved will just be a lot bigger. And thus there will still be plenty of very large analytic databases using relatively cheap forms of storage, perhaps even disk.</p>
<p>But <strong>OLTP and other short-request processing are likely to wind up in-memory.</strong> And the same may be true for a considerable amount of <strong>analytics,</strong> especially but not only if the analytics have a low-latency requirement.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/05/23/databases-ram/feed/</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
		<item>
		<title>Object-oriented database management systems (OODBMS)</title>
		<link>http://www.dbms2.com/2011/05/21/object-oriented-database-management-systems-oodbms/</link>
		<comments>http://www.dbms2.com/2011/05/21/object-oriented-database-management-systems-oodbms/#comments</comments>
		<pubDate>Sat, 21 May 2011 10:45:49 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cache]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Intersystems and Cache']]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Objectivity and Infinite Graph]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Starcounter]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4512</guid>
		<description><![CDATA[There seems to be a fair amount of confusion about object-oriented database management systems (OODBMS). Let&#8217;s start with a working definition: An object-oriented database management system (OODBMS, but sometimes just called &#8220;object database&#8221;) is a DBMS that stores data in a logical model that is closely aligned with an application program&#8217;s object model. Of course, [...]]]></description>
			<content:encoded><![CDATA[<p>There seems to be a fair amount of confusion about object-oriented database management systems (OODBMS). Let&#8217;s start with a working definition:</p>
<p><strong>An object-oriented database management system</strong> (OODBMS, but sometimes just called &#8220;object database&#8221;) is a <strong>DBMS that stores data in a logical model that is closely aligned with an application program&#8217;s object model. </strong>Of course, an OODBMS will have a physical data model optimized for the kinds of logical data model it expects.</p>
<p>If you&#8217;re guessing from that definition that there can be difficulties drawing boundaries between the application, the application programming language, the data manipulation language, and/or the DBMS &#8212; you&#8217;re right. Those difficulties have been a big factor in relegating OODBMS to being a relatively niche technology to date.</p>
<p>Examples of what I would call OODBMS include:  <span id="more-4512"></span></p>
<ul>
<li>Intersystems Cache&#8217;, <a href="../../../../../2010/01/15/intersystems-cache-highlights/">the most successful OODBMS product by far</a>, with good OLTP (OnLine Transaction Processing) capabilities and a strong presence in the health care market. Although it was designed around the specialized MUMPS/M language, Cache&#8217; happily talks Java and SQL.</li>
<li><a href="../../../../../2008/02/01/dan-weinreb-on-objectstore/">ObjectStore</a>, a well-pedigreed startup a couple decades ago, which wound up focusing on complex objects in markets such as computer-aided design. ObjectStore was eventually sold to Progress Software, which is positioning ObjectStore more as a <a href="http://web.progress.com/en/objectstore/">distributed caching system</a> than anything else (<a href="../../../../../2005/10/10/the-amazoncom-bookstore-is-a-huge-modern-oltp-app-so-is-it-relational/">Amazon</a> was an impressive reference for that use case). That said, Progress&#8217; ObjectStore business is small, as is its ObjectStore level of effort. Both Cache&#8217; and ObjectStore were at some point unsuccessfully targeted at the XML database market.</li>
<li>Part of <a href="../../../../../2010/08/22/workday-technology-stack/">Workday&#8217;s technology stack</a>. Very-well-pedigreed SaaS (Software as a Service) application vendor Workday decided to go with what amounts to an in-memory OODBMS. This makes all kinds of sense, and is a lot of what rekindled my interest in object-oriented database management.</li>
<li><a href="../../../../../2010/06/19/objectivity-infinite-graph/">Objectivity</a>, also from the 20-years-ago generation, and a poster child for the &#8220;DBMS toolkit as much as a DBMS&#8221; issue.</li>
<li><a href="../../../../../2008/06/08/perst/">McObject Perst</a>, an embeddable memory-centric OODBMS.</li>
<li><a href="../../../../../2008/06/08/perst/">Versant</a>. Actually, by now the Versant company has several different OODBMS; I&#8217;m not sure whether what it&#8217;s selling has much to do with the original Versant product. Anyhow, both the original and current Versant product seem to be positioned in OLTP. Versant has recently suffered from <a href="http://sec.gov/Archives/edgar/data/865917/000104746911000392/a2201670z10-k.htm">declining revenues</a>, in license fees and maintenance alike.</li>
<li><a href="../../../../../2011/05/18/starcounter-high-speed-memory-centric-object-oriented-dbms-coming-soon/">Forthcoming technology from Starcounter</a>, in the area of high-performance memory-centric OLTP. According to my correspondents, Starcounter still needs to explain how its technology is different from what Versant and ObjectStore introduced 20 or so years ago. Interestingly, while ObjectStore shines as a distributed system, Starcounter&#8217;s developers have consigned scale-out to the &#8220;too hard to bother with&#8221; category.</li>
<li>Gemstone, which seemed to be on an ObjectStore-like caching track until it was acquired by VMware.</li>
</ul>
<p>Arguably, OODBMS have all the benefits of <a href="../../../../../2011/02/07/notes-on-document-oriented-nosql/">document-model DBMS</a>, but with different language bindings. And if you&#8217;re going to write in an object-oriented language anyway, those language bindings can seem pretty appealing. In particular, they might be preferable to fighting your way through object/relational mapping.</p>
<p>Other than the double-edged language sword, the main criticism of object-oriented DBMS is that they include a whole lot of pointers. Intersystems and others have shown that, even in a disk-centric world, OODBMS can have excellent performance in OLTP and tolerable performance in simple reporting. As RAM gets cheaper, memory-centric operation becomes ever more viable, making the pointers even less problematic.</p>
<p><strong>Bottom line: If I were starting a SaaS project today, I&#8217;d give serious consideration to memory-centric OODBMS technology. </strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/05/21/object-oriented-database-management-systems-oodbms/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>DB2 OLTP scale-out: pureScale</title>
		<link>http://www.dbms2.com/2011/05/06/db2-oltp-scale-out-purescale/</link>
		<comments>http://www.dbms2.com/2011/05/06/db2-oltp-scale-out-purescale/#comments</comments>
		<pubDate>Fri, 06 May 2011 15:20:51 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cache]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4406</guid>
		<description><![CDATA[Tim Vincent of IBM talked me through DB2 pureScale Monday. IBM DB2 pureScale is a kind of shared-disk scale-out parallel OTLP DBMS, with some interesting twists. IBM&#8217;s scalability claims for pureScale, on a 90% read/10% write workload, include: 95% scalability up to 64 machines 90% scalability up to 88 machines 89% scalability up to 112 [...]]]></description>
			<content:encoded><![CDATA[<p>Tim Vincent of IBM talked me through <strong>DB2 pureScale</strong> Monday. IBM DB2 pureScale is a kind of <strong>shared-disk scale-out parallel OTLP DBMS,</strong> with some interesting twists. IBM&#8217;s scalability claims for pureScale, on a 90% read/10% write workload, include:</p>
<ul>
<li>95% scalability up to 64 machines</li>
<li>90% scalability up to 88 machines</li>
<li>89% scalability up to 112 machines</li>
<li>84% scalability up to 128 machines</li>
</ul>
<p>More precisely, those are counts of cluster &#8220;members,&#8221; but the recommended configuration is one member per operating system instance &#8212; i.e. one member per machine &#8212; for reasons of availability. In an 80% read/20% write workload, scalability is less &#8212; perhaps 90% scalability over 16 members.</p>
<p>Several elements are of IBM&#8217;s DB2 pureScale architecture are pretty straightforward:</p>
<ul>
<li>There are multiple pureScale members (machines), each with its own instance of DB2.</li>
<li>There&#8217;s an RDMA (Remote Direct Memory Access) interconnect, perhaps InfiniBand. (The point of InfiniBand and other RDMA is that moving data doesn&#8217;t require interrupts, and hence doesn&#8217;t cost many CPU cycles.)</li>
<li>The DB2 pureScale members share access to the database on a disk array.</li>
<li>Each DB2 pureScale member has its own log, also on the disk array.</li>
</ul>
<p>Something called GPFS (Global Parallel File System), which comes bundled with DB2, sits underneath all this. It&#8217;s all based on the mainframe technology IBM Parallel Sysplex.</p>
<p>The weirdest part (to me) of DB2 pureScale is something called the <strong>Global Cluster Facility,</strong> which runs on its own set of boxes.  <em>(Edit: Actually, see Tim Vincent&#8217;s comment below.)</em><span id="more-4406"></span>These might have 20% or so of the cores of the member boxes, with perhaps a somewhat higher percentage of RAM (especially in the case of write-heavy workloads). Specifically:</p>
<ul>
<li>The DB2 pureScale Global Cluster Facility maintains a buffer pool (cache) shared by all the DB2 pureScale members.</li>
<li>Even so, the DB2 pureScale members themselves are in charge of disk access.</li>
</ul>
<p>So what&#8217;s going on here is not an <a href="../../../../../2008/10/17/oracle-notes/">Exadata-like split between database server and storage processing tiers</a>. The Global Cluster Facility also handles lock management, presumably because locking issues only arise when a page gets fetched into the buffer.</p>
<p>The other surprise is that every client talks to every member, usually through a connection pool from an app server. Tim Vincent assures me that DB2 connections are so lightweight this isn&#8217;t a problem. Clients have load-balancing code on behalf of the members, and route transactions to whichever pureScale member is least busy.</p>
<p>DB2 pureScale is designed to be pretty robust against outages:</p>
<ul>
<li>In the case of planned maintenance, a pureScale member can be &#8220;quiesced.&#8221; I.e., it stops being given new work; it finishes up its existing work; then maintenance happens; then the member starts being given work again.</li>
<li>In the case of an unplanned outage, the redo log naturally comes into play. The pureScale twist on this is that a second small instance of DB2 is around &#8212; or is started up? &#8212; just to handled the redos.</li>
</ul>
<p>Also, IBM believes that the DB2 pureScale locking strategy gives availability and performance advantages vs. the Oracle RAC (Real Application Cluster) approach. The distinction IBM draws is that any member can take over the lock on a buffer page from any other member, just by attempting to change the page &#8212; and the attempt will succeed; only row-level locks can ever block work.  Thus, if a node fails, I/O can merrily proceed on other nodes, without waiting for any recovery effort. IBM&#8217;s target is &lt;20 seconds for full row availability to be restored.</p>
<p>Obviously, it&#8217;s crucial that the Global Cluster Facility machines be fully mirrored, with no double failure &#8212; but so what? Modern computing systems have double-points-of-failure all over the place.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/05/06/db2-oltp-scale-out-purescale/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>Oracle and Exadata: Business and technical notes</title>
		<link>http://www.dbms2.com/2011/05/03/oracle-exadata-business-technology/</link>
		<comments>http://www.dbms2.com/2011/05/03/oracle-exadata-business-technology/#comments</comments>
		<pubDate>Tue, 03 May 2011 08:19:20 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Cache]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Emulation, transparency, portability]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Solid-state memory]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4361</guid>
		<description><![CDATA[Last Friday I stopped by Oracle for my first conversation since January, 2010, in this case for a chat with Andy Mendelsohn, Mark Townsend, Tim Shetler, and George Lumpkin, covering Exadata and the Oracle DBMS. Key points included:  Given Oracle’s market penetration and share, it makes sense that Oracle is focused on selling add-on products [...]]]></description>
			<content:encoded><![CDATA[<p>Last Friday I stopped by Oracle for my first conversation since January, 2010, in this case for a chat with Andy Mendelsohn, Mark Townsend, Tim Shetler, and George Lumpkin, covering Exadata and the Oracle DBMS. Key points included:  <span id="more-4361"></span></p>
<ul>
<li>Given Oracle’s market      penetration and share, it makes sense that<strong> Oracle is focused on selling      add-on products to its installed base.</strong> Oracle’s three top such      go-to-market emphases at the moment are:
<ul>
<li><strong>Database       consolidation,</strong> <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/">especially on Exadata</a>.</li>
<li><strong>Data warehousing,</strong> presumably on       Exadata.</li>
<li><strong>Database security,       especially encryption.</strong> This is not Exadata-specific, but does       exploit Intel Westmere on-chip encryption, which Oracle says allows       encryption with minimal overhead. This seems to be via something called <strong>Oracle Advanced Security.</strong></li>
</ul>
</li>
<li>Deleted*</li>
</ul>
<p><em>*Oracle asked me to delete a point on pricing they went out of their way to make, because they are in quiet period &#8212; even though nobody said it was confidential at the time, we weren&#8217;t under NDA, and it looks like public information to me anyway. Frankly, I&#8217;m not sure I was right to comply.<br />
</em></p>
<p>Oracle also told me quite a bit about Exadata onsite POCs (Proofs of Concept) and Exadata references, but I’ll save those subjects for future posts. The same goes for workload management.</p>
<p>Oracle&#8217;s version names and numbers can get confusing, but it turns out that:</p>
<ul>
<li>Oracle <span style="text-decoration: line-through;">11.203</span> 11.2.0.3 will come      out this fall. Oracle <span style="text-decoration: line-through;">11.204</span> 11.2.0.4 will come out a little more than a year      later. After that I imagine it will be time for Oracle 12.</li>
<li>The current versions of      Oracle Exadata are Exadata X2-2 and Exadata X2-8.
<ul>
<li>Oracle Exadata 2-2 is       evolutionary from prior Exadata versions, and has 8 moderately big       servers per rack. It can be sliced into half- or quarter-racks.</li>
<li>Oracle Exadata 2-8, in       lieu of those 8 servers, has 2 bigger SMP (Symmetric MultiProcessing)       systems, each with a terabyte of RAM. You can’t slice Exadata 2-8 below       full-rack size, as you’d lose redundancy among the servers.</li>
</ul>
</li>
</ul>
<p>I didn’t really understand the discussion as to why certain workloads and/or workload consolidations go better on the SMP boxes of Exadata X2-8 than the blades of Exadata X2-2, but Oracle assures me that some do. I also suspect that some Oracle customers prefer large SMP boxes for no good reason other than familiarity.</p>
<p>As for recent-release adoption:</p>
<ul>
<li>Oracle estimates that<strong> 40-50% of customers have Oracle 11g running </strong>somewhere in their shops,      mainly Oracle 11g Release 2.</li>
<li>All major ISVs      (Independent Software Vendors) are certified on Oracle 11g, typically      Oracle 11g Release 2.</li>
<li>But Exadata      certification is something different from Oracle 11g certification; for      example, <strong>SAP certification on Exadata is still underway, </strong>targeted      for some time this year.</li>
</ul>
<p>Exadata obviously enjoys huge performance gains over existing Oracle installations for certain analytic queries, and therefore for some whole analytic workloads. Oracle has happily trumpeted these. But it turns out that Exadata’s OLTP (OnLine Transaction Processing) performance gains are less dramatic. This makes all kinds of sense, given that Oracle’s analytic query performance was in pretty bad shape pre-Exadata, while OLTP has been just fine. The range Oracle used was <strong>2-3X OLTP performance gains vs. existing Oracle installations on several-year-old hardware.</strong> Oracle says somewhere <strong>over 50% of Exadata physical I/O* goes against flash cache </strong>in uses cases such as running Oracle’s application suite.</p>
<p><em>*Note that physical I/O may be only a small fraction of logical; e.g., SAP long ago said that <a href="../../../../../2009/07/07/hasso-plattner-calls-for-in-memory-oltp-column-stores/">&gt;99% of SAP transactions never hit disk</a>.</em></p>
<p>Finally, we talked about a variety of options or other related products. Highlights included:</p>
<ul>
<li>One piece of the Oracle      security story is a new product called<strong> Oracle Database Firewall,</strong> released in January, based on an acquisition of a small startup last year.      Targeted primarily at internal hackers, Oracle Database Firewall sniffs      your SQL traffic for a week or so, observes what kinds of SQL statements      can be expected, builds a white list accordingly, and casts a jaundiced      eye on any other kind of SQL statements that come through.</li>
<li><em>Edit: I have no idea why I was told the following, in view of <a href="http://www.dbms2.com/2011/05/03/oracle-on-active-active-replication/">a subsequent email</a>.</em> <span style="text-decoration: line-through;"><strong>Oracle Active Data Guard, </strong>first introduced in the      Oracle 11g code line, is the preferred way to do active-active Oracle      replication. That said: </span>
<ul>
<li><span style="text-decoration: line-through;">Not a lot of customers       use Oracle Active Data Guard yet &#8230;</span></li>
<li><span style="text-decoration: line-through;">&#8230; but a considerable       fraction of Exadata users are at least interested in it.</span></li>
<li><span style="text-decoration: line-through;">Some number of Oracle       customers have other kinds of active-active implementation. One option is       via GoldenGate.</span></li>
</ul>
</li>
<li><strong>Oracle Cloud File Management System</strong> is an Oracle 11g      feature/option that lets you managed non-Oracle data. It is related to ASM      (Automatic Storage Management), which seems to have been the most popular      Oracle 10g feature, and which is essential to Exadata. Oracle Cloud File      Management Systems seems to be popular for consolidation uses. But it is      not technically well suited to, for example, play the role of HDFS in a      MapReduce implementation.</li>
<li>For DBAs who care,      Exadata now supports Solaris on the database server tier as well as Linux.      (That would be Solaris on Intel, of course; Exadata doesn&#8217;t use Sparc.)      The storage tier still runs only on a kind of embedded Linux.</li>
<li><strong>Oracle 11g Express Edition</strong> (free crippleware)      just went into beta test.</li>
<li>And finally, <strong>Oracle SQL Developer 3.0</strong> features,      among other things, a GUI for Oracle Data Mining, and migration tools.      Sybase migration is in there now, and was enhanced for SQL Developer 3.0.      Teradata migration is slated for the next release.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/05/03/oracle-exadata-business-technology/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Membase and CouchOne merged to form Couchbase</title>
		<link>http://www.dbms2.com/2011/02/08/couchbase-membase-couchone-couchdb/</link>
		<comments>http://www.dbms2.com/2011/02/08/couchbase-membase-couchone-couchdb/#comments</comments>
		<pubDate>Tue, 08 Feb 2011 05:59:35 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Application areas]]></category>
		<category><![CDATA[Cache]]></category>
		<category><![CDATA[CouchDB]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[memcached]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=3809</guid>
		<description><![CDATA[Membase, the company whose product is Membase and whose former company name is Northscale, has merged with CouchOne, the company whose product is CouchDB and whose former name is Couch.io. The result (product and company) will be called Couchbase. CouchDB inventor Damien Katz will join the Membase (now Couchbase) management team as CTO. Couchbase can [...]]]></description>
			<content:encoded><![CDATA[<p>Membase, the company whose product is Membase and whose former company name is Northscale, has merged with CouchOne, the company whose product is CouchDB and whose former name is Couch.io. The result (product and company) will be called Couchbase. CouchDB inventor Damien Katz will join the Membase (now Couchbase) management team as CTO. Couchbase can reasonably be regarded as a <a href="../../../../../2011/02/07/notes-on-document-oriented-nosql/">document-oriented NoSQL DBMS</a>, a product category I not coincidentally posted about yesterday.</p>
<p>In essence, <strong>Couchbase will be CouchDB with scale-out.</strong> Alternatively, <strong>Couchbase will be Membase with a richer programming interface</strong>. The Couchbase sweet spot is likely to be:  <span id="more-3809"></span></p>
<ul>
<li>Internet applications, especially ones that involve connectivity between a host and mobile devices.</li>
<li>Delivery of data, content, and/or software across a network. (That&#8217;s a high-profile CouchDB use case today.)</li>
<li>(Possibly) transactions for virtual goods that have no scarcity. (Once there&#8217;s actual inventory involved, the traditional relational database model starts looking pretty appealing.)</li>
</ul>
<p>And now let&#8217;s go to the lists of bullet points.</p>
<p>Background to the Membase/CouchDB/Couchbase integration story:</p>
<ul>
<li>Membase is a key-value store with the memcached interface. Its strengths are memcached compatibility and performant scale-out. What it stores are in essence JSON documents.</li>
<li>CouchDB is designed for ease of programming, and for built-in handling of occasionally-connected replication. (Not coincidentally, Damien Katz used to work on Lotus Notes.) CouchDB indexes individual data fields for reasonable query capability, although <a href="../../../../../2010/11/29/document-database-without-joins/">joins</a> are problematic. What CouchDB stores are in essence JSON documents.</li>
</ul>
<p>Highlights of how Membase works and is deployed today:</p>
<ul>
<li>Your API is Get/Set, just like in memcached.</li>
<li>To a first approximation, Membase just persists memcached cache at every node. That said, it can certainly store more data per node than fits in cache.</li>
<li>Most Membase installations are in Amazon EC2, where flash memory is not available. Most in-house Membase installations, however, use flash.</li>
</ul>
<p>Business background on Couchbase predecessors:</p>
<ul>
<li>Membase raised $15 million, had 20 employees, and has a number of paying customers.</li>
<li>CouchOne raised $2 million, had 16 employees, hadn&#8217;t focused much on traditional customer acquisition yet or on building an enterprise edition of the product, and had about 4 customers anyway &#8230;</li>
<li>&#8230; except that CouchOne&#8217;s plans included CouchDB hosting, and there are around 4500 users of same in a free beta that&#8217;s on the verge of going non-free. Damien positions his hosting as being focused on high throughput and concurrency, while rival CouchDB host Cloudant is in his opinion more focused on big data.</li>
<li>The apparent repositioning of CouchOne as being highly focused on mobile applications (with unreliable host connections) never really had time to take hold. Indeed &#8230;</li>
<li>&#8230; Damien asserts that CouchDB has a lot more mission-critical enterprise deployments than MongoDB, whereas he concedes that MongoDB is doing great in a Ruby-centric market.</li>
</ul>
<p>Happy talk around Membase/CouchDB/Couchbase product integration:</p>
<ul>
<li>Hey, both Membase and CouchDB talk JSON.</li>
<li>Product strengths and weaknesses are synergistic. For example:
<ul>
<li>Membase started with caching technology (memcached). CouchDB doesn&#8217;t yet make much use of cache.</li>
<li>Membase&#8217;s back end is SQLite, used in a &#8220;dumb&#8221; way. CouchDB can presumably do everything the dumb implementation of SQLite can.</li>
<li>Membase&#8217;s scale-out is designed for a single data center, with strict consistency. CouchDB&#8217;s is designed for wide-area networks, with eventual consistency. At least one big internet company likes the idea of strict consistency within data centers, but eventual consistency among them.</li>
<li>The CouchDB interface takes the place of something Membase planned to build called <a href="http://www.dbms2.com/2010/08/18/northscale-membase-roadmap/">Node Code</a>, which was going to overcome <a href="http://www.dbms2.com/2010/10/11/membase-simplifies-name-goes-ga/">the limitations of a simple key-value interface</a>. Node Code development didn&#8217;t ever really get started, and indeed was deferred for a couple of months while CouchOne acquisition discussions were underway. However, Membase did build Node Code&#8217;s underpinnings, called the &#8220;TAP&#8221; interface.</li>
<li>And on the operations side: Membase has been in Mountain View, right by the CalTrain. CouchOne has been in Oakland, but with a lot of at-home workers. One option is to move the Oakland office to a San Francisco location that, you guessed it, is also right by the CalTrain.</li>
</ul>
</li>
</ul>
<p>Other technical notes:</p>
<ul>
<li>The only current API to CouchDB is http/https. memcached protocols will be added to Couchbase.</li>
<li>CouchDB has design documents. These are used to tell you how to do indexes. They&#8217;re built on the fly if they don&#8217;t already exist. Then there are Javascript functions that update the indexes as documents are added/updated.</li>
<li>In particular, CouchDB has a geospatial index, in a true R-tree. Damien fondly thinks it already has most albeit not all the features of PostgreSQL GIS. I gather CouchDB geospatial will be straightforwardly integrated into Couchbase.</li>
<li>There&#8217;s also a CouchDB add-on project for full-text indexing. Damien seems less confident of how that will be integrated into Couchbase.</li>
</ul>
<p>Finally, I&#8217;m curious about the relative performance of Couchbase/Membase and <a href="../../../../../2011/01/28/schooner-software-onl/">Schooner Membrain</a> when using flash memory. I would guess that the comparison favors Schooner, because of Schooner&#8217;s extensive focus on flash optimization. I would also guess that Schooner&#8217;s edge is small, because I&#8217;d think it would be less than Schooner&#8217;s advantage vs. alternative Flash uses on the MySQL side, and Schooner&#8217;s MySQL performance advantage seems to be less than 2X even when Schooner is doing the benchmarks.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/02/08/couchbase-membase-couchone-couchdb/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>More notes on Membase and memcached</title>
		<link>http://www.dbms2.com/2010/10/18/more-notes-on-membase-and-memcached/</link>
		<comments>http://www.dbms2.com/2010/10/18/more-notes-on-membase-and-memcached/#comments</comments>
		<pubDate>Mon, 18 Oct 2010 09:09:39 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Cache]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Web analytics]]></category>
		<category><![CDATA[memcached]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=3322</guid>
		<description><![CDATA[As a companion to my post about Membase last week, the company has graciously allowed me to post a rather detailed Membase slide deck. (It even has pricing.) Also, I left one point out. Membase announced a Cloudera partnership. I couldn&#8217;t detect anything technically exciting about that, but it serves to highlight what I do [...]]]></description>
			<content:encoded><![CDATA[<p>As a companion to my post about <a href="http://www.dbms2.com/2010/10/11/membase-simplifies-name-goes-ga/">Membase</a> last week, the company has graciously allowed me to post <a href="http://www.monash.com/uploads/Membase-slides-October-2010.pdf">a rather detailed Membase slide deck</a>. (It even has pricing.) Also, I left one point out.</p>
<p>Membase announced <a href="http://blog.membase.com/membase-cloudera-integration">a Cloudera partnership</a>. I couldn&#8217;t detect anything technically exciting about that, but it serves to highlight what I do find to be an interesting usage trend. A couple of big Web players (AOL and ShareThis) are using Hadoop to crunch data and derive customer profile data, then feed that back into Membase. Why Membase? Because it can serve up the profile in a millisecond, as part of a bigger 40-millisecond-latency request.</p>
<p><em>And why Hadoop, rather than Aster Data nCluster, which ShareThis also uses? Umm, I didn&#8217;t ask.</em></p>
<p>When I mentioned this to Colin Mahony, he said Vertica had similar stories. However, I don&#8217;t recall whether they were about Membase or just memcached, and he hasn&#8217;t had a chance to get back to me with clarification. <em> (Edit: As per Colin&#8217;s comment below, it&#8217;s both.)</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/10/18/more-notes-on-membase-and-memcached/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Membase simplifies name, goes GA</title>
		<link>http://www.dbms2.com/2010/10/11/membase-simplifies-name-goes-ga/</link>
		<comments>http://www.dbms2.com/2010/10/11/membase-simplifies-name-goes-ga/#comments</comments>
		<pubDate>Tue, 12 Oct 2010 00:45:52 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Basho and Riak]]></category>
		<category><![CDATA[Cache]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[memcached]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=3236</guid>
		<description><![CDATA[The company Northscale that makes the product Membase is now the company Membase that makes the product Membase. Good. Also, the product Membase has now gone GA. I wrote back in August about Membase, and that covers most of what I think, with perhaps a couple of exceptions:  One point I might put more weight [...]]]></description>
			<content:encoded><![CDATA[<p>The company Northscale that makes the product Membase is now the company Membase that makes the product Membase. Good. Also, the product Membase has now gone GA.</p>
<p>I wrote back in August about <a href="http://www.dbms2.com/2010/08/18/northscale-membase-roadmap/">Membase</a>, and that covers most of what I think, with perhaps a couple of exceptions:  <span id="more-3236"></span></p>
<ul>
<li>One point I might put more weight on &#8212; by doing the same thing in memory (the well-known memcached caching system) and on disk, Membase may have a pretty slick high-performance memory-centric architecture.</li>
<li>I&#8217;m getting more sympathetic to the idea of just banging objects to disk, whether via a key-value store in some other NoSQL kind of model. E.g., if somebody were to imitate <a href="http://www.dbms2.com/2010/08/22/workday-technology-stack/">Workday&#8217;s</a> or salesforce.com&#8217;s technical architectures today &#8212; which while of course not identical are actually pretty similar &#8212; they might use a key-value store, rather than the actual choices of MySQL and Oracle respectively.</li>
</ul>
<p>Generally, as per my recent <a href="http://www.dbms2.com/2010/10/11/nosql-overview/">NoSQL overview</a>,  I&#8217;m still negative about pure key-value stores. However, Membase:</p>
<ul>
<li>Has a roadmap to get beyond pure key-value.</li>
<li>Is plug-compatible with the popular memcached.</li>
<li>Has a high-performance memory-centric story.</li>
</ul>
<p>So I can see some appealing Membase use cases.</p>
<p>And in case somebody is wondering why I don&#8217;t compare/contrast Membase with Basho&#8217;s key-value store Riak &#8212; beyond the obvious concerns raised by Basho&#8217;s VC <a href="../2010/08/26/nosql-hvsp-olrp/">down round</a> &#8212; let me copy what I tweeted last night in response to a flamefest about same:</p>
<blockquote><p>I&#8217;m not some symbolism-heavy figurehead that should be held  responsible for all-inclusive nuance-balancing. I write about as much of what I think is interesting as I can get around  to.</p></blockquote>
<p><em><strong>Related links</strong></em></p>
<ul>
<li>A post I wrote about <a href="http://www.dbms2.com/2010/05/01/ryw-read-your-writes-consistency/">RYW consistency</a> based in part on a great discussion with Basho&#8217;s Justin Sheehy.</li>
<li><a href="http://nosql.mypopescu.com/post/1291197255/membase-releases-membase-1-0">Two</a> <a href="http://nosql.mypopescu.com/post/728983002/what-is-membase">posts</a> by Alex Popescu about Membase. The second has some technical details.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/10/11/membase-simplifies-name-goes-ga/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

