<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Clustering</title>
	<atom:link href="http://www.dbms2.com/category/parallelization/database-clustering/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Wed, 08 Feb 2012 17:17:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>MarkLogic&#8217;s Hadoop connector</title>
		<link>http://www.dbms2.com/2011/11/03/marklogic-hadoop-connector/</link>
		<comments>http://www.dbms2.com/2011/11/03/marklogic-hadoop-connector/#comments</comments>
		<pubDate>Fri, 04 Nov 2011 00:58:06 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[MarkLogic]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5585</guid>
		<description><![CDATA[It&#8217;s time to circle back to a subject I skipped when I otherwise wrote about MarkLogic 5: MarkLogic&#8217;s new Hadoop connector. Most of what&#8217;s confusing about the MarkLogic Hadoop Connector lies in two pairs of options it presents you: Hadoop can talk XQuery to MarkLogic. But alternatively, Hadoop can use a long-established simple(r) Java API [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s time to circle back to a subject I skipped when I otherwise wrote about <a href="http://www.dbms2.com/2011/11/01/marklogic-version-5/">MarkLogic 5</a>: MarkLogic&#8217;s new Hadoop connector.</p>
<p>Most of what&#8217;s confusing about the MarkLogic Hadoop Connector lies in two pairs of options it presents you:</p>
<ul>
<li>Hadoop can talk XQuery to MarkLogic. But alternatively, Hadoop can use a long-established simple(r) Java API for streaming documents into or out of a MarkLogic database.</li>
<li>Hadoop can make requests to MarkLogic in MarkLogic&#8217;s normal mode of operation, namely to address any node in the MarkLogic cluster, which then serves as a &#8220;head&#8221; node for the duration of that particular request. But alternatively, Hadoop can use a long-standing MarkLogic option to circumvent the whole DBMS cluster and only talk to one specific MarkLogic node.</li>
</ul>
<p>Otherwise, the whole thing is just what you would think:</p>
<ul>
<li>Hadoop can read from and write to MarkLogic, in parallel at both ends.</li>
<li>If Hadoop is just writing to MarkLogic, there&#8217;s a good chance the process is properly called &#8220;ETL.&#8221;</li>
<li>If Hadoop is reading a lot from MarkLogic, there&#8217;s a good chance the process is properly called &#8220;batch analytics.&#8221;</li>
</ul>
<p>MarkLogic said that it wrote this Hadoop connector itself.</p>
<p><span id="more-5585"></span>When I realized MarkLogic was claiming the ability to seamlessly integrate short-request and batch analytic processing, I asked about workload management. I gathered that:</p>
<ul>
<li>MarkLogic believes that MarkLogic 5 does a great job of granular workload monitoring.</li>
<li>However, MarkLogic doesn&#8217;t have a strong workload management administrative interface. Rather, you may have to do workload management programmatically.</li>
</ul>
<p>Overall, I think the MarkLogic Hadoop connector could prove pretty useful. The first question I ask somebody who wants to process relational data in Hadoop is &#8220;Why not just an analytic RDBMS?&#8221; But the natural use cases for MarkLogic are often ones in which you might as well do your analytics in Hadoop, including a 4 billion Word/PDF/image document insurance-industry example I recently encountered, and for which <a href="../../../../../2011/10/10/text-data-management-part-2-general-and-short-request/">I favor MarkLogic over MongoDB or straight Hadoop alike</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/03/marklogic-hadoop-connector/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>NoSQL notes</title>
		<link>http://www.dbms2.com/2011/10/23/nosql-notes/</link>
		<comments>http://www.dbms2.com/2011/10/23/nosql-notes/#comments</comments>
		<pubDate>Mon, 24 Oct 2011 04:20:27 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Basho and Riak]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[MongoDB and 10gen]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Parallelization]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5522</guid>
		<description><![CDATA[Last week I visited with James Phillips of Couchbase, Max Schireson and Eliot Horowitz of 10gen, and Todd Lipcon, Eric Sammer, and Omer Trajman of Cloudera. I guess it&#8217;s time for a round-up NoSQL post. Views of the NoSQL market horse race are reasonably consistent, with perhaps some elements of “Where you stand depends upon [...]]]></description>
			<content:encoded><![CDATA[<p>Last week I visited with James Phillips of Couchbase, Max Schireson and Eliot Horowitz of 10gen, and Todd Lipcon, Eric Sammer, and Omer Trajman of Cloudera. I guess it&#8217;s time for a round-up NoSQL post. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Views of the NoSQL market horse race are reasonably consistent, with perhaps some elements of “Where you stand depends upon where you sit.”</p>
<ul>
<li>As      James tells it, NoSQL is simply a three-horse race between Couchbase,      MongoDB, and Cassandra.</li>
<li>Max      would include HBase on the list.</li>
<li>Further,      Max pointed out that metrics such as job listings suggest MongoDB has the      most development activity, and Couchbase/Membase/CouchDB perhaps have      less.</li>
<li>The Cloudera      guys remarked on some serious HBase adopters.*</li>
<li>Everybody      I spoke with agreed that Riak had little current market presence, although      some Basho guys could surely be found who&#8217;d disagree.</li>
</ul>
<p><span id="more-5522"></span><em>*I hope to do a separate post on HBase adoption soon. In connection with that, any info on HBase adoption by Facebook (said to be very heavy), Twitter, et al. would be much appreciated.</em></p>
<p>The reasons for using NoSQL of course are, in some order, <a href="../../../../../2011/07/31/dynamic-fixed-schema-databases/">dynamic schemas</a>, scale-out, and open source. <a href="http://www.dbms2.com/2011/10/23/transparent-relational-oltp-scale-out/">I find the scale-out argument somewhat bogus</a>,* but the data model one is very real. Depending on whom you talk with, the most important point about dynamic schemas may actually be that they’re changeable, or it may just be that you don’t have to specify a schema at the time of initial application design. MongoDB gets particular praise as a good platform on which to throw something together quickly, although predictions as to how far the application will then scale may differ depending on whether you’re talking with, say, Max or Todd.</p>
<p><em>*It’s fair to say that NoSQL systems are more proven in scale-out than most relational DBMS. Even so, I would cringe at any line of reasoning that concluded one should adopt NoSQL because it is more mature than relational alternatives.</em></p>
<p>Finally, I was perhaps too extreme when <a href="../../../../../2011/10/20/more-notes-on-oracle-nosql/">I suggested there was no good reason for Oracle to have adopted the major key/minor key approach it took in its NoSQL offering</a>. Todd offered a reason why that approach – which he characterized as similar to Project Voldemort’s – could make sense:</p>
<ul>
<li>If you      have some kind of global secondary index, it’s hard to maintain that index      consistently without what amounts to distributed transactions.</li>
<li>If you      want to avoid the overhead of those, one alternative is a column-group      system such as HBase or Cassandra. Those have no indexes at all, except in      the sense that a column is its own index.</li>
<li>Another      alternative is to load as much indexing information as you can into the      key of a key-value store.</li>
</ul>
<p>I’d be interested to learn about the Couchbase and MongoDB answers to that challenge.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/10/23/nosql-notes/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Transparent relational OLTP scale-out</title>
		<link>http://www.dbms2.com/2011/10/23/transparent-relational-oltp-scale-out/</link>
		<comments>http://www.dbms2.com/2011/10/23/transparent-relational-oltp-scale-out/#comments</comments>
		<pubDate>Mon, 24 Oct 2011 04:19:09 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Schooner Information Technology]]></category>
		<category><![CDATA[dbShards and CodeFutures]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5521</guid>
		<description><![CDATA[There’s a perception that, if you want (relatively) worry-free database scale-out, you need a non-relational/NoSQL strategy. That perception is false. In the analytic case it’s completely ridiculous, as has been demonstrated by Teradata, Vertica, Netezza, and various other MPP (Massively Parallel Processing) analytic DBMS vendors. And now it’s false for short-request/OLTP (OnLine Transaction Processing) use [...]]]></description>
			<content:encoded><![CDATA[<p>There’s a perception that, if you want (relatively) worry-free database scale-out, you need a non-relational/NoSQL strategy. That perception is false. In the analytic case it’s completely ridiculous, as has been demonstrated by <a href="../../../../../2011/09/24/confusion-about-teradatas-big-customers/">Teradata</a>, <a href="../../../../../2011/06/20/columnar-dbms-vendor-customer-metrics/">Vertica</a>, Netezza, and various other MPP (Massively Parallel Processing) analytic DBMS vendors. And now it’s false for <a href="../../../../../2011/03/02/short-request-processing/">short-request</a>/OLTP (OnLine Transaction Processing) use cases as well.</p>
<p>My favorite relational OLTP scale-out choice these days is <a href="http://www.dbms2.com/2011/10/23/schooner-pivots-further/">the SchoonerSQL/dbShards partnership</a>. Schooner Information Technology (SchoonerSQL) and Code Futures (dbShards) are young, small companies, but I’m not too concerned about that, because the APIs they want you to write to are just MySQL’s. The main scenarios in which I can see them failing are ones in which they are competitively leapfrogged, either by other small competitors – e.g. ScaleBase, Akiban, TokuDB, or ScaleDB &#8212; or by Oracle/MySQL itself. While that could suck for my clients Schooner and Code Futures, it would still provide users relying on MySQL scale-out with one or more good product alternatives.</p>
<p>Relying on non-MySQL NewSQL startups, by way of contrast, would leave me somewhat more concerned. (However, if their code is open sourced. you have at least some vendor-failure protection.) And big-vendor scale-out offerings, such as Oracle RAC or <a href="../../../../../2011/05/06/db2-oltp-scale-out-purescale/">DB2 pureScale</a>, may be more complex to deploy and administer than the MySQL and NewSQL alternatives.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/10/23/transparent-relational-oltp-scale-out/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Schooner pivots further</title>
		<link>http://www.dbms2.com/2011/10/23/schooner-pivots-further/</link>
		<comments>http://www.dbms2.com/2011/10/23/schooner-pivots-further/#comments</comments>
		<pubDate>Mon, 24 Oct 2011 04:18:11 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Schooner Information Technology]]></category>
		<category><![CDATA[dbShards and CodeFutures]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5523</guid>
		<description><![CDATA[Schooner Information Technology started out as a complete-system MySQL appliance vendor. Then Schooner went software-only, but continued to brag about great performance in configurations with solid-state drives. Now Schooner has pivoted further, and is emphasizing high availability, clustered performance, and other hardware-agnostic OLTP (OnLine Transaction Processing) features. Fortunately, Schooner has some interesting stuff in those [...]]]></description>
			<content:encoded><![CDATA[<p>Schooner Information Technology started out as a complete-system MySQL appliance vendor. Then <a href="../../../../../2011/01/28/schooner-software-onl/">Schooner went software-only, but continued to brag about great performance in configurations with solid-state drives</a>. Now Schooner has pivoted further, and is emphasizing high availability, clustered performance, and other hardware-agnostic OLTP (OnLine Transaction Processing) features. Fortunately, Schooner has some interesting stuff in those areas to talk about.</p>
<p>The short form of the SchoonerSQL (as Schooner’s product is now called) story goes roughly like this:</p>
<ul>
<li>SchoonerSQL      replicates data &#8212; synchronously if the replication target is local,      asynchronously if it is remote.</li>
<li>Local      synchronous replication provides high availability; remote asynchronous      replication provides disaster recovery.</li>
<li>SchoonerSQL’s      local synchronous replication also provides read scale-out.</li>
<li>Schooner      has a partnership with Code Futures/dbShards to provide write scale-out      via <a href="../../../../../2011/02/24/transparent-sharding/">transparent      sharding</a>.</li>
<li>SchoonerSQL      has some secret sauce in replication performance. This has the effect of      significantly increasing write performance (assuming you were going to      replicate anyway), because otherwise you might have to slow down the      master server&#8217;s write performance so that the slaves can keep up with it.</li>
<li>Schooner      believes it still has some single-server performance advantages as well.</li>
</ul>
<p><span id="more-5523"></span><em>Just to be clear here: Schooner is my client. Code Futures is my  client. I introduced them and suggested their partnership. I then  introduced them both to a user client, who was sufficiently impressed to  seriously evaluate both of them. Even so, some of the Schooner  replication story only became clear to me when I visited last Friday.</em></p>
<p>To flesh that out a bit more:</p>
<ul>
<li>Schooner      has been making performance tweaks to the MySQL/InnoDB stack for several      years. Some of them are still relevant, and can offer 50-100% performance      improvements, especially but not only if you use solid-state storage.</li>
<li>Schooner      has found a way to scale up the slave side of master-slave replication,      both in the synchronous and asynchronous cases.
<ul>
<li>The       core idea of SchoonerSQL parallel (i.e. scale-up) replication is       straightforward. The replication log streams in. A chunk is sent off to a       CPU core. The next chunk is examined for conflicts with the first chunk.       If there are none, it is sent off to a different core to be processed</li>
<li>Thus,       you can have both local replication/high availability and remote       replication/disaster recovery without slowing down writes on the master       the way you may have to with other alternatives.</li>
<li>Schooner       believes this feature alone can make SchoonerSQL 3X faster than MySQL       alternatives, at least for writes.</li>
</ul>
</li>
</ul>
<p>At least in casual conversation, Schooner synthesizes its 1.5-2X single-server figure and up-to-3X clustering figure into a single claim of frequent 2-5X speedup over generic MySQL/InnoDB. But the usual caveats about vendor-supplied performance numbers of course apply.</p>
<p>Finally, some housekeeping:</p>
<ul>
<li>At      Oracle’s polite request,* Schooner changed its product name to not mention      MySQL; hence the moniker SchoonerSQL.</li>
<li>SchoonerSQL      is being launched Monday.</li>
<li>Schooner      has determined that if its version numbers are different from MySQL’s,      confusion ensues. So the first ever version under the name SchoonerSQL is      SchoonerSQL 5.1, a factoid that Schooner is wisely omitting from the      official product launch press release.</li>
<li>The      synchronous part of SchoonerSQL’s replication story dates back to last      January’s product release.</li>
<li>Most      of the asynchronous part of SchoonerSQL story is new for Monday.</li>
<li>dbShards/SchoonerSQL      partnership engineering is still underway.</li>
</ul>
<p><em><span style="text-decoration: line-through;">*I mean that non-facetiously. </span><span style="text-decoration: line-through;">Schooner’s MySQL OEM contract was such that Oracle didn’t have a legal hammer to force the change.</span> Edit: Whoops! There turn out to have been inaccuracies in the original version of this footnote, which I now regret writing. The contract isn&#8217;t exactly OEM, and there actually were some trademark-based legal hammers.<span style="text-decoration: line-through;"><br />
</span></em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/10/23/schooner-pivots-further/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Are there any remaining reasons to put new OLTP applications on disk?</title>
		<link>http://www.dbms2.com/2011/09/19/oltp-disk-solid-state/</link>
		<comments>http://www.dbms2.com/2011/09/19/oltp-disk-solid-state/#comments</comments>
		<pubDate>Mon, 19 Sep 2011 18:07:07 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[dbShards and CodeFutures]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5257</guid>
		<description><![CDATA[Once again, I&#8217;m working with an OLTP SaaS vendor client on the architecture for their next-generation system. Parameters include: 100s of gigabytes of data at first, growing to &#62;1 terabyte over time. High peak loads. Public cloud portability (but they have private data centers they can use today). Simple database design &#8212; not a lot [...]]]></description>
			<content:encoded><![CDATA[<p>Once again, I&#8217;m working with an OLTP SaaS vendor client on the architecture for their next-generation system. Parameters include:</p>
<ul>
<li>100s of gigabytes of data at first, growing to &gt;1 terabyte over time.</li>
<li>High peak loads.</li>
<li>Public cloud portability (but they have <strong>private data centers they can use today).</strong></li>
<li>Simple database design &#8212; not a lot of tables, not a lot of columns, not a lot of joins, and everything can be distributed on the same customer_ID key.</li>
<li>Stream the data to a data warehouse, that will grow to a few terabytes. (Keeping only one year of OLTP data online actually makes sense in this application, but of course everything should go into the DW.)</li>
</ul>
<p>So I&#8217;m leaning to saying:   <span id="more-5257"></span></p>
<ul>
<li>They should go with a scalable, MySQL-based solution.
<ul>
<li>Lots of third-party software works with MySQL, in case that&#8217;s helpful.</li>
<li>Yes, any one vendor is small and not yet firmly established, but there are numerous vendors around with interesting MySQL scaling stories.</li>
<li>In a vendor emergency, just going with Oracle&#8217;s MySQL stuff would probably work &#8230;</li>
<li>&#8230; especially because there are these lovely things in the world called <strong>solid-state drives.</strong></li>
<li>There&#8217;s also good escapability if one wants to move away from MySQL, because everybody knows how to handle MySQL data.</li>
</ul>
</li>
<li>The first product to look at is dbShards, because it meets all the topology needs:
<ul>
<li>Local scale-out (<a href="http://www.dbms2.com/2011/02/24/transparent-sharding/">transparent sharding</a>).</li>
<li><a href="http://www.dbms2.com/2011/02/09/clarification-on-dbshards-shard-replication/">Local high availability</a>.</li>
<li>Remote disaster recovery (details of that are underway).</li>
</ul>
</li>
<li>The first analytic DBMS to look at is Infobright.
<ul>
<li>Yes, I know Infobright is focused more on machine-generated data these days, but this client&#8217;s analytic needs are so straightforward Infobright should pass with flying colors.</li>
<li>The MySQL-to-MySQL aspect should make ETL dead simple.</li>
<li>Again, there&#8217;s escapability.</li>
</ul>
</li>
</ul>
<p>Mainly, this is all fine. But I&#8217;m getting pushback on the solid-state aspect, for fear that it will compromise public cloud portability.</p>
<p>Am I missing something here? As far as I&#8217;m concerned, <strong>if you&#8217;re planning an OLTP system with a many-year lifespan today, </strong>of course <strong>you should assume solid-state storage.</strong> Maybe you scale out just as far as you would with disk, striping indexes or entire databases across the RAM of multiple servers. It that case, having solid-state backing reduces the risk of bottlenecks. Maybe you don&#8217;t scale out as far as you would with disk. In that case, solid-state backing saves you money.</p>
<p><strong>As for public-cloud support for solid-state storage, that&#8217;s coming fast, right? </strong>(Actually, I have data points in support of that theory, but they&#8217;re a bit tenuous.) A large fraction of web businesses with private data centers seem to be using solid-state storage &#8212; from Facebook on down &#8212; or so the NoSQL/NewSQL/<a href="http://www.dbms2.com/2011/03/02/short-request-processing/">short-request</a> DBMS guys tell me. Surely a number of public cloud vendors are close behind.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/19/oltp-disk-solid-state/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Couchbase technical update</title>
		<link>http://www.dbms2.com/2011/08/13/couchbase-technical-update/</link>
		<comments>http://www.dbms2.com/2011/08/13/couchbase-technical-update/#comments</comments>
		<pubDate>Sun, 14 Aug 2011 04:08:03 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cache]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[memcached]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5081</guid>
		<description><![CDATA[My Couchbase business update with Bob Wiederhold was very interesting, but it didn&#8217;t answer much about the actual Couchbase product. For that, I talked with Dustin Sallings. We jumped around a lot, and some important parts of the Couchbase product haven&#8217;t had their designs locked down yet anyway. But here&#8217;s at least a partial explanation [...]]]></description>
			<content:encoded><![CDATA[<p>My <a href="http://www.dbms2.com/2011/08/13/couchbase-business-update/">Couchbase business update</a> with Bob Wiederhold was very interesting, but it didn&#8217;t answer much about the actual Couchbase product. For that, I talked with Dustin Sallings. We jumped around a lot, and some important parts of the Couchbase product haven&#8217;t had their designs locked down yet anyway. But here&#8217;s at least a partial explanation of what&#8217;s up.</p>
<p>memcached is a way to cache data in RAM across a cluster of servers and have it all look logically like a single memory pool, extremely popular among large internet companies. The Membase product &#8212; which is what Couchbase has been selling this year &#8212; adds persistence to memcached, an obvious improvement on requiring application developers to write both to memcached and to <a href="http://www.dbms2.com/2011/02/24/transparent-sharding/">non-transparently-sharded MySQL</a>. The main technical points in adding persistence seem to have been:</p>
<ul>
<li>A <strong>persistent backing store</strong> (duh), namely SQLite.</li>
<li>A <strong>change to the hashing algorithm,</strong> to avoid losing data when the cluster configuration is changed.</li>
</ul>
<p>Couchbase is essentially Membase improved by integrating CouchDB into it, with the main changes being:</p>
<ul>
<li><strong>Changing the backing store to CouchDB</strong> (duh). This will be in the first Couchbase release.</li>
<li><strong>Adding cross data center replication on CouchDB&#8217;s consistency model.</strong> This will not, I believe, be in the first Couchbase release.</li>
<li><strong>Offering CouchDB&#8217;s programming and query interfaces as an option.</strong> So far as I can tell, this will be implemented straightforwardly in the first Couchbase release, with elegance planned for later down the road.</li>
</ul>
<p>Let&#8217;s drill down a bit into <strong>Membase/Couchbase clustering and consistency. </strong><span id="more-5081"></span></p>
<ul>
<li>When data is written to RAM in memcached, it immediately gets copied to another server. The same is of course true in Membase/Couchbase. The terminology on all this is confusing, but I think:
<ul>
<li>The portion of data that is stored as a primary copy on any given server is called a &#8220;shard&#8221;.</li>
<li>That would seem to make sense, as that data could correspond to what goes &#8212; <a href="http://www.dbms2.com/2011/02/24/transparent-sharding/">non-transparently</a> &#8212; into an instance of MySQL in a classical memcached/MySQL set-up.</li>
</ul>
</li>
<li>Updates are of course also banged to disk ASAP &#8212; but at times of heavy load, that can take a while. A few seconds to a couple of minutes is normal operation; if it takes an hour, you really should buy more hardware. (Or solid-state storage.)</li>
<li>Similarly, the replication of data to a second machine&#8217;s RAM may not happen at times of heavy load &#8212; and that&#8217;s another sign you don&#8217;t have enough machines.</li>
<li>Each Membase/Couchbase &#8220;shard&#8221; has lots of logical sub-shards.* (1024 for now, at least as default, although Dustin finds that number excessive and is looking to lower it.)  So if you add a node, some of the sub-shards get sent over to the new node. Unlike the case for straight memcached, no data is lost from cache (and of course not also from the persistent store). Blocking of operations from such a move only happens in narrow time windows, and then only in edge cases.</li>
</ul>
<p><em>*Edit: They&#8217;re called <a href="http://dustin.github.com/2010/06/29/memcached-vbuckets.html">vbuckets</a>.</em></p>
<p><em></em>So if we consider Membase technology alone, Couchbase is CA in the CAP Theorem.  CouchDB, however, is gloriously AP in the CAP Theorem, in that it was written to assume an occasionally connected topology.* Based on that, Couchbase will allow AP operation between data centers (i.e. &#8220;stay synchronized if you can, to within the limitations of physics and so on, but don&#8217;t beat yourself up on the rare occasions that you can&#8217;t.&#8221;) I don&#8217;t know that that capability will quite be in the first release of Couchbase, but it&#8217;s coming soon.</p>
<p><em>*CouchDB also has other features friendly to occasionally-connected use cases, such as a lot of flexibility as to which parts of the database are or aren&#8217;t synced when you do reconnect. These are at the heart of the Couchbase Mobile offering.</em></p>
<p>memcached and Membase have a very simple key-value interface. CouchDB adds secondary indexes and so on. I think in the first release of Couchbase this is pretty much like having two different APIs for the same product; more elegant integration is planned down the road, and more language support as well.</p>
<p>The highest-performing way to use Couchbase will probably always be to just pretend it is Membase, which is to say memcached+. Dustin told me of Membase users who demanded 10-40 millisecond response times, and that not even for single queries but rather for sequences of several queries in succession. He further told me of customers asking for 1-200 microsecond response, and insisting on no worse than 1 millisecond. Frankly, the first requirement could be met by lots of technologies I can think of, at least if  you don&#8217;t rely on disk; the second is thoroughly impossible if you rely on disk, and pretty demanding no matter what kind of hardware and storage you have.</p>
<p>Couchbase performance against disk is a work in progress. CouchDB started out 8X slower than SQLite as a backing store, apples to apples, but Couchbase is fixing that before they roll the product out. (After all, they wouldn&#8217;t want to slow the product down in the course of an upgrade.) Beyond that, when you do exploit the indexing capability of CouchDB, performance of course slows down. Work is underway to lower the performance hit; I imagine much improvement can indeed be made, given how few resources CouchDB has been able to devote to date to <a href="http://www.dbms2.com/2009/08/21/bottleneck-whack-a-mole/">Bottleneck Whack-A-Mole</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/08/13/couchbase-technical-update/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>An odd claim attributed to Mike Stonebraker</title>
		<link>http://www.dbms2.com/2011/07/14/an-odd-claim-attributed-to-mike-stonebraker/</link>
		<comments>http://www.dbms2.com/2011/07/14/an-odd-claim-attributed-to-mike-stonebraker/#comments</comments>
		<pubDate>Thu, 14 Jul 2011 11:10:34 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cache]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[Games and virtual worlds]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[VoltDB and H-Store]]></category>
		<category><![CDATA[memcached]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4964</guid>
		<description><![CDATA[This post has a sequel. Last week, Mike Stonebraker insulted MySQL and Facebook&#8217;s use of it, by implication advocating VoltDB instead. Kerfuffle ensued. To the extent Mike was saying that non-transparently sharded MySQL isn&#8217;t an ideal way to do things, he&#8217;s surely right. That still leaves a lot of options for massive short-request databases, however, [...]]]></description>
			<content:encoded><![CDATA[<p><em>This post has a <a href="http://www.dbms2.com/2011/07/15/facebook-mysql-nosql-voltdb-stonebraker/">sequel</a>.</em></p>
<p>Last week, Mike Stonebraker <a href="http://gigaom.com/cloud/facebook-trapped-in-mysql-fate-worse-than-death/">insulted MySQL and Facebook&#8217;s use of it</a>, by implication advocating <a href="http://www.dbms2.com/2010/06/30/details-and-analysis-of-the-voltdb-argument/">VoltDB</a> instead. Kerfuffle ensued. To the extent Mike was saying that non-transparently sharded MySQL isn&#8217;t an ideal way to do things, he&#8217;s surely right. That still leaves a lot of options for massive <a href="http://www.dbms2.com/2011/03/02/short-request-processing/">short-request</a> databases, however, including <a href="http://www.dbms2.com/2011/02/24/transparent-sharding/">transparently sharded</a> RDBMS, scale-out <a href="http://www.dbms2.com/2011/05/23/databases-ram/">in-memory DBMS</a> (whether or not VoltDB*), and various NoSQL options. If nothing else, <a href="http://www.dbms2.com/2011/02/08/couchbase-membase-couchone-couchdb/">Couchbase</a> would seem superior to memcached/non-transparent MySQL if you were starting a project today.</p>
<p><em>*The big problem with VoltDB, last I checked, was its reliance on Java stored procedures to get work done.</em></p>
<p>Pleasantries continued in <em><a href="http://www.theregister.co.uk/2011/07/13/mike_stonebraker_versus_facebook/">The Register</a>,</em> which got an amazing-sounding quote from Mike. If <em>The Reg</em> is to be believed &#8212; something <a href="http://www.monashreport.com/2006/03/22/goodmail-esther-dyson-andrew-orlowski-etc/">I wouldn&#8217;t necessarily take for granted</a> &#8212; Mike claimed that he (i.e. VoltDB) knows how to solve the <strong>distributed join</strong> performance problem.  <span id="more-4964"></span></p>
<blockquote><p>So, it&#8217;s Stonebraker against the web. And the difference of option is  severe. In May, at a MongoDB developer conference in San Francisco,  Mongo creator Dwight Merriman told his audience there was &#8220;no way&#8221; to do distributed joins in a way that really scales.  &#8220;I&#8217;m not smart enough to do distributed joins that scale horizontally,  widely, and are super fast. You have to choose something else. We have  no choice but to not be relational,&#8221; he said</p>
<p>&#8220;You can do distributed transactions, but if you do them with no loss  of generality and you do them across a thousand machines, it&#8217;s not  going to be that fast.&#8221;</p>
<p>Stonebraker says precisely the opposite, and in typical fashion, he  goes right for the jugular. &#8220;I reject what Merriman says out of hand,&#8221;  he tells <em>The Register</em>. Merriman and his company, 10gen, declined  to comment for this story. But Stonebaker says words don&#8217;t matter. As  much as he likes to wield his opinions, he insists the debate will be  decided elsewhere. &#8220;Let the bake-off begin,&#8221; he crows.</p></blockquote>
<p>But when last I checked, VoltDB made nowhere near that claim. And well it shouldn&#8217;t have. In the fully general case, there&#8217;s no way to ensure super distributed join performance other than by throwing lots and lots of gear at the problem. But if you do that, many alternatives are fast. More specialized cases may be a different matter &#8212; but there are many fast alternatives for those too.</p>
<p>I imagine there will be use cases for which VoltDB sustains a lead as the truly fastest alternative, similarly-architected competitors perhaps excepted.* But what Mike supposedly said seems quite forward-leaning when compared to technical reality.</p>
<p><em>*The canonical VoltDB use case is <a href="http://www.dbms2.com/2010/05/25/voltdb-finally-launches/">e-commerce in virtual goods</a>, the point of &#8220;virtual&#8221; being that physical inventory might necessitate costlier kinds of joins.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/14/an-odd-claim-attributed-to-mike-stonebraker/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>DB2 OLTP scale-out: pureScale</title>
		<link>http://www.dbms2.com/2011/05/06/db2-oltp-scale-out-purescale/</link>
		<comments>http://www.dbms2.com/2011/05/06/db2-oltp-scale-out-purescale/#comments</comments>
		<pubDate>Fri, 06 May 2011 15:20:51 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cache]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4406</guid>
		<description><![CDATA[Tim Vincent of IBM talked me through DB2 pureScale Monday. IBM DB2 pureScale is a kind of shared-disk scale-out parallel OTLP DBMS, with some interesting twists. IBM&#8217;s scalability claims for pureScale, on a 90% read/10% write workload, include: 95% scalability up to 64 machines 90% scalability up to 88 machines 89% scalability up to 112 [...]]]></description>
			<content:encoded><![CDATA[<p>Tim Vincent of IBM talked me through <strong>DB2 pureScale</strong> Monday. IBM DB2 pureScale is a kind of <strong>shared-disk scale-out parallel OTLP DBMS,</strong> with some interesting twists. IBM&#8217;s scalability claims for pureScale, on a 90% read/10% write workload, include:</p>
<ul>
<li>95% scalability up to 64 machines</li>
<li>90% scalability up to 88 machines</li>
<li>89% scalability up to 112 machines</li>
<li>84% scalability up to 128 machines</li>
</ul>
<p>More precisely, those are counts of cluster &#8220;members,&#8221; but the recommended configuration is one member per operating system instance &#8212; i.e. one member per machine &#8212; for reasons of availability. In an 80% read/20% write workload, scalability is less &#8212; perhaps 90% scalability over 16 members.</p>
<p>Several elements are of IBM&#8217;s DB2 pureScale architecture are pretty straightforward:</p>
<ul>
<li>There are multiple pureScale members (machines), each with its own instance of DB2.</li>
<li>There&#8217;s an RDMA (Remote Direct Memory Access) interconnect, perhaps InfiniBand. (The point of InfiniBand and other RDMA is that moving data doesn&#8217;t require interrupts, and hence doesn&#8217;t cost many CPU cycles.)</li>
<li>The DB2 pureScale members share access to the database on a disk array.</li>
<li>Each DB2 pureScale member has its own log, also on the disk array.</li>
</ul>
<p>Something called GPFS (Global Parallel File System), which comes bundled with DB2, sits underneath all this. It&#8217;s all based on the mainframe technology IBM Parallel Sysplex.</p>
<p>The weirdest part (to me) of DB2 pureScale is something called the <strong>Global Cluster Facility,</strong> which runs on its own set of boxes.  <em>(Edit: Actually, see Tim Vincent&#8217;s comment below.)</em><span id="more-4406"></span>These might have 20% or so of the cores of the member boxes, with perhaps a somewhat higher percentage of RAM (especially in the case of write-heavy workloads). Specifically:</p>
<ul>
<li>The DB2 pureScale Global Cluster Facility maintains a buffer pool (cache) shared by all the DB2 pureScale members.</li>
<li>Even so, the DB2 pureScale members themselves are in charge of disk access.</li>
</ul>
<p>So what&#8217;s going on here is not an <a href="../../../../../2008/10/17/oracle-notes/">Exadata-like split between database server and storage processing tiers</a>. The Global Cluster Facility also handles lock management, presumably because locking issues only arise when a page gets fetched into the buffer.</p>
<p>The other surprise is that every client talks to every member, usually through a connection pool from an app server. Tim Vincent assures me that DB2 connections are so lightweight this isn&#8217;t a problem. Clients have load-balancing code on behalf of the members, and route transactions to whichever pureScale member is least busy.</p>
<p>DB2 pureScale is designed to be pretty robust against outages:</p>
<ul>
<li>In the case of planned maintenance, a pureScale member can be &#8220;quiesced.&#8221; I.e., it stops being given new work; it finishes up its existing work; then maintenance happens; then the member starts being given work again.</li>
<li>In the case of an unplanned outage, the redo log naturally comes into play. The pureScale twist on this is that a second small instance of DB2 is around &#8212; or is started up? &#8212; just to handled the redos.</li>
</ul>
<p>Also, IBM believes that the DB2 pureScale locking strategy gives availability and performance advantages vs. the Oracle RAC (Real Application Cluster) approach. The distinction IBM draws is that any member can take over the lock on a buffer page from any other member, just by attempting to change the page &#8212; and the attempt will succeed; only row-level locks can ever block work.  Thus, if a node fails, I/O can merrily proceed on other nodes, without waiting for any recovery effort. IBM&#8217;s target is &lt;20 seconds for full row availability to be restored.</p>
<p>Obviously, it&#8217;s crucial that the Global Cluster Facility machines be fully mirrored, with no double failure &#8212; but so what? Modern computing systems have double-points-of-failure all over the place.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/05/06/db2-oltp-scale-out-purescale/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>Oracle and Exadata: Business and technical notes</title>
		<link>http://www.dbms2.com/2011/05/03/oracle-exadata-business-technology/</link>
		<comments>http://www.dbms2.com/2011/05/03/oracle-exadata-business-technology/#comments</comments>
		<pubDate>Tue, 03 May 2011 08:19:20 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Cache]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Emulation, transparency, portability]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Solid-state memory]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4361</guid>
		<description><![CDATA[Last Friday I stopped by Oracle for my first conversation since January, 2010, in this case for a chat with Andy Mendelsohn, Mark Townsend, Tim Shetler, and George Lumpkin, covering Exadata and the Oracle DBMS. Key points included:  Given Oracle’s market penetration and share, it makes sense that Oracle is focused on selling add-on products [...]]]></description>
			<content:encoded><![CDATA[<p>Last Friday I stopped by Oracle for my first conversation since January, 2010, in this case for a chat with Andy Mendelsohn, Mark Townsend, Tim Shetler, and George Lumpkin, covering Exadata and the Oracle DBMS. Key points included:  <span id="more-4361"></span></p>
<ul>
<li>Given Oracle’s market      penetration and share, it makes sense that<strong> Oracle is focused on selling      add-on products to its installed base.</strong> Oracle’s three top such      go-to-market emphases at the moment are:
<ul>
<li><strong>Database       consolidation,</strong> <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/">especially on Exadata</a>.</li>
<li><strong>Data warehousing,</strong> presumably on       Exadata.</li>
<li><strong>Database security,       especially encryption.</strong> This is not Exadata-specific, but does       exploit Intel Westmere on-chip encryption, which Oracle says allows       encryption with minimal overhead. This seems to be via something called <strong>Oracle Advanced Security.</strong></li>
</ul>
</li>
<li>Deleted*</li>
</ul>
<p><em>*Oracle asked me to delete a point on pricing they went out of their way to make, because they are in quiet period &#8212; even though nobody said it was confidential at the time, we weren&#8217;t under NDA, and it looks like public information to me anyway. Frankly, I&#8217;m not sure I was right to comply.<br />
</em></p>
<p>Oracle also told me quite a bit about Exadata onsite POCs (Proofs of Concept) and Exadata references, but I’ll save those subjects for future posts. The same goes for workload management.</p>
<p>Oracle&#8217;s version names and numbers can get confusing, but it turns out that:</p>
<ul>
<li>Oracle <span style="text-decoration: line-through;">11.203</span> 11.2.0.3 will come      out this fall. Oracle <span style="text-decoration: line-through;">11.204</span> 11.2.0.4 will come out a little more than a year      later. After that I imagine it will be time for Oracle 12.</li>
<li>The current versions of      Oracle Exadata are Exadata X2-2 and Exadata X2-8.
<ul>
<li>Oracle Exadata 2-2 is       evolutionary from prior Exadata versions, and has 8 moderately big       servers per rack. It can be sliced into half- or quarter-racks.</li>
<li>Oracle Exadata 2-8, in       lieu of those 8 servers, has 2 bigger SMP (Symmetric MultiProcessing)       systems, each with a terabyte of RAM. You can’t slice Exadata 2-8 below       full-rack size, as you’d lose redundancy among the servers.</li>
</ul>
</li>
</ul>
<p>I didn’t really understand the discussion as to why certain workloads and/or workload consolidations go better on the SMP boxes of Exadata X2-8 than the blades of Exadata X2-2, but Oracle assures me that some do. I also suspect that some Oracle customers prefer large SMP boxes for no good reason other than familiarity.</p>
<p>As for recent-release adoption:</p>
<ul>
<li>Oracle estimates that<strong> 40-50% of customers have Oracle 11g running </strong>somewhere in their shops,      mainly Oracle 11g Release 2.</li>
<li>All major ISVs      (Independent Software Vendors) are certified on Oracle 11g, typically      Oracle 11g Release 2.</li>
<li>But Exadata      certification is something different from Oracle 11g certification; for      example, <strong>SAP certification on Exadata is still underway, </strong>targeted      for some time this year.</li>
</ul>
<p>Exadata obviously enjoys huge performance gains over existing Oracle installations for certain analytic queries, and therefore for some whole analytic workloads. Oracle has happily trumpeted these. But it turns out that Exadata’s OLTP (OnLine Transaction Processing) performance gains are less dramatic. This makes all kinds of sense, given that Oracle’s analytic query performance was in pretty bad shape pre-Exadata, while OLTP has been just fine. The range Oracle used was <strong>2-3X OLTP performance gains vs. existing Oracle installations on several-year-old hardware.</strong> Oracle says somewhere <strong>over 50% of Exadata physical I/O* goes against flash cache </strong>in uses cases such as running Oracle’s application suite.</p>
<p><em>*Note that physical I/O may be only a small fraction of logical; e.g., SAP long ago said that <a href="../../../../../2009/07/07/hasso-plattner-calls-for-in-memory-oltp-column-stores/">&gt;99% of SAP transactions never hit disk</a>.</em></p>
<p>Finally, we talked about a variety of options or other related products. Highlights included:</p>
<ul>
<li>One piece of the Oracle      security story is a new product called<strong> Oracle Database Firewall,</strong> released in January, based on an acquisition of a small startup last year.      Targeted primarily at internal hackers, Oracle Database Firewall sniffs      your SQL traffic for a week or so, observes what kinds of SQL statements      can be expected, builds a white list accordingly, and casts a jaundiced      eye on any other kind of SQL statements that come through.</li>
<li><em>Edit: I have no idea why I was told the following, in view of <a href="http://www.dbms2.com/2011/05/03/oracle-on-active-active-replication/">a subsequent email</a>.</em> <span style="text-decoration: line-through;"><strong>Oracle Active Data Guard, </strong>first introduced in the      Oracle 11g code line, is the preferred way to do active-active Oracle      replication. That said: </span>
<ul>
<li><span style="text-decoration: line-through;">Not a lot of customers       use Oracle Active Data Guard yet &#8230;</span></li>
<li><span style="text-decoration: line-through;">&#8230; but a considerable       fraction of Exadata users are at least interested in it.</span></li>
<li><span style="text-decoration: line-through;">Some number of Oracle       customers have other kinds of active-active implementation. One option is       via GoldenGate.</span></li>
</ul>
</li>
<li><strong>Oracle Cloud File Management System</strong> is an Oracle 11g      feature/option that lets you managed non-Oracle data. It is related to ASM      (Automatic Storage Management), which seems to have been the most popular      Oracle 10g feature, and which is essential to Exadata. Oracle Cloud File      Management Systems seems to be popular for consolidation uses. But it is      not technically well suited to, for example, play the role of HDFS in a      MapReduce implementation.</li>
<li>For DBAs who care,      Exadata now supports Solaris on the database server tier as well as Linux.      (That would be Solaris on Intel, of course; Exadata doesn&#8217;t use Sparc.)      The storage tier still runs only on a kind of embedded Linux.</li>
<li><strong>Oracle 11g Express Edition</strong> (free crippleware)      just went into beta test.</li>
<li>And finally, <strong>Oracle SQL Developer 3.0</strong> features,      among other things, a GUI for Oracle Data Mining, and migration tools.      Sybase migration is in there now, and was enhanced for SQL Developer 3.0.      Teradata migration is slated for the next release.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/05/03/oracle-exadata-business-technology/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>The MongoDB story</title>
		<link>http://www.dbms2.com/2011/04/04/the-mongodb-story/</link>
		<comments>http://www.dbms2.com/2011/04/04/the-mongodb-story/#comments</comments>
		<pubDate>Mon, 04 Apr 2011 16:12:32 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[MongoDB and 10gen]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Parallelization]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4127</guid>
		<description><![CDATA[Along with CouchDB/Couchbase, MongoDB was one of the top examples I had in mind when I wrote about document-oriented NoSQL. Invented by 10gen, MongoDB is an open source, no-schema DBMS, so it is suitable for very quick development cycles. Accordingly, there are a lot of MongoDB users who build small things quickly. But MongoDB has [...]]]></description>
			<content:encoded><![CDATA[<p>Along with <a href="http://www.dbms2.com/2011/01/28/schooner-software-onl/">CouchDB/Couchbase</a>, MongoDB was one of the top examples I had in mind when I wrote about <a href="../../../../../2011/02/07/notes-on-document-oriented-nosql/">document-oriented NoSQL</a>. Invented by <a href="http://www.dbms2.com/2011/04/04/10gen-company-basics/">10gen</a>, MongoDB is an open source, no-schema DBMS, so it is suitable for very quick development cycles. Accordingly, there are a lot of MongoDB users who build small things quickly. But MongoDB has heftier uses as well, and naturally I&#8217;m focused more on those.</p>
<p>MongoDB&#8217;s data model is based on <a href="../../../../../2011/02/07/notes-on-document-oriented-nosql/#comment-207762">BSON, which seems to be JSON-on-steroids</a>. In particular:</p>
<ul>
<li>You just bang things into single BSON objects managed by MongoDB; there is nothing like a foreign key to relate objects. However &#8230;</li>
<li>&#8230; there are fields, datatypes, and so on <a href="http://www.mongodb.org/display/DOCS/BSON">within MongoDB BSON objects</a>. The fields are indexed.</li>
<li>There&#8217;s a multi-value/nested-data-structure flavor to MongoDB; for example, a BSON object might store multiple addresses in an array.</li>
<li><a href="../../../../../2010/11/29/document-database-without-joins/">You can&#8217;t do joins</a> in MongoDB. Instead, you are encouraged to put what might be related records in a relational database into a single MongoDB object. If that doesn&#8217;t suffice, then use client-side logic to do the equivalent of joins. If that doesn&#8217;t suffice either, you&#8217;re not looking at a good MongoDB use case.</li>
</ul>
<p><span id="more-4127"></span>MongoDB has integrated MapReduce. Natural uses include:</p>
<ul>
<li>The usual kinds of transformations one might do via MapReduce.</li>
<li>Aggregations one might otherwise do in SQL (e.g. GROUP BY kinds of things are an obvious MapReduce fit).</li>
</ul>
<p>Improved aggregation/MapReduce performance is a roadmap item.</p>
<p>However, Dwight said MongoDB has excellent performance in simple real-time reporting, for example updating a counter 10,000 times per second. When I asked him why, reasons included:</p>
<ul>
<li>A memory-mapped data model.</li>
<li>Deferred writes &#8212; a write might take a couple of seconds to actually persist.</li>
<li>Optimism &#8212; you don&#8217;t have to wait for an acknowledgement if you write something to the database.</li>
<li>“Upsert in place” – update in place without checking whether you&#8217;re doing a write or insert.</li>
<li>General lack of overhead.</li>
</ul>
<p>Inspired in part by <a href="http://www.dbms2.com/2011/01/28/schooner-software-onl/">Schooner&#8217;s</a> internal benchmarks, I&#8217;ve come to think that, apples-to-apples, even the simplest key-value store will have &lt; 3X single-node performance advantage over well-implemented MySQL. (Read, write, or blended.) The 10gen guys don&#8217;t dispute that. However, they point out that a single MongoDB request can process the equivalent of many relational rows, opening the possibility of much greater performance gains than that. In particular, there seem to be some Drupal implementations enjoying huge MongoDB-based speed-ups.</p>
<p>On a heavy-duty server (8-12 cores, 16-64 gigs of RAM), MongoDB can apparently do 20-30,000 writes or 100,000 reads per second. Improved concurrency and mixed read/write performance is coming, although obviously 10gen would think that MongoDB does pretty well in those areas already. The largest known MongoDB system does about 1 million reads/second. I would imagine that those figures require the database to fit into RAM, given:</p>
<ul>
<li>MongoDB&#8217;s memory-mapped architecture.</li>
<li>This <a href="http://nosql.mypopescu.com/post/1265191137/foursquare-mongodb-outage-post-mortem">post mortem of the MongoDB Foursquare outage</a>.</li>
</ul>
<p>I&#8217;ve gotten conflicting signals as to whether there are any multi-hundred-node MongoDB deployments. But there are &#8220;lots&#8221; of 20-40 node ones, as well as lots under 10 nodes. Note that 1 million reads/second naively sounds as if it could be achieved on 10 MongoDB nodes &#8212; but I&#8217;d guess that&#8217;s not really the configuration. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>MongoDB&#8217;s scale-out story starts with <a href="../../../../../2011/02/24/transparent-sharding/">transparent sharding</a>, although I think transparent sharding was a new feature in MongoDB 1.6 last summer. So far as I understand, what MongoDB sacrifices in the CAP Theorem is the P &#8212; partitions happen, and when they do you might not be able to write to the shard you want to. A MongoDB shard can have multiple nodes, in a master-slave set-up, and there&#8217;s some flexibility as to how intra-node consistency is handled.</p>
<p>MongoDB functionality futures include full-text search, and extensions to the general query language.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/04/04/the-mongodb-story/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

