<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Storage</title>
	<atom:link href="http://www.dbms2.com/category/storage/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Tue, 07 Feb 2012 06:49:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>MarkLogic 5, and why you might care</title>
		<link>http://www.dbms2.com/2011/11/01/marklogic-version-5/</link>
		<comments>http://www.dbms2.com/2011/11/01/marklogic-version-5/#comments</comments>
		<pubDate>Tue, 01 Nov 2011 04:03:59 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MarkLogic]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Structured documents]]></category>
		<category><![CDATA[Text]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5560</guid>
		<description><![CDATA[MarkLogic is releasing MarkLogic 5. Key elements of the announcement are: More-of-the-same in line with MarkLogic’s core positioning. A new bi-directional Hadoop connector. A free MarkLogic Express edition, limited in license terms more than in actual features, as per Slide 27 of the deck MarkLogic graciously supplied for me to post. Also, MarkLogic is early [...]]]></description>
			<content:encoded><![CDATA[<p>MarkLogic is releasing MarkLogic 5. Key elements of the announcement are:</p>
<ul>
<li>More-of-the-same      in line with MarkLogic’s core positioning.</li>
<li>A new      bi-directional Hadoop connector.</li>
<li>A free      MarkLogic Express edition, limited in license terms more than in actual      features, as per Slide 27 of <a href="http://www.monash.com/uploads/MarkLogic-5-Deck.pptx">the deck      MarkLogic graciously supplied for me to post</a>.</li>
</ul>
<p>Also, MarkLogic is early with a feature that most serious DBMS vendors will  soon have – support for tiered storage, with writes going first to  solid-state storage, then being flushed to disk via a caching-style  algorithm.* And as befits a sometime search-engine-substitute, MarkLogic has finally licensed a large set of document filters, from an Australian company called <a href="http://www.isys-search.com/index.html">Isys</a>. Apparently, the special virtue of the Isys filters is that they’re good at extracting not only text, but metadata as well.</p>
<p><em>*If there’s a caching algorithm that doesn’t contain a major element of LRU (Least Recently Used), I don’t recall ever hearing about it.</em></p>
<p>MarkLogic seems to have settled on a positioning that, although distressingly buzzword-heavy, is at least partly based upon reality. The real part includes:</p>
<ul>
<li>MarkLogic      is a serious, enterprise-class DBMS (see for example Slide 12 of <a href="http://www.monash.com/uploads/MarkLogic-5-Deck.pptx">the MarkLogic      deck</a>) …</li>
<li>…      which has been optimized from the getgo for <a href="../../../../../2011/05/17/poly-structured-database/">poly-structured      data</a>.</li>
<li>MarkLogic      can and does scale out to handle large amounts of data.</li>
<li>MarkLogic      is a general-purpose DBMS, suitable for <a href="../../../../../2011/03/30/short-request-and-analytic-processing/">both      short-request and analytic tasks</a>.</li>
<li>MarkLogic      is particularly well suited for analyses with long chains of “progressive      enhancement” (MarkLogic’s favorite term when talking about <a href="../../../../../2011/05/30/another-category-of-derived-data/">derived      data</a>).</li>
<li><a href="http://blogs.avalonconsult.com/blog/search/is-marklogic-a-search-engine/">MarkLogic      often plays the role of a content assembler and/or search engine</a>, and      the people who use MarkLogic in those ways are commonly doing things that can      be described as research and analysis.</li>
</ul>
<p>Based on that reality, MarkLogic talks a lot about Volume, Velocity, Variety, Big Data, unstructured data, semi-structured data, and big data analytics.</p>
<p><span id="more-5560"></span><em>My <a href="../../../../../2010/11/29/marklogic-and-its-document-dbms/">November, 2010 overview of MarkLogic technology</a> remains pretty relevant. One correction, however: Node heterogeneity configurations, in which “data” and “evaluation” nodes reside on separate servers, are the exception rather than the rule.</em></p>
<p>Like <a href="../../../../../2011/10/18/vertica-community-edition/">Vertica</a>, MarkLogic has laudably said that true academic researchers can get MarkLogic for free without the severe license restrictions. Free MarkLogic should be of particular interest to researchers who:</p>
<ul>
<li>Are      studying natural networks or graphs, such as social networks or biological      pathways. (This might be a fit in the social or biological sciences.)</li>
<li>Are      managing metadata for, say, a variety of disparate kinds of experimental      files. (This might be a fit anywhere in the natural sciences.)</li>
<li>Are      managing actual documents, images, videos, etc., or data about such      things. (This might be a fit in the humanities or social sciences.)</li>
</ul>
<p>MarkLogic provided some disclosable financial substance by email, which I shall quote verbatim:</p>
<ul>
<li><em>MarkLogic      has 45% revenue growth and 55-60% license growth year over year.</em></li>
<li><em>We      expect to finish this year with over $85 million in revenue, up from $55      million last year.</em></li>
</ul>
<p>Arithmetical purists might note that 85/55 is more than 145%, but I’m just going to settle for the information I got and move on.</p>
<p><em>Edit: I posted separately about the <a href="http://www.dbms2.com/2011/11/03/marklogic-hadoop-connector/">MarkLogic Hadoop connector.</a></em> <span style="text-decoration: line-through;">As for that Hadoop connector – stay tuned for a short follow-up post, as writing about it now would not be convenient. (My backup discipline isn’t what it should be, and the only copy of my notes about that product is on a heavy tower computer in a house that doesn’t have working power.)</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/01/marklogic-version-5/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>HP systems soundbites</title>
		<link>http://www.dbms2.com/2011/09/22/hp-systems-soundbites/</link>
		<comments>http://www.dbms2.com/2011/09/22/hp-systems-soundbites/#comments</comments>
		<pubDate>Thu, 22 Sep 2011 17:44:31 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Exadata]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5314</guid>
		<description><![CDATA[It is widely rumored that there will be a leadership change at HP (Meg Whitman in, Leo Apotheker out). In connection with that, I found myself holding forth on points such as: HP needs to make outstanding enterprise systems again. They fell away from that target under Mark Hurd, but they surely can hit it [...]]]></description>
			<content:encoded><![CDATA[<p>It is widely rumored that there will be a leadership change at HP (Meg Whitman in, Leo Apotheker out). In connection with that, I found myself holding forth on points such as:</p>
<ul>
<li>HP needs to make outstanding enterprise systems again.</li>
<li>They fell away from that target under Mark Hurd, but they surely can hit it again, based on the remnants of DEC (Digital Equipment Corporation), Tandem, the higher-end part of Compaq, and of course the original HP systems group.</li>
<li>In particular:
<ul>
<li>Rumors say that Oracle Exadata 1 boxes, made by HP, were much lower quality than Exadata 2 boxes made by Sun.</li>
<li>HP Neoview was a waste of good engineering talent.</li>
<li>I&#8217;d like to see a few excellent Vertica appliances.</li>
<li>I hope the SAP HANA appliances go well, whenever HANA finally becomes a serious product.</li>
<li>The general move from disk to solid-state memory should offer some opportunities.</li>
</ul>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/22/hp-systems-soundbites/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Are there any remaining reasons to put new OLTP applications on disk?</title>
		<link>http://www.dbms2.com/2011/09/19/oltp-disk-solid-state/</link>
		<comments>http://www.dbms2.com/2011/09/19/oltp-disk-solid-state/#comments</comments>
		<pubDate>Mon, 19 Sep 2011 18:07:07 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[dbShards and CodeFutures]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5257</guid>
		<description><![CDATA[Once again, I&#8217;m working with an OLTP SaaS vendor client on the architecture for their next-generation system. Parameters include: 100s of gigabytes of data at first, growing to &#62;1 terabyte over time. High peak loads. Public cloud portability (but they have private data centers they can use today). Simple database design &#8212; not a lot [...]]]></description>
			<content:encoded><![CDATA[<p>Once again, I&#8217;m working with an OLTP SaaS vendor client on the architecture for their next-generation system. Parameters include:</p>
<ul>
<li>100s of gigabytes of data at first, growing to &gt;1 terabyte over time.</li>
<li>High peak loads.</li>
<li>Public cloud portability (but they have <strong>private data centers they can use today).</strong></li>
<li>Simple database design &#8212; not a lot of tables, not a lot of columns, not a lot of joins, and everything can be distributed on the same customer_ID key.</li>
<li>Stream the data to a data warehouse, that will grow to a few terabytes. (Keeping only one year of OLTP data online actually makes sense in this application, but of course everything should go into the DW.)</li>
</ul>
<p>So I&#8217;m leaning to saying:   <span id="more-5257"></span></p>
<ul>
<li>They should go with a scalable, MySQL-based solution.
<ul>
<li>Lots of third-party software works with MySQL, in case that&#8217;s helpful.</li>
<li>Yes, any one vendor is small and not yet firmly established, but there are numerous vendors around with interesting MySQL scaling stories.</li>
<li>In a vendor emergency, just going with Oracle&#8217;s MySQL stuff would probably work &#8230;</li>
<li>&#8230; especially because there are these lovely things in the world called <strong>solid-state drives.</strong></li>
<li>There&#8217;s also good escapability if one wants to move away from MySQL, because everybody knows how to handle MySQL data.</li>
</ul>
</li>
<li>The first product to look at is dbShards, because it meets all the topology needs:
<ul>
<li>Local scale-out (<a href="http://www.dbms2.com/2011/02/24/transparent-sharding/">transparent sharding</a>).</li>
<li><a href="http://www.dbms2.com/2011/02/09/clarification-on-dbshards-shard-replication/">Local high availability</a>.</li>
<li>Remote disaster recovery (details of that are underway).</li>
</ul>
</li>
<li>The first analytic DBMS to look at is Infobright.
<ul>
<li>Yes, I know Infobright is focused more on machine-generated data these days, but this client&#8217;s analytic needs are so straightforward Infobright should pass with flying colors.</li>
<li>The MySQL-to-MySQL aspect should make ETL dead simple.</li>
<li>Again, there&#8217;s escapability.</li>
</ul>
</li>
</ul>
<p>Mainly, this is all fine. But I&#8217;m getting pushback on the solid-state aspect, for fear that it will compromise public cloud portability.</p>
<p>Am I missing something here? As far as I&#8217;m concerned, <strong>if you&#8217;re planning an OLTP system with a many-year lifespan today, </strong>of course <strong>you should assume solid-state storage.</strong> Maybe you scale out just as far as you would with disk, striping indexes or entire databases across the RAM of multiple servers. It that case, having solid-state backing reduces the risk of bottlenecks. Maybe you don&#8217;t scale out as far as you would with disk. In that case, solid-state backing saves you money.</p>
<p><strong>As for public-cloud support for solid-state storage, that&#8217;s coming fast, right? </strong>(Actually, I have data points in support of that theory, but they&#8217;re a bit tenuous.) A large fraction of web businesses with private data centers seem to be using solid-state storage &#8212; from Facebook on down &#8212; or so the NoSQL/NewSQL/<a href="http://www.dbms2.com/2011/03/02/short-request-processing/">short-request</a> DBMS guys tell me. Surely a number of public cloud vendors are close behind.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/19/oltp-disk-solid-state/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Kaminario goes (mainly) flash</title>
		<link>http://www.dbms2.com/2011/09/14/kaminario-goes-mainly-flash/</link>
		<comments>http://www.dbms2.com/2011/09/14/kaminario-goes-mainly-flash/#comments</comments>
		<pubDate>Wed, 14 Sep 2011 09:30:53 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Kaminario]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Solid-state memory]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5227</guid>
		<description><![CDATA[Kaminario, which used to be in the business of solid state storage via DRAM, now is emphasizing hybrid DRAM/flash storage appliances instead. The reason is evidently price. Per terabyte of primary storage (before mirroring onto disk and so on): A Kaminario K2 DRAM-only appliance costs $100K. A Kaminario K2 flash-only appliance costs $30K (but nobody [...]]]></description>
			<content:encoded><![CDATA[<p>Kaminario, which used to be in the business of solid state storage via DRAM, now is emphasizing hybrid DRAM/flash storage appliances instead. The reason is evidently price. <strong>Per terabyte of primary storage</strong> (before mirroring onto disk and so on):</p>
<ul>
<li>A Kaminario K2 DRAM-only appliance costs <strong>$100K.</strong></li>
<li>A Kaminario K2 flash-only appliance costs $30K (but nobody buys that configuration).</li>
<li>A typical Kaminario K2 hybrid DRAM/flash appliance might cost <strong>$35K</strong> (which tells us that there&#8217;s a lot more flash than DRAM).</li>
</ul>
<p>Kaminario positions DRAM as where you focus your most write-intensive/ bottlenecking loads, such as logging or <a href="../../../../../2010/08/16/vertica-flash-temp-space/">temp space</a>, with the primary benefit being performance and a secondary benefit being slowing the wear on your flash.</p>
<p><span id="more-5227"></span><em>If you want even your mirrors to be on flash &#8212; which Kaminario says greatly reduces the temporary performance hit in case of a failure &#8212; there will be an additional charge. Perhaps Kaminario will dig up a price number and post it in the comment thread.</em></p>
<p>The flash comes in via Fusion-io cards. Kaminario stresses that it sells a SAN (Storage Area Network) kind of offering, as opposed to the shared-nothing way one might otherwise use Fusion-io cards in servers&#8217; PCIe slots. Kamanario further asserts its built-in high availability is both smoother and less costly than Texas Memory Systems or Violin Memory alternatives; Kaminario is generally proud of its high availability features, down to redundant uninterruptible power supplies. Apparently the sweet spot of Kaminario&#8217;s market is single-chassis 5-6 TB systems, but Kaminario asserts seamless elasticity even if you grow into a second chassis.</p>
<p>Price resistance seems to have gotten strongly in the way of Kaminario&#8217;s growth, although the company was evasive about customer counts and the like. But it does now have 60+ employees and an aggressive hiring plan, vs. &lt;50 when <a href="../../../../../2010/10/19/introduction-to-kaminario/">I wrote about Kaminario a year ago</a>. I do believe that many enterprises would benefit from<strong> throwing solid-state storage at certain performance problems,</strong> at least as a band-aid, while they contemplate software changes.* But evidently Kaminario has had difficulties &#8212; especially at the DRAM-only price point &#8212; getting customers to agree, or at least to agree that Kaminario K2 was a sufficiently cost-effective way to address the issue.</p>
<p><em>*If you like, you can regard this as <strong>deferring repayment of your technical debt.</strong></em></p>
<p>Kaminario&#8217;s comments about how its technology is or will be applied are all over the place (again, I think part of this is due to having a small number of customers overall, and wanting to conceal how small that number is). But in general Kaminario has seen more OLTP (OnLine Transaction Processing) than analytic uptake, which contributes to them thinking that low latency is a bigger deal than raw IOPS (Input/Output Per Second). Certainly Kaminario is focused on database applications of some kind or other, generally running on big-name DBMS such as Oracle or Microsoft SQL Server</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/14/kaminario-goes-mainly-flash/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Couchbase technical update</title>
		<link>http://www.dbms2.com/2011/08/13/couchbase-technical-update/</link>
		<comments>http://www.dbms2.com/2011/08/13/couchbase-technical-update/#comments</comments>
		<pubDate>Sun, 14 Aug 2011 04:08:03 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cache]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[memcached]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5081</guid>
		<description><![CDATA[My Couchbase business update with Bob Wiederhold was very interesting, but it didn&#8217;t answer much about the actual Couchbase product. For that, I talked with Dustin Sallings. We jumped around a lot, and some important parts of the Couchbase product haven&#8217;t had their designs locked down yet anyway. But here&#8217;s at least a partial explanation [...]]]></description>
			<content:encoded><![CDATA[<p>My <a href="http://www.dbms2.com/2011/08/13/couchbase-business-update/">Couchbase business update</a> with Bob Wiederhold was very interesting, but it didn&#8217;t answer much about the actual Couchbase product. For that, I talked with Dustin Sallings. We jumped around a lot, and some important parts of the Couchbase product haven&#8217;t had their designs locked down yet anyway. But here&#8217;s at least a partial explanation of what&#8217;s up.</p>
<p>memcached is a way to cache data in RAM across a cluster of servers and have it all look logically like a single memory pool, extremely popular among large internet companies. The Membase product &#8212; which is what Couchbase has been selling this year &#8212; adds persistence to memcached, an obvious improvement on requiring application developers to write both to memcached and to <a href="http://www.dbms2.com/2011/02/24/transparent-sharding/">non-transparently-sharded MySQL</a>. The main technical points in adding persistence seem to have been:</p>
<ul>
<li>A <strong>persistent backing store</strong> (duh), namely SQLite.</li>
<li>A <strong>change to the hashing algorithm,</strong> to avoid losing data when the cluster configuration is changed.</li>
</ul>
<p>Couchbase is essentially Membase improved by integrating CouchDB into it, with the main changes being:</p>
<ul>
<li><strong>Changing the backing store to CouchDB</strong> (duh). This will be in the first Couchbase release.</li>
<li><strong>Adding cross data center replication on CouchDB&#8217;s consistency model.</strong> This will not, I believe, be in the first Couchbase release.</li>
<li><strong>Offering CouchDB&#8217;s programming and query interfaces as an option.</strong> So far as I can tell, this will be implemented straightforwardly in the first Couchbase release, with elegance planned for later down the road.</li>
</ul>
<p>Let&#8217;s drill down a bit into <strong>Membase/Couchbase clustering and consistency. </strong><span id="more-5081"></span></p>
<ul>
<li>When data is written to RAM in memcached, it immediately gets copied to another server. The same is of course true in Membase/Couchbase. The terminology on all this is confusing, but I think:
<ul>
<li>The portion of data that is stored as a primary copy on any given server is called a &#8220;shard&#8221;.</li>
<li>That would seem to make sense, as that data could correspond to what goes &#8212; <a href="http://www.dbms2.com/2011/02/24/transparent-sharding/">non-transparently</a> &#8212; into an instance of MySQL in a classical memcached/MySQL set-up.</li>
</ul>
</li>
<li>Updates are of course also banged to disk ASAP &#8212; but at times of heavy load, that can take a while. A few seconds to a couple of minutes is normal operation; if it takes an hour, you really should buy more hardware. (Or solid-state storage.)</li>
<li>Similarly, the replication of data to a second machine&#8217;s RAM may not happen at times of heavy load &#8212; and that&#8217;s another sign you don&#8217;t have enough machines.</li>
<li>Each Membase/Couchbase &#8220;shard&#8221; has lots of logical sub-shards.* (1024 for now, at least as default, although Dustin finds that number excessive and is looking to lower it.)  So if you add a node, some of the sub-shards get sent over to the new node. Unlike the case for straight memcached, no data is lost from cache (and of course not also from the persistent store). Blocking of operations from such a move only happens in narrow time windows, and then only in edge cases.</li>
</ul>
<p><em>*Edit: They&#8217;re called <a href="http://dustin.github.com/2010/06/29/memcached-vbuckets.html">vbuckets</a>.</em></p>
<p><em></em>So if we consider Membase technology alone, Couchbase is CA in the CAP Theorem.  CouchDB, however, is gloriously AP in the CAP Theorem, in that it was written to assume an occasionally connected topology.* Based on that, Couchbase will allow AP operation between data centers (i.e. &#8220;stay synchronized if you can, to within the limitations of physics and so on, but don&#8217;t beat yourself up on the rare occasions that you can&#8217;t.&#8221;) I don&#8217;t know that that capability will quite be in the first release of Couchbase, but it&#8217;s coming soon.</p>
<p><em>*CouchDB also has other features friendly to occasionally-connected use cases, such as a lot of flexibility as to which parts of the database are or aren&#8217;t synced when you do reconnect. These are at the heart of the Couchbase Mobile offering.</em></p>
<p>memcached and Membase have a very simple key-value interface. CouchDB adds secondary indexes and so on. I think in the first release of Couchbase this is pretty much like having two different APIs for the same product; more elegant integration is planned down the road, and more language support as well.</p>
<p>The highest-performing way to use Couchbase will probably always be to just pretend it is Membase, which is to say memcached+. Dustin told me of Membase users who demanded 10-40 millisecond response times, and that not even for single queries but rather for sequences of several queries in succession. He further told me of customers asking for 1-200 microsecond response, and insisting on no worse than 1 millisecond. Frankly, the first requirement could be met by lots of technologies I can think of, at least if  you don&#8217;t rely on disk; the second is thoroughly impossible if you rely on disk, and pretty demanding no matter what kind of hardware and storage you have.</p>
<p>Couchbase performance against disk is a work in progress. CouchDB started out 8X slower than SQLite as a backing store, apples to apples, but Couchbase is fixing that before they roll the product out. (After all, they wouldn&#8217;t want to slow the product down in the course of an upgrade.) Beyond that, when you do exploit the indexing capability of CouchDB, performance of course slows down. Work is underway to lower the performance hit; I imagine much improvement can indeed be made, given how few resources CouchDB has been able to devote to date to <a href="http://www.dbms2.com/2009/08/21/bottleneck-whack-a-mole/">Bottleneck Whack-A-Mole</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/08/13/couchbase-technical-update/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>MongoDB users and use cases</title>
		<link>http://www.dbms2.com/2011/07/27/mongodb-users-and-use-cases/</link>
		<comments>http://www.dbms2.com/2011/07/27/mongodb-users-and-use-cases/#comments</comments>
		<pubDate>Wed, 27 Jul 2011 18:14:36 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Games and virtual worlds]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MongoDB and 10gen]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Splunk]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5031</guid>
		<description><![CDATA[I spoke with Eliot Horowitz and Max Schierson of 10gen last month about MongoDB users and use cases. The biggest clusters they came up with weren&#8217;t much over 100 nodes, but clusters an order of magnitude bigger were under development. The 100 node one we talked the most about had 33 replica sets, each with [...]]]></description>
			<content:encoded><![CDATA[<p>I spoke with Eliot Horowitz and Max Schierson of 10gen last month about MongoDB users and use cases. The biggest clusters they came up with weren&#8217;t much over 100 nodes, but clusters an order of magnitude bigger were under development. The 100 node one we talked the most about had 33 replica sets, each with about 100 gigabytes of data, so that&#8217;s in the 3-4 terabyte range total. In general, the largest MongoDB databases are 20-30 TB; I&#8217;d guess those really do use the bulk of available disk space.   <span id="more-5031"></span></p>
<p>10gen recommends solid-state storage in many cases. In some cases solid-state lets you get away with fewer total nodes. 10gen also likes Flashcache (Facebook-developed technology to put a flash cache in front of hard disks). But the 100-node example mentioned above uses spinning disk.</p>
<p>Use cases 10gen is proud of include:</p>
<ul>
<li>Lots of user profile maintenance, including at online ad companies. This includes full user ad impression data. (I&#8217;ve argued for a while that <a href="../../../../../2010/09/17/jp-morgan-chase-oracle-database-outage/">user profile information belongs in something like a NoSQL database</a>.)</li>
<li>A big-name web company that wants to inspect every packet that enters their network, and replaced Splunk with MongoDB for performance reasons.</li>
<li>A big-name photo/video site whose metadata is all in MongoDB. (That&#8217;s the kind of thing that often makes for good <a href="../../../../../2011/05/30/another-category-of-derived-data/">MarkLogic</a> use cases.)</li>
</ul>
<p>But actually, the reason we had the call was to review cases where MongoDB&#8217;s <strong>schemaless</strong> nature was significant. Examples of those included:</p>
<ul>
<li>A couple of top examples were of the kind &#8220;A bunch of apps, similar but not the same.&#8221; For MTV, it&#8217;s a single content management system for a bunch of websites. For Disney Playdom, it&#8217;s different schemas for every game.</li>
<li>For a wireless telco, the issue was a product catalog in which devices and service plans called for very different schemas, and which the telco felt had thus become unmanageable in Oracle.</li>
<li>For Craigslist, the issue wasn&#8217;t programming so much as performance &#8212; <a href="http://blog.zawodny.com/2010/04/27/i-want-a-new-data-store/">ALTER TABLE operations took months in MySQL</a>, and that&#8217;s not a typo, although I&#8217;ll confess to not understanding why this was the case.</li>
</ul>
<p>The 10gen guys went on to claim that schemalessness is helpful for incremental development in general, the point being that you don&#8217;t have a database-modification step. To some extent, changes can even be rolled back more easily than if you actually changed your schemas.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/27/mongodb-users-and-use-cases/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Hadoop hardware and compression</title>
		<link>http://www.dbms2.com/2011/07/06/hadoop-hardware-and-compression/</link>
		<comments>http://www.dbms2.com/2011/07/06/hadoop-hardware-and-compression/#comments</comments>
		<pubDate>Wed, 06 Jul 2011 05:09:10 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Zettaset]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4899</guid>
		<description><![CDATA[A month ago, I posted about typical Hadoop hardware. After talking today with Eric Baldeschwieler of Hortonworks, I have an update. I also learned some things from Eric and from Brian Christian of Zettaset about Hadoop compression. First the compression part. Eric thinks 6-10X compression is common for &#8220;curated&#8221; Hadoop data &#8212; i.e., the data [...]]]></description>
			<content:encoded><![CDATA[<p>A month ago, I posted about <a href="../../../../../2011/06/04/hardware-for-hadoop/">typical Hadoop hardware</a>. After talking today with Eric Baldeschwieler of Hortonworks, I have an update. I also learned some things from Eric and from Brian Christian of Zettaset about Hadoop compression.</p>
<p>First the compression part. Eric thinks 6-10X compression is common for &#8220;curated&#8221; Hadoop data &#8212; i.e., the data that actually gets used a lot. Brian used an overall figure of 6-8X, and told of a specific customer who had 6X or a little more. By way of comparison, it sounds as if the kinds of data involved are like what <a href="../../../../../2008/09/24/vertica-finally-spells-out-its-compression-claims/">Vertica claimed 10-60X compression</a> for almost three years ago.</p>
<p>Eric also made an excellent point about low-value <a href="../../../../../2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a>. I was suggesting that as Moore&#8217;s Law made sensor networks ever more affordable:  <span id="more-4899"></span></p>
<ul>
<li>There would be lots more data thrown off.</li>
<li>A lot of it would be repetitive &#8220;I&#8217;m fine; nothing to report&#8221; kinds of events.</li>
<li>It would be a good idea to filter this low-value information out rather than permanently storing it.</li>
</ul>
<p>Eric retorted that such data compresses extremely well. He was, of course, correct. If you have a long sequence or other large amount of identical data, and the right compression algorithms* &#8212; yeah, that compresses really well.</p>
<p><em>*Think run-length encoding (RLE), delta, or tokenization with variable-length tokens.</em></p>
<p>While I was at it, I asked Eric what might be typical for Hadoop temp/working space. He said at Yahoo it was getting down to 1/4 of the disk, from a previous range of 1/3.</p>
<p>Anyhow, Yahoo&#8217;s most recent standard Hadoop nodes feature:</p>
<ul>
<li>8-12 cores</li>
<li>48 gigabytes of RAM</li>
<li>12 disks of 2 or 3 TB each</li>
</ul>
<p>If you divide 12 by 3 for standard Hadoop redundancy, and take off 1/4, then you have 6-9 TB/node. Multiple that by a compression factor of 6-10X, at least for the &#8220;curated data,&#8221; and you get to 36-90 TB of user data per node.</p>
<p>As an alternative, suppose we take a point figure from <a href="http://www.dbms2.com/2011/06/04/hardware-for-hadoop/">Cloudera&#8217;s ranges</a> of 16 TB of spinning disk per node (8 spindles, 2 TB/disk). Go with the 6X compression figure. Lop off 1/3 for temp space. That more conservative calculation leaves us a bit over 20 TB/node, which is probably a more typical figure among today&#8217;s Hadoop users.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/06/hadoop-hardware-and-compression/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Patent nonsense: Parallel Iron/HDFS edition</title>
		<link>http://www.dbms2.com/2011/06/10/patent-nonsense-parallel-ironhdfs-edition/</link>
		<comments>http://www.dbms2.com/2011/06/10/patent-nonsense-parallel-ironhdfs-edition/#comments</comments>
		<pubDate>Fri, 10 Jun 2011 08:10:13 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[EMC]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Storage]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4633</guid>
		<description><![CDATA[Alan Scott commented with concern about Parallel Iron&#8217;s patent lawsuit attacking HDFS (Hadoop Distributed File System), filed in &#8212; where else? &#8212; Eastern Texas. The patent in question &#8212; US 7,415,565 &#8212; seems to in essence cover any shared-nothing block storage that exploits a &#8220;configurable switch fabric&#8221;; indeed, it&#8217;s more oriented to OLTP (OnLine Transaction [...]]]></description>
			<content:encoded><![CDATA[<p>Alan Scott commented with concern about <a href="../../../../../2011/06/05/hadoop-confusion-from-forrester-research/#comment-226611">Parallel Iron&#8217;s patent lawsuit attacking HDFS</a> (Hadoop Distributed File System), filed in &#8212; where else? &#8212; Eastern Texas. <a href="http://www.patents.com/us-7415565.html">The patent in question</a> &#8212; US 7,415,565 &#8212; seems to in essence cover any shared-nothing block storage that exploits a &#8220;configurable switch fabric&#8221;; indeed, it&#8217;s more oriented to OLTP (OnLine Transaction Processing) than to analytics. For example, the Background section starts: <span id="more-4633"></span></p>
<blockquote><p>The present invention relates to data storage, and more particularly, to methods and systems for a high throughput storage device.</p>
<p>A form of on-line transaction processing (OLTP) applications requiring a high number of data block reads or writes are called H-OLTP applications. A large server or mainframe or several servers typically host an H-OLTP application. Typically, these applications involve the use of a real time operating system, a relational database, optical fiber based networking, distributed communications facilities to a user community, and the application itself. Storage solutions for these applications use a combination of mechanical disk drives and cached memory under stored program control. The techniques for the storage management of H-OLTP applications can use redundant file storage algorithms on multiple disk drives, memory cache replications, data coherency algorithms, and/or load balancing.</p></blockquote>
<p>and ends</p>
<blockquote><p>It would be desirable for large capacity storage to provide sufficient throughput for high-volume, real-time applications, especially, for example in emerging applications in financial, defense, research, customer management, and homeland security areas.</p></blockquote>
<p>The independent claims are:</p>
<blockquote><p>1. A storage system comprising: one or more memory sections, including one or more memory devices including storage locations that store data, and a memory section controller that provides addresses to the memory devices, the addresses identifying storage locations for a memory device, wherein the memory devices use the provided addresses to perform a function selected from the set of reading out and writing data to/from the memory devices; and one or more switches, comprising a configurable switch fabric, that receive a data request including a data block identifier and switch the data request to one or more of the memory sections determined by applying the data block identifier to an algorithm that selectively configures operation of the switch fabric, the data block identifier identifying a set of storage locations; wherein the memory sections to which the data request was switched forward the received data block identifier to its memory section controller which maps the data block identifier to a set of addresses for the storage locations identified by the data block identifier, and provides the set of addresses to one or more of the memory section&#8217;s memory devices.<br />
&#8230;<br />
16. A method for use in a storage system, comprising: storing data in storage locations in a memory device; receiving by a switch comprising a configurable switch fabric, a data request including a data block identifier; the switch switching the data request to a memory section including the memory device determined by applying the data block identifier to an algorithm that selectively configures operation of the switch, the data block identifier identifying a set of storage locations in the memory device; forwarding the received data block identifier to a memory section controller; the memory section controller mapping the data block identifier to a set of addresses for the storage locations identified by the data block identifier; and the memory section controller providing the set of addresses to the memory device; and the memory device using the provided addresses to perform a function selected from the set of reading and writing data to/from the memory device.<br />
&#8230;<br />
26. A storage system, comprising: means for storing, including: means for storing data in storage locations, the means for storing data in storage locations including means for reading data stored in the storage locations using an address; means for controlling the means for storing, the means for controlling including: means for mapping a data block identifier to a set of addresses, means for providing the addresses to the means for storing data in storage locations, the addresses identifying storage locations; means for switching, including means for receiving a data request including a data block identifier; means for switching the data request based on the data block identifier to a means for storing determined by applying the data block identifier to an algorithm that selectively configures operation of the means for switching, the data block identifier identifying a set of storage locations in the means for storing data in storage locations; and means for forwarding the received data block identifier to the means for storing.</p>
<p>27. A storage hub comprising a memory section, including a memory device including storage locations that store data, and a memory section controller that provides an address to the memory device, the address identifying a storage location, wherein the memory device uses the provided address to write data into the memory device; and a switch, comprising a configurable switch fabric, that receives a data request including a data block identifier and transmits the data request to the memory section determined by applying the data block identifier to an algorithm that selectively configures operation of the switch fabric, and that receives write data associated with the data request and transmits the write data to the determined memory section; wherein the memory section forwards the received data block identifier to the memory section controller, which determines from the data block identifier the address of a storage location and provides the address to the memory device, and the memory device stores the write data at the address.</p></blockquote>
<p>My one thought that could have led to the patent making sense was that maybe the term &#8220;configurable switch fabric&#8221; was defined in some particularly limited way. But noooo. Indeed, the term is not defined in the patent&#8217;s body at all; rather, the patent says (somewhat ungrammatically):</p>
<blockquote><p>The switches 22 may be any type of switch using any type of switch fabric, such as, for example, a time division multiplexed fabric or a space division multiplexed fabric. As used herein, the term &#8220;switch fabric&#8221; the physical interconnection architecture that directs data from an incoming interface to an outgoing interface. For example, the switches 22 may be a Fibre Channel switch, an ATM switch, a switched fast Ethernet switch, a switched FDDI switch, or any other type of switch. The switches 22 may also include a controller (not shown) for controlling the switch.</p></blockquote>
<p>I would be shocked if this patent held up upon reexamination. (If it did, EMC would pretty much be out of business, or at least vulnerable to a considerable cashectomy.) This is a particularly strong example of my belief that <a href="../../../../../2010/03/23/software-innovation-patent/">performance-enhancement software patents are always bogus</a>. What&#8217;s more, it seems strange to worry about this patent&#8217;s effect on HDFS in any case, because if you&#8217;re that much of a patent wimp, you probably don&#8217;t want to run afoul of <a href="../../../../../2010/02/11/google-mapreduce-patent/">Google&#8217;s (also bogus) MapReduce patent</a> in the first place.</p>
<p>On the whole, I&#8217;m somewhat more sympathetic to the idea of <a href="../../../../../2011/05/14/hadoop-mapreduce-data-storage-management/">replacing HDFS underneath Hadoop</a> than my clients at Cloudera or IBM would wish me to be. But the Parallel Iron patent is not a serious reason in support of such a change.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/06/10/patent-nonsense-parallel-ironhdfs-edition/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Hardware for Hadoop</title>
		<link>http://www.dbms2.com/2011/06/04/hardware-for-hadoop/</link>
		<comments>http://www.dbms2.com/2011/06/04/hardware-for-hadoop/#comments</comments>
		<pubDate>Sat, 04 Jun 2011 22:47:12 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4610</guid>
		<description><![CDATA[After suggesting that there&#8217;s little point to Hadoop appliances, it occurred to me to look into what kinds of hardware actually are used with Hadoop. So far as I can tell: Hadoop nodes today tend to run on fairly standard boxes. Hadoop nodes in the past have tended to run on boxes that were light [...]]]></description>
			<content:encoded><![CDATA[<p>After suggesting that <a href="http://www.dbms2.com/2011/06/02/why-you-would-want-an-appliance-and-when-you-wouldnt/">there&#8217;s little point to Hadoop appliances</a>, it occurred to me to look into what kinds of hardware actually are used with Hadoop. So far as I can tell:</p>
<ul>
<li>Hadoop nodes today tend to run on fairly standard boxes.</li>
<li>Hadoop nodes in the past have tended to run on boxes that were light with respect to RAM.</li>
<li>The number of spindles per core on Hadoop node boxes is going up even as disks get bigger.</li>
</ul>
<p><span id="more-4610"></span>A key input comes from Cloudera, who to my joy delegated the questions to Omer Trajman, who wrote:</p>
<blockquote><p>Most Hadoop deployments today use systems with dual socket and quad or  hex cores (8 or 12 cores total, 16 or 24 hyper-threaded). Storage has  increased as well with 6-8 spindles being common and some deployments  going to 12 spindles. These are SATA disks with between 1TB and 2TB  capacity. The amount of RAM varies depending on the application. 24GB is  common as is 36GB – all ECC RAM. HBase clusters may have more RAM so  they can cache more data. Some customers put Hadoop on their “standard  box” which may not be perfectly balanced (e.g. more RAM, less disk) and  needs to be altered slightly to meet the above specs. The new Dell C2100  series and the HP SL170 series are both popular server lines for  Hadoop.</p>
<p>For a year ago perspective, see this post: <a href="http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations/" target="_blank">http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations/</a></p></blockquote>
<p>Bullet points from that year-ago link include:</p>
<ul>
<blockquote>
<li>4 1TB hard disks in a JBOD (Just a Bunch Of Disks) configuration</li>
<li>2 quad core CPUs, running at least 2-2.5GHz</li>
<li>16-24GBs of RAM (24-32GBs if you’re considering HBase)</li>
<li>Gigabit Ethernet</li>
</blockquote>
</ul>
<p>So basically we&#8217;re talking in the range of 2-3 GB of RAM per core &#8212; and 1 spindle per core, up from perhaps half a spindle per core a year ago.</p>
<p>Meanwhile, a 2009 <a href="https://opencirrus.org/system/files/OpenCirrusHadoop2009.ppt">Yahoo  slide deck</a> refers to &#8220;500 nodes, 4000 cores, 3TB RAM, 1.5PB disk&#8221;;  that divides out to 8 cores, 6 GB of RAM, and 3 TB of disk per node, all  on &#8220;commodity hardware.&#8221; By 2010 Yahoo was evidently up to <a href="http://twitter.com/#!/marin_dimitrov/status/12900368052">2 GB of RAM per core</a>.</p>
<p>There are lots of data points on the <a href="http://wiki.apache.org/hadoop/PoweredBy">Apache Hadoop wiki</a>, but many seem a few years old, and I don&#8217;t immediately see how to time-stamp them. Overall, they seem consistent with the trends I noted at the top of the post.</p>
<p>One thing I haven&#8217;t done is attempted to price any of these systems.</p>
<p>Contributions in the comment thread would be warmly appreciated.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/06/04/hardware-for-hadoop/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Notes from the Fusion-io S-1 filing</title>
		<link>http://www.dbms2.com/2011/05/24/notes-from-the-fusion-io-s-1-filing/</link>
		<comments>http://www.dbms2.com/2011/05/24/notes-from-the-fusion-io-s-1-filing/#comments</comments>
		<pubDate>Tue, 24 May 2011 08:53:23 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4549</guid>
		<description><![CDATA[Fusion-io has filed for an initial public offering. With public offerings go S-1 filings which, along with 10-Ks, are the kinds of SEC filing that typically contain a few nuggets of business information. Notes from Fusion-io&#8217;s S-1 include: Fusion-io is growing very, very fast, doubling or better in revenue every 6 months. Fusion-io&#8217;s marketing message [...]]]></description>
			<content:encoded><![CDATA[<p>Fusion-io has filed for an initial public offering. With public offerings go S-1 filings which, along with 10-Ks, are the kinds of SEC filing that typically contain a few nuggets of business information. Notes from <a href="http://sec.gov/Archives/edgar/data/1383729/000095012311023375/f58285sv1.htm">Fusion-io&#8217;s S-1</a> include:</p>
<p>Fusion-io is growing very, very fast, <strong>doubling or better in revenue every 6 months.</strong></p>
<p>Fusion-io&#8217;s marketing message revolves around &#8220;data centralization&#8221;. <strong>Fusion-io is competing against storage-area networks and storage arrays.</strong></p>
<p>Fusion-io&#8217;s list of application types includes</p>
<blockquote><p>&#8230; systems dedicated to decision     support, high performance financial analysis, web search,     content delivery and enterprise resource planning.</p></blockquote>
<p>Fusion-io says it has shipped <strong>over 20 petabytes of storage.<br />
</strong></p>
<p>Fusion-io has a shifting array of big customers, including OEMs:  <span id="more-4549"></span></p>
<blockquote><p>Historically, large purchases by a relatively limited number of     customers have accounted for a substantial majority of our     revenue, and the composition of the group of our largest     customers changes from period to period. Many of our customers     make concentrated purchases to complete or upgrade specific     large-scale data storage installations. These concentrated     purchases are short-term in nature and are typically made on a     purchase order basis rather than pursuant to long-term     contracts. During fiscal 2010 and the six months ended     December 31, 2010, sales to the 10 largest customers in     each period, including the applicable OEMs, accounted for     approximately 75% and 92% of revenue, respectively. Facebook,     Inc. is currently our largest customer and accounted for a     substantial portion of revenue during the six months ended     December 31, 2010. We expect revenue from sales to Facebook     and one other end-user to account for a substantial portion of     revenue for the three months ending March 31, 2011, but     that revenue from sales to Facebook and the other end-user will     decline significantly for the three months ending June 30,     2011 as they complete their planned deployments.</p></blockquote>
<p>But Fusion-io invests enough in sales and marketing, including direct sales, that I&#8217;m guessing they&#8217;re out there persuading end-users to ask for product from Dell, HP, and IBM.</p>
<p>Fusion-io&#8217;s inventory growth of $23.3 million for the second half of 2010 is close to revenue of $26.0 million. Accounts receivable is a much smaller figure. I&#8217;m not sure what all that signifies, but I do find it ironic that Fusion-io&#8217;s marketing statements draw an analogy to &#8220;just-in-time&#8221; manufacturing.</p>
<p>As for what I think about Fusion-io, it starts:</p>
<ul>
<li>Fusion-io&#8217;s ideas are smart.</li>
<li>My skepticism about <a href="http://www.dbms2.com/2011/05/23/databases-ram/">specialized storage hardware for database applications</a> applies in part but not in whole to Fusion-io.</li>
<li>Right now, Fusion-io has won the market. Even if you don&#8217;t need Fusion-io hardware to optimize your use of solid-state memory, you&#8217;re apt to go with/partner with Fusion-io anyway.</li>
</ul>
<p>I don&#8217;t have strong opinions as to how long the last point will remain true.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/05/24/notes-from-the-fusion-io-s-1-filing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

