<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Solid-state memory</title>
	<atom:link href="http://www.dbms2.com/category/storage/solid-state-memory-disk-flash/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Wed, 08 Feb 2012 17:17:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>MarkLogic 5, and why you might care</title>
		<link>http://www.dbms2.com/2011/11/01/marklogic-version-5/</link>
		<comments>http://www.dbms2.com/2011/11/01/marklogic-version-5/#comments</comments>
		<pubDate>Tue, 01 Nov 2011 04:03:59 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MarkLogic]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Structured documents]]></category>
		<category><![CDATA[Text]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5560</guid>
		<description><![CDATA[MarkLogic is releasing MarkLogic 5. Key elements of the announcement are: More-of-the-same in line with MarkLogic’s core positioning. A new bi-directional Hadoop connector. A free MarkLogic Express edition, limited in license terms more than in actual features, as per Slide 27 of the deck MarkLogic graciously supplied for me to post. Also, MarkLogic is early [...]]]></description>
			<content:encoded><![CDATA[<p>MarkLogic is releasing MarkLogic 5. Key elements of the announcement are:</p>
<ul>
<li>More-of-the-same      in line with MarkLogic’s core positioning.</li>
<li>A new      bi-directional Hadoop connector.</li>
<li>A free      MarkLogic Express edition, limited in license terms more than in actual      features, as per Slide 27 of <a href="http://www.monash.com/uploads/MarkLogic-5-Deck.pptx">the deck      MarkLogic graciously supplied for me to post</a>.</li>
</ul>
<p>Also, MarkLogic is early with a feature that most serious DBMS vendors will  soon have – support for tiered storage, with writes going first to  solid-state storage, then being flushed to disk via a caching-style  algorithm.* And as befits a sometime search-engine-substitute, MarkLogic has finally licensed a large set of document filters, from an Australian company called <a href="http://www.isys-search.com/index.html">Isys</a>. Apparently, the special virtue of the Isys filters is that they’re good at extracting not only text, but metadata as well.</p>
<p><em>*If there’s a caching algorithm that doesn’t contain a major element of LRU (Least Recently Used), I don’t recall ever hearing about it.</em></p>
<p>MarkLogic seems to have settled on a positioning that, although distressingly buzzword-heavy, is at least partly based upon reality. The real part includes:</p>
<ul>
<li>MarkLogic      is a serious, enterprise-class DBMS (see for example Slide 12 of <a href="http://www.monash.com/uploads/MarkLogic-5-Deck.pptx">the MarkLogic      deck</a>) …</li>
<li>…      which has been optimized from the getgo for <a href="../../../../../2011/05/17/poly-structured-database/">poly-structured      data</a>.</li>
<li>MarkLogic      can and does scale out to handle large amounts of data.</li>
<li>MarkLogic      is a general-purpose DBMS, suitable for <a href="../../../../../2011/03/30/short-request-and-analytic-processing/">both      short-request and analytic tasks</a>.</li>
<li>MarkLogic      is particularly well suited for analyses with long chains of “progressive      enhancement” (MarkLogic’s favorite term when talking about <a href="../../../../../2011/05/30/another-category-of-derived-data/">derived      data</a>).</li>
<li><a href="http://blogs.avalonconsult.com/blog/search/is-marklogic-a-search-engine/">MarkLogic      often plays the role of a content assembler and/or search engine</a>, and      the people who use MarkLogic in those ways are commonly doing things that can      be described as research and analysis.</li>
</ul>
<p>Based on that reality, MarkLogic talks a lot about Volume, Velocity, Variety, Big Data, unstructured data, semi-structured data, and big data analytics.</p>
<p><span id="more-5560"></span><em>My <a href="../../../../../2010/11/29/marklogic-and-its-document-dbms/">November, 2010 overview of MarkLogic technology</a> remains pretty relevant. One correction, however: Node heterogeneity configurations, in which “data” and “evaluation” nodes reside on separate servers, are the exception rather than the rule.</em></p>
<p>Like <a href="../../../../../2011/10/18/vertica-community-edition/">Vertica</a>, MarkLogic has laudably said that true academic researchers can get MarkLogic for free without the severe license restrictions. Free MarkLogic should be of particular interest to researchers who:</p>
<ul>
<li>Are      studying natural networks or graphs, such as social networks or biological      pathways. (This might be a fit in the social or biological sciences.)</li>
<li>Are      managing metadata for, say, a variety of disparate kinds of experimental      files. (This might be a fit anywhere in the natural sciences.)</li>
<li>Are      managing actual documents, images, videos, etc., or data about such      things. (This might be a fit in the humanities or social sciences.)</li>
</ul>
<p>MarkLogic provided some disclosable financial substance by email, which I shall quote verbatim:</p>
<ul>
<li><em>MarkLogic      has 45% revenue growth and 55-60% license growth year over year.</em></li>
<li><em>We      expect to finish this year with over $85 million in revenue, up from $55      million last year.</em></li>
</ul>
<p>Arithmetical purists might note that 85/55 is more than 145%, but I’m just going to settle for the information I got and move on.</p>
<p><em>Edit: I posted separately about the <a href="http://www.dbms2.com/2011/11/03/marklogic-hadoop-connector/">MarkLogic Hadoop connector.</a></em> <span style="text-decoration: line-through;">As for that Hadoop connector – stay tuned for a short follow-up post, as writing about it now would not be convenient. (My backup discipline isn’t what it should be, and the only copy of my notes about that product is on a heavy tower computer in a house that doesn’t have working power.)</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/01/marklogic-version-5/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>HP systems soundbites</title>
		<link>http://www.dbms2.com/2011/09/22/hp-systems-soundbites/</link>
		<comments>http://www.dbms2.com/2011/09/22/hp-systems-soundbites/#comments</comments>
		<pubDate>Thu, 22 Sep 2011 17:44:31 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Exadata]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5314</guid>
		<description><![CDATA[It is widely rumored that there will be a leadership change at HP (Meg Whitman in, Leo Apotheker out). In connection with that, I found myself holding forth on points such as: HP needs to make outstanding enterprise systems again. They fell away from that target under Mark Hurd, but they surely can hit it [...]]]></description>
			<content:encoded><![CDATA[<p>It is widely rumored that there will be a leadership change at HP (Meg Whitman in, Leo Apotheker out). In connection with that, I found myself holding forth on points such as:</p>
<ul>
<li>HP needs to make outstanding enterprise systems again.</li>
<li>They fell away from that target under Mark Hurd, but they surely can hit it again, based on the remnants of DEC (Digital Equipment Corporation), Tandem, the higher-end part of Compaq, and of course the original HP systems group.</li>
<li>In particular:
<ul>
<li>Rumors say that Oracle Exadata 1 boxes, made by HP, were much lower quality than Exadata 2 boxes made by Sun.</li>
<li>HP Neoview was a waste of good engineering talent.</li>
<li>I&#8217;d like to see a few excellent Vertica appliances.</li>
<li>I hope the SAP HANA appliances go well, whenever HANA finally becomes a serious product.</li>
<li>The general move from disk to solid-state memory should offer some opportunities.</li>
</ul>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/22/hp-systems-soundbites/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Are there any remaining reasons to put new OLTP applications on disk?</title>
		<link>http://www.dbms2.com/2011/09/19/oltp-disk-solid-state/</link>
		<comments>http://www.dbms2.com/2011/09/19/oltp-disk-solid-state/#comments</comments>
		<pubDate>Mon, 19 Sep 2011 18:07:07 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[dbShards and CodeFutures]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5257</guid>
		<description><![CDATA[Once again, I&#8217;m working with an OLTP SaaS vendor client on the architecture for their next-generation system. Parameters include: 100s of gigabytes of data at first, growing to &#62;1 terabyte over time. High peak loads. Public cloud portability (but they have private data centers they can use today). Simple database design &#8212; not a lot [...]]]></description>
			<content:encoded><![CDATA[<p>Once again, I&#8217;m working with an OLTP SaaS vendor client on the architecture for their next-generation system. Parameters include:</p>
<ul>
<li>100s of gigabytes of data at first, growing to &gt;1 terabyte over time.</li>
<li>High peak loads.</li>
<li>Public cloud portability (but they have <strong>private data centers they can use today).</strong></li>
<li>Simple database design &#8212; not a lot of tables, not a lot of columns, not a lot of joins, and everything can be distributed on the same customer_ID key.</li>
<li>Stream the data to a data warehouse, that will grow to a few terabytes. (Keeping only one year of OLTP data online actually makes sense in this application, but of course everything should go into the DW.)</li>
</ul>
<p>So I&#8217;m leaning to saying:   <span id="more-5257"></span></p>
<ul>
<li>They should go with a scalable, MySQL-based solution.
<ul>
<li>Lots of third-party software works with MySQL, in case that&#8217;s helpful.</li>
<li>Yes, any one vendor is small and not yet firmly established, but there are numerous vendors around with interesting MySQL scaling stories.</li>
<li>In a vendor emergency, just going with Oracle&#8217;s MySQL stuff would probably work &#8230;</li>
<li>&#8230; especially because there are these lovely things in the world called <strong>solid-state drives.</strong></li>
<li>There&#8217;s also good escapability if one wants to move away from MySQL, because everybody knows how to handle MySQL data.</li>
</ul>
</li>
<li>The first product to look at is dbShards, because it meets all the topology needs:
<ul>
<li>Local scale-out (<a href="http://www.dbms2.com/2011/02/24/transparent-sharding/">transparent sharding</a>).</li>
<li><a href="http://www.dbms2.com/2011/02/09/clarification-on-dbshards-shard-replication/">Local high availability</a>.</li>
<li>Remote disaster recovery (details of that are underway).</li>
</ul>
</li>
<li>The first analytic DBMS to look at is Infobright.
<ul>
<li>Yes, I know Infobright is focused more on machine-generated data these days, but this client&#8217;s analytic needs are so straightforward Infobright should pass with flying colors.</li>
<li>The MySQL-to-MySQL aspect should make ETL dead simple.</li>
<li>Again, there&#8217;s escapability.</li>
</ul>
</li>
</ul>
<p>Mainly, this is all fine. But I&#8217;m getting pushback on the solid-state aspect, for fear that it will compromise public cloud portability.</p>
<p>Am I missing something here? As far as I&#8217;m concerned, <strong>if you&#8217;re planning an OLTP system with a many-year lifespan today, </strong>of course <strong>you should assume solid-state storage.</strong> Maybe you scale out just as far as you would with disk, striping indexes or entire databases across the RAM of multiple servers. It that case, having solid-state backing reduces the risk of bottlenecks. Maybe you don&#8217;t scale out as far as you would with disk. In that case, solid-state backing saves you money.</p>
<p><strong>As for public-cloud support for solid-state storage, that&#8217;s coming fast, right? </strong>(Actually, I have data points in support of that theory, but they&#8217;re a bit tenuous.) A large fraction of web businesses with private data centers seem to be using solid-state storage &#8212; from Facebook on down &#8212; or so the NoSQL/NewSQL/<a href="http://www.dbms2.com/2011/03/02/short-request-processing/">short-request</a> DBMS guys tell me. Surely a number of public cloud vendors are close behind.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/19/oltp-disk-solid-state/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Kaminario goes (mainly) flash</title>
		<link>http://www.dbms2.com/2011/09/14/kaminario-goes-mainly-flash/</link>
		<comments>http://www.dbms2.com/2011/09/14/kaminario-goes-mainly-flash/#comments</comments>
		<pubDate>Wed, 14 Sep 2011 09:30:53 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Kaminario]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Solid-state memory]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5227</guid>
		<description><![CDATA[Kaminario, which used to be in the business of solid state storage via DRAM, now is emphasizing hybrid DRAM/flash storage appliances instead. The reason is evidently price. Per terabyte of primary storage (before mirroring onto disk and so on): A Kaminario K2 DRAM-only appliance costs $100K. A Kaminario K2 flash-only appliance costs $30K (but nobody [...]]]></description>
			<content:encoded><![CDATA[<p>Kaminario, which used to be in the business of solid state storage via DRAM, now is emphasizing hybrid DRAM/flash storage appliances instead. The reason is evidently price. <strong>Per terabyte of primary storage</strong> (before mirroring onto disk and so on):</p>
<ul>
<li>A Kaminario K2 DRAM-only appliance costs <strong>$100K.</strong></li>
<li>A Kaminario K2 flash-only appliance costs $30K (but nobody buys that configuration).</li>
<li>A typical Kaminario K2 hybrid DRAM/flash appliance might cost <strong>$35K</strong> (which tells us that there&#8217;s a lot more flash than DRAM).</li>
</ul>
<p>Kaminario positions DRAM as where you focus your most write-intensive/ bottlenecking loads, such as logging or <a href="../../../../../2010/08/16/vertica-flash-temp-space/">temp space</a>, with the primary benefit being performance and a secondary benefit being slowing the wear on your flash.</p>
<p><span id="more-5227"></span><em>If you want even your mirrors to be on flash &#8212; which Kaminario says greatly reduces the temporary performance hit in case of a failure &#8212; there will be an additional charge. Perhaps Kaminario will dig up a price number and post it in the comment thread.</em></p>
<p>The flash comes in via Fusion-io cards. Kaminario stresses that it sells a SAN (Storage Area Network) kind of offering, as opposed to the shared-nothing way one might otherwise use Fusion-io cards in servers&#8217; PCIe slots. Kamanario further asserts its built-in high availability is both smoother and less costly than Texas Memory Systems or Violin Memory alternatives; Kaminario is generally proud of its high availability features, down to redundant uninterruptible power supplies. Apparently the sweet spot of Kaminario&#8217;s market is single-chassis 5-6 TB systems, but Kaminario asserts seamless elasticity even if you grow into a second chassis.</p>
<p>Price resistance seems to have gotten strongly in the way of Kaminario&#8217;s growth, although the company was evasive about customer counts and the like. But it does now have 60+ employees and an aggressive hiring plan, vs. &lt;50 when <a href="../../../../../2010/10/19/introduction-to-kaminario/">I wrote about Kaminario a year ago</a>. I do believe that many enterprises would benefit from<strong> throwing solid-state storage at certain performance problems,</strong> at least as a band-aid, while they contemplate software changes.* But evidently Kaminario has had difficulties &#8212; especially at the DRAM-only price point &#8212; getting customers to agree, or at least to agree that Kaminario K2 was a sufficiently cost-effective way to address the issue.</p>
<p><em>*If you like, you can regard this as <strong>deferring repayment of your technical debt.</strong></em></p>
<p>Kaminario&#8217;s comments about how its technology is or will be applied are all over the place (again, I think part of this is due to having a small number of customers overall, and wanting to conceal how small that number is). But in general Kaminario has seen more OLTP (OnLine Transaction Processing) than analytic uptake, which contributes to them thinking that low latency is a bigger deal than raw IOPS (Input/Output Per Second). Certainly Kaminario is focused on database applications of some kind or other, generally running on big-name DBMS such as Oracle or Microsoft SQL Server</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/14/kaminario-goes-mainly-flash/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Couchbase technical update</title>
		<link>http://www.dbms2.com/2011/08/13/couchbase-technical-update/</link>
		<comments>http://www.dbms2.com/2011/08/13/couchbase-technical-update/#comments</comments>
		<pubDate>Sun, 14 Aug 2011 04:08:03 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cache]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[memcached]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5081</guid>
		<description><![CDATA[My Couchbase business update with Bob Wiederhold was very interesting, but it didn&#8217;t answer much about the actual Couchbase product. For that, I talked with Dustin Sallings. We jumped around a lot, and some important parts of the Couchbase product haven&#8217;t had their designs locked down yet anyway. But here&#8217;s at least a partial explanation [...]]]></description>
			<content:encoded><![CDATA[<p>My <a href="http://www.dbms2.com/2011/08/13/couchbase-business-update/">Couchbase business update</a> with Bob Wiederhold was very interesting, but it didn&#8217;t answer much about the actual Couchbase product. For that, I talked with Dustin Sallings. We jumped around a lot, and some important parts of the Couchbase product haven&#8217;t had their designs locked down yet anyway. But here&#8217;s at least a partial explanation of what&#8217;s up.</p>
<p>memcached is a way to cache data in RAM across a cluster of servers and have it all look logically like a single memory pool, extremely popular among large internet companies. The Membase product &#8212; which is what Couchbase has been selling this year &#8212; adds persistence to memcached, an obvious improvement on requiring application developers to write both to memcached and to <a href="http://www.dbms2.com/2011/02/24/transparent-sharding/">non-transparently-sharded MySQL</a>. The main technical points in adding persistence seem to have been:</p>
<ul>
<li>A <strong>persistent backing store</strong> (duh), namely SQLite.</li>
<li>A <strong>change to the hashing algorithm,</strong> to avoid losing data when the cluster configuration is changed.</li>
</ul>
<p>Couchbase is essentially Membase improved by integrating CouchDB into it, with the main changes being:</p>
<ul>
<li><strong>Changing the backing store to CouchDB</strong> (duh). This will be in the first Couchbase release.</li>
<li><strong>Adding cross data center replication on CouchDB&#8217;s consistency model.</strong> This will not, I believe, be in the first Couchbase release.</li>
<li><strong>Offering CouchDB&#8217;s programming and query interfaces as an option.</strong> So far as I can tell, this will be implemented straightforwardly in the first Couchbase release, with elegance planned for later down the road.</li>
</ul>
<p>Let&#8217;s drill down a bit into <strong>Membase/Couchbase clustering and consistency. </strong><span id="more-5081"></span></p>
<ul>
<li>When data is written to RAM in memcached, it immediately gets copied to another server. The same is of course true in Membase/Couchbase. The terminology on all this is confusing, but I think:
<ul>
<li>The portion of data that is stored as a primary copy on any given server is called a &#8220;shard&#8221;.</li>
<li>That would seem to make sense, as that data could correspond to what goes &#8212; <a href="http://www.dbms2.com/2011/02/24/transparent-sharding/">non-transparently</a> &#8212; into an instance of MySQL in a classical memcached/MySQL set-up.</li>
</ul>
</li>
<li>Updates are of course also banged to disk ASAP &#8212; but at times of heavy load, that can take a while. A few seconds to a couple of minutes is normal operation; if it takes an hour, you really should buy more hardware. (Or solid-state storage.)</li>
<li>Similarly, the replication of data to a second machine&#8217;s RAM may not happen at times of heavy load &#8212; and that&#8217;s another sign you don&#8217;t have enough machines.</li>
<li>Each Membase/Couchbase &#8220;shard&#8221; has lots of logical sub-shards.* (1024 for now, at least as default, although Dustin finds that number excessive and is looking to lower it.)  So if you add a node, some of the sub-shards get sent over to the new node. Unlike the case for straight memcached, no data is lost from cache (and of course not also from the persistent store). Blocking of operations from such a move only happens in narrow time windows, and then only in edge cases.</li>
</ul>
<p><em>*Edit: They&#8217;re called <a href="http://dustin.github.com/2010/06/29/memcached-vbuckets.html">vbuckets</a>.</em></p>
<p><em></em>So if we consider Membase technology alone, Couchbase is CA in the CAP Theorem.  CouchDB, however, is gloriously AP in the CAP Theorem, in that it was written to assume an occasionally connected topology.* Based on that, Couchbase will allow AP operation between data centers (i.e. &#8220;stay synchronized if you can, to within the limitations of physics and so on, but don&#8217;t beat yourself up on the rare occasions that you can&#8217;t.&#8221;) I don&#8217;t know that that capability will quite be in the first release of Couchbase, but it&#8217;s coming soon.</p>
<p><em>*CouchDB also has other features friendly to occasionally-connected use cases, such as a lot of flexibility as to which parts of the database are or aren&#8217;t synced when you do reconnect. These are at the heart of the Couchbase Mobile offering.</em></p>
<p>memcached and Membase have a very simple key-value interface. CouchDB adds secondary indexes and so on. I think in the first release of Couchbase this is pretty much like having two different APIs for the same product; more elegant integration is planned down the road, and more language support as well.</p>
<p>The highest-performing way to use Couchbase will probably always be to just pretend it is Membase, which is to say memcached+. Dustin told me of Membase users who demanded 10-40 millisecond response times, and that not even for single queries but rather for sequences of several queries in succession. He further told me of customers asking for 1-200 microsecond response, and insisting on no worse than 1 millisecond. Frankly, the first requirement could be met by lots of technologies I can think of, at least if  you don&#8217;t rely on disk; the second is thoroughly impossible if you rely on disk, and pretty demanding no matter what kind of hardware and storage you have.</p>
<p>Couchbase performance against disk is a work in progress. CouchDB started out 8X slower than SQLite as a backing store, apples to apples, but Couchbase is fixing that before they roll the product out. (After all, they wouldn&#8217;t want to slow the product down in the course of an upgrade.) Beyond that, when you do exploit the indexing capability of CouchDB, performance of course slows down. Work is underway to lower the performance hit; I imagine much improvement can indeed be made, given how few resources CouchDB has been able to devote to date to <a href="http://www.dbms2.com/2009/08/21/bottleneck-whack-a-mole/">Bottleneck Whack-A-Mole</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/08/13/couchbase-technical-update/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>MongoDB users and use cases</title>
		<link>http://www.dbms2.com/2011/07/27/mongodb-users-and-use-cases/</link>
		<comments>http://www.dbms2.com/2011/07/27/mongodb-users-and-use-cases/#comments</comments>
		<pubDate>Wed, 27 Jul 2011 18:14:36 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Games and virtual worlds]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MongoDB and 10gen]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Splunk]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5031</guid>
		<description><![CDATA[I spoke with Eliot Horowitz and Max Schierson of 10gen last month about MongoDB users and use cases. The biggest clusters they came up with weren&#8217;t much over 100 nodes, but clusters an order of magnitude bigger were under development. The 100 node one we talked the most about had 33 replica sets, each with [...]]]></description>
			<content:encoded><![CDATA[<p>I spoke with Eliot Horowitz and Max Schierson of 10gen last month about MongoDB users and use cases. The biggest clusters they came up with weren&#8217;t much over 100 nodes, but clusters an order of magnitude bigger were under development. The 100 node one we talked the most about had 33 replica sets, each with about 100 gigabytes of data, so that&#8217;s in the 3-4 terabyte range total. In general, the largest MongoDB databases are 20-30 TB; I&#8217;d guess those really do use the bulk of available disk space.   <span id="more-5031"></span></p>
<p>10gen recommends solid-state storage in many cases. In some cases solid-state lets you get away with fewer total nodes. 10gen also likes Flashcache (Facebook-developed technology to put a flash cache in front of hard disks). But the 100-node example mentioned above uses spinning disk.</p>
<p>Use cases 10gen is proud of include:</p>
<ul>
<li>Lots of user profile maintenance, including at online ad companies. This includes full user ad impression data. (I&#8217;ve argued for a while that <a href="../../../../../2010/09/17/jp-morgan-chase-oracle-database-outage/">user profile information belongs in something like a NoSQL database</a>.)</li>
<li>A big-name web company that wants to inspect every packet that enters their network, and replaced Splunk with MongoDB for performance reasons.</li>
<li>A big-name photo/video site whose metadata is all in MongoDB. (That&#8217;s the kind of thing that often makes for good <a href="../../../../../2011/05/30/another-category-of-derived-data/">MarkLogic</a> use cases.)</li>
</ul>
<p>But actually, the reason we had the call was to review cases where MongoDB&#8217;s <strong>schemaless</strong> nature was significant. Examples of those included:</p>
<ul>
<li>A couple of top examples were of the kind &#8220;A bunch of apps, similar but not the same.&#8221; For MTV, it&#8217;s a single content management system for a bunch of websites. For Disney Playdom, it&#8217;s different schemas for every game.</li>
<li>For a wireless telco, the issue was a product catalog in which devices and service plans called for very different schemas, and which the telco felt had thus become unmanageable in Oracle.</li>
<li>For Craigslist, the issue wasn&#8217;t programming so much as performance &#8212; <a href="http://blog.zawodny.com/2010/04/27/i-want-a-new-data-store/">ALTER TABLE operations took months in MySQL</a>, and that&#8217;s not a typo, although I&#8217;ll confess to not understanding why this was the case.</li>
</ul>
<p>The 10gen guys went on to claim that schemalessness is helpful for incremental development in general, the point being that you don&#8217;t have a database-modification step. To some extent, changes can even be rolled back more easily than if you actually changed your schemas.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/27/mongodb-users-and-use-cases/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Notes from the Fusion-io S-1 filing</title>
		<link>http://www.dbms2.com/2011/05/24/notes-from-the-fusion-io-s-1-filing/</link>
		<comments>http://www.dbms2.com/2011/05/24/notes-from-the-fusion-io-s-1-filing/#comments</comments>
		<pubDate>Tue, 24 May 2011 08:53:23 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4549</guid>
		<description><![CDATA[Fusion-io has filed for an initial public offering. With public offerings go S-1 filings which, along with 10-Ks, are the kinds of SEC filing that typically contain a few nuggets of business information. Notes from Fusion-io&#8217;s S-1 include: Fusion-io is growing very, very fast, doubling or better in revenue every 6 months. Fusion-io&#8217;s marketing message [...]]]></description>
			<content:encoded><![CDATA[<p>Fusion-io has filed for an initial public offering. With public offerings go S-1 filings which, along with 10-Ks, are the kinds of SEC filing that typically contain a few nuggets of business information. Notes from <a href="http://sec.gov/Archives/edgar/data/1383729/000095012311023375/f58285sv1.htm">Fusion-io&#8217;s S-1</a> include:</p>
<p>Fusion-io is growing very, very fast, <strong>doubling or better in revenue every 6 months.</strong></p>
<p>Fusion-io&#8217;s marketing message revolves around &#8220;data centralization&#8221;. <strong>Fusion-io is competing against storage-area networks and storage arrays.</strong></p>
<p>Fusion-io&#8217;s list of application types includes</p>
<blockquote><p>&#8230; systems dedicated to decision     support, high performance financial analysis, web search,     content delivery and enterprise resource planning.</p></blockquote>
<p>Fusion-io says it has shipped <strong>over 20 petabytes of storage.<br />
</strong></p>
<p>Fusion-io has a shifting array of big customers, including OEMs:  <span id="more-4549"></span></p>
<blockquote><p>Historically, large purchases by a relatively limited number of     customers have accounted for a substantial majority of our     revenue, and the composition of the group of our largest     customers changes from period to period. Many of our customers     make concentrated purchases to complete or upgrade specific     large-scale data storage installations. These concentrated     purchases are short-term in nature and are typically made on a     purchase order basis rather than pursuant to long-term     contracts. During fiscal 2010 and the six months ended     December 31, 2010, sales to the 10 largest customers in     each period, including the applicable OEMs, accounted for     approximately 75% and 92% of revenue, respectively. Facebook,     Inc. is currently our largest customer and accounted for a     substantial portion of revenue during the six months ended     December 31, 2010. We expect revenue from sales to Facebook     and one other end-user to account for a substantial portion of     revenue for the three months ending March 31, 2011, but     that revenue from sales to Facebook and the other end-user will     decline significantly for the three months ending June 30,     2011 as they complete their planned deployments.</p></blockquote>
<p>But Fusion-io invests enough in sales and marketing, including direct sales, that I&#8217;m guessing they&#8217;re out there persuading end-users to ask for product from Dell, HP, and IBM.</p>
<p>Fusion-io&#8217;s inventory growth of $23.3 million for the second half of 2010 is close to revenue of $26.0 million. Accounts receivable is a much smaller figure. I&#8217;m not sure what all that signifies, but I do find it ironic that Fusion-io&#8217;s marketing statements draw an analogy to &#8220;just-in-time&#8221; manufacturing.</p>
<p>As for what I think about Fusion-io, it starts:</p>
<ul>
<li>Fusion-io&#8217;s ideas are smart.</li>
<li>My skepticism about <a href="http://www.dbms2.com/2011/05/23/databases-ram/">specialized storage hardware for database applications</a> applies in part but not in whole to Fusion-io.</li>
<li>Right now, Fusion-io has won the market. Even if you don&#8217;t need Fusion-io hardware to optimize your use of solid-state memory, you&#8217;re apt to go with/partner with Fusion-io anyway.</li>
</ul>
<p>I don&#8217;t have strong opinions as to how long the last point will remain true.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/05/24/notes-from-the-fusion-io-s-1-filing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Oracle and Exadata: Business and technical notes</title>
		<link>http://www.dbms2.com/2011/05/03/oracle-exadata-business-technology/</link>
		<comments>http://www.dbms2.com/2011/05/03/oracle-exadata-business-technology/#comments</comments>
		<pubDate>Tue, 03 May 2011 08:19:20 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Cache]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Emulation, transparency, portability]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Solid-state memory]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4361</guid>
		<description><![CDATA[Last Friday I stopped by Oracle for my first conversation since January, 2010, in this case for a chat with Andy Mendelsohn, Mark Townsend, Tim Shetler, and George Lumpkin, covering Exadata and the Oracle DBMS. Key points included:  Given Oracle’s market penetration and share, it makes sense that Oracle is focused on selling add-on products [...]]]></description>
			<content:encoded><![CDATA[<p>Last Friday I stopped by Oracle for my first conversation since January, 2010, in this case for a chat with Andy Mendelsohn, Mark Townsend, Tim Shetler, and George Lumpkin, covering Exadata and the Oracle DBMS. Key points included:  <span id="more-4361"></span></p>
<ul>
<li>Given Oracle’s market      penetration and share, it makes sense that<strong> Oracle is focused on selling      add-on products to its installed base.</strong> Oracle’s three top such      go-to-market emphases at the moment are:
<ul>
<li><strong>Database       consolidation,</strong> <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/">especially on Exadata</a>.</li>
<li><strong>Data warehousing,</strong> presumably on       Exadata.</li>
<li><strong>Database security,       especially encryption.</strong> This is not Exadata-specific, but does       exploit Intel Westmere on-chip encryption, which Oracle says allows       encryption with minimal overhead. This seems to be via something called <strong>Oracle Advanced Security.</strong></li>
</ul>
</li>
<li>Deleted*</li>
</ul>
<p><em>*Oracle asked me to delete a point on pricing they went out of their way to make, because they are in quiet period &#8212; even though nobody said it was confidential at the time, we weren&#8217;t under NDA, and it looks like public information to me anyway. Frankly, I&#8217;m not sure I was right to comply.<br />
</em></p>
<p>Oracle also told me quite a bit about Exadata onsite POCs (Proofs of Concept) and Exadata references, but I’ll save those subjects for future posts. The same goes for workload management.</p>
<p>Oracle&#8217;s version names and numbers can get confusing, but it turns out that:</p>
<ul>
<li>Oracle <span style="text-decoration: line-through;">11.203</span> 11.2.0.3 will come      out this fall. Oracle <span style="text-decoration: line-through;">11.204</span> 11.2.0.4 will come out a little more than a year      later. After that I imagine it will be time for Oracle 12.</li>
<li>The current versions of      Oracle Exadata are Exadata X2-2 and Exadata X2-8.
<ul>
<li>Oracle Exadata 2-2 is       evolutionary from prior Exadata versions, and has 8 moderately big       servers per rack. It can be sliced into half- or quarter-racks.</li>
<li>Oracle Exadata 2-8, in       lieu of those 8 servers, has 2 bigger SMP (Symmetric MultiProcessing)       systems, each with a terabyte of RAM. You can’t slice Exadata 2-8 below       full-rack size, as you’d lose redundancy among the servers.</li>
</ul>
</li>
</ul>
<p>I didn’t really understand the discussion as to why certain workloads and/or workload consolidations go better on the SMP boxes of Exadata X2-8 than the blades of Exadata X2-2, but Oracle assures me that some do. I also suspect that some Oracle customers prefer large SMP boxes for no good reason other than familiarity.</p>
<p>As for recent-release adoption:</p>
<ul>
<li>Oracle estimates that<strong> 40-50% of customers have Oracle 11g running </strong>somewhere in their shops,      mainly Oracle 11g Release 2.</li>
<li>All major ISVs      (Independent Software Vendors) are certified on Oracle 11g, typically      Oracle 11g Release 2.</li>
<li>But Exadata      certification is something different from Oracle 11g certification; for      example, <strong>SAP certification on Exadata is still underway, </strong>targeted      for some time this year.</li>
</ul>
<p>Exadata obviously enjoys huge performance gains over existing Oracle installations for certain analytic queries, and therefore for some whole analytic workloads. Oracle has happily trumpeted these. But it turns out that Exadata’s OLTP (OnLine Transaction Processing) performance gains are less dramatic. This makes all kinds of sense, given that Oracle’s analytic query performance was in pretty bad shape pre-Exadata, while OLTP has been just fine. The range Oracle used was <strong>2-3X OLTP performance gains vs. existing Oracle installations on several-year-old hardware.</strong> Oracle says somewhere <strong>over 50% of Exadata physical I/O* goes against flash cache </strong>in uses cases such as running Oracle’s application suite.</p>
<p><em>*Note that physical I/O may be only a small fraction of logical; e.g., SAP long ago said that <a href="../../../../../2009/07/07/hasso-plattner-calls-for-in-memory-oltp-column-stores/">&gt;99% of SAP transactions never hit disk</a>.</em></p>
<p>Finally, we talked about a variety of options or other related products. Highlights included:</p>
<ul>
<li>One piece of the Oracle      security story is a new product called<strong> Oracle Database Firewall,</strong> released in January, based on an acquisition of a small startup last year.      Targeted primarily at internal hackers, Oracle Database Firewall sniffs      your SQL traffic for a week or so, observes what kinds of SQL statements      can be expected, builds a white list accordingly, and casts a jaundiced      eye on any other kind of SQL statements that come through.</li>
<li><em>Edit: I have no idea why I was told the following, in view of <a href="http://www.dbms2.com/2011/05/03/oracle-on-active-active-replication/">a subsequent email</a>.</em> <span style="text-decoration: line-through;"><strong>Oracle Active Data Guard, </strong>first introduced in the      Oracle 11g code line, is the preferred way to do active-active Oracle      replication. That said: </span>
<ul>
<li><span style="text-decoration: line-through;">Not a lot of customers       use Oracle Active Data Guard yet &#8230;</span></li>
<li><span style="text-decoration: line-through;">&#8230; but a considerable       fraction of Exadata users are at least interested in it.</span></li>
<li><span style="text-decoration: line-through;">Some number of Oracle       customers have other kinds of active-active implementation. One option is       via GoldenGate.</span></li>
</ul>
</li>
<li><strong>Oracle Cloud File Management System</strong> is an Oracle 11g      feature/option that lets you managed non-Oracle data. It is related to ASM      (Automatic Storage Management), which seems to have been the most popular      Oracle 10g feature, and which is essential to Exadata. Oracle Cloud File      Management Systems seems to be popular for consolidation uses. But it is      not technically well suited to, for example, play the role of HDFS in a      MapReduce implementation.</li>
<li>For DBAs who care,      Exadata now supports Solaris on the database server tier as well as Linux.      (That would be Solaris on Intel, of course; Exadata doesn&#8217;t use Sparc.)      The storage tier still runs only on a kind of embedded Linux.</li>
<li><strong>Oracle 11g Express Edition</strong> (free crippleware)      just went into beta test.</li>
<li>And finally, <strong>Oracle SQL Developer 3.0</strong> features,      among other things, a GUI for Oracle Data Mining, and migration tools.      Sybase migration is in there now, and was enhanced for SQL Developer 3.0.      Teradata migration is slated for the next release.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/05/03/oracle-exadata-business-technology/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Notes on short-request scale-out MySQL</title>
		<link>http://www.dbms2.com/2011/04/19/notes-on-short-request-scale-out-mysql/</link>
		<comments>http://www.dbms2.com/2011/04/19/notes-on-short-request-scale-out-mysql/#comments</comments>
		<pubDate>Tue, 19 Apr 2011 09:52:28 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Akiban]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Kaminario]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ScaleBase]]></category>
		<category><![CDATA[ScaleDB]]></category>
		<category><![CDATA[Schooner Information Technology]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Tokutek]]></category>
		<category><![CDATA[Web analytics]]></category>
		<category><![CDATA[dbShards and CodeFutures]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4329</guid>
		<description><![CDATA[A press person recently asked about: &#8230; start-ups that are building technologies to enable MySQL and other SQL databases to get over some of the problems they have in scaling past a certain size. &#8230; I’d like to get a sense as to whether or not the problems are as severe and wide spread as [...]]]></description>
			<content:encoded><![CDATA[<p>A press person recently asked about:</p>
<blockquote><p>&#8230; start-ups that are building technologies to enable MySQL and other SQL databases to get over some of the problems they have in scaling past a certain size. &#8230; I’d like to get a sense as to whether or not the problems are as severe and wide spread as these companies are telling me? If so, why wouldn’t a customer just move to a new database?</p></blockquote>
<p>While that sounds as if he was asking about scale-out relational DBMS in general, MySQL or otherwise, <a href="http://www.dbms2.com/2011/03/30/short-request-and-analytic-processing/">short-request or analytic</a>, it turned out that he was asking just about <strong>short-request scale-out MySQL.</strong> My thoughts and comments on that narrower subject include(d) but are not limited to:  <span id="more-4329"></span></p>
<ul>
<li>The biggest web companies had to go to non-<a href="http://www.dbms2.com/2011/02/24/transparent-sharding/">transparently sharded</a> MySQL years ago. The NoSQL movement is, in no small part, <a href="http://www.dbms2.com/2010/03/02/cassandra-nosql-scalable-oltp/">an attempt to improve upon that</a>. Ditto for scale-out short-request MySQL.</li>
<li>Some overlapping categories of companies or projects who need scale-out short-request database processing are:
<ul>
<li>The aforementioned big companies who have other applications they haven&#8217;t hand-sharded yet.</li>
<li>Other web companies whose applications are getting that big.</li>
<li>Conventional enterprises whose web efforts happen to be very big.</li>
<li>Sensor networks and other massive sources of <a href="http://www.dbms2.com/2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a>.</li>
<li>Certain specialized areas (e.g., financial trading).</li>
</ul>
</li>
</ul>
<ul>
<li>Relatively few of these applications are totally impossible to do in Oracle. But the Oracle approach might be very expensive.</li>
<li>In particular, there&#8217;s a break point when companies &#8212; often SaaS vendors &#8212; <a href="http://www.dbms2.com/2011/04/01/the-client-that-was-confused-about-security/">outgrow Oracle Standard Edition</a>.</li>
<li>Yes, the alternatives usually are one of MySQL or Oracle.</li>
<li>InnoDB isn&#8217;t an alternative to these newer technologies; it&#8217;s just a piece of the puzzle and indeed of default MySQL now. Several of them &#8212; e.g. dbShards &#8212; are meant to be used in conjunction with InnoDB.</li>
<li>Merging his list and mine, the high-performance/scale-out MySQL alternatives look like <a href="http://www.dbms2.com/2011/01/25/dbshards-update/">dbShards</a>, <a href="http://www.dbms2.com/2011/01/28/schooner-software-onl/">Schooner</a>, <a href="http://www.dbms2.com/2011/01/25/scalebase-another-mpp-oltp-quasi-dbms/">ScaleBase</a>, <a href="http://www.dbms2.com/2008/04/13/scaledb-presents-the-revenge-of-the-pointer/">ScaleDB</a>, <a href="http://www.dbms2.com/2009/04/16/introduction-to-tokutek/">Tokutek</a>, <a href="http://www.dbms2.com/2010/04/03/akiban-highlights/">Akiban</a>, Xeround, and <a href="http://www.dbms2.com/2010/05/12/the-clustrix-story/">Clustrix</a>. The first two are to my knowledge more proven than the rest.</li>
<li>Proprietary hardware and the associated hardware/appliance pricing aren&#8217;t very appealing for these applications. That speaks against Oracle Exadata and Clustrix, and is the reason Schooner switched to a software-only strategy despite some initial appliance sales.</li>
<li>However, hardware band-aids such as solid-state drives or even <a href="http://www.dbms2.com/2010/10/19/introduction-to-kaminario/">RAM-based solid-state storage</a> could make more sense:
<ul>
<li>If, for performance, you&#8217;ve scaling out your database so that it fits in RAM on each box, you don&#8217;t really have a disk-based architecture anyway, now do you?</li>
<li>Even if you&#8217;re not doing that yet &#8212; if your problem is throughput rather than storage capacity, silicon-based storage could be a big help.</li>
<li>In principle, devices of that kind can be moved from one application to another, after the first one is rearchitected not to need them. (In practice, however, I don&#8217;t know of anybody who is doing that. I also don&#8217;t believe that Kaminario et al. are marketing that kind of idea, more&#8217;s the pity.)</li>
</ul>
</li>
<li>My notes on all this from <a href="http://www.dbms2.com/2010/04/05/oltp-database-management-systems-2/">April, 2010</a> are already badly outdated, but may be interesting anyway.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/04/19/notes-on-short-request-scale-out-mysql/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Teradata integrates in solid-state storage</title>
		<link>http://www.dbms2.com/2011/04/10/teradata-integrates-in-solid-state-storage/</link>
		<comments>http://www.dbms2.com/2011/04/10/teradata-integrates-in-solid-state-storage/#comments</comments>
		<pubDate>Mon, 11 Apr 2011 03:56:30 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4253</guid>
		<description><![CDATA[For once, I think Teradata&#8217;s annual hardware refresh is pretty interesting, because of the integration of flash storage into its high-end &#8220;active enterprise data warehouse&#8221; product line. The essence of the announcement is: Teradata is rolling out a new appliance,* the 6680, which combines hard-disk and solid-state drives, relying on Teradata Virtual Storage. Teradata is [...]]]></description>
			<content:encoded><![CDATA[<p>For once, I think Teradata&#8217;s annual hardware refresh is pretty interesting, because of the integration of flash storage into its high-end &#8220;active enterprise data warehouse&#8221; product line. The essence of the announcement is:</p>
<ul>
<li>Teradata is rolling out a new appliance,* the 6680, which combines hard-disk and solid-state drives, relying on <a href="../../../../../2008/10/14/teradata-virtual-storage/">Teradata Virtual Storage</a>.</li>
<li>Teradata is also rolling out a hard-disk-based appliance,* the 6650, in a more routine annual refresh.</li>
</ul>
<p><span id="more-4253"></span>Teradata graciously permitted me to post a <a href="http://www.monash.com/uploads/Teradata-Active-EDW-6660-6680.pdf">6650/6680 announcement slide deck</a>. (Contrary to what it says, it&#8217;s not actually &#8220;NDA Confidential.&#8221;)</p>
<p><em>* Teradata doesn&#8217;t use the term &#8220;appliance&#8221; for its high-end &#8220;active EDW&#8221; products, but never mind that; to me, they&#8217;re appliances.</em></p>
<p>The Teradata 6680 has a fixed 3:1 ratio of hard-disk and solid-state drives (SSDs). SSDs are always 300 gigabytes; hard drive capacity can be 300, 450, or 600 GB. Thus, the SSD part is somewhere between 1/4 and 1/7 of total data capacity.</p>
<p>The Teradata 6650 will let you include solid-state drives in the mix in a future release, late this year. But for the intervening months, it&#8217;s a hard-disk-only product.</p>
<p>Teradata&#8217;s adoption of solid-state storage is somewhat different from other vendors&#8217; in at least two ways:</p>
<ul>
<li>Teradata&#8217;s disk access has always been much more &#8220;random&#8221; than some newer vendors&#8217;. Thus, Teradata potentially can enjoy a much greater speed-up from solid-state storage than they can, as per Slide 17. (However, Teradata doesn&#8217;t seem to be claiming that level of speed-up in practice.)</li>
<li>Teradata doesn&#8217;t buy heavily into the idea that solid-state storage is an ideal place to put temp space. In fact, only 20% of Teradata&#8217;s SSD capacity is allocated for temp space or write-ahead logs.</li>
</ul>
<p>Pricing of the new Teradata systems is a bit vague. According to Teradata,</p>
<blockquote><p>The Teradata Active Enterprise Data Warehouse 6650 is offered at a price reduction from the current Teradata Enterprise Data Warehouse 5650.</p></blockquote>
<p>But the details are no clearer than:</p>
<blockquote><p>The Teradata Active Enterprise Data Warehouse (EADW) 6680 starting price per Terabyte of data is basically the same as the Teradata Active Enterprise Data Warehouse 6650 for the same performance level, and it can go up to 4x the performance levels of the 6650. The Teradata AEDW 6680 is designed to be more cost effective for high performance data warehouses.  In these scenarios, the price per unit of performance is lower than the Teradata AEDW 6650, and is lower than last year&#8217;s Teradata EDW 5650.</p></blockquote>
<p>Confusion is heightened by Teradata&#8217;s balancing-taken-to-an-extreme choice to cripple some of the CPU capacity on 6650s with hard disks, then unlock it if solid-state drives are put in instead.</p>
<p>Naturally, the Teradata 6680 has different ratios among performance, data capacity, price, and operating cost than hard-disk-only alternatives. For example, as per Slide 30, a 6680 implementation can have &gt;2X the performance of the 6650 on the same amount of data, yet enjoy &#8220;27% lower data center costs.&#8221;</p>
<p>Speaking of data center &#8212; i.e. power and  floor space &#8212; costs, Slide 28 tells us that they&#8217;re 20% better with a Teradata 6650 than a 5650. One reason is that Teradata has decided that it really trusts its write-ahead logs, so UPS (Uninterruptible Power Supplies) are no longer needed. Slide 28 also tells us that the 6650 and 5650 are pretty equivalent in performance and data capacity.</p>
<p>Teradata has long made a big deal about its &#8220;investment protection,&#8221; which ensures that different year&#8217;s models of Teradata systems can work together at their respective full performance capacities. However, investment protection has been suspended for Teradata&#8217;s products with solid-state drives (the 6680, or the 6650 with the optional SSDs that will eventually become available). Teradata does say that future releases will start having investment protection again, at least back to these new systems.</p>
<p>Relevant background to all this includes:</p>
<ul>
<li><a href="../../../../../2010/08/16/vertica-flash-temp-space/">Vertica&#8217;s  take on combining hard and solid-state disks</a> (August, 2010)</li>
<li><a href="../../../../../2010/08/18/more-on-temp-space-compression-and-random-io/">IBM&#8217;s  take on combining hard-disk and solid-state storage</a> (August, 2010)</li>
<li><a href="../../../../../2008/09/13/ssd-get-incorporated-into-data-warehousing/">Some  storage vendors&#8217; thoughts on the relationship between DBMS and  solid-state storage</a> (September, 2008)</li>
<li>A discussion I had in October, 2009 discussion with Teradata&#8217;s <a href="../../../../../2009/10/25/teradata-hardware-strategy-and-tactics/">Carson Schmidt</a>. (The SSD supplier he was so high on turns out to be Pliant Technology.)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/04/10/teradata-integrates-in-solid-state-storage/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

