<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Storage</title>
	<atom:link href="http://www.dbms2.com/category/storage/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 02 Sep 2010 09:06:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Vertica&#8217;s innovative architecture for Flash, plus more about temp space than you perhaps wanted to know</title>
		<link>http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/</link>
		<comments>http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/#comments</comments>
		<pubDate>Mon, 16 Aug 2010 08:07:33 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2788</guid>
		<description><![CDATA[Vertica is announcing:

Technology it already has 	released*, but has not published any reference architectures 	for
A 	Barney partnership**

In other words, Vertica has succumbed to the common delusion that it&#8217;s a good idea to put out half-baked press releases the week of TDWI conferences. But if we look past that kind of all-too-common nonsense, Vertica is highlighting [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Vertica is announcing:</p>
<ul>
<li>Technology it already has 	released*, but has not published any reference architectures 	for</li>
<li><span style="font-style: normal;">A 	<a href="http://www.strategicmessaging.com/barney-partnerships/2010/08/12/" onclick="javascript:pageTracker._trackPageview('/www.strategicmessaging.com');">Barney</a> partnership**</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">In other words, Vertica has succumbed to the common delusion that it&#8217;s a good idea to put out half-baked press releases the week of TDWI conferences. </span>But if we look past that kind of all-too-common nonsens<span style="font-weight: normal;">e, Vertica is highlighting an interesting technical story, about </span><strong>how the analytic DBMS industry can exploit solid-state memory technology.</strong></p>
<p style="margin-bottom: 0in;"><em>*Upgrades to <a href="../2009/08/04/flexstore-and-the-rest-of-vertica-35/">Vertica FlexStore</a> to handle Flash memory, actually released as part of <a href="../2010/02/22/vertica-4/">Vertica 4.0</a></em></p>
<p style="margin-bottom: 0in;"><em>** With Fusion I/O</em></p>
<p style="margin-bottom: 0in;">To set the context, let&#8217;s recall a few points I&#8217;ve noted in the past:</p>
<ul>
<li><a href="../2010/01/31/flash-pcmsolid-state-memory-disk/">Solid-state 	memory&#8217;s price/throughput tradeoffs obviously make it the future of 	database storage</a>.</li>
<li><a href="../2010/06/25/flash-is-coming-well/">The 	Flash future is coming soon</a>, in part because Flash&#8217;s propensity 	to wear out is overstated. This is especially true in the case of 	modern analytic DBMS, which tend to write to blocks all at once, and 	most particularly the case for append-only systems such as Vertica.</li>
<li><a href="../2010/08/12/teradata-future-product-strategy/">Being 	able to intelligently split databases among various cost tiers of 	storage – e.g. Flash and disk – makes a whole lot of sense</a>.</li>
</ul>
<p style="margin-bottom: 0in;">Taken together, those points tell us:</p>
<p style="margin-bottom: 0in;"><strong>For optimal price/performance, analytic DBMS should support databases that run part on Flash, part on disk.</strong></p>
<p style="margin-bottom: 0in;">While all this is a future for some other analytic DBMS vendors, Vertica is shipping it today.* What&#8217;s more, three aspects of Vertica&#8217;s architecture make it particularly well-suited for hybrid Flash/disk storage, in each case for a similar reason – you can get most of the performance benefit of all-Flash for a relatively low actual investment in Flash chips:  <span id="more-2788"></span></p>
<ul>
<li><strong>Vertica lets you split tables 	by column, </strong><span style="font-weight: normal;">and Vertica 	FlexStore is versatile enough to let you put only the most-used 	columns in Flash. (Vertica offers a figure that 85% of usage calls 	on only 15% of columns, but I don&#8217;t know how rigorously grounded 	those numbers are.)</span></li>
<li>To the extent that Vertica data is<span style="font-weight: normal;"> <a href="../2008/09/24/vertica-finally-spells-out-its-compression-claims/">more </a></span><a href="../2008/09/24/vertica-finally-spells-out-its-compression-claims/">compressed</a> than many of Vertica&#8217;s competitors&#8217; (which it probably is, debates 	over the magnitude of Vertica&#8217;s advantage notwithstanding), the 	total storage-hardware cost of sticking stuff in Flash is less when 	you use Vertica than with other systems.</li>
<li>Vertica has <span style="font-weight: normal;">relatively 	less need for </span><strong>temp space</strong> than some other systems. 	(Vertica uses figures of &lt;20% of total storage, vs. 30%+ for some 	other systems.) If you want to use Flash for temp space, so as to 	accelerate your toughest queries, that can save you some cash …</li>
<li>… and by the way, <strong>temp space 	is an especially good use of Flash, </strong>because <strong>temp space is 	accessed in a less sequential manner than data storage is.</strong></li>
</ul>
<p style="margin-bottom: 0in;">The least obvious of those points are about temp space; I only understood the particulars when Vertica development chief Shilpa Lawande explained them to me Thursday.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><em>* At least in theory; customer adoption may be a different matter.</em></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">But before drilling down on temp space, let me first note that there&#8217;s one offsetting factor to all those “We need somewhat less Flash than the other guys” Vertica advantages. Like all serious databases, a Vertica installation keeps two or more copies of all data, to that there&#8217;s no storage single point of failure. In a flexible system like Vertica, you can put one copy on Flash and one on disk. But if you do that in Vertica, you forgo fully exploiting one possible benefit of Vertica&#8217;s architecture – the ability to store different copies of a column in different orders, which are beneficial for accelerating different groups of queries.*</p>
<p style="margin-bottom: 0in;"><em>*More precisely, you don&#8217;t get the full benefits of Flash acceleration for every query touching those columns.</em></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">OK. Back to temp space. There are four kinds of things you can put in storage if you&#8217;re running a database management system:</p>
<ul>
<li>The <strong>software</strong> itself.</li>
<li><span style="font-weight: normal;">Persistent </span><strong>data. </strong><span style="font-weight: normal;">(I.e., tables, 	if the DBMS you&#8217;re running is relational.)</span></li>
<li><strong>Metadata,</strong> especially the 	kind that lets you find data &#8211;<strong> indexes,</strong> zone maps, catalogs, 	etc.</li>
<li><strong>Temporary data constructs</strong> built as part of, say, a s<span style="font-weight: normal;">ort-merge 	join. These, by definition, are what populate temp space.</span></li>
</ul>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">Just to be clear, those constructs are NOT temporary tables of the sort created by, say, Microstrategy; such tables are handled like any other data. Rather, they are ephemeral creat<span style="font-weight: normal;">ions and, so far as I can tell, not tables at all. </span></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">Vertica offered two theories as to why its DBMS requires less temp space than competitors do:</p>
<ul>
<li>To the extent data is decompressed 	before being operated on in memory by the DBMS, that decompression 	would of course also apply to temp space as well. Vertica prides 	itself on <strong>keeping data compressed</strong> all the way through, and 	seems to get away with smaller temp space allocations as a benefit.</li>
<li>Since Vertica can store columns in 	expedient sort orders, it does less sorting overall, and sorting is 	a big use of temp space.</li>
</ul>
<p style="margin-bottom: 0in;">Obviously, no matter which DBMS you use, the amount of temp space you need is surely workload-dependent. Even so, Vertica&#8217;s claim to something of an advantage seems legit.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><em>Truth be told, I&#8217;m not convinced the savings involved are great enough to </em>matter<em> a whole lot – but it&#8217;s a fun subject to think through. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </em></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">And finally: One of my biggest surprises since starting to look at analytic-DBMS-on-Flash has been the centrality of temp space. Talking to Vertica Thursday, I finally uncovered a key reason why: <strong>Temp space tends to be accessed via multiple streams of data at once.</strong> I&#8217;m still struggling with WHY that is true, with two reasons suggested being:</p>
<ul>
<li>Temp space can be accessed by 	multiple operations at once. (But isn&#8217;t that also true of the rest 	of storage?)</li>
<li>Merge sorts, a common use of temp 	space, read multiple streams of data. (Couldn&#8217;t you tweak your 	software to make that not be true?)</li>
</ul>
<p style="margin-bottom: 0in;">But if we grant that temp space naturally is accessed in multiple places at once – well, that&#8217;s a lot like random I/O, and <a href="../2005/11/13/breaking-the-disk-speed-barrier/">if you&#8217;re doing a lot of random reads, you&#8217;d love to use something other than spinning disk</a>.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Teradata&#8217;s future product strategy</title>
		<link>http://www.dbms2.com/2010/08/12/teradata-future-product-strategy/</link>
		<comments>http://www.dbms2.com/2010/08/12/teradata-future-product-strategy/#comments</comments>
		<pubDate>Thu, 12 Aug 2010 10:37:14 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Kickfire]]></category>
		<category><![CDATA[Microstrategy]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2769</guid>
		<description><![CDATA[I think Teradata&#8217;s future product strategy is coming into focus. I&#8217;ll start by outlining some particular aspects, and then show how I think it all ties together.

The immediate hook here is that I had a short conversation with Scott Gnau of Teradata yesterday, triggered by Teradata&#8217;s acquisition of Kickfire&#8217;s assets. Takeaways from that part included:

The [...]]]></description>
			<content:encoded><![CDATA[<p>I think Teradata&#8217;s future product strategy is coming into focus. I&#8217;ll start by outlining some particular aspects, and then show how I think it all ties together.<br />
<span id="more-2769"></span></p>
<p style="margin-bottom: 0in;">The immediate hook here is that I had a short conversation with Scott Gnau of Teradata yesterday, triggered by <a href="../2010/07/27/kickfire-unlikely-to-survive/">Teradata&#8217;s acquisition of Kickfire&#8217;s assets</a>. Takeaways from that part included:</p>
<ul>
<li>The acquisition is all about 	Kickfire&#8217;s <a href="../2009/08/21/kickfires-fpga-based-technical-strategy/">data 	pipelining</a> technology.</li>
<li>Scott (in my opinion rightly) 	thinks that isn&#8217;t particularly tied to Kickfire&#8217;s choice of 	particular DBMS architecture (fairly vanilla columnar).</li>
<li>No decision has been made about 	whether the right vehicle for this technology is an FPGA (Field 	Programmable Gate Array), conventional Intel CPU, RAM, etc.</li>
</ul>
<p style="margin-bottom: 0in;"><em>If you want to handicap Teradata&#8217;s future data pipelining strategy, you might note that:</em></p>
<ul>
<li><em>Kickfire&#8217;s own choice – and 	hence its existing implementation – is an FPGA.</em></li>
<li><em><a href="../2009/08/04/vectorwise-ingres-and-monetdb/">VectorWise&#8217;s 	approach to pipelining is Intel-based,</a> apparently at the cost of 	being closely tied to specific generations of Intel CPUs.</em></li>
<li><em><a href="../2009/07/27/xtremedata-announces-its-dbx-data-warehouse-appliance/">XtremeData&#8217;s 	approach to pipelining</a> is FPGA-based.</em></li>
<li><em>Teradata has a lot more 	development resources than any of those other companies, as well as 	important existing products, and hence has both means and motive to 	shoehorn new technology into older system designs.</em></li>
</ul>
<p style="margin-bottom: 0in;">While I had Scott on the phone, I brought up a few other subjects too. Highlights included:</p>
<ul>
<li>Teradata&#8217;s Flash-based appliance 	is doing just fine in beta test and customer POCs (Proofs of 	Concept).</li>
<li>Other kinds of Teradata appliance 	are not inconceivable.</li>
<li>Scott thinks <a href="http://www.dbms2.com/2010/07/31/teradata-xkoto-gridscale-rip-and-active-active-clustering/" >Michael McIntire&#8217;s 	condemnation of Active-Active architectures</a> is overstated. That 	said,
<ul>
<li>Scott does acknowledge a need for 	greater Active-Active scalability, and suggests that the reason 	Xkoto&#8217;s current products are being discontinued is their lack of 	scaling.</li>
<li>Scott seems quietly confident the 	scaling will get done.</li>
</ul>
</li>
<li>Scott is emphatic that Teradata is 	not going to go to <a href="../2009/04/20/calpont-update-you-read-it-here-first/">a 	two-tier architecture</a>. In particular, the point of splitting 	storage/lightweight database processing and heavyweight database 	processing on separate tiers is generally to save bandwidth, and 	Teradata&#8217;s BYNET is typically less than 10% loaded.</li>
<li>Scott didn&#8217;t dispute my claim that 	this all suggests <a href="../2008/10/14/teradata-virtual-storage/">Teradata 	Virtual Storage</a> is the future, at the expense of a rigid 	delineation among <a href="../2008/10/23/teradata-appliance-product-lines/">specific 	use-case-focused product lines</a>.</li>
<li>Unlike <a href="http://www.dbms2.com/2010/02/22/netezza-twinfin/" >Netezza</a> or <a href="http://www.dbms2.com/2010/02/22/aster-data-ncluster-4-5/" >Aster</a>, Teradata doesn&#8217;t seem to plan analytic capability that works outside 	the UDF (User Defined Function) framework. However, Scott noted that 	Teradata has long had the capability that Aster and Netezza now also 	have of letting you run analytic code either in “protected mode” 	(if the process fails the whole database doesn&#8217;t crash) or in the 	database kernel (best performance, if you&#8217;re sufficiently confident 	in the code&#8217;s stability to take the risk). Scott also spoke of the 	release later this quarter of Teradata FastPath, which will offer 	yet better performance (however, there&#8217;s a gotcha to Teradata 	FastPath that&#8217;s still NDA).</li>
</ul>
<p style="margin-bottom: 0in;">Putting all that together with the rest of what we know about Teradata, I&#8217;m going to call out<strong> three pillars of Teradata&#8217;s long-term product strategy:</strong></p>
<ul>
<li><strong>Same fundamentals as always.</strong> Teradata&#8217;s core product strategy is:
<ul>
<li>Single DBMS, capable of meeting 	all analytic needs while running in a single instance, usually 	running on &#8230;</li>
<li>… proprietary hardware …</li>
<li>… built from 	conservatively-chosen parts.</li>
</ul>
</li>
<li><strong>Selective vertical application 	stack.</strong> No matter how horizontally-oriented they are, many 	companies that have been in the analytic technology business for a 	while wind up with some vertical applications. It sort of just 	happens. Teradata is no exception. Teradata also likes to sell 	services to its product customers, and some of those are quite 	vertical-aware.</li>
<li><strong>Mutable, modular platform.</strong> This is what I highlighted above. Note that it&#8217;s philosophically 	attuned with the one-system-does-everything approach Teradata 	prefers. More subtly, please also note that it goes well with 	customer-by-customer price customization, which is almost a must for 	Teradata given the Innovator&#8217;s Dilemma kind of pricing box it finds 	itself in.</li>
</ul>
<p style="margin-bottom: 0in;">So far, that&#8217;s not too exciting, except in the details of how Teradata&#8217;s engineers make that all work. But there&#8217;s a <strong>fourth pillar to Teradata&#8217;s technical strategy</strong> as well, and it&#8217;s a wild card: t<strong>ight partnerships.</strong> Every time I talk with Teradata hardware chief Carson Schmidt, he seems excited about some particular version of a part or other – sometimes from a reasonably established vendor (once it was LSI Logic), sometimes from a tiny one (notably <a href="../2009/10/25/teradata-hardware-strategy-and-tactics/">the “stealth” start-up on which Teradata bet its first solid-state product</a>.) In the future, I expect tight business intelligence partnerships as well. Cognos BI will be increasingly integrated with IBM&#8217;s DBMS and hardware; Business Objects&#8217; BI will increasingly be integrated with SAP&#8217;s applications; and Oracle&#8217;s BI will eventually be integrated with everything. How do you compete with that if you<span style="font-style: normal;">&#8216;re Microstrategy? </span>Well, you try to have superior product, of course – but you also partner as closely with DBMS vendors as you can, an approach Microstrategy has already started. Predictive analytics stalwart <a href="http://www.dbms2.com/2010/05/15/further-clarifying-in-database-mpp-sas/" >SAS</a>, of course, is on a partnership binge as well.</p>
<p style="margin-bottom: 0in;">Teradata has a larger installed base than almost all its competitors, and enjoys richer third-party software and service support as a result. But I suspect that going forward,  for Teradata to remain a leading competitor at price points it is willing to accept, Teradata&#8217;s “ecosystem” advantages will need to ratchet up one or several notches.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/12/teradata-future-product-strategy/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Teradata, Xkoto Gridscale (RIP), and active-active clustering</title>
		<link>http://www.dbms2.com/2010/07/31/teradata-xkoto-gridscale-rip-and-active-active-clustering/</link>
		<comments>http://www.dbms2.com/2010/07/31/teradata-xkoto-gridscale-rip-and-active-active-clustering/#comments</comments>
		<pubDate>Sat, 31 Jul 2010 08:23:57 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Continuent]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Xkoto]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2708</guid>
		<description><![CDATA[Having gotten a number of questions about Teradata&#8217;s acquisition of Xkoto, I leaned on Teradata for an update, and eventually connected with Scott Gnau. Takeaways included:

Teradata is discontinuing  Xkoto&#8217;s existing product Gridscale, which 	Scott characterized as being too OLTP-focused to be a good fit for 	Teradata. Teradata hopes and expects that existing Xkoto Gridscale [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Having gotten a number of questions about Teradata&#8217;s acquisition of Xkoto, I leaned on Teradata for an update, and eventually connected with Scott Gnau. Takeaways included:</p>
<ul>
<li>Teradata is discontinuing <a href="http://www.dbms2.com/2009/09/11/xkoto-gridscale-highlights/" > </a><a href="http://www.dbms2.com/2009/09/11/xkoto-gridscale-highlights/" >Xkoto&#8217;s existing product Gridscale</a>, <span style="font-style: normal;">which 	Scott characterized as being too OLTP-focused to be a good fit for 	Teradata. Teradata hopes and expects that existing Xkoto Gridscale 	customers won&#8217;t renew maintenance. (I&#8217;m not sure</span> that they&#8217;ll 	even get the option to do so.)</li>
<li>The point of Teradata&#8217;s technology 	+ engineers acquisition of Xkoto is to enhance Teradata&#8217;s 	active-active or multi-active data warehousing capabilities, which 	it has had in some form for several years.</li>
<li>In particular, Teradata wants to 	tie together different products in the Teradata product line. (Note: 	Those typically all run pretty much the same Teradata database 	management software, except insofar as they might be on different 	releases.)</li>
<li>Scott rattled off all the 	plausible areas of enhancement, with multiple phrasings – 	performance, manageability, ease of use, tools, features, etc.</li>
<li>Teradata plans to have one or two 	releases based on Xkoto technology in 2011.</li>
</ul>
<p style="margin-bottom: 0in;">Frankly, I&#8217;m disappointed at the struggles of clustering efforts such as Xkoto Gridscale or <a href="http://www.dbms2.com/2009/09/03/continuent-on-clustering/" >Continuent&#8217;s pre-Tungsten products</a>, but if the DBMS vendors meet the same needs themselves, that&#8217;s OK too.</p>
<p style="margin-bottom: 0in;">The logic behind active-active database implementations actually seems pretty compelling:  <span id="more-2708"></span></p>
<ul>
<li>You may well be keeping a second 	copy of your database for high availability/hot standby.</li>
<li>You might even be keeping a third 	copy for off-site disaster recovery.</li>
<li>In some cases, you might have 	reasons beyond disaster recovery to distribute a database around the 	world.</li>
<li>So why not allow queries to be run 	against all the copies?</li>
<li>And by the way, splitting the 	workload up a bit by kinds (e.g., long-running vs. short query) 	might let you optimize the implementation of each copy of the 	database. (This last point becomes even more important with the rise 	of solid-state memory.)</li>
</ul>
<p style="margin-bottom: 0in;">Analytic DBMS vendors pretty much all need to offer this. (Possible exception: If they have a data-mart-only positioning so extreme that customers will never care about any form of failover.) That said, I must confess to not having done a good job of tracking who does or doesn&#8217;t have which features in this area to date; informative comments to this post in that regard would be much appreciated!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/07/31/teradata-xkoto-gridscale-rip-and-active-active-clustering/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Why analytic DBMS increasingly need to be storage-aware</title>
		<link>http://www.dbms2.com/2010/07/07/analytic-database-storage-aware/</link>
		<comments>http://www.dbms2.com/2010/07/07/analytic-database-storage-aware/#comments</comments>
		<pubDate>Wed, 07 Jul 2010 06:30:14 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2515</guid>
		<description><![CDATA[In my quick reactions to the EMC/Greenplum announcement, I opined

I think that even software-only analytic DBMS vendors should design their systems in an increasingly storage-aware manner

promising to explain what I meant later on. So here goes.  
There always have been good technical reasons to tailor hardware to analytic database software. Data moves through disk controller, [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><span style="font-style: normal;">In <a href="../2010/07/06/emc-is-buying-greenplum/">my quick reactions to the EMC/Greenplum announcement</a>, I opined</span></p>
<blockquote>
<p style="margin-bottom: 0in;">I think that even software-only analytic DBMS vendors should design their systems in an increasingly storage-aware manner</p>
</blockquote>
<p style="margin-bottom: 0in;">promising to explain what I meant later on. So here goes.  <span id="more-2515"></span></p>
<p style="margin-bottom: 0in;">There always have been good technical reasons to tailor hardware to analytic database software. Data moves through disk controller, network, RAM, CPU and more, each with its own data rate. Getting different kinds of parts into the right bal<span style="font-style: normal;">ance doesn&#8217;t completely eliminate bottlenecks – the <a href="../2010/07/06/the-one-hoss-shay/">Wonderful One-Hoss Shay</a> is poetic fiction </span>– but it certainly can help. As a result, every analytic DBMS vendor of any size offers at least one of:</p>
<ul>
<li><a href="../2007/01/27/data-warehouse-appliance-hardware-strategies/">A 	Type 0 appliance</a></li>
<li>A Type 1 appliance</li>
<li>A “recommended hardware 	configuration”</li>
</ul>
<p style="margin-bottom: 0in;">And beyond performance, appliances and pre-specified hardware configurations offer at least the possibility of easing installation, administration, and support.</p>
<p style="margin-bottom: 0in;">There also are marketing reasons to offer an appliance or something appliance-like.</p>
<ul>
<li>To various extents, Oracle, 	Teradata, Microsoft, IBM, Netezza, and EMC are all telling the world 	that your hardware should be optimized for your analytic DBMS.</li>
<li>Smaller vendors such as Vertica 	and Aster Data also tend to cobble together some sort of appliance, 	in part so they don&#8217;t have to say they disagree.</li>
<li>Thus, a “We don&#8217;t see any point 	in special hardware assembly at all” story would leave an analytic 	DBMS vendor pretty far out on a limb.</li>
</ul>
<p style="margin-bottom: 0in;">Finally, there are three overlapping technical trends that increase the need for storage-awareness in analytic DBMS. First and foremost is the rise of <strong>solid-state memory.</strong> For starters, I believe:</p>
<ul>
<li><a href="../2010/06/25/flash-is-coming-well/">Flash 	will be important for analytic DBMS soon</a>.</li>
<li>There are good technical reasons 	for this.</li>
<li><a href="../2010/01/22/oracle-database-hardware-strategy/">Oracle&#8217;s 	marketing will make a big deal out of the Flash aspects of Exadata</a>, 	so other analytic DBMS vendors will need a response. And of course, 	if Netezza or Teradata preemptively make a big deal of their 	Flash-based offerings, that just adds to the pressure for Flash 	adoption on everybody else.</li>
<li>But it&#8217;s not just Flash – <a href="../2010/01/31/flash-pcmsolid-state-memory-disk/">Flash, 	other solid-state memory, and disk</a> will be combined in various 	ways.</li>
</ul>
<p style="margin-bottom: 0in;">But this move to Flash will require analytic DBMS vendors to be increasingly storage-aware for at least three reasons:</p>
<ul>
<li>It just adds another level of 	<strong>complexity</strong> to their hardware-balancing challenges.</li>
<li><strong>Flash overturns some of the 	fundamental assumptions of modern analytic DBMS,</strong> in particular:
<ul>
<li><a href="../2006/09/19/is-data-warehousing-now-all-about-sequential-access/">Sequential 	reads are hugely better than random</a></li>
<li>The worst bottleneck is at the 	point where data comes out of storage.</li>
</ul>
</li>
<li><strong>The Flash technology stack is 	still immature,</strong> and you have to pick your poison in how to deal 	with it. Vendors are making very different choices in this regard – 	and they do have to choose.</li>
</ul>
<p style="margin-bottom: 0in;">Another trend that could naturally lead analytic DBMS vendors to be more storage-aware is their incorporation of what could be viewed as hierarchical storage/ILM technologies.  Different data is stored in different ways and/or on different kinds of storage hardware. (Vendors pursuing – you guessed it – different approache<span style="font-style: normal;">s to this include <a href="../2009/08/04/2008/10/14/teradata-virtual-storage/">Teradata</a>, <a href="../2009/10/14/greenplum-hybrid-columnar/">Greenplum</a>, <a href="../2009/08/04/flexstore-and-the-rest-of-vertica-35/">Vertica</a>, and <a href="../2009/08/25/sybase-iq-technical-highlights/">Sybase</a>.) The m</span>ore automatic that process is, the more storage-aware the DBMS will need to be.</p>
<p style="margin-bottom: 0in;">Finally, there are reasons to th<span style="font-style: normal;">ink that <a href="../2008/09/06/sans-vs-das-in-mpp-data-warehousing/">DBMS should be split between conventional servers and smart storage</a>. This is, of course, the E</span>xadata strategy. <a href="../2010/06/21/netezza-silicon-balance/">Netezza&#8217;s two-processor approach</a>, while rather different, also somewhat validates the idea.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/07/07/analytic-database-storage-aware/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>EMC is buying Greenplum</title>
		<link>http://www.dbms2.com/2010/07/06/emc-is-buying-greenplum/</link>
		<comments>http://www.dbms2.com/2010/07/06/emc-is-buying-greenplum/#comments</comments>
		<pubDate>Tue, 06 Jul 2010 22:53:52 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Storage]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2498</guid>
		<description><![CDATA[EMC is buying Greenplum. Most of the press release is a general recapitulation of Greenplum&#8217;s marketing messages, the main exceptions being (emphasis mine):
The acquisition of Greenplum will be an all-cash transaction and is expected to be completed in the third quarter of 2010, subject to customary closing conditions and regulatory approvals. The acquisition is not [...]]]></description>
			<content:encoded><![CDATA[<p>EMC is buying Greenplum. Most of the <a href="http://www.emc.com/about/news/press/2010/20100706-01.htm" onclick="javascript:pageTracker._trackPageview('/www.emc.com');">press release</a> is a general recapitulation of Greenplum&#8217;s marketing messages, the main exceptions being (emphasis mine):</p>
<blockquote><p>The acquisition of Greenplum will be an all-cash transaction and is <strong>expected to be completed in the third quarter of 2010,</strong> subject to customary closing conditions and regulatory approvals. The acquisition is not expected to have a material impact to EMC GAAP and non-GAAP EPS for the full 2010 fiscal year. Upon close, Bill Cook will lead the new data computing product division and report to Pat Gelsinger. <strong>EMC will continue to offer Greenplum&#8217;s full product portfolio to customers and plans to deliver new EMC Proven reference architectures as well as an integrated hardware and software offering</strong> designed to improve performance and drive down implementation costs.</p></blockquote>
<p>Greenplum is one of my biggest vendor clients, and EMC is just becoming one, but of course neither side gave me a heads-up before the deal happened, nor have I yet been briefed subsequently. With those disclaimers out of the way, some of my early thoughts include:</p>
<ul>
<li>I wish my clients would never buy each other, but it&#8217;s inevitable.</li>
<li>I don&#8217;t think anybody evaluating Greenplum should be much influenced by this deal one way or the other. (Whether they will be is of course a different matter.)
<ul>
<li>EMC tends to run its bigger software acquisitions in a fairly hands-off manner. There&#8217;s no particular FUD (Fear/Uncertainty/Doubt) reason why this deal should stop anybody from buying Greenplum software.</li>
<li>I also don&#8217;t think adding a rich parent adds much of a reason to buy from Greenplum. But if you&#8217;re the type who&#8217;s nervous about smaller vendors &#8212; well, Greenplum now isn&#8217;t so small.</li>
</ul>
</li>
<li><a href="http://www.dbms2.com/2010/04/12/greenplumchorus/" >Greenplum Chorus</a> could, in principle, work with non-Greenplum DBMS. That possibility suddenly looks a lot more realistic.</li>
<li>The list of analytic DBMS vendors with an appliance orientation is pretty impressive, including:
<ul>
<li>Oracle, with <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" >Exadata</a></li>
<li>Microsoft, partially</li>
<li>Teradata</li>
<li>Netezza</li>
<li>Now EMC/Greenplum, at least partially</li>
<li>Weaker players such as:
<ul>
<li>The <a href="http://www.dbms2.com/2010/06/11/kickfire-update-2/" >ailing Kickfire</a>, which a client (not Kickfire itself) tells me is being shopped around</li>
<li>The <a href="http://www.dbms2.com/2010/03/19/some-business-trends-in-the-data-warehouse-market/" >reeling HP Neoview</a></li>
<li>XtremeData, but I&#8217;m still waiting to hear of<a href="http://www.dbms2.com/2010/03/18/xtremedata-update/" > XtremeData&#8217;s first real sale</a></li>
</ul>
</li>
</ul>
</li>
<li>Greenplum is something of a specialist in <a href="http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/" >large databases</a>. EMC has to love that.</li>
<li>Greenplum&#8217;s weakness is concurrency.</li>
<li><a href="http://www.dbms2.com/2009/10/14/greenplum-hybrid-columnar/" >Greenplum&#8217;s &#8220;polymorphic storage&#8221;</a> is a good fit for a storage vendor with appliance-y ideas.</li>
<li>And finally &#8212; I think that even software-only analytic DBMS vendors should design their systems in an increasingly storage-aware manner, and have been advising my vendor clients of same. I&#8217;ll blog that line of reasoning separately when I get a chance, and edit in a link here after I do.</li>
</ul>
<p><em><strong>Related links (edit)</strong></em></p>
<ul>
<li>Here&#8217;s the promised post as to <a href="http://www.dbms2.com/2010/07/07/analytic-database-storage-aware/" >why analytic DBMS need to be ever more storage-aware</a>.</li>
<li><a href="http://www.kellblog.com/2010/07/06/emc-acquires-data-warehouse-vendor-greenplum-as-cornerstone-of-new-data-computing-product-division/" onclick="javascript:pageTracker._trackPageview('/www.kellblog.com');">Dave Kellogg crunched the EMC/Greenplum numbers</a>, coming up with an estimated valuation range of $3-400 million, the high end of which is rumored to be correct.</li>
<li>Merv Adrian suggests <a href="http://mervadrian.wordpress.com/2010/07/06/emc-buys-greenplum-big-data-realignment-continues/#more-2890" onclick="javascript:pageTracker._trackPageview('/mervadrian.wordpress.com');">the big EMC/Greenplum loser is ParAccel</a>, a viewpoint which presumably presupposes that the EMC/ParAccel partnership was significant in the first place.</li>
<li>I talked with Ben Werther and posted <a href="http://www.dbms2.com/2010/07/07/more-on-greenplum-and-emc/" >more about Greenplum and EMC</a>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/07/06/emc-is-buying-greenplum/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Flash is coming, well &#8230;</title>
		<link>http://www.dbms2.com/2010/06/25/flash-is-coming-well/</link>
		<comments>http://www.dbms2.com/2010/06/25/flash-is-coming-well/#comments</comments>
		<pubDate>Fri, 25 Jun 2010 16:42:26 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data integration and middleware]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2389</guid>
		<description><![CDATA[I really, really wanted to title this post &#8220;Flash is coming in a flash.&#8221; That seems a little exaggerated &#8212; but only a little.

Netezza now intends to come out with a Flash-based appliance earlier than it originally expected.
Indeed, Netezza has suspended &#8212; by which I mean &#8220;scrapped&#8221; &#8212; prior plans for a RAM-heavy disk-based appliance. [...]]]></description>
			<content:encoded><![CDATA[<p>I really, really wanted to title this post &#8220;Flash is coming in a flash.&#8221; That seems a little exaggerated &#8212; but only a little.</p>
<ul>
<li>Netezza now intends to come out with a Flash-based appliance earlier than it originally expected.</li>
<li>Indeed, Netezza has suspended &#8212; by which I mean &#8220;scrapped&#8221; &#8212; prior plans for a RAM-heavy disk-based appliance. It will use a RAM/Flash combo instead.*</li>
<li>Tim Vincent of IBM told me that customers seem ready to adopt solid-state memory. One interesting comment he made is that Flash isn&#8217;t really all that much more expensive than high-end storage area networks.</li>
</ul>
<p>Uptake of solid-state memory (i.e. Flash) for analytic database processing will probably stay pretty low in 2010, but in 2011 it should be a notable (b)leading-edge technology, and it should get mainstreamed pretty quickly after that.  <span id="more-2389"></span></p>
<p><em>*So far as I can tell, that&#8217;s one of the two significant roadmap changes between the 2009 and 2010 editions of <a href="http://www.dbms2.com/2010/06/23/my-talk-this-morning/" >Enzee Universe</a>. The other one is that </em><em>the robust form of</em><em> appliance-to-appliance replication technology is coming out later than Netezza had originally planned and hoped.</em></p>
<p>There also is increasing reason to think that the issues with Flash memory wearing out are overwrought.  And by the way, the entire history of enterprise solid-state memory use is basically shorter than the time in which these products supposedly will wear out, so it&#8217;s not as if there have been a lot of real-life failures out there.)</p>
<ul>
<li>First, clever things are being done in the area of error correction codes, although for the most part I defer that part of the discussion to Petascan&#8217;s Camuel Gilyadov. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  E.g., this seems to be the idea behind Anobit.</li>
<li>Second, analytic DBMS are pretty much an ideal use case for Flash reliability. Suppose, as is the case for many products and implementations, you only write things in big blocks. Then you are, ipso facto, resetting the Flash bits only in big blocks. Thus, at least in theory, you automatically have pretty perfect wear leveling.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/25/flash-is-coming-well/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>VoltDB finally launches</title>
		<link>http://www.dbms2.com/2010/05/25/voltdb-finally-launches/</link>
		<comments>http://www.dbms2.com/2010/05/25/voltdb-finally-launches/#comments</comments>
		<pubDate>Tue, 25 May 2010 07:15:04 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Games and virtual worlds]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[VoltDB and H-Store]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2201</guid>
		<description><![CDATA[VoltDB is finally launching today. As is common for companies in sectors I write about, VoltDB &#8212; or just &#8220;Volt&#8221; &#8212; has discovered the virtues of embargoes that end 12:01 am. Let&#8217;s go straight to the technical highlights:

VoltDB is based on the H-Store technology, which I wrote about in February, 2009. Most of what I [...]]]></description>
			<content:encoded><![CDATA[<p>VoltDB is finally launching today. As is common for companies in sectors I write about, VoltDB &#8212; or just &#8220;Volt&#8221; &#8212; has discovered the virtues of embargoes that end 12:01 am. Let&#8217;s go straight to the technical highlights:</p>
<ul>
<li>VoltDB is based on the <a href="http://www.dbms2.com/2008/02/19/h-store-architecture/" >H-Store</a> technology, which I wrote about in February, 2009. Most of what I said about H-Store then applies to VoltDB today.</li>
<li>VoltDB is a no-apologies ACID relational DBMS, which runs entirely in RAM.</li>
<li>VoltDB has rather limited SQL. (One example: VoltDB can&#8217;t do SUMs in SQL.) However, VoltDB guy Tim Callaghan (Mark Callaghan&#8217;s lesser-known but nonetheless smart brother) asserts that if you code up the missing functionality, it&#8217;s almost as fast as if it were present in the DBMS to begin with, because there&#8217;s no added I/O from the handoff between the DBMS and the procedural code. (The data&#8217;s in RAM one way or the other.)</li>
<li>VoltDB&#8217;s Big Conceptual Performance Story is that it does away with most locks, latches, logs, etc., and also most context switching.</li>
<li>In particular, you&#8217;re supposed to partition your data and architect your application so that most transactions execute on a single core. When you can do that, you get VoltDB&#8217;s performance benefits. To the extent you can&#8217;t, you&#8217;re in two-phase-commit performance land. (More precisely, you&#8217;re doing 2PC for multi-core writes, which is surely a major reason that multi-core reads are a lot faster in VoltDB than multi-core writes.)</li>
<li>VoltDB has a little less than one DBMS thread per core. When the data partitioning works as it should, you execute a complete transaction in that single thread. Poof. No context switching.</li>
<li>A transaction in VoltDB is a Java stored procedure. (The early idea of Ruby on Rails in lieu of the Java/SQL combo didn&#8217;t hold up performance-wise.)</li>
<li>Solid-state memory is not a viable alternative to RAM for VoltDB. Too slow.</li>
<li>Instead, VoltDB lets you snapshot data to disk at tunable intervals. &#8220;Continuous&#8221; is one of the options, wherein a new snapshot starts being made as soon as the last one completes.</li>
<li>In addition, VoltDB will also spool a kind of transaction log to the target of your choice. (Obvious choice: An analytic DBMS such as Vertica, but there&#8217;s no such connectivity partnership actually in place at this time.)</li>
</ul>
<p><span id="more-2201"></span>I should also note that when Tim Callaghan described architectural options to get around 2PC performance issues, they sounded a lot like eventual consistency. Maybe tunable <a href="http://www.dbms2.com/2010/05/01/ryw-read-your-writes-consistency/" >RYW consistency</a> isn&#8217;t in the cards, but at least there&#8217;s a NoSQL-like possibility with VoltDB.</p>
<p>VoltDB&#8217;s open source strategy is:</p>
<ul>
<li>VoltDB will be open sourced.</li>
<li>Community VoltDB will be GPLed. Professional Edition VoltDB has a non-GPL license.</li>
<li>The VoltDB Professional Edition won&#8217;t start out with features beyond the Community Edition ones, but will gain such later on. I didn&#8217;t get the sense the plans for those features were completely baked yet, but ideas mentioned included:
<ul>
<li>Management/monitoring tools.</li>
<li>Integration with expense closed-source enterprise software products, such as ones in the management/monitoring area.</li>
<li>Yet more &#8220;extreme&#8221;/edge-case performance.</li>
</ul>
</li>
<li>Before VoltDB decided for sure that it wasn&#8217;t selling licenses, it sold a license to Getco, which also seems to be an investor in the company.</li>
</ul>
<p>VoltDB had a beta test with about 150 participants. None is in production yet, although at least a few are clearly headed there. Most VoltDB beta testers are in some kind of online business, with a particular concentration in everybody&#8217;s new favorite market, online gaming. Most of the rest are in investment/trading &#8212; a major target market for at least three different Mike Stonebraker companies &#8212; and a few are in telecom. VoltDB assures me that some of the beta users are companies one actually has heard of before, but VoltDB is not in a position to name any of those.</p>
<p>VoltDB is not ideally suited for a classic order management system, since you&#8217;d want to partition both on CustomerID and SKU, the latter because you&#8217;d constantly updating inventory stock levels. However, this argument doesn&#8217;t apply in the case of virtual goods. Virtual goods that are sold for real money &#8212; and hence need ACID levels of transaction integrity &#8212; are thus a clear target market for VoltDB. (The example that came up was in, you guessed it, online gaming.) The other interesting use case that Tim highlighted was low-latency analytics/ELT. For reasons I didn&#8217;t totally grasp, Tim likes to call this &#8220;Stateful ELT.&#8221; (Given that the data goes into the VoltDB database before much else happens to it, I&#8217;m pretty sure I heard &#8220;ELT&#8221; correctly. But I guess I might have been mishearing &#8220;ETL&#8221;.)</p>
<p>VoltDB company highlights include:</p>
<ul>
<li>VoltDB has about a dozen employees, all but two of whom are technical. (However, I&#8217;m not sure they&#8217;re counting Andy Ellicott against the two. But then, last I heard he wasn&#8217;t full time at VoltDB.)</li>
<li>VoltDB&#8217;s venture funding status is, if I may paraphrase, &#8220;Mumble mumble.&#8221;</li>
<li>Although long separate from Vertica, VoltDB is still located in Vertica&#8217;s offices.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/25/voltdb-finally-launches/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>The Clustrix story</title>
		<link>http://www.dbms2.com/2010/05/12/the-clustrix-story/</link>
		<comments>http://www.dbms2.com/2010/05/12/the-clustrix-story/#comments</comments>
		<pubDate>Wed, 12 May 2010 08:53:48 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Application areas]]></category>
		<category><![CDATA[Clustrix]]></category>
		<category><![CDATA[Emulation, transparency, portability]]></category>
		<category><![CDATA[Games and virtual worlds]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Solid-state memory]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2096</guid>
		<description><![CDATA[After my recent post, the Clustrix guys raised their hands and briefed me. Takeaways included:    

Nothing in my 	original short post about Clustrix was actually incorrect.
Clustrix plans to reveal actual 	production “name-brand” customers soon.
The name of Clustrix&#8217;s software, 	or at least the guts thereof, is Sierra.
Clustrix&#8217;s products have actually 	been in general availability since last [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">After my recent post, the Clustrix guys raised their hands and briefed me. Takeaways included:    <span id="more-2096"></span></p>
<ul>
<li>Nothing in <a href="../2010/05/04/clustrix-may-be-doing-something-interesting/">my 	original short post about Clustrix</a> was actually incorrect.</li>
<li>Clustrix plans to reveal actual 	production “name-brand” customers soon.</li>
<li>The name of Clustrix&#8217;s software, 	or at least the guts thereof, is Sierra.</li>
<li>Clustrix&#8217;s products have actually 	been in general availability since last quarter, with some versions 	at customer sites for 2 years. Development started 3 ½ years ago.</li>
<li>Clustrix says its technology is 	for OLTP systems, which it calls “non-batch/non-analytic,” with 	mixed read/write workloads. All Clustrix&#8217;s example target markets 	are “internet verticals,” such as photo sharing, gaming, social 	media, e-commerce, etc.</li>
<li>Clustrix&#8217;s heart is in SQL, as is 	most of its customer base. Clustrix Sierra&#8217;s key-value-store option 	has little or no performance advantage over Clustrix Sierra&#8217;s SQL 	option, nor any other advantage over SQL that came up in discussion.</li>
<li>Clustrix Sierra is 	“wire-compatible” with MySQL, but doesn&#8217;t use MySQL code; 	Clustrix wrote all the code itself.</li>
<li>Clustrix asserts that Clustrix 	Sierra supports the “vast majority” of MySQL features. Examples 	of MySQL features Clustrix doesn&#8217;t support at this time are 	full-text search and geospatial indexing.</li>
<li>Indeed, Clustrix claims Clustrix 	Sierra can be used to replace MySQL with few or zero changes to 	existing applications.</li>
<li>I specifically asked about 	referential integrity, which has a poor performance reputation in 	MySQL. Besides saying they supported it, Clustrix said that some 	customers actually use referential integrity in some of their less 	active tables.</li>
<li>Clustrix Sierra is fully 	ACID-compliant, with no eventual consistency or <a href="http://www.dbms2.com/2010/05/01/ryw-read-your-writes-consistency/" >RYW consistency</a> story. The default number of copies of each datum is two, and 	they&#8217;re kept consistent via two-phase commit.</li>
<li>Clustrix Sierra is fully parallel, 	with no “head” node. I forgot to ask how it was determined which 	queries would be addressed to and/or controlled by which nodes, but 	I presume there&#8217;s some sort of a load-balancing scheme.</li>
<li>Clustrix says that because 	Clustrix Sierra uses MVCC (Multi-Version Concurrency Control), and 	thus reads and writes don&#8217;t block each other, global locks aren&#8217;t a 	major issue. (They&#8217;re rare or short or something – I have trouble 	seeing why they would be non-existent.)</li>
<li>Clustrix says there&#8217;s a second 	class of locks and latches that are purely local and short-lived, 	for B-tree indexes and the like. (I didn&#8217;t drill down into those 	either.) I guess this means Clustrix Sierra is B-tree-centric, which 	makes sense for an OLTP-oriented system.</li>
<li>Clustrix Sierra distributes data 	among nodes via consistent hashing (default), range partitioning, or 	“full distribution”(i.e., coping a – presumably small – 	table to each node). The choice of distribution plans is manual now; 	more automation is a future feature.</li>
<li>Clustrix Sierra&#8217;s CBO (Cost-Based 	Optimizer) is, as one would hope, distribution-aware.</li>
<li>Clustrix Sierra compiles query 	fragments and ships them off to the relevant nodes. A fragment might 	contain both instructions for SQL to be executed locally and for 	where data is to be sent next.</li>
<li>Clustrix says that Clustrix Sierra 	does data migration and redistribution (e.g., when you add a node) 	transparently online, and further says that in practice this doesn&#8217;t 	cause a performance hit.</li>
<li>As for Clustrix hardware:
<ul>
<li>Clustrix makes <a href="http://www.monashreport.com/2007/01/29/computing-appliances-trends/" onclick="javascript:pageTracker._trackPageview('/www.monashreport.com');">Type 	I appliances</a>.</li>
<li>A Clustrix node contains 2 	quad-core chips, 32 gigs of RAM, and 7 160 GB solid-state drives.</li>
<li>Specifically, Clustrix is using 	Intel SSDs, with a SAS interface.</li>
<li>Clustrix says solid-state memory 	isn&#8217;t really essential to the product design; it&#8217;s just cheap in 	terms of $/IOPS (I/O Per Second).</li>
</ul>
</li>
<li>A minimum Clustrix configuration 	is 3 nodes, for redundancy. After that you can add nodes one at a 	time. Clustrix says it built a 20-node system in-house, leading me 	to suspect that customers don&#8217;t have anything bigger than 20 nodes 	either.</li>
<li>That 20-node Clustrix system was 	tested to show near-linear scalability. (In discussing this, 	Clustrix tends to forget to use the word “near”.)</li>
<li>Clustrix has partnered with 	somebody to provide global 4-hour-response support. As of now 	Clustrix seems to be active mainly in North America and Europe.</li>
<li>Clustrix is formed from the 	combination of two startups, which I&#8217;ve heard elsewhere were called 	Clustrix and Sprout. Exactly when the combination happened sounds a 	little different depending on who&#8217;s telling the story (one version 	has the predecessors still being separate well into 2008, but 	Clustrix implies the combination happened pretty much on Day 1).</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/12/the-clustrix-story/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Revisiting disk vibration as a data warehouse performance problem</title>
		<link>http://www.dbms2.com/2010/05/08/disk-vibration-data-warehouse-performance-problem/</link>
		<comments>http://www.dbms2.com/2010/05/08/disk-vibration-data-warehouse-performance-problem/#comments</comments>
		<pubDate>Sat, 08 May 2010 04:06:02 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Storage]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2077</guid>
		<description><![CDATA[Last April, I wrote about the problems disk vibration can cause for data warehouse performance. Possible performance hits exceeded 10X, wild as that sounds.
Now Slashdot and ZDnet have weighed in, although for the most part they only are suggesting 50-100% performance hits. One good quote is:
It&#8217;s a lot easier to interfere with a moving head [...]]]></description>
			<content:encoded><![CDATA[<p>Last April, I wrote about <a href="http://www.dbms2.com/2009/04/28/data-warehouse-storage-options-cheap-expensive-or-solid-state-disk-drives/" >the problems disk vibration can cause for data warehouse performance</a>. Possible performance hits exceeded 10X, wild as that sounds.</p>
<p>Now <a href="http://hardware.slashdot.org/story/10/05/07/2254210/Vibration-Killing-Enterprise-Disk-Performance" onclick="javascript:pageTracker._trackPageview('/hardware.slashdot.org');">Slashdot and ZDnet</a> have weighed in, although for the most part they only are suggesting 50-100% performance hits. <span id="more-2077"></span>One good quote is:</p>
<blockquote><p>It&#8217;s a lot easier to interfere with a moving head arm than it is to mess one up that&#8217;s locked on a track, so this isn&#8217;t surprising in the least for vibration to affect reads that require numerous long seeks. I&#8217;m surprised it&#8217;s not <strong>worse</strong> than they&#8217;ve found.</p>
<p>Moving the head requires accelerated head stepping to top speed, stepping to close to the track, slowing down, stopping at the destination track, waiting for the head to settle, and reading an address block to find out where you managed to land. If you find you missed the track, you have to go through the whole seek process again. (usually only once more, those short adjustment hops are pretty reliable because they&#8217;re lower speed) But that really hurts your single block read time.</p>
<p>Add to that the fact that the &#8220;high performance&#8221; drives are making more risky higher speed track changes, which increase the odds of missing your target and make the operation more sensitive to vibration. I&#8217;ve written direct HDD io code before, and sure, you can up the step speed to get <strong>very</strong> nice seek time boosts, but then you start missing your track and start getting reseeks. Usually you go with the fastest that&#8217;s acceptably reliable, and that puts you on the bleeding edge of having problems, where things like vibration can run you off the deep end of the bell curve.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/08/disk-vibration-data-warehouse-performance-problem/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Clustrix may be doing something interesting</title>
		<link>http://www.dbms2.com/2010/05/04/clustrix-may-be-doing-something-interesting/</link>
		<comments>http://www.dbms2.com/2010/05/04/clustrix-may-be-doing-something-interesting/#comments</comments>
		<pubDate>Wed, 05 May 2010 00:18:55 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Clustrix]]></category>
		<category><![CDATA[DBMS product categories]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2053</guid>
		<description><![CDATA[Clustrix launched without briefing me or, at least so far as I can tell, anybody else who knows much about database technology. But Clustrix did post a somewhat crunchy, no-registration-required, white paper. Based on that, I get the impression:

Clustrix is making OLTP DBMS.
The core problem Clustrix tries to solve is scale-out, without necessarily giving up [...]]]></description>
			<content:encoded><![CDATA[<p>Clustrix launched without briefing me or, at least so far as I can tell, anybody else who knows much about database technology. But Clustrix did post a somewhat crunchy, no-registration-required, <a href="http://www.clustrix.com/wp-content/uploads/2010/04/clustrix-whitepaper-01-no-on-sql-mysql-object-key-value-store-database-scaling.pdf" onclick="javascript:pageTracker._trackPageview('/www.clustrix.com');">white paper</a>. Based on that, I get the impression:</p>
<ul>
<li>Clustrix is making OLTP DBMS.</li>
<li>The core problem Clustrix tries to solve is scale-out, without necessarily giving up SQL. (I couldn&#8217;t immediately tell whether Clustrix supports NoSQL-style key-value interfaces enthusiastically, grudgingly, or not at all.)</li>
<li>Unlike <a href="http://www.dbms2.com/2010/04/03/akiban-highlights/" >Akiban</a> or <a href="http://www.dbms2.com/2009/06/22/h-store-horizontica-voltdb/" >VoltDB</a>, Clustrix makes database appliances. The Clustrix software seems to assume a Clustrix appliance.</li>
<li>A key feature of Clustrix&#8217;s database appliances is that they rely on solid-state memory. I&#8217;m guessing that Clustrix appliances don&#8217;t even have disks, or that if they do the disks store some software or something, not actual data. (As <a href="http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/" >previously noted</a>, I agree with Oracle in thinking that <a href="http://www.dbms2.com/2010/04/05/oltp-database-management-systems-2/" >much of the progress in database technology this decade will come from proper design for solid-state memory</a>.)</li>
<li>Clustrix talks of things that sound like compiled queries and attempts to avoid locks. However, it doesn&#8217;t sound as extreme in these regards as VoltDB.</li>
<li>Clustrix also talks of things that sound like consistent hashing.</li>
<li>The brand name &#8220;Sierra&#8221; also shows up along with the brand name &#8220;Clustrix.&#8221;<em><br />
</em></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/04/clustrix-may-be-doing-something-interesting/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
