<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Solid-state memory</title>
	<atom:link href="http://www.dbms2.com/category/storage/solid-state-memory-disk-flash/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 02 Sep 2010 09:06:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Vertica&#8217;s innovative architecture for Flash, plus more about temp space than you perhaps wanted to know</title>
		<link>http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/</link>
		<comments>http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/#comments</comments>
		<pubDate>Mon, 16 Aug 2010 08:07:33 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2788</guid>
		<description><![CDATA[Vertica is announcing:

Technology it already has 	released*, but has not published any reference architectures 	for
A 	Barney partnership**

In other words, Vertica has succumbed to the common delusion that it&#8217;s a good idea to put out half-baked press releases the week of TDWI conferences. But if we look past that kind of all-too-common nonsense, Vertica is highlighting [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Vertica is announcing:</p>
<ul>
<li>Technology it already has 	released*, but has not published any reference architectures 	for</li>
<li><span style="font-style: normal;">A 	<a href="http://www.strategicmessaging.com/barney-partnerships/2010/08/12/" onclick="javascript:pageTracker._trackPageview('/www.strategicmessaging.com');">Barney</a> partnership**</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">In other words, Vertica has succumbed to the common delusion that it&#8217;s a good idea to put out half-baked press releases the week of TDWI conferences. </span>But if we look past that kind of all-too-common nonsens<span style="font-weight: normal;">e, Vertica is highlighting an interesting technical story, about </span><strong>how the analytic DBMS industry can exploit solid-state memory technology.</strong></p>
<p style="margin-bottom: 0in;"><em>*Upgrades to <a href="../2009/08/04/flexstore-and-the-rest-of-vertica-35/">Vertica FlexStore</a> to handle Flash memory, actually released as part of <a href="../2010/02/22/vertica-4/">Vertica 4.0</a></em></p>
<p style="margin-bottom: 0in;"><em>** With Fusion I/O</em></p>
<p style="margin-bottom: 0in;">To set the context, let&#8217;s recall a few points I&#8217;ve noted in the past:</p>
<ul>
<li><a href="../2010/01/31/flash-pcmsolid-state-memory-disk/">Solid-state 	memory&#8217;s price/throughput tradeoffs obviously make it the future of 	database storage</a>.</li>
<li><a href="../2010/06/25/flash-is-coming-well/">The 	Flash future is coming soon</a>, in part because Flash&#8217;s propensity 	to wear out is overstated. This is especially true in the case of 	modern analytic DBMS, which tend to write to blocks all at once, and 	most particularly the case for append-only systems such as Vertica.</li>
<li><a href="../2010/08/12/teradata-future-product-strategy/">Being 	able to intelligently split databases among various cost tiers of 	storage – e.g. Flash and disk – makes a whole lot of sense</a>.</li>
</ul>
<p style="margin-bottom: 0in;">Taken together, those points tell us:</p>
<p style="margin-bottom: 0in;"><strong>For optimal price/performance, analytic DBMS should support databases that run part on Flash, part on disk.</strong></p>
<p style="margin-bottom: 0in;">While all this is a future for some other analytic DBMS vendors, Vertica is shipping it today.* What&#8217;s more, three aspects of Vertica&#8217;s architecture make it particularly well-suited for hybrid Flash/disk storage, in each case for a similar reason – you can get most of the performance benefit of all-Flash for a relatively low actual investment in Flash chips:  <span id="more-2788"></span></p>
<ul>
<li><strong>Vertica lets you split tables 	by column, </strong><span style="font-weight: normal;">and Vertica 	FlexStore is versatile enough to let you put only the most-used 	columns in Flash. (Vertica offers a figure that 85% of usage calls 	on only 15% of columns, but I don&#8217;t know how rigorously grounded 	those numbers are.)</span></li>
<li>To the extent that Vertica data is<span style="font-weight: normal;"> <a href="../2008/09/24/vertica-finally-spells-out-its-compression-claims/">more </a></span><a href="../2008/09/24/vertica-finally-spells-out-its-compression-claims/">compressed</a> than many of Vertica&#8217;s competitors&#8217; (which it probably is, debates 	over the magnitude of Vertica&#8217;s advantage notwithstanding), the 	total storage-hardware cost of sticking stuff in Flash is less when 	you use Vertica than with other systems.</li>
<li>Vertica has <span style="font-weight: normal;">relatively 	less need for </span><strong>temp space</strong> than some other systems. 	(Vertica uses figures of &lt;20% of total storage, vs. 30%+ for some 	other systems.) If you want to use Flash for temp space, so as to 	accelerate your toughest queries, that can save you some cash …</li>
<li>… and by the way, <strong>temp space 	is an especially good use of Flash, </strong>because <strong>temp space is 	accessed in a less sequential manner than data storage is.</strong></li>
</ul>
<p style="margin-bottom: 0in;">The least obvious of those points are about temp space; I only understood the particulars when Vertica development chief Shilpa Lawande explained them to me Thursday.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><em>* At least in theory; customer adoption may be a different matter.</em></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">But before drilling down on temp space, let me first note that there&#8217;s one offsetting factor to all those “We need somewhat less Flash than the other guys” Vertica advantages. Like all serious databases, a Vertica installation keeps two or more copies of all data, to that there&#8217;s no storage single point of failure. In a flexible system like Vertica, you can put one copy on Flash and one on disk. But if you do that in Vertica, you forgo fully exploiting one possible benefit of Vertica&#8217;s architecture – the ability to store different copies of a column in different orders, which are beneficial for accelerating different groups of queries.*</p>
<p style="margin-bottom: 0in;"><em>*More precisely, you don&#8217;t get the full benefits of Flash acceleration for every query touching those columns.</em></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">OK. Back to temp space. There are four kinds of things you can put in storage if you&#8217;re running a database management system:</p>
<ul>
<li>The <strong>software</strong> itself.</li>
<li><span style="font-weight: normal;">Persistent </span><strong>data. </strong><span style="font-weight: normal;">(I.e., tables, 	if the DBMS you&#8217;re running is relational.)</span></li>
<li><strong>Metadata,</strong> especially the 	kind that lets you find data &#8211;<strong> indexes,</strong> zone maps, catalogs, 	etc.</li>
<li><strong>Temporary data constructs</strong> built as part of, say, a s<span style="font-weight: normal;">ort-merge 	join. These, by definition, are what populate temp space.</span></li>
</ul>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">Just to be clear, those constructs are NOT temporary tables of the sort created by, say, Microstrategy; such tables are handled like any other data. Rather, they are ephemeral creat<span style="font-weight: normal;">ions and, so far as I can tell, not tables at all. </span></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">Vertica offered two theories as to why its DBMS requires less temp space than competitors do:</p>
<ul>
<li>To the extent data is decompressed 	before being operated on in memory by the DBMS, that decompression 	would of course also apply to temp space as well. Vertica prides 	itself on <strong>keeping data compressed</strong> all the way through, and 	seems to get away with smaller temp space allocations as a benefit.</li>
<li>Since Vertica can store columns in 	expedient sort orders, it does less sorting overall, and sorting is 	a big use of temp space.</li>
</ul>
<p style="margin-bottom: 0in;">Obviously, no matter which DBMS you use, the amount of temp space you need is surely workload-dependent. Even so, Vertica&#8217;s claim to something of an advantage seems legit.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><em>Truth be told, I&#8217;m not convinced the savings involved are great enough to </em>matter<em> a whole lot – but it&#8217;s a fun subject to think through. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </em></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">And finally: One of my biggest surprises since starting to look at analytic-DBMS-on-Flash has been the centrality of temp space. Talking to Vertica Thursday, I finally uncovered a key reason why: <strong>Temp space tends to be accessed via multiple streams of data at once.</strong> I&#8217;m still struggling with WHY that is true, with two reasons suggested being:</p>
<ul>
<li>Temp space can be accessed by 	multiple operations at once. (But isn&#8217;t that also true of the rest 	of storage?)</li>
<li>Merge sorts, a common use of temp 	space, read multiple streams of data. (Couldn&#8217;t you tweak your 	software to make that not be true?)</li>
</ul>
<p style="margin-bottom: 0in;">But if we grant that temp space naturally is accessed in multiple places at once – well, that&#8217;s a lot like random I/O, and <a href="../2005/11/13/breaking-the-disk-speed-barrier/">if you&#8217;re doing a lot of random reads, you&#8217;d love to use something other than spinning disk</a>.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Teradata&#8217;s future product strategy</title>
		<link>http://www.dbms2.com/2010/08/12/teradata-future-product-strategy/</link>
		<comments>http://www.dbms2.com/2010/08/12/teradata-future-product-strategy/#comments</comments>
		<pubDate>Thu, 12 Aug 2010 10:37:14 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Kickfire]]></category>
		<category><![CDATA[Microstrategy]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2769</guid>
		<description><![CDATA[I think Teradata&#8217;s future product strategy is coming into focus. I&#8217;ll start by outlining some particular aspects, and then show how I think it all ties together.

The immediate hook here is that I had a short conversation with Scott Gnau of Teradata yesterday, triggered by Teradata&#8217;s acquisition of Kickfire&#8217;s assets. Takeaways from that part included:

The [...]]]></description>
			<content:encoded><![CDATA[<p>I think Teradata&#8217;s future product strategy is coming into focus. I&#8217;ll start by outlining some particular aspects, and then show how I think it all ties together.<br />
<span id="more-2769"></span></p>
<p style="margin-bottom: 0in;">The immediate hook here is that I had a short conversation with Scott Gnau of Teradata yesterday, triggered by <a href="../2010/07/27/kickfire-unlikely-to-survive/">Teradata&#8217;s acquisition of Kickfire&#8217;s assets</a>. Takeaways from that part included:</p>
<ul>
<li>The acquisition is all about 	Kickfire&#8217;s <a href="../2009/08/21/kickfires-fpga-based-technical-strategy/">data 	pipelining</a> technology.</li>
<li>Scott (in my opinion rightly) 	thinks that isn&#8217;t particularly tied to Kickfire&#8217;s choice of 	particular DBMS architecture (fairly vanilla columnar).</li>
<li>No decision has been made about 	whether the right vehicle for this technology is an FPGA (Field 	Programmable Gate Array), conventional Intel CPU, RAM, etc.</li>
</ul>
<p style="margin-bottom: 0in;"><em>If you want to handicap Teradata&#8217;s future data pipelining strategy, you might note that:</em></p>
<ul>
<li><em>Kickfire&#8217;s own choice – and 	hence its existing implementation – is an FPGA.</em></li>
<li><em><a href="../2009/08/04/vectorwise-ingres-and-monetdb/">VectorWise&#8217;s 	approach to pipelining is Intel-based,</a> apparently at the cost of 	being closely tied to specific generations of Intel CPUs.</em></li>
<li><em><a href="../2009/07/27/xtremedata-announces-its-dbx-data-warehouse-appliance/">XtremeData&#8217;s 	approach to pipelining</a> is FPGA-based.</em></li>
<li><em>Teradata has a lot more 	development resources than any of those other companies, as well as 	important existing products, and hence has both means and motive to 	shoehorn new technology into older system designs.</em></li>
</ul>
<p style="margin-bottom: 0in;">While I had Scott on the phone, I brought up a few other subjects too. Highlights included:</p>
<ul>
<li>Teradata&#8217;s Flash-based appliance 	is doing just fine in beta test and customer POCs (Proofs of 	Concept).</li>
<li>Other kinds of Teradata appliance 	are not inconceivable.</li>
<li>Scott thinks <a href="http://www.dbms2.com/2010/07/31/teradata-xkoto-gridscale-rip-and-active-active-clustering/" >Michael McIntire&#8217;s 	condemnation of Active-Active architectures</a> is overstated. That 	said,
<ul>
<li>Scott does acknowledge a need for 	greater Active-Active scalability, and suggests that the reason 	Xkoto&#8217;s current products are being discontinued is their lack of 	scaling.</li>
<li>Scott seems quietly confident the 	scaling will get done.</li>
</ul>
</li>
<li>Scott is emphatic that Teradata is 	not going to go to <a href="../2009/04/20/calpont-update-you-read-it-here-first/">a 	two-tier architecture</a>. In particular, the point of splitting 	storage/lightweight database processing and heavyweight database 	processing on separate tiers is generally to save bandwidth, and 	Teradata&#8217;s BYNET is typically less than 10% loaded.</li>
<li>Scott didn&#8217;t dispute my claim that 	this all suggests <a href="../2008/10/14/teradata-virtual-storage/">Teradata 	Virtual Storage</a> is the future, at the expense of a rigid 	delineation among <a href="../2008/10/23/teradata-appliance-product-lines/">specific 	use-case-focused product lines</a>.</li>
<li>Unlike <a href="http://www.dbms2.com/2010/02/22/netezza-twinfin/" >Netezza</a> or <a href="http://www.dbms2.com/2010/02/22/aster-data-ncluster-4-5/" >Aster</a>, Teradata doesn&#8217;t seem to plan analytic capability that works outside 	the UDF (User Defined Function) framework. However, Scott noted that 	Teradata has long had the capability that Aster and Netezza now also 	have of letting you run analytic code either in “protected mode” 	(if the process fails the whole database doesn&#8217;t crash) or in the 	database kernel (best performance, if you&#8217;re sufficiently confident 	in the code&#8217;s stability to take the risk). Scott also spoke of the 	release later this quarter of Teradata FastPath, which will offer 	yet better performance (however, there&#8217;s a gotcha to Teradata 	FastPath that&#8217;s still NDA).</li>
</ul>
<p style="margin-bottom: 0in;">Putting all that together with the rest of what we know about Teradata, I&#8217;m going to call out<strong> three pillars of Teradata&#8217;s long-term product strategy:</strong></p>
<ul>
<li><strong>Same fundamentals as always.</strong> Teradata&#8217;s core product strategy is:
<ul>
<li>Single DBMS, capable of meeting 	all analytic needs while running in a single instance, usually 	running on &#8230;</li>
<li>… proprietary hardware …</li>
<li>… built from 	conservatively-chosen parts.</li>
</ul>
</li>
<li><strong>Selective vertical application 	stack.</strong> No matter how horizontally-oriented they are, many 	companies that have been in the analytic technology business for a 	while wind up with some vertical applications. It sort of just 	happens. Teradata is no exception. Teradata also likes to sell 	services to its product customers, and some of those are quite 	vertical-aware.</li>
<li><strong>Mutable, modular platform.</strong> This is what I highlighted above. Note that it&#8217;s philosophically 	attuned with the one-system-does-everything approach Teradata 	prefers. More subtly, please also note that it goes well with 	customer-by-customer price customization, which is almost a must for 	Teradata given the Innovator&#8217;s Dilemma kind of pricing box it finds 	itself in.</li>
</ul>
<p style="margin-bottom: 0in;">So far, that&#8217;s not too exciting, except in the details of how Teradata&#8217;s engineers make that all work. But there&#8217;s a <strong>fourth pillar to Teradata&#8217;s technical strategy</strong> as well, and it&#8217;s a wild card: t<strong>ight partnerships.</strong> Every time I talk with Teradata hardware chief Carson Schmidt, he seems excited about some particular version of a part or other – sometimes from a reasonably established vendor (once it was LSI Logic), sometimes from a tiny one (notably <a href="../2009/10/25/teradata-hardware-strategy-and-tactics/">the “stealth” start-up on which Teradata bet its first solid-state product</a>.) In the future, I expect tight business intelligence partnerships as well. Cognos BI will be increasingly integrated with IBM&#8217;s DBMS and hardware; Business Objects&#8217; BI will increasingly be integrated with SAP&#8217;s applications; and Oracle&#8217;s BI will eventually be integrated with everything. How do you compete with that if you<span style="font-style: normal;">&#8216;re Microstrategy? </span>Well, you try to have superior product, of course – but you also partner as closely with DBMS vendors as you can, an approach Microstrategy has already started. Predictive analytics stalwart <a href="http://www.dbms2.com/2010/05/15/further-clarifying-in-database-mpp-sas/" >SAS</a>, of course, is on a partnership binge as well.</p>
<p style="margin-bottom: 0in;">Teradata has a larger installed base than almost all its competitors, and enjoys richer third-party software and service support as a result. But I suspect that going forward,  for Teradata to remain a leading competitor at price points it is willing to accept, Teradata&#8217;s “ecosystem” advantages will need to ratchet up one or several notches.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/12/teradata-future-product-strategy/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Teradata, Xkoto Gridscale (RIP), and active-active clustering</title>
		<link>http://www.dbms2.com/2010/07/31/teradata-xkoto-gridscale-rip-and-active-active-clustering/</link>
		<comments>http://www.dbms2.com/2010/07/31/teradata-xkoto-gridscale-rip-and-active-active-clustering/#comments</comments>
		<pubDate>Sat, 31 Jul 2010 08:23:57 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Continuent]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Xkoto]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2708</guid>
		<description><![CDATA[Having gotten a number of questions about Teradata&#8217;s acquisition of Xkoto, I leaned on Teradata for an update, and eventually connected with Scott Gnau. Takeaways included:

Teradata is discontinuing  Xkoto&#8217;s existing product Gridscale, which 	Scott characterized as being too OLTP-focused to be a good fit for 	Teradata. Teradata hopes and expects that existing Xkoto Gridscale [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Having gotten a number of questions about Teradata&#8217;s acquisition of Xkoto, I leaned on Teradata for an update, and eventually connected with Scott Gnau. Takeaways included:</p>
<ul>
<li>Teradata is discontinuing <a href="http://www.dbms2.com/2009/09/11/xkoto-gridscale-highlights/" > </a><a href="http://www.dbms2.com/2009/09/11/xkoto-gridscale-highlights/" >Xkoto&#8217;s existing product Gridscale</a>, <span style="font-style: normal;">which 	Scott characterized as being too OLTP-focused to be a good fit for 	Teradata. Teradata hopes and expects that existing Xkoto Gridscale 	customers won&#8217;t renew maintenance. (I&#8217;m not sure</span> that they&#8217;ll 	even get the option to do so.)</li>
<li>The point of Teradata&#8217;s technology 	+ engineers acquisition of Xkoto is to enhance Teradata&#8217;s 	active-active or multi-active data warehousing capabilities, which 	it has had in some form for several years.</li>
<li>In particular, Teradata wants to 	tie together different products in the Teradata product line. (Note: 	Those typically all run pretty much the same Teradata database 	management software, except insofar as they might be on different 	releases.)</li>
<li>Scott rattled off all the 	plausible areas of enhancement, with multiple phrasings – 	performance, manageability, ease of use, tools, features, etc.</li>
<li>Teradata plans to have one or two 	releases based on Xkoto technology in 2011.</li>
</ul>
<p style="margin-bottom: 0in;">Frankly, I&#8217;m disappointed at the struggles of clustering efforts such as Xkoto Gridscale or <a href="http://www.dbms2.com/2009/09/03/continuent-on-clustering/" >Continuent&#8217;s pre-Tungsten products</a>, but if the DBMS vendors meet the same needs themselves, that&#8217;s OK too.</p>
<p style="margin-bottom: 0in;">The logic behind active-active database implementations actually seems pretty compelling:  <span id="more-2708"></span></p>
<ul>
<li>You may well be keeping a second 	copy of your database for high availability/hot standby.</li>
<li>You might even be keeping a third 	copy for off-site disaster recovery.</li>
<li>In some cases, you might have 	reasons beyond disaster recovery to distribute a database around the 	world.</li>
<li>So why not allow queries to be run 	against all the copies?</li>
<li>And by the way, splitting the 	workload up a bit by kinds (e.g., long-running vs. short query) 	might let you optimize the implementation of each copy of the 	database. (This last point becomes even more important with the rise 	of solid-state memory.)</li>
</ul>
<p style="margin-bottom: 0in;">Analytic DBMS vendors pretty much all need to offer this. (Possible exception: If they have a data-mart-only positioning so extreme that customers will never care about any form of failover.) That said, I must confess to not having done a good job of tracking who does or doesn&#8217;t have which features in this area to date; informative comments to this post in that regard would be much appreciated!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/07/31/teradata-xkoto-gridscale-rip-and-active-active-clustering/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Why analytic DBMS increasingly need to be storage-aware</title>
		<link>http://www.dbms2.com/2010/07/07/analytic-database-storage-aware/</link>
		<comments>http://www.dbms2.com/2010/07/07/analytic-database-storage-aware/#comments</comments>
		<pubDate>Wed, 07 Jul 2010 06:30:14 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2515</guid>
		<description><![CDATA[In my quick reactions to the EMC/Greenplum announcement, I opined

I think that even software-only analytic DBMS vendors should design their systems in an increasingly storage-aware manner

promising to explain what I meant later on. So here goes.  
There always have been good technical reasons to tailor hardware to analytic database software. Data moves through disk controller, [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><span style="font-style: normal;">In <a href="../2010/07/06/emc-is-buying-greenplum/">my quick reactions to the EMC/Greenplum announcement</a>, I opined</span></p>
<blockquote>
<p style="margin-bottom: 0in;">I think that even software-only analytic DBMS vendors should design their systems in an increasingly storage-aware manner</p>
</blockquote>
<p style="margin-bottom: 0in;">promising to explain what I meant later on. So here goes.  <span id="more-2515"></span></p>
<p style="margin-bottom: 0in;">There always have been good technical reasons to tailor hardware to analytic database software. Data moves through disk controller, network, RAM, CPU and more, each with its own data rate. Getting different kinds of parts into the right bal<span style="font-style: normal;">ance doesn&#8217;t completely eliminate bottlenecks – the <a href="../2010/07/06/the-one-hoss-shay/">Wonderful One-Hoss Shay</a> is poetic fiction </span>– but it certainly can help. As a result, every analytic DBMS vendor of any size offers at least one of:</p>
<ul>
<li><a href="../2007/01/27/data-warehouse-appliance-hardware-strategies/">A 	Type 0 appliance</a></li>
<li>A Type 1 appliance</li>
<li>A “recommended hardware 	configuration”</li>
</ul>
<p style="margin-bottom: 0in;">And beyond performance, appliances and pre-specified hardware configurations offer at least the possibility of easing installation, administration, and support.</p>
<p style="margin-bottom: 0in;">There also are marketing reasons to offer an appliance or something appliance-like.</p>
<ul>
<li>To various extents, Oracle, 	Teradata, Microsoft, IBM, Netezza, and EMC are all telling the world 	that your hardware should be optimized for your analytic DBMS.</li>
<li>Smaller vendors such as Vertica 	and Aster Data also tend to cobble together some sort of appliance, 	in part so they don&#8217;t have to say they disagree.</li>
<li>Thus, a “We don&#8217;t see any point 	in special hardware assembly at all” story would leave an analytic 	DBMS vendor pretty far out on a limb.</li>
</ul>
<p style="margin-bottom: 0in;">Finally, there are three overlapping technical trends that increase the need for storage-awareness in analytic DBMS. First and foremost is the rise of <strong>solid-state memory.</strong> For starters, I believe:</p>
<ul>
<li><a href="../2010/06/25/flash-is-coming-well/">Flash 	will be important for analytic DBMS soon</a>.</li>
<li>There are good technical reasons 	for this.</li>
<li><a href="../2010/01/22/oracle-database-hardware-strategy/">Oracle&#8217;s 	marketing will make a big deal out of the Flash aspects of Exadata</a>, 	so other analytic DBMS vendors will need a response. And of course, 	if Netezza or Teradata preemptively make a big deal of their 	Flash-based offerings, that just adds to the pressure for Flash 	adoption on everybody else.</li>
<li>But it&#8217;s not just Flash – <a href="../2010/01/31/flash-pcmsolid-state-memory-disk/">Flash, 	other solid-state memory, and disk</a> will be combined in various 	ways.</li>
</ul>
<p style="margin-bottom: 0in;">But this move to Flash will require analytic DBMS vendors to be increasingly storage-aware for at least three reasons:</p>
<ul>
<li>It just adds another level of 	<strong>complexity</strong> to their hardware-balancing challenges.</li>
<li><strong>Flash overturns some of the 	fundamental assumptions of modern analytic DBMS,</strong> in particular:
<ul>
<li><a href="../2006/09/19/is-data-warehousing-now-all-about-sequential-access/">Sequential 	reads are hugely better than random</a></li>
<li>The worst bottleneck is at the 	point where data comes out of storage.</li>
</ul>
</li>
<li><strong>The Flash technology stack is 	still immature,</strong> and you have to pick your poison in how to deal 	with it. Vendors are making very different choices in this regard – 	and they do have to choose.</li>
</ul>
<p style="margin-bottom: 0in;">Another trend that could naturally lead analytic DBMS vendors to be more storage-aware is their incorporation of what could be viewed as hierarchical storage/ILM technologies.  Different data is stored in different ways and/or on different kinds of storage hardware. (Vendors pursuing – you guessed it – different approache<span style="font-style: normal;">s to this include <a href="../2009/08/04/2008/10/14/teradata-virtual-storage/">Teradata</a>, <a href="../2009/10/14/greenplum-hybrid-columnar/">Greenplum</a>, <a href="../2009/08/04/flexstore-and-the-rest-of-vertica-35/">Vertica</a>, and <a href="../2009/08/25/sybase-iq-technical-highlights/">Sybase</a>.) The m</span>ore automatic that process is, the more storage-aware the DBMS will need to be.</p>
<p style="margin-bottom: 0in;">Finally, there are reasons to th<span style="font-style: normal;">ink that <a href="../2008/09/06/sans-vs-das-in-mpp-data-warehousing/">DBMS should be split between conventional servers and smart storage</a>. This is, of course, the E</span>xadata strategy. <a href="../2010/06/21/netezza-silicon-balance/">Netezza&#8217;s two-processor approach</a>, while rather different, also somewhat validates the idea.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/07/07/analytic-database-storage-aware/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Flash is coming, well &#8230;</title>
		<link>http://www.dbms2.com/2010/06/25/flash-is-coming-well/</link>
		<comments>http://www.dbms2.com/2010/06/25/flash-is-coming-well/#comments</comments>
		<pubDate>Fri, 25 Jun 2010 16:42:26 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data integration and middleware]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2389</guid>
		<description><![CDATA[I really, really wanted to title this post &#8220;Flash is coming in a flash.&#8221; That seems a little exaggerated &#8212; but only a little.

Netezza now intends to come out with a Flash-based appliance earlier than it originally expected.
Indeed, Netezza has suspended &#8212; by which I mean &#8220;scrapped&#8221; &#8212; prior plans for a RAM-heavy disk-based appliance. [...]]]></description>
			<content:encoded><![CDATA[<p>I really, really wanted to title this post &#8220;Flash is coming in a flash.&#8221; That seems a little exaggerated &#8212; but only a little.</p>
<ul>
<li>Netezza now intends to come out with a Flash-based appliance earlier than it originally expected.</li>
<li>Indeed, Netezza has suspended &#8212; by which I mean &#8220;scrapped&#8221; &#8212; prior plans for a RAM-heavy disk-based appliance. It will use a RAM/Flash combo instead.*</li>
<li>Tim Vincent of IBM told me that customers seem ready to adopt solid-state memory. One interesting comment he made is that Flash isn&#8217;t really all that much more expensive than high-end storage area networks.</li>
</ul>
<p>Uptake of solid-state memory (i.e. Flash) for analytic database processing will probably stay pretty low in 2010, but in 2011 it should be a notable (b)leading-edge technology, and it should get mainstreamed pretty quickly after that.  <span id="more-2389"></span></p>
<p><em>*So far as I can tell, that&#8217;s one of the two significant roadmap changes between the 2009 and 2010 editions of <a href="http://www.dbms2.com/2010/06/23/my-talk-this-morning/" >Enzee Universe</a>. The other one is that </em><em>the robust form of</em><em> appliance-to-appliance replication technology is coming out later than Netezza had originally planned and hoped.</em></p>
<p>There also is increasing reason to think that the issues with Flash memory wearing out are overwrought.  And by the way, the entire history of enterprise solid-state memory use is basically shorter than the time in which these products supposedly will wear out, so it&#8217;s not as if there have been a lot of real-life failures out there.)</p>
<ul>
<li>First, clever things are being done in the area of error correction codes, although for the most part I defer that part of the discussion to Petascan&#8217;s Camuel Gilyadov. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  E.g., this seems to be the idea behind Anobit.</li>
<li>Second, analytic DBMS are pretty much an ideal use case for Flash reliability. Suppose, as is the case for many products and implementations, you only write things in big blocks. Then you are, ipso facto, resetting the Flash bits only in big blocks. Thus, at least in theory, you automatically have pretty perfect wear leveling.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/25/flash-is-coming-well/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>VoltDB finally launches</title>
		<link>http://www.dbms2.com/2010/05/25/voltdb-finally-launches/</link>
		<comments>http://www.dbms2.com/2010/05/25/voltdb-finally-launches/#comments</comments>
		<pubDate>Tue, 25 May 2010 07:15:04 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Games and virtual worlds]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[VoltDB and H-Store]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2201</guid>
		<description><![CDATA[VoltDB is finally launching today. As is common for companies in sectors I write about, VoltDB &#8212; or just &#8220;Volt&#8221; &#8212; has discovered the virtues of embargoes that end 12:01 am. Let&#8217;s go straight to the technical highlights:

VoltDB is based on the H-Store technology, which I wrote about in February, 2009. Most of what I [...]]]></description>
			<content:encoded><![CDATA[<p>VoltDB is finally launching today. As is common for companies in sectors I write about, VoltDB &#8212; or just &#8220;Volt&#8221; &#8212; has discovered the virtues of embargoes that end 12:01 am. Let&#8217;s go straight to the technical highlights:</p>
<ul>
<li>VoltDB is based on the <a href="http://www.dbms2.com/2008/02/19/h-store-architecture/" >H-Store</a> technology, which I wrote about in February, 2009. Most of what I said about H-Store then applies to VoltDB today.</li>
<li>VoltDB is a no-apologies ACID relational DBMS, which runs entirely in RAM.</li>
<li>VoltDB has rather limited SQL. (One example: VoltDB can&#8217;t do SUMs in SQL.) However, VoltDB guy Tim Callaghan (Mark Callaghan&#8217;s lesser-known but nonetheless smart brother) asserts that if you code up the missing functionality, it&#8217;s almost as fast as if it were present in the DBMS to begin with, because there&#8217;s no added I/O from the handoff between the DBMS and the procedural code. (The data&#8217;s in RAM one way or the other.)</li>
<li>VoltDB&#8217;s Big Conceptual Performance Story is that it does away with most locks, latches, logs, etc., and also most context switching.</li>
<li>In particular, you&#8217;re supposed to partition your data and architect your application so that most transactions execute on a single core. When you can do that, you get VoltDB&#8217;s performance benefits. To the extent you can&#8217;t, you&#8217;re in two-phase-commit performance land. (More precisely, you&#8217;re doing 2PC for multi-core writes, which is surely a major reason that multi-core reads are a lot faster in VoltDB than multi-core writes.)</li>
<li>VoltDB has a little less than one DBMS thread per core. When the data partitioning works as it should, you execute a complete transaction in that single thread. Poof. No context switching.</li>
<li>A transaction in VoltDB is a Java stored procedure. (The early idea of Ruby on Rails in lieu of the Java/SQL combo didn&#8217;t hold up performance-wise.)</li>
<li>Solid-state memory is not a viable alternative to RAM for VoltDB. Too slow.</li>
<li>Instead, VoltDB lets you snapshot data to disk at tunable intervals. &#8220;Continuous&#8221; is one of the options, wherein a new snapshot starts being made as soon as the last one completes.</li>
<li>In addition, VoltDB will also spool a kind of transaction log to the target of your choice. (Obvious choice: An analytic DBMS such as Vertica, but there&#8217;s no such connectivity partnership actually in place at this time.)</li>
</ul>
<p><span id="more-2201"></span>I should also note that when Tim Callaghan described architectural options to get around 2PC performance issues, they sounded a lot like eventual consistency. Maybe tunable <a href="http://www.dbms2.com/2010/05/01/ryw-read-your-writes-consistency/" >RYW consistency</a> isn&#8217;t in the cards, but at least there&#8217;s a NoSQL-like possibility with VoltDB.</p>
<p>VoltDB&#8217;s open source strategy is:</p>
<ul>
<li>VoltDB will be open sourced.</li>
<li>Community VoltDB will be GPLed. Professional Edition VoltDB has a non-GPL license.</li>
<li>The VoltDB Professional Edition won&#8217;t start out with features beyond the Community Edition ones, but will gain such later on. I didn&#8217;t get the sense the plans for those features were completely baked yet, but ideas mentioned included:
<ul>
<li>Management/monitoring tools.</li>
<li>Integration with expense closed-source enterprise software products, such as ones in the management/monitoring area.</li>
<li>Yet more &#8220;extreme&#8221;/edge-case performance.</li>
</ul>
</li>
<li>Before VoltDB decided for sure that it wasn&#8217;t selling licenses, it sold a license to Getco, which also seems to be an investor in the company.</li>
</ul>
<p>VoltDB had a beta test with about 150 participants. None is in production yet, although at least a few are clearly headed there. Most VoltDB beta testers are in some kind of online business, with a particular concentration in everybody&#8217;s new favorite market, online gaming. Most of the rest are in investment/trading &#8212; a major target market for at least three different Mike Stonebraker companies &#8212; and a few are in telecom. VoltDB assures me that some of the beta users are companies one actually has heard of before, but VoltDB is not in a position to name any of those.</p>
<p>VoltDB is not ideally suited for a classic order management system, since you&#8217;d want to partition both on CustomerID and SKU, the latter because you&#8217;d constantly updating inventory stock levels. However, this argument doesn&#8217;t apply in the case of virtual goods. Virtual goods that are sold for real money &#8212; and hence need ACID levels of transaction integrity &#8212; are thus a clear target market for VoltDB. (The example that came up was in, you guessed it, online gaming.) The other interesting use case that Tim highlighted was low-latency analytics/ELT. For reasons I didn&#8217;t totally grasp, Tim likes to call this &#8220;Stateful ELT.&#8221; (Given that the data goes into the VoltDB database before much else happens to it, I&#8217;m pretty sure I heard &#8220;ELT&#8221; correctly. But I guess I might have been mishearing &#8220;ETL&#8221;.)</p>
<p>VoltDB company highlights include:</p>
<ul>
<li>VoltDB has about a dozen employees, all but two of whom are technical. (However, I&#8217;m not sure they&#8217;re counting Andy Ellicott against the two. But then, last I heard he wasn&#8217;t full time at VoltDB.)</li>
<li>VoltDB&#8217;s venture funding status is, if I may paraphrase, &#8220;Mumble mumble.&#8221;</li>
<li>Although long separate from Vertica, VoltDB is still located in Vertica&#8217;s offices.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/25/voltdb-finally-launches/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>The Clustrix story</title>
		<link>http://www.dbms2.com/2010/05/12/the-clustrix-story/</link>
		<comments>http://www.dbms2.com/2010/05/12/the-clustrix-story/#comments</comments>
		<pubDate>Wed, 12 May 2010 08:53:48 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Application areas]]></category>
		<category><![CDATA[Clustrix]]></category>
		<category><![CDATA[Emulation, transparency, portability]]></category>
		<category><![CDATA[Games and virtual worlds]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Solid-state memory]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2096</guid>
		<description><![CDATA[After my recent post, the Clustrix guys raised their hands and briefed me. Takeaways included:    

Nothing in my 	original short post about Clustrix was actually incorrect.
Clustrix plans to reveal actual 	production “name-brand” customers soon.
The name of Clustrix&#8217;s software, 	or at least the guts thereof, is Sierra.
Clustrix&#8217;s products have actually 	been in general availability since last [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">After my recent post, the Clustrix guys raised their hands and briefed me. Takeaways included:    <span id="more-2096"></span></p>
<ul>
<li>Nothing in <a href="../2010/05/04/clustrix-may-be-doing-something-interesting/">my 	original short post about Clustrix</a> was actually incorrect.</li>
<li>Clustrix plans to reveal actual 	production “name-brand” customers soon.</li>
<li>The name of Clustrix&#8217;s software, 	or at least the guts thereof, is Sierra.</li>
<li>Clustrix&#8217;s products have actually 	been in general availability since last quarter, with some versions 	at customer sites for 2 years. Development started 3 ½ years ago.</li>
<li>Clustrix says its technology is 	for OLTP systems, which it calls “non-batch/non-analytic,” with 	mixed read/write workloads. All Clustrix&#8217;s example target markets 	are “internet verticals,” such as photo sharing, gaming, social 	media, e-commerce, etc.</li>
<li>Clustrix&#8217;s heart is in SQL, as is 	most of its customer base. Clustrix Sierra&#8217;s key-value-store option 	has little or no performance advantage over Clustrix Sierra&#8217;s SQL 	option, nor any other advantage over SQL that came up in discussion.</li>
<li>Clustrix Sierra is 	“wire-compatible” with MySQL, but doesn&#8217;t use MySQL code; 	Clustrix wrote all the code itself.</li>
<li>Clustrix asserts that Clustrix 	Sierra supports the “vast majority” of MySQL features. Examples 	of MySQL features Clustrix doesn&#8217;t support at this time are 	full-text search and geospatial indexing.</li>
<li>Indeed, Clustrix claims Clustrix 	Sierra can be used to replace MySQL with few or zero changes to 	existing applications.</li>
<li>I specifically asked about 	referential integrity, which has a poor performance reputation in 	MySQL. Besides saying they supported it, Clustrix said that some 	customers actually use referential integrity in some of their less 	active tables.</li>
<li>Clustrix Sierra is fully 	ACID-compliant, with no eventual consistency or <a href="http://www.dbms2.com/2010/05/01/ryw-read-your-writes-consistency/" >RYW consistency</a> story. The default number of copies of each datum is two, and 	they&#8217;re kept consistent via two-phase commit.</li>
<li>Clustrix Sierra is fully parallel, 	with no “head” node. I forgot to ask how it was determined which 	queries would be addressed to and/or controlled by which nodes, but 	I presume there&#8217;s some sort of a load-balancing scheme.</li>
<li>Clustrix says that because 	Clustrix Sierra uses MVCC (Multi-Version Concurrency Control), and 	thus reads and writes don&#8217;t block each other, global locks aren&#8217;t a 	major issue. (They&#8217;re rare or short or something – I have trouble 	seeing why they would be non-existent.)</li>
<li>Clustrix says there&#8217;s a second 	class of locks and latches that are purely local and short-lived, 	for B-tree indexes and the like. (I didn&#8217;t drill down into those 	either.) I guess this means Clustrix Sierra is B-tree-centric, which 	makes sense for an OLTP-oriented system.</li>
<li>Clustrix Sierra distributes data 	among nodes via consistent hashing (default), range partitioning, or 	“full distribution”(i.e., coping a – presumably small – 	table to each node). The choice of distribution plans is manual now; 	more automation is a future feature.</li>
<li>Clustrix Sierra&#8217;s CBO (Cost-Based 	Optimizer) is, as one would hope, distribution-aware.</li>
<li>Clustrix Sierra compiles query 	fragments and ships them off to the relevant nodes. A fragment might 	contain both instructions for SQL to be executed locally and for 	where data is to be sent next.</li>
<li>Clustrix says that Clustrix Sierra 	does data migration and redistribution (e.g., when you add a node) 	transparently online, and further says that in practice this doesn&#8217;t 	cause a performance hit.</li>
<li>As for Clustrix hardware:
<ul>
<li>Clustrix makes <a href="http://www.monashreport.com/2007/01/29/computing-appliances-trends/" onclick="javascript:pageTracker._trackPageview('/www.monashreport.com');">Type 	I appliances</a>.</li>
<li>A Clustrix node contains 2 	quad-core chips, 32 gigs of RAM, and 7 160 GB solid-state drives.</li>
<li>Specifically, Clustrix is using 	Intel SSDs, with a SAS interface.</li>
<li>Clustrix says solid-state memory 	isn&#8217;t really essential to the product design; it&#8217;s just cheap in 	terms of $/IOPS (I/O Per Second).</li>
</ul>
</li>
<li>A minimum Clustrix configuration 	is 3 nodes, for redundancy. After that you can add nodes one at a 	time. Clustrix says it built a 20-node system in-house, leading me 	to suspect that customers don&#8217;t have anything bigger than 20 nodes 	either.</li>
<li>That 20-node Clustrix system was 	tested to show near-linear scalability. (In discussing this, 	Clustrix tends to forget to use the word “near”.)</li>
<li>Clustrix has partnered with 	somebody to provide global 4-hour-response support. As of now 	Clustrix seems to be active mainly in North America and Europe.</li>
<li>Clustrix is formed from the 	combination of two startups, which I&#8217;ve heard elsewhere were called 	Clustrix and Sprout. Exactly when the combination happened sounds a 	little different depending on who&#8217;s telling the story (one version 	has the predecessors still being separate well into 2008, but 	Clustrix implies the combination happened pretty much on Day 1).</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/12/the-clustrix-story/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Clustrix may be doing something interesting</title>
		<link>http://www.dbms2.com/2010/05/04/clustrix-may-be-doing-something-interesting/</link>
		<comments>http://www.dbms2.com/2010/05/04/clustrix-may-be-doing-something-interesting/#comments</comments>
		<pubDate>Wed, 05 May 2010 00:18:55 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Clustrix]]></category>
		<category><![CDATA[DBMS product categories]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2053</guid>
		<description><![CDATA[Clustrix launched without briefing me or, at least so far as I can tell, anybody else who knows much about database technology. But Clustrix did post a somewhat crunchy, no-registration-required, white paper. Based on that, I get the impression:

Clustrix is making OLTP DBMS.
The core problem Clustrix tries to solve is scale-out, without necessarily giving up [...]]]></description>
			<content:encoded><![CDATA[<p>Clustrix launched without briefing me or, at least so far as I can tell, anybody else who knows much about database technology. But Clustrix did post a somewhat crunchy, no-registration-required, <a href="http://www.clustrix.com/wp-content/uploads/2010/04/clustrix-whitepaper-01-no-on-sql-mysql-object-key-value-store-database-scaling.pdf" onclick="javascript:pageTracker._trackPageview('/www.clustrix.com');">white paper</a>. Based on that, I get the impression:</p>
<ul>
<li>Clustrix is making OLTP DBMS.</li>
<li>The core problem Clustrix tries to solve is scale-out, without necessarily giving up SQL. (I couldn&#8217;t immediately tell whether Clustrix supports NoSQL-style key-value interfaces enthusiastically, grudgingly, or not at all.)</li>
<li>Unlike <a href="http://www.dbms2.com/2010/04/03/akiban-highlights/" >Akiban</a> or <a href="http://www.dbms2.com/2009/06/22/h-store-horizontica-voltdb/" >VoltDB</a>, Clustrix makes database appliances. The Clustrix software seems to assume a Clustrix appliance.</li>
<li>A key feature of Clustrix&#8217;s database appliances is that they rely on solid-state memory. I&#8217;m guessing that Clustrix appliances don&#8217;t even have disks, or that if they do the disks store some software or something, not actual data. (As <a href="http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/" >previously noted</a>, I agree with Oracle in thinking that <a href="http://www.dbms2.com/2010/04/05/oltp-database-management-systems-2/" >much of the progress in database technology this decade will come from proper design for solid-state memory</a>.)</li>
<li>Clustrix talks of things that sound like compiled queries and attempts to avoid locks. However, it doesn&#8217;t sound as extreme in these regards as VoltDB.</li>
<li>Clustrix also talks of things that sound like consistent hashing.</li>
<li>The brand name &#8220;Sierra&#8221; also shows up along with the brand name &#8220;Clustrix.&#8221;<em><br />
</em></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/04/clustrix-may-be-doing-something-interesting/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Thoughts on IBM&#8217;s anti-Oracle announcements</title>
		<link>http://www.dbms2.com/2010/04/07/ibm-anti-oracle-announcements/</link>
		<comments>http://www.dbms2.com/2010/04/07/ibm-anti-oracle-announcements/#comments</comments>
		<pubDate>Wed, 07 Apr 2010 15:28:15 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Solid-state memory]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1854</guid>
		<description><![CDATA[IBM is putting out a couple of press releases today that are obviously directed competitively at Oracle/Sun, and more specifically at Oracle&#8217;s Exadata-centric strategy. I haven&#8217;t been briefed, so I just have those to go on.
On the whole, the releases look pretty lame. Highlights seem to include:

Maybe a claim of enhanced data compression.
Otherwise, no obvious [...]]]></description>
			<content:encoded><![CDATA[<p>IBM is putting out a couple of press releases today that are obviously directed competitively at Oracle/Sun, and more specifically at Oracle&#8217;s <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" >Exadata-centric strateg</a>y. I haven&#8217;t been briefed, so I just have those to go on.</p>
<p>On the whole, the releases look pretty lame. Highlights seem to include:</p>
<ul>
<li>Maybe a claim of enhanced data compression.</li>
<li>Otherwise, no obvious new technology except product packaging and bundling.</li>
<li>Aggressive plans to throw capital at the Sun channel to convert it to selling IBM gear. (A figure of $1/2 billion is mentioned, for financing.</li>
</ul>
<p>Disappointingly, IBM shows a lot of confusion between:</p>
<ul>
<li>Text data</li>
<li>Machine-generated data such as that from sensors</li>
</ul>
<p>While both highly important, those are <a href="http://www.dbms2.com/2010/01/17/three-broad-categories-of-data/" >very different things</a>. IBM has not in the past shown much impressive technology in either of those two areas, and based on these releases, I presume that trend is continuing.</p>
<p><em>Edits: </em></p>
<p><em>I see from press coverage that at least one new IBM model has some Fusion I/O solid-state memory boards in it. Makes sense.</em></p>
<p><em>A Twitter hashtag has a number of observations from the event. Not much substance I could detect except various kind of <a href="http://twitter.com/#search?q=%23ibmsmartsys" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Oracle bashing</a>.<br />
</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/04/07/ibm-anti-oracle-announcements/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>Notes on the evolution of OLTP database management systems</title>
		<link>http://www.dbms2.com/2010/04/05/oltp-database-management-systems-2/</link>
		<comments>http://www.dbms2.com/2010/04/05/oltp-database-management-systems-2/#comments</comments>
		<pubDate>Mon, 05 Apr 2010 08:22:03 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Akiban]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EnterpriseDB and Postgres Plus]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Mid-range]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[VoltDB and H-Store]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1841</guid>
		<description><![CDATA[The past few years have seen a spate of startups in the analytic DBMS business. Netezza, Vertica, Greenplum, Aster Data and others are all reasonably prosperous, alongside older specialty product vendors Teradata and Sybase (the Sybase IQ part).  OLTP (OnLine Transaction Processing) and general purpose DBMS startups, however, have not yet done as well, with [...]]]></description>
			<content:encoded><![CDATA[<p>The past few years have seen a spate of startups in the analytic DBMS business. Netezza, Vertica, Greenplum, Aster Data and others are all reasonably prosperous, alongside older specialty product vendors Teradata and Sybase (the Sybase IQ part).  OLTP <span style="font-weight: normal;">(OnLine Transaction Processing) </span>and general purpose DBMS startups, however, have not yet done as well, with such success as there has been (MySQL, Intersystems Cache&#8217;, solidDB&#8217;s exit, etc.) generally accruing to products that originated in the 20th Century.</p>
<p>Nonetheless, OLTP/general-purpose data management startup activity has recently picked up, targeting what I see as some very real opportunities and needs. So as a jumping-off point for further writing, I thought it might be interesting to collect a few observations about the market in one place.  These include:</p>
<ul>
<li><span style="font-weight: normal;">Big-brand 	OLTP/general-purpose DBMS have more “stickiness” 	than analytic DBMS.</span></li>
<li><span style="font-weight: normal;">By 	number, most of an enterprise&#8217;s OLTP/general-purpose databases are low-volume and 	low-value. </span></li>
<li>Most 	interesting new OLTP/general-purpose data management products are <span style="font-style: normal;">either 	MySQL-based or NoSQL.</span></li>
<li>It&#8217;s not yet 	clear whether MySQL will prevail over MySQL forks, or vice-versa, or 	whether they will co-exist.</li>
<li>The era of 	silicon-centric relational DBMS is coming.</li>
<li>The emphasis 	on scale-out and reducing the cost of joins spans the NoSQL and 	SQL-based worlds.<em> </em></li>
<li><span style="font-weight: normal;">Users&#8217; 	instance on “free” could be a major problem for OLTP DBMS 	innovation. </span></li>
</ul>
<p style="margin-bottom: 0in;">I shall explain.<span id="more-1841"></span></p>
<p style="margin-bottom: 0in;"><strong>Big-brand OLTP/general-purpose DBMS have more “stickiness” than analytic DBMS.</strong></p>
<ul>
<li>OLTP 	applications are more complex than analytic ones, and hence more 	tightly wired into particular brands of DBMS. For example, 	third-party packaged OLTP applications are typically portable among 	only a few brands of DBMS. But third-party business intelligence 	tools, and the BI “applications” built in them, are more easily 	and widely portable.</li>
<li>Specific technical observations 	such as “OLTP apps tend to use stored procedures, which are 	DBMS-specific” or “OLTP apps tend to have lots and lots of 	tables” serve to underscore the first point.</li>
<li>An enterprise&#8217;s highest-value data 	is commonly the financial stuff handled by its core OLTP systems, so 	those are the last things they want to mess around with just to get 	some cost savings. Security, high availability, and so on are major 	considerations that can outweigh cost.</li>
</ul>
<p style="margin-bottom: 0in;"><strong>By number, most of an enterprise&#8217;s OLTP/general-purpose databases are low-volume and low-value. </strong>Indeed, “OLTP” is often a misnomer, which is why I tend to go with “general-purpose” or some similarly wishy-washy phrase instead.</p>
<ul>
<li>In theory, this is a ripe area for 	what I&#8217;ve called <a href="http://www.dbms2.com/category/database-management-system/mid-range/" >mid-range DBMS</a>.</li>
<li>The big brand vendors try hard to 	keep as many of those databases for themselves as they can. 	Enterprise-wide license pricing helps. Going forward, so will 	virtualization/consolidation strategies, such as <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" >Oracle&#8217;s 	Exadata-centric approach</a>.</li>
<li>A variety of mid-range DBMS 	alternatives beyond the big brands have technical merit, at least in 	some cases and configurations – MySQL, PostgreSQL, Intersystems 	Cache&#8217;, and so on.</li>
<li>The only such mid-range DBMS 	alternative with much large enterprise business momentum, however, 	appears to be MySQL.</li>
</ul>
<p style="margin-bottom: 0in;"><strong>&#8220;General-purpose&#8221; might be a better term than &#8220;OLTP&#8221; anyway.</strong></p>
<ul>
<li>I don&#8217;t have a link, but it&#8217;s widely agreed that over half of the processing on an &#8220;OLTP&#8221; enterprise app is commonly reporting and so on.</li>
<li>&#8220;Operational BI&#8221; is progressing by fits and starts, but it is progressing.</li>
<li>Anything customer-facing &#8212; web-based, call center, or otherwise &#8212; is likely to include a heavy dose of &#8220;real-time&#8221; analytic optimization.</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Most interesting new OLTP/general-purpose data management products are <span style="font-style: normal;">either MySQL-based or NoSQL.</span></strong></p>
<ul>
<li><a href="http://www.dbms2.com/2009/06/22/h-store-horizontica-voltdb/" >VoltDB</a> is the main 	exception that jumps to mind.</li>
<li>This isn&#8217;t true in the analytic 	DBMS area, where Netezza, Greenplum, Aster, Vertica and others 	started from PostgreSQL&#8217;s code, APIs, or both.</li>
</ul>
<p style="margin-bottom: 0in;"><strong>It&#8217;s not yet clear whether MySQL will prevail over MySQL forks, or vice-versa, or whether they will co-exist.</strong></p>
<ul>
<li>MySQL is a limited product without 	all the third-party storage engines that are being developed.</li>
<li><a href="http://www.dbms2.com/2009/12/14/oracle-mysql-storage-engine/" >Oracle&#8217;s promise of MySQL good 	behavior</a> has an expiration date.</li>
<li>None of the MySQL front-end 	alternatives are remotely mature yet.</li>
</ul>
<p style="margin-bottom: 0in;"><strong>The era of silicon-centric relational DBMS is coming.</strong></p>
<ul>
<li>I think “silicon” means 	“solid-state memory” as much as or more than it means “RAM,” 	but that&#8217;s not yet certain.</li>
<li>What is pretty certain is that, 	thanks to Moore&#8217;s Law, some kind of silicon will increasingly 	replace disk.</li>
<li><a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" >Oracle&#8217;s increasingly 	Flash-centric story</a> is a challenge to everybody.</li>
<li>RAM-centric VoltDB will launch 	fairly soon. (By the way, while VoltDB still has <a href="http://www.dbms2.com/2009/06/22/h-store-horizontica-voltdb/" >a lot in common 	with H-Store</a>, they&#8217;re not exactly the same thing. And <a href="http://bit.ly/9QxjV2." onclick="javascript:pageTracker._trackPageview('/bit.ly');">H-Store 	research</a> is progressing too.)</li>
<li><span style="font-style: normal;"><a href="http://rethinkdb.com/" onclick="javascript:pageTracker._trackPageview('/rethinkdb.com');">RethinkDB</a> is being de</span>veloped, focused directly on solid-state memory. 	Based on the sparse information available online, RethinkDB sounds 	somewhat like a dumbed-down H-Store.</li>
<li>New disk-based vendors may never 	optimize their use of disk, instead targeting a solid-state future. 	(E.g., I think Akiban should and quite well might follow this path.)</li>
</ul>
<p style="margin-bottom: 0in; font-weight: normal;"><strong>The emphasis on scale-out and reducing the cost of joins spans the NoSQL and SQL-based worlds.</strong> We hear that from the <a href="http://www.dbms2.com/2010/03/14/nosql-taxonomy/" >NoSQL</a> guys all the time. But I also just heard it from <a href="http://www.dbms2.com/2010/04/03/akiban-highlights/" >Akiban</a>.</p>
<p style="margin-bottom: 0in;"><strong>Users&#8217; instance on “free” could be a major problem for OLTP DBMS innovation.</strong> Vendors of new OLTP data management technologies often feel obligated to open source their products, notwithstanding the historical lack of revenue in the open source OLTP DBMS market. As just one of many examples,  <a href="http://www.novaspivack.com/uncategorized/evri-ties-the-knot-with-twine" onclick="javascript:pageTracker._trackPageview('/www.novaspivack.com');">Nova Spivack</a> wrote:</p>
<blockquote>
<p style="margin-bottom: 0in;">I have recently seen some new graph data storage products that may provide the levels of scale and performance needed, but pricing has not been determined yet. In short, storage and retrieval of semantic graph datasets is a big unsolved challenge that is holding back the entire industry. We need federated database systems that can handle hundreds of billions to trillions of triples under high load conditions, in the cloud, on commodity hardware and open source software. Only then will it be affordable to make semantic applications and services at Web-scale.</p>
</blockquote>
<p style="margin-bottom: 0in;">I hear similar things from other startups, who evidently believe they need and/or are entitled to enjoy sophisticated, high-performance, zero-cost, specialized database management technology.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/04/05/oltp-database-management-systems-2/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>
