<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS2 -- DataBase Management System Services &#187; Theory and architecture</title>
	<atom:link href="http://www.dbms2.com/category/database-theory-practice/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Sun, 14 Mar 2010 23:24:45 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Toward a NoSQL taxonomy</title>
		<link>http://www.dbms2.com/2010/03/14/nosql-taxonomy/</link>
		<comments>http://www.dbms2.com/2010/03/14/nosql-taxonomy/#comments</comments>
		<pubDate>Sun, 14 Mar 2010 23:24:45 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Structured documents]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1708</guid>
		<description><![CDATA[I talked Friday with Dwight Merriman, founder of 10gen (the MongoDB company). He more or less convinced me of his definition of NoSQL systems, which in my adaptation goes:
NoSQL = HVSP (High Volume Simple Processing) without joins or explicit transactions
Within that realm, Dwight offered a two-part taxonomy of NoSQL systems, according to their data model [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I talked Friday with Dwight Merriman, founder of 10gen (the MongoDB company). He more or less convinced me of his definition of NoSQL systems, which in my adaptation goes:</p>
<p style="margin-bottom: 0in;"><strong>NoSQL = <a href="http://www.dbms2.com/2010/03/13/the-naming-of-the-foo/" >HVSP (High Volume Simple Processing)</a> without joins or explicit transactions</strong></p>
<p style="margin-bottom: 0in;">Within that realm, Dwight offered a two-part taxonomy of NoSQL systems, according to their data model and replication/sharding strategy. I&#8217;d be happier, however, with at least three parts to the taxonomy:</p>
<ul>
<li>How data looks logically on a 	single node</li>
<li>How data is stored physically on a 	single node</li>
<li>How data is distributed, 	replicated, and reconciled across multiple nodes, and whether 	applications have to be aware of how the data is partitioned among 	nodes/shards.<span id="more-1708"></span></li>
</ul>
<p style="margin-bottom: 0in;">After talking with Dwight, and also with Cassandra project chair Jonathan Ellis, I feel I&#8217;m doing decently in understanding the first of those three areas. But there&#8217;s a long way yet to go on the other two.</p>
<p style="margin-bottom: 0in;">In Dwight&#8217;s opinion, as I understand it, NoSQL data models come in four general kinds.</p>
<ul>
<li><em><strong>Key-value stores,</strong></em><em> more or less pure.</em> I.e., they store keys+BLOBs (Binary Large 	OBjects), except that the “Large” part of “BLOB” may not 	come into play.</li>
<li><em><strong>Table-oriented,</strong></em><em> more or less. </em>The major examples here are Google&#8217;s BigTable, and 	Cassandra.</li>
<li><em><strong>Document-oriented,</strong></em><em> where a “document” is more like XML than free text. </em>MongoDB 	and CouchDB are the big examples here.</li>
<li><strong><em>Graph-oriented.</em> </strong><span style="font-weight: normal;">To 	date, this is the smallest area of the four. I&#8217;m reserving judgment 	as to whether I agree it&#8217;s properly included in HVSP and NoSQL.</span></li>
</ul>
<p style="margin-bottom: 0in;">As Dwight sees it, JSON (JavaScript Object Notation) is the emerging markup standard for the document-oriented data models, and to some extent the BLOB part of key-value models as well. Reasons seem to include:</p>
<ul>
<li>JSON is something web developers 	are likely to know anyway.</li>
<li>JSON, unlike XML, is schema-less. 	In the NoSQL world, that&#8217;s perceived as a good thing.</li>
<li>Perhaps for both these reasons, 	JSON is perceived as easier to use than XML.</li>
</ul>
<p style="margin-bottom: 0in;">Except as noted, I&#8217;m not aware of anything that solidly contradicts the above.</p>
<p style="margin-bottom: 0in;">Dwight went on to say that there are two main NoSQL replication/sharding models, in line with the seminal papers to which I <a href="http://www.dbms2.com/2010/03/12/some-nosql-links/" >previously linked</a>:</p>
<ul>
<li><em>Based on or resembling </em><em><strong>Dynamo.</strong></em> The core idea here is accepting <strong>eventual consistency</strong> among 	nodes as being good enough, even if that means you sometimes read 	dirty data. The benefit is that <strong>you never are blocked from 	writing.</strong> By way of contrast, systems that enforce true 	inter-node consistency (think of a two-phase commit) can shut you 	down from writing if consistency guarantees aren&#8217;t being confirmed 	in a timely manner. Thus, in a Dynamo-like scheme you write data to 	multiple nodes, via <strong>consistent hashing;</strong> then when the time 	comes you read one or more nodes, and hope that what you&#8217;re getting 	back is a correct result.</li>
<li><em>Based on or resembling </em><em><strong>BigTable.</strong></em> In this model you&#8217;re trying to keep the 	nodes fully consistent in the usual way, e.g. by synchronous 	replication. Indeed, what&#8217;s being kept consistent is both data 	itself, and metadata about the data&#8217;s location. Details surely vary 	a lot from implementation to implementation.</li>
</ul>
<p style="margin-bottom: 0in;">I&#8217;m fuzzier on this stuff than on the data models, because to date nobody has ever explained to me how an actual live system (MongoDB, Cassandra, whatever) implements its replication strategy. Also, while I think that in both these models applications are allowed to be ignorant of the replication/sharding strategy, I&#8217;m not as sure of that as I&#8217;d like to be.</p>
<p style="margin-bottom: 0in;">If we stop here, we already have something useful. MongoDB has a document data model, and is in the BigTable-like replication camp, at least at first. Cassandra has a table-like data model, and is on the Dynamo-like eventual consistency side. But to say those are the only differences that matter would be like saying that all shared-disk RDBMS (e.g., Oracle and Sybase IQ) are essentially alike. That, of course, would be nonsense.</p>
<p style="margin-bottom: 0in;">So a third dimension needed in this taxonomy is how the systems actually bang data on and off of disk (or silicon, as the case may be). I don&#8217;t yet have an overview of that. I know something of how Cassandra does it, and will write about same in a future post, but that&#8217;s about it. So please stay tuned.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/14/nosql-taxonomy/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The Naming of the Foo</title>
		<link>http://www.dbms2.com/2010/03/13/the-naming-of-the-foo/</link>
		<comments>http://www.dbms2.com/2010/03/13/the-naming-of-the-foo/#comments</comments>
		<pubDate>Sat, 13 Mar 2010 22:47:06 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Mark Logic]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1703</guid>
		<description><![CDATA[Let&#8217;s start from some reasonable premises.

No technology category name is 	ever perfect.
It&#8217;s particularly hard to describe 	NoSQL (Not Only SQL) accurately, given the basic confusion as to 	what NoSQL is all about.
That said, it 	seems pretty clear that NoSQL is about making big websites (and 	perhaps other cloud-like installations) run and scale.
Dwight Merriman (founder/CEO of [...]]]></description>
			<content:encoded><![CDATA[<p>Let&#8217;s start from some reasonable premises.</p>
<ul>
<li><a href="http://www.strategicmessaging.com/monashs-first-law-of-commercial-semantics-explained/2009/01/09/" onclick="javascript:pageTracker._trackPageview('/www.strategicmessaging.com');">No technology category name is 	ever perfect</a>.</li>
<li>It&#8217;s particularly hard to describe 	NoSQL (Not Only SQL) accurately, given <a href="http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/" >the basic confusion as to 	what NoSQL is all about</a>.</li>
<li>That said, it 	seems pretty clear that NoSQL is about making big websites (and 	perhaps other cloud-like installations) run and scale.</li>
<li>Dwight Merriman (founder/CEO of 	MongoDB vendor 10gen) is heading in the right direction when he says 	that the unifying ideas of NoSQL are that you do away with 	transactions and joins. But if he&#8217;s ever said something like “NoSQL 	is Foo without joins and transactions,” I don&#8217;t know what Foo is.</li>
<li><span style="font-style: normal;">Actually, 	I do know what Foo is – Foo is what happens when lots of people 	want to get small amounts each of information in or out of a 	database at the same time. I just don&#8217;t know what Foo is called.</span></li>
<li>Obviously, Foo is a lot like OLTP 	(OnLine Transaction Processing). However, it would be pretty silly 	for Foo to actually be OLTP, given that one of the core points of 	NoSQL is that you don&#8217;t have transactions.</li>
<li>It not just the “T” part of 	OLTP that&#8217;s fried.  Calling something “OnLine” only makes sense 	as long as offline is an option, and offline transaction processing 	has been obsolete for a very long time.*</li>
</ul>
<p style="margin-bottom: 0in;"><em>*Sure, if you strain you can talk yourself into exceptions. But the point stands.</em></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">So we need a name for Foo, where Foo is what happens when</span><span style="font-style: normal;"><strong> lots of people want to get small amounts each of information in or out of a database at the same time.</strong></span><span style="font-style: normal;"> Thus, three major subcategories of more-or-less disk-based Foo are:</span></p>
<ul>
<li><span style="font-style: normal;">No-compromises 	ACID-compliant relational OLTP</span></li>
<li><span style="font-style: normal;">Sharded 	MySQL</span></li>
<li>NoSQL</li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">There may be some more purely memory-centric versions too, but let&#8217;s put those aside for the moment. </span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">Absent a better idea, I can squeeze Foo into yet another four-letter acronym:</span></p>
<p style="margin-bottom: 0in;"><strong><span style="font-style: normal;">HVSP (High-Volume Simple Processing)</span></strong></p>
<p style="margin-bottom: 0in; font-style: normal;">That&#8217;s as imperfect as any other category name, and an awkward mouthful to boot. So I&#8217;d love to hear a better one; if you have such, please share it!  In the mean time, I think “HVSP” has merit because:</p>
<ul>
<li><span style="font-style: normal;">The 	“Processing” part should be noncontroversial.</span></li>
<li>“<span style="font-style: normal;">High-Volume” 	is inherent to the challenge. If RDBMS scale well enough for your 	use case, using something less powerful is probably silly.*  	Similarly, while Oracle shines at high-volume OLTP workloads, there 	are many cheaper DBMS that do a fine job of OLTP at lower volumes.</span></li>
<li>“<span style="font-style: normal;">Simple” 	is the core principle of NoSQL systems, which drop joins and 	transactions as being too much foofarah.  That only makes sense at 	all under the assumption that you have bone-simple queries and 	updates, so that programming around the lack of joins and 	transactions isn&#8217;t all that much of a burden.</span></li>
<li><span style="font-style: normal;">Something 	similar is true of sharded MySQL.</span></li>
<li><span style="font-style: normal;">Less 	obviously, “simple” is a core principle of relational OLTP as 	well. The point of the relational model is to cap the complexity of 	data operations, or more precisely to hide that complexity from 	programmers.</span></li>
<li><span style="font-style: normal;">And 	overloading the word “simple” a bit, it&#8217;s fair to say that if 	you&#8217;re reading or writing one record at a time, you&#8217;re doing 	something relatively simple, at least as opposed to what you do in 	analytic processing. The OLTP vs. OLAP distinction is preserved in 	this name change.</span></li>
<li><span style="font-style: normal;">The whole thing matches my definition above, namely &#8220;what happens when lots of people want to get small amounts each of information in or out of a database at the same time.&#8221;</span></li>
</ul>
<p style="margin-bottom: 0in;"><em>*Assuming, of course, that rows-and-tables are a good metaphor for your data structure in the first place.</em></p>
<p style="margin-bottom: 0in; font-style: normal;">Systems I&#8217;m leaving out of the HVSP and hence also NoSQL categories include:</p>
<ul>
<li><span style="font-style: normal;"><strong>Hadoop 	and other batch-oriented MapReduce.</strong></span><span style="font-style: normal;"> Hadoop isn&#8217;t part of NoSQL. I&#8217;m pretty sure that </span><a href="http://twitter.com/mikeolson/status/10388695185" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Cloudera 	CEO Mike Olson</a><span style="font-style: normal;"> agrees with me.</span></li>
<li><span style="font-style: normal;"><span style="font-weight: normal;">More 	generally, </span></span><span style="font-style: normal;"><strong>non-SQL 	data stores that don&#8217;t meet the HVSP criteria.</strong></span><span style="font-style: normal;"> Dave Kellogg stretches things when he claims that <a href="http://www.kellblog.com/2010/03/10/ieee-computer-society-article-on-nosql-an-executive-level-overview/" onclick="javascript:pageTracker._trackPageview('/www.kellblog.com');">MarkLogic 	is a NoSQL system</a>. (But then, that was in a post where he 	seemingly praised </span><a href="http://www.dbms2.com/2009/12/11/nosql-q-and-a/" >a train wreck of an article</a><span style="font-style: normal;">.)</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">But hey – what good is a categorization if it doesn&#8217;t leave some things out?</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/13/the-naming-of-the-foo/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Some NoSQL links</title>
		<link>http://www.dbms2.com/2010/03/12/some-nosql-links/</link>
		<comments>http://www.dbms2.com/2010/03/12/some-nosql-links/#comments</comments>
		<pubDate>Fri, 12 Mar 2010 23:51:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Amazon and its cloud]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Continuent]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Tokutek]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1692</guid>
		<description><![CDATA[I plan to post a few things soon about MongoDB, Cassandra, and NoSQL in general. So I&#8217;m poking around a bit reading stuff on the subjects. Here are some links I found.

A little over a year ago, Julian Browne put up a great post on Eric Brewer&#8217;s CAP conjecture/theorem, which provides much of the impetus [...]]]></description>
			<content:encoded><![CDATA[<p>I plan to post a few things soon about MongoDB, Cassandra, and NoSQL in general. So I&#8217;m poking around a bit reading stuff on the subjects. Here are some links I found.</p>
<ul>
<li>A little over a year ago, Julian Browne put up a great post on <a href="http://www.julianbrowne.com/article/viewer/brewers-cap-theorem" onclick="javascript:pageTracker._trackPageview('/www.julianbrowne.com');">Eric Brewer&#8217;s CAP conjecture/theorem</a>, which provides much of the impetus to relax the traditional requirement for atomicity/consistency.</li>
<li>Even more directly inspirational to NoSQL technology development were two seminal papers: Google&#8217;s on <a href="http://labs.google.com/papers/bigtable.html" onclick="javascript:pageTracker._trackPageview('/labs.google.com');">BigTable</a> and Amazon&#8217;s on <a href="http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf" onclick="javascript:pageTracker._trackPageview('/s3.amazonaws.com');">Dynamo</a>. (That said, I&#8217;m having trouble getting myself to actually read them from start to finish, especially since they&#8217;ve been superseded by subsequent technology development.)</li>
<li>10gen (the MongoDB guys) hosted a NoSQL conference yesterday. Much blogging has ensued. The best post I&#8217;ve seen so far was by <a href="http://blog.marcua.net/post/442594842/notes-from-nosql-live-boston-2010" onclick="javascript:pageTracker._trackPageview('/blog.marcua.net');">Adam Marcus</a>. I find the graph database notes near the bottom particularly interesting.</li>
<li>Mark Callaghan hit back against the <a href="http://mysqlha.blogspot.com/2010/03/plays-well-with-others.html" onclick="javascript:pageTracker._trackPageview('/mysqlha.blogspot.com');">NoSQL <span style="text-decoration: line-through;">movement</span> hype</a>, and in particular against the <a href="http://www.dbms2.com/2010/03/02/cassandra-nosql-scalable-oltp/" >MySQL/memcached is passe</a>&#8216; meme. On the other hand, he also bemoaned many failings of MySQL. On the third hand, he praised or at least expressed hope for a variety of MySQL-related technologies, including <a href="http://www.dbms2.com/2009/04/16/introduction-to-tokutek/" >Tokutek&#8217;s TokuDB</a> and <a href="http://www.dbms2.com/2009/09/03/continuent-on-clustering/" >Continuent&#8217;s Tungsten</a>.</li>
<li>In connection with that debate, Mark Rendle offered a <a href="http://blog.markrendle.net/2010/03/do-you-need-relational-database.html" onclick="javascript:pageTracker._trackPageview('/blog.markrendle.net');">funny rant</a>, mainly pro-NoSQL, in the style of a Socratic dialogue.</li>
<li>John Quinn of Digg recently described <a href="http://www.stumbleupon.com/su/5099Ti/about.digg.com/node/564" onclick="javascript:pageTracker._trackPageview('/www.stumbleupon.com');">Digg&#8217;s move from MySQL to Cassandra</a>, and outlined a lot of features Digg was adding to Cassandra, all of which it is open-sourcing.</li>
<li>The NoSQL guys maintain their own long <a href="http://nosql-database.org/links.html" onclick="javascript:pageTracker._trackPageview('/nosql-database.org');">list of NoSQL-related links</a>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/12/some-nosql-links/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Cassandra and the NoSQL scalable OLTP argument</title>
		<link>http://www.dbms2.com/2010/03/02/cassandra-nosql-scalable-oltp/</link>
		<comments>http://www.dbms2.com/2010/03/02/cassandra-nosql-scalable-oltp/#comments</comments>
		<pubDate>Tue, 02 Mar 2010 19:01:13 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1675</guid>
		<description><![CDATA[Todd Hoff put up a provocative post on High Scalability called MySQL and Memcached: End of an Era? The post itself focuses on observations like:

Facebook invented and is adopting Cassandra.
Twitter is adopting Cassandra.
Digg is adopting Cassandra.
LinkedIn invented and is adopting Voldemort.
Gee, it seems as if the super-scalable website biz has moved beyond MySQL/Memcached.

But in addition, he [...]]]></description>
			<content:encoded><![CDATA[<p>Todd Hoff put up a provocative post on High Scalability called <a href="http://highscalability.com/blog/2010/2/26/mysql-and-memcached-end-of-an-era.html" onclick="javascript:pageTracker._trackPageview('/highscalability.com');">MySQL and Memcached: End of an Era?</a> The post itself focuses on observations like:</p>
<ul>
<li>Facebook invented and is adopting Cassandra.</li>
<li>Twitter is adopting Cassandra.</li>
<li>Digg is adopting Cassandra.</li>
<li>LinkedIn invented and is adopting Voldemort.</li>
<li>Gee, it seems as if the super-scalable website biz has moved beyond MySQL/Memcached.</li>
</ul>
<p>But in addition, he provides a lot of useful links, which DBMS-oriented folks such as myself might have previously overlooked. <span id="more-1675"></span>Following those trails gets one to, among other things:</p>
<ul>
<li>A September, 2009 post outlining <a href="http://about.digg.com/blog/looking-future-cassandra" onclick="javascript:pageTracker._trackPageview('/about.digg.com');">Digg&#8217;s reasons for moving to Cassandra</a>. The core idea is that joining two tables is expensive; it&#8217;s cheaper to store the results prejoined on disk. Details are provided.</li>
<li>A February, 2010 post outlining <a href="http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king" onclick="javascript:pageTracker._trackPageview('/nosql.mypopescu.com');">Twitter&#8217;s reasons for moving to Cassandra</a>. They boil down to &#8220;sufficiently scalable, sufficiently simple, sufficiently robust, robustly open source.&#8221;</li>
<li>A <a href="http://www.niallkennedy.com/blog/uploads/flickr_php.pdf" onclick="javascript:pageTracker._trackPageview('/www.niallkennedy.com');">Flickr slide presentation</a> saying &#8220;normalization is for wimps&#8221;. They seemed to be staying with MySQL, but lusting after XPath.</li>
<li>A nice <a href="http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/" onclick="javascript:pageTracker._trackPageview('/blog.evanweaver.com');">Cassandra technical overview</a> by Evan Weaver of Twitter.</li>
</ul>
<p>I also recall seeing something that said &#8220;We have 13X as many queries as updates, so of course we should optimize for reads,&#8221; but I can&#8217;t find that now. The classical OLTP answer to that would probably be &#8220;Yeah, but by the time you&#8217;re two-phase-committing and integrity-checking all the part of that update, it turns out updates are still what you should optimize for.&#8221; Well, what if the update is so simple that that&#8217;s no longer a valid argument?</p>
<p>There certainly seem to be some non-obvious technical choices being made here, with options being conflated that perhaps shouldn&#8217;t be. In particular, I wonder whether things are being written to cheap disk in a really fast way when it might be better to keep them in more expensive RAM or, perhaps better yet, solid-state memory. Perhaps then the functionality/performance tradeoff wouldn&#8217;t be so painful.</p>
<p>On the other hand, the designers of the world&#8217;s most scalable websites &#8212; e-commerce sites perhaps excepted &#8212; seem pretty unanimous in thinking it&#8217;s best to bake some database/integrity management into the applications, rather than offload it all to an RDBMS. Why? Because the transactions are so simple that hand-coding all that isn&#8217;t prohibitive. And of course because of their extreme performance and scalability needs.</p>
<p>I&#8217;m not sure on what basis one could argue that they&#8217;re wrong.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/02/cassandra-nosql-scalable-oltp/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Another reason to expect number-crunching and big-data management to converge</title>
		<link>http://www.dbms2.com/2010/02/26/number-crunching-big-data-managementconverge/</link>
		<comments>http://www.dbms2.com/2010/02/26/number-crunching-big-data-managementconverge/#comments</comments>
		<pubDate>Fri, 26 Feb 2010 06:03:12 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1660</guid>
		<description><![CDATA[Dan Olds argues that Oracle is likely to pursue commercially-substantive high performance computing (HPC), emphasis mine:
I just don’t see Oracle abandoning HPC entirely. I think it may call it by some other name or describe it differently, but it will be in the high throughput computing business for the foreseeable future.
There are some interesting angles [...]]]></description>
			<content:encoded><![CDATA[<p>Dan Olds argues that <a href="http://www.theregister.co.uk/2010/02/25/oracle_sun/" onclick="javascript:pageTracker._trackPageview('/www.theregister.co.uk');">Oracle is likely to pursue commercially-substantive high performance computing</a> (HPC), emphasis mine:<span id="more-1660"></span></p>
<blockquote><p>I just don’t see Oracle abandoning HPC entirely. I think it may call it by some other name or describe it differently, but it will be <strong>in the high throughput computing business for the foreseeable future.</strong></p>
<p>There are some interesting angles for it to pursue. <strong>Many of its best commercial customers have sizeable HPC or HPC-like workloads</strong> that Oracle can now (with the addition of Sun) compete for. I don’t see it passing up those opportunities.</p>
<p>Oracle can also look to specialize on certain subsets of the market and provide more of a solution rather than piece parts. I wouldn’t be surprised to hear of it offering<strong> an Exadata-like system that is optimized for, say, seismic or financial services.</strong> In fact, Exadata as it stands today is a decent fit for financial service analytic workloads.</p>
<p>HPC can be a profitable business and, in a lot of organizations, it’s growing faster than traditional business processing. From Oracle’s perspective, what’s not to like?</p></blockquote>
<p>Now, except for the Exadata-in-financial-services comment, that&#8217;s not directly an argument for the convergence of number crunching and data management.  However, I think <a href="http://www.dbms2.com/2010/02/22/netezza-twinfin/" >Netezza and Aster Data</a> are showing the way for that convergence. So, up to a point, is <a href="http://www.dbms2.com/2009/10/03/issues-in-scientific-data-management/" >the scientific-research community</a>. And of course the <a href="http://www.dbms2.com/2009/10/10/enterprises-using-hadoo/" >Hadoop</a> guys think they have the best way to that convergent future.</p>
<p>But if Dan Olds is right that the best technologies for Oracle to pursue HPC and big-data processing with aren&#8217;t all that far apart, then the chances that Oracle will indeed pursue their convergence are pretty high. And that would amount to critical mass for the trend.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/26/number-crunching-big-data-managementconverge/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chris Bird&#8217;s blog is brilliant, and update-in-place is increasingly passe&#8217;</title>
		<link>http://www.dbms2.com/2010/02/25/chris-bird-database-design-update-in-plac/</link>
		<comments>http://www.dbms2.com/2010/02/25/chris-bird-database-design-update-in-plac/#comments</comments>
		<pubDate>Thu, 25 Feb 2010 05:44:54 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1641</guid>
		<description><![CDATA[I wouldn&#8217;t say every post in Chris Bird&#8217;s occasionally-updated blog is brilliant. I wouldn&#8217;t even say every post is readable. But I&#8217;d still recommend his blog to just about anybody who reads here as, at a minimum, a consciousness-raiser.
One of the two posts inspiring me to mention this is a high-level one on &#8220;technical debt&#8220;, [...]]]></description>
			<content:encoded><![CDATA[<p>I wouldn&#8217;t say every post in Chris Bird&#8217;s occasionally-updated blog is brilliant. I wouldn&#8217;t even say every post is readable. But I&#8217;d still recommend his blog to just about anybody who reads here as, at a minimum, a consciousness-raiser.</p>
<p>One of the two posts inspiring me to mention this is a high-level one on &#8220;<a href="http://businessanditarchitecture.blogspot.com/2009/10/technical-debt.html" onclick="javascript:pageTracker._trackPageview('/businessanditarchitecture.blogspot.com');">technical debt</a>&#8220;, reminding us why things don&#8217;t always get done right the first time, and further reminding us that circling back to fix them sooner rather than later is usually wise. The other <a href="http://businessanditarchitecture.blogspot.com/2009/11/updates-harmful.html" onclick="javascript:pageTracker._trackPageview('/businessanditarchitecture.blogspot.com');">connects two observations</a> that individually have great merit (at least if you don&#8217;t take them to extremes):</p>
<ul>
<li>Update-in-place is passe&#8217;</li>
<li>So is elaborate up-front database design</li>
</ul>
<p>Specific points of interest here include:<span id="more-1641"></span></p>
<ul>
<li>Most data never gets changed after being written. Update-in-place doesn&#8217;t save all that much in storage hardware.</li>
<li>Update-in-place interferes with a lot of modern optimizations in analytic DBMS design.</li>
<li>Knowing what values data had in the past is interesting in and of itself.</li>
<li>So, potentially, is knowing what &#8220;dirty&#8221; data end-users &#8212; especially customers and prospects &#8212; decided to enter.</li>
<li>The &#8220;right&#8221; amount of data validation is application-dependent. For example, if data validation involves torturing your customers, maybe it&#8217;s not such a good idea. (Great observation by Chris.)</li>
<li>If you have the old data as well as the new, the harm of having &#8220;bad&#8221; updates is lessened. (Central connecting observation by Chris.)</li>
<li>People enter data inconsistently. MDM (Master Data Management) and data cleansing tools fix much (admittedly not all) of the harm. Computers are cheaper than people. You do the math.</li>
<li>Data is increasingly being managed in non-relational and/or non-persistent ways. Get used to it.</li>
<li>As the <a href="http://www.dbms2.com/2009/12/12/legit-nosql-key-value-store/" >NoSQL</a> guys point out, some of today&#8217;s most demanding applications have extremely simple schemas.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/25/chris-bird-database-design-update-in-plac/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Vertica 4.0</title>
		<link>http://www.dbms2.com/2010/02/22/vertica-4/</link>
		<comments>http://www.dbms2.com/2010/02/22/vertica-4/#comments</comments>
		<pubDate>Mon, 22 Feb 2010 08:19:00 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1607</guid>
		<description><![CDATA[Vertica briefed me last month on its forthcoming Vertica 4.0 release. I think it&#8217;s fair to say that Vertica 4.0 is mainly a cleanup/catchup release, washing away some of the tradeoffs Vertica had previously made in support of its innovative DBMS architecture.
For starters, there&#8217;s a lot of new analytic functionality. This isn&#8217;t Aster/Netezza-style ambitious. Rather, [...]]]></description>
			<content:encoded><![CDATA[<p>Vertica briefed me last month on its forthcoming Vertica 4.0 release. I think it&#8217;s fair to say that Vertica 4.0 is mainly a cleanup/catchup release, washing away some of the tradeoffs Vertica had previously made in support of its innovative DBMS architecture.</p>
<p>For starters, there&#8217;s a lot of new analytic functionality. This isn&#8217;t Aster/Netezza-style ambitious. Rather, there&#8217;s a lot more SQL-99 functionality, plus some time series extensions of the sort that financial services firms – an important market for Vertica – need and love. Vertica did suggest a couple of these time series extensions are innovative, but I haven&#8217;t yet gotten detail about those.</p>
<p>Perhaps even more important, Vertica is cleaning up a lot of its previous SQL optimization and execution weirdnesses. In no particular order, I was told:<span id="more-1607"></span></p>
<ul>
<li>Vertica&#8217;s delete performance is up “literally” 30-100X, at least in the case of “large” deletes. Performance for “large” updates has been enhanced as well.</li>
<li>Vertica has finally cleaned up all vestiges of its prior <a href="http://www.dbms2.com/2007/10/23/vertica-star-snowflake-schema/" >bias to star schemas</a>. For example, Vertica concedes that its product previously would sometimes force a star execution plan that wasn&#8217;t really appropriate.</li>
<li>It is no longer the case that you need to define projections before you load a table into Vertica. This is now fully automatic.</li>
<li>Vertica 4.0 automatically redesigns the database when new nodes are added to the system.</li>
<li>When a database designer does hand-tune projections – and there&#8217;s no shame in this still being a possibility in Vertica 4.0 – that hand-tuning is now pulled back into the automatic generation/recommendation/whatever wizards for further projections. I.e., there&#8217;s a kind of DBA round-trip engineering going on.</li>
<li>Vertica used to require that tables being joined be identically “segmented” (I think this means distributed across joins). That is no longer the case in 4.0.</li>
<li>In connection with this new-found flexibility, Vertica now supports full outer joins directly, rather than requiring the left outer join/right outer join/UNION kluge.</li>
<li>The Vertica 4.0 optimizer is smarter than its predecessor about things like predicate pushdown into subqueries, or exploiting commonality between predicates and partition keys.</li>
<li>There&#8217;s a fundamental change that I don&#8217;t understand very well in the Vertica execution engine basic unit of work. It sounds as if in the past all the disk-based data containers the query needed got opened at once and read into memory, whether or not there was enough RAM and CPU cores to handle them, and this problem has now been fixed.</li>
<li>Vertica always seemed to say that you could query immediately on new data, because even if it hadn&#8217;t hit disk yet – the ROS (Read-Optimized Store) – it was available in memory – the WOS (Write-Optimized Store). And queries were in essence federated between the ROS and WOS. But apparently it&#8217;s a new feature in Vertica 4.0 that you can read totally fresh data without locking. I confess to not understanding this very well either. (It has something to do with what  Vertica calls “Epochs”.)</li>
<li>Temporary tables can now be created in Vertica on a local/session basis without any DDL. Make temporary tables easier and more performant is important for a variety of reasons:
<ul>
<li>Microstrategy, Company V* et al. use lots of temp tables. E.g,, Company V on Vertica has 3000 permanent tables and 5-7000 temporary ones.</li>
<li>Vertica rightly points out that temporary tables are also important for ELT (Extract/Load/Transform).</li>
<li>Vertica further says that single-node OEMs such as security appliance vendors use lots of temp tables.</li>
</ul>
</li>
</ul>
<p><em>*Company V = one of the more prominent vertical-market application providers.</em></p>
<p>In other Vertica highlights:</p>
<ul>
<li>It sounds as if 4.0 is the first Vertica release with what I would regard as serious workload management.</li>
<li>While Vertica has stored and retrieved Unicode since Vertica 3.5 or so, 4.0 will be the first Vertica release in which Unicode is sorted and collated properly.</li>
<li>Stored-procedure-like functionality is still a future for Vertica.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/22/vertica-4/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Open issues in database and analytic technology</title>
		<link>http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/</link>
		<comments>http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/#comments</comments>
		<pubDate>Mon, 01 Feb 2010 22:04:31 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1507</guid>
		<description><![CDATA[The last part of my New England Database Summit talk was on open issues in database and analytic technology. This was closely intertwined with the previous section, and also relied on a lot that I&#8217;ve posted here. So I&#8217;ll just put up a few notes on that part, with lots of linkage to prior discussion [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">The last part of my <a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/" >New England Database Summit</a> talk was on open issues in database and analytic technology. This was closely intertwined with the <a href="http://www.dbms2.com/2010/01/31/trends-database-aanalytic-technology/" >previous section</a>, and also relied on a lot that I&#8217;ve posted here. So I&#8217;ll just put up a few notes on that part, with lots of linkage to prior discussion of the same points.<span id="more-1507"></span></p>
<p><!-- 		@page { margin: 0.79in } 		P { margin-bottom: 0.08in } --></p>
<ul>
<li>The most important issue in 	database and analytic technology, in my opinion, isn&#8217;t technological 	at all – rather, it&#8217;s the legal and political steps needed to <a href="http://www.dbms2.com/2010/01/31/data-based-snooping-threat-libert/" > preserve liberty</a> in the face of advancing, intrusive 	technology.</li>
<li>Another important issue for 	society – and this one does involve a lot of technology – is 	scientific number crunching. In particular, <a href="http://www.dbms2.com/2009/10/03/issues-in-scientific-data-management/" >database technology for 	scientific computing</a> needs to be developed much further. I&#8217;ll have 	more to say on all this soon.</li>
<li>More generally, technology needs 	to keep advancing for parallel analytics. Fortunately, it is. Watch 	this space over the next few weeks.</li>
<li>Oracle has said, in effect, that <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" > its most important technological challenge of the decade</a> is getting 	<a href="http://www.dbms2.com/2010/01/31/flash-pcmsolid-state-memory-disk/" >solid-state memory</a> right. I agree.</li>
<li>Data volumes will keep going up, 	up, up. Technology needs to keep evolving accordingly. Much of what 	I write is on that subject.</li>
<li>Data needs to be processed and analyzed at <a href="http://www.dbms2.com/2009/09/10/analytic-speed-latency/" >very 	different latencies</a>. And there&#8217;s much further to go in integrating 	disparate latencies.</li>
<li>Analytic database management in 	the cloud hasn&#8217;t been solved yet, especially for Big Data. Among the 	reasons are the difficulty of moving data into the cloud (unless it 	originated there), the slowness of moving it from node to node in 	shared-nothing architectures (which reduces the elasticity benefit), 	and above all the long and unpredictable latencies of interprocessor 	communication while queries are running (a key subject of discussion 	at the <a href="http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/" >Boston Big Data Summit</a>).</li>
<li>Better business intelligence user 	interfaces are increasingly available. I&#8217;m thinking particularly of 	approaches with buzzwords like <a href="http://www.dbms2.com/2008/08/04/qliktech-qlikview-update/" >visualization/interactive exploration</a> or <a href="http://www.texttechnologies.com/2007/08/03/the-case-for-inxight-awareness-server/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">faceted</a>. But they aren&#8217;t well-integrated into the overall 	analytic stack, as big BI vendors are trailing the smaller ones in 	this regards. (Part of the problem relates to my previous point.)</li>
<li>Application development over text 	search isn&#8217;t in the same league as application development over 	relational DBMS. The choices are mainly XML (e.g., <a href="http://www.texttechnologies.com/2008/04/29/mark-logic-viewed-as-a-different-kind-of-text-search-technology-vendor/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">MarkLogic</a>), SQL 	for text integrated into RDBMS (limited by the weakness of those 	integrations), and something like <a href="http://www.texttechnologies.com/2008/09/20/attivio-update/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">Attivio&#8217;s Java SDK</a>. There&#8217;s a 	major conceptual barrier in building those apps, namely the 	unpredictability of query results. Still, it should be possible to 	do better.</li>
<li>Similarly, text analytics and 	conventional analytics exist well side by side. They can even be in 	the same database and/or dashboard, although in practice that is 	limited by the strong <a href="http://www.texttechnologies.com/2008/10/24/attensity-update-2/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">SaaS focus of text mining vendors and users</a>. But analytic 	integration of them is really hard. Linguistic imprecision is, in my 	opinion, only the #2 reason for this difficulty. The #1 reason is 	that trends detected by text analytics are much less precise than 	trends on tabular data – e.g., a 50% increase in a certain kind of 	complaint may be no more significant than a 5% change in a revenue 	variable.</li>
<li>I&#8217;m increasingly persuaded that <a href="http://www.dbms2.com/2009/08/21/social-network-analysis-aka-relationship-analytics/" > graph analytics</a> can be handled without a graph-centric data model. 	But right now, it isn&#8217;t being handled well at all. Lots more needs 	to be done – although when it is, it will just exacerbate the 	privacy/liberty dangers that so concern me.</li>
</ul>
<p><em><strong>Other posts based on my January, 2010 New England Database Summit keynote address</strong></em></p>
<ul>
<li><a title="Data-based snooping — a huge threat to liberty that we’re all helping make worse" href="../2010/01/31/data-based-snooping-threat-libert/">Data-based snooping — a huge threat to liberty that we’re all helping make worse</a></li>
<li><a title="Flash, other solid-state memory, and disk" href="../2010/01/31/flash-pcmsolid-state-memory-disk/">Flash, other solid-state memory, and disk</a></li>
<li><a title="Interesting trends in database and analytic technology" href="../2010/01/31/trends-database-aanalytic-technology/">Interesting trends in database and analytic technology</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Interesting trends in database and analytic technology</title>
		<link>http://www.dbms2.com/2010/01/31/trends-database-aanalytic-technology/</link>
		<comments>http://www.dbms2.com/2010/01/31/trends-database-aanalytic-technology/#comments</comments>
		<pubDate>Mon, 01 Feb 2010 02:11:17 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1492</guid>
		<description><![CDATA[My project for the day is blogging based on my “Database and analytic technology: State of the union” talk of a few days ago. (I called it that because of when it was given, because it mixed prescriptive and descriptive elements, and because I wanted to call attention to the fact that I cover the [...]]]></description>
			<content:encoded><![CDATA[<p>My project for the day is blogging based on my “<a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/" >Database and analytic technology: </a><a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/" >State of the union</a>” talk of a few days ago. (I called it that because of when it was given, because it mixed prescriptive and descriptive elements, and because I wanted to call attention to the fact that I cover the <em>union</em> of database and analytic technologies – the <em>intersection</em> of those two sectors is an area of particular focus, but is far from the whole of my coverage.)</p>
<p>One section covered recent/ongoing/near-future trends that I thought were particularly interesting, including:<span id="more-1492"></span></p>
<p><strong>Simpler database technology,</strong> by which I mean DBMS that are:</p>
<ul>
<li>Easier 	to administer than market-leading systems &#8230;</li>
<li>… even if at the cost of being special-purpose</li>
<li>E.g.,
<ul>
<li>MySQL and older mid-tier RDBMS such as Progress</li>
<li>Many analytic DBMS and appliances, most notably Netezza&#8217;s</li>
</ul>
</li>
</ul>
<p>For general purpose or OLTP uses, I&#8217;m not a big fan of MySQL (not enough progress in making it industrial-strength), PostgreSQL (no good company behind it – I&#8217;m a non-fan of EnterpriseDB), or Ingres (open source or not, it&#8217;s an antiquated system that hasn&#8217;t been invested in as much as Oracle, DB2 or SQL Server).</p>
<p>But I get the impression there are a lot of contenders among small startups, featuring very new architectures for OLTP or general-purpose database management. VoltDB comes to mind. NimbusDB is finally within range of getting funded. Dan Weinreb told me Friday he knows of a bunch of others as well. And that&#8217;s all before we even get into the <a href="http://www.dbms2.com/2009/12/12/legit-nosql-key-value-store/" >NoSQL</a> kind of alternative.</p>
<p><strong>Flexible storage architectures.</strong> That&#8217;s starting out with an emphasis on hybrid columnar, as in the examples of <a href="http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/" >Vertica</a> and <a href="http://www.dbms2.com/2009/10/14/greenplum-hybrid-columnar/" >Greenplum</a>. Oracle (to whom I&#8217;m under no NDA obligation) and other vendors (to whom I am) are going that way as well.</p>
<p><strong>Multi-tier database architectures,</strong> by which I mean at least two things:</p>
<ul>
<li>The database tier/server tier split of Exadata</li>
<li>Hybrid RAM/disk architectures, examples of which include
<ul>
<li>Vertica&#8217;s RAM-based write-optimized store</li>
<li><a href="http://www.dbms2.com/2009/10/18/introduction-to-sensage/" >Sensage&#8217;s CEP-in-the-DBMS</a></li>
<li>This in-memory analytics stuff we keep hearing about from the BI vendors</li>
<li>Any true in-memory/disk hybrid, such as the regrettably sidelined <a href="http://www.dbms2.com/2007/12/21/ibm-acquires-soliddb/" >solidDB</a></li>
<li>Smart thinking by numerous DBMS vendors about optimizing the use of RAM and/or Level 2 cache</li>
</ul>
</li>
</ul>
<p>Netezza is particularly interesting to watch in this regard because it:</p>
<ul>
<li>Had a pretty strict storage/other processing split in prior product generations and &#8230;</li>
<li>… <a href="http://www.dbms2.com/2009/07/30/netezza-new-product-family/" >ditched that in its latest generation</a> …</li>
<li>… which however is focused on optimizing the use of RAM cache</li>
</ul>
<p>Also noteworthy is Petascan, the stealth-mode –and therefore harder to watch right now <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  – company I keep teasing about, which makes a strong case for carrying the database/storage tier split into the flash/solid-state memory technology generation. <a href="../2009/04/20/calpont-update-you-read-it-here-first/">Calpont</a> also has a server/storage tier split, but that&#8217;s of mainly theoretical interest unless and until Calpont actually ships an MPP version of <a href="../2009/11/07/calponts-infinidb/">InfiniDB</a>.</p>
<p><strong>Cheaper parts,</strong> which have of course been a huge trend for decades.<a href="../2010/01/31/flash-pcmsolid-state-memory-disk/"> Solid-state memory</a> will soon conquer the world. Meanwhile, cheaper sensors drive that <a href="../2010/01/17/three-broad-categories-of-data/">machine-generated data</a> I keep talking about.</p>
<p>An ever-better understanding of <strong>scale-out technology,</strong> in several respects, including:</p>
<ul>
<li>Query, notably data movement for MPP DBMS</li>
<li>Update, especially minimalistic DBMS approaches, be they sharded MySQL or more NoSQLish</li>
<li>Number-crunching, especially via MapReduce and/or parallel analytic libraries integrated into DBMS</li>
</ul>
<p>Cool trends I touched on more briefly include:</p>
<ul>
<li>More data being available for analysis. This was a core theme of my <a href="http://www.dbms2.com/2009/07/30/netezza-enzee-universe/" >Enzee Universe keynote speeches</a>; there are also some notes on it in my 	post based on my <a href="http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/" >Boston Big Data Summit</a> talk.</li>
<li>More users being served by analytics. Ditto.</li>
<li>Data exploration/visualization, ala QlikView, Spotfire, or Tableau, and also the faceted stuff.</li>
<li>The democratization of data mining. But I&#8217;m not as sure of that one as of the others&#8230;</li>
</ul>
<p>One area I flat-out forgot to mention is <a href="http://www.dbms2.com/2009/06/08/the-future-of-data-marts/" >easy data mart spin-out</a>.</p>
<p><em><strong>Other posts based on my January, 2010 New England Database Summit keynote address</strong></em></p>
<ul>
<li><a title="Data-based snooping — a huge threat to liberty that we’re all helping make worse" href="../2010/01/31/data-based-snooping-threat-libert/">Data-based snooping — a huge threat to liberty that we’re all helping make worse</a></li>
<li><a title="Flash, other solid-state memory, and disk" href="../2010/01/31/flash-pcmsolid-state-memory-disk/">Flash, other solid-state memory, and disk</a></li>
<li><a title="Open issues in database and analytic technology" href="../2010/02/01/open-issues-in-database-and-analytic-technology/">Open issues in database and analytic technology</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/01/31/trends-database-aanalytic-technology/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Flash, other solid-state memory, and disk</title>
		<link>http://www.dbms2.com/2010/01/31/flash-pcmsolid-state-memory-disk/</link>
		<comments>http://www.dbms2.com/2010/01/31/flash-pcmsolid-state-memory-disk/#comments</comments>
		<pubDate>Sun, 31 Jan 2010 22:12:30 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1469</guid>
		<description><![CDATA[If there&#8217;s one subject on which the New England Database Summit changed or at least clarified my thinking,* it&#8217;s future storage technologies. Here&#8217;s what I now think:

Solid-state memory will soon be 	the right storage technology for a large fraction of databases, OLTP and analytic alike. I&#8217;m not sure whether the initial cutoff in 	database size [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">If there&#8217;s one subject on which the <a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/" >New England Database Summit</a> changed or at least clarified my thinking,* it&#8217;s future storage technologies. Here&#8217;s what I now think:</p>
<ul>
<li><strong>Solid-state memory will soon be 	the right storage technology for a large fraction of databases,</strong> OLTP and analytic alike. I&#8217;m not sure whether the initial cutoff in 	database size is best thought of as terabytes or 10s of terabytes, 	but it&#8217;s in that range. And it will increase over time, for the 	usual cheaper-parts reasons.</li>
<li><strong>That doesn&#8217;t necessarily mean 	flash.</strong> <a href="http://en.wikipedia.org/wiki/Phase-change_memory" onclick="javascript:pageTracker._trackPageview('/en.wikipedia.org');">PCM</a> (Phase-Change Memory) is coming down the pike, with perhaps 100X the 	durability of flash, in terms of the total number of writes it can 	tolerate. On the other hand, PCM has issues in the face of heat. 	More futuristically, IBM is also high on <a href="http://www.almaden.ibm.com/spinaps/research/sd/?racetrack" onclick="javascript:pageTracker._trackPageview('/www.almaden.ibm.com');">magnetic racetrack 	memory</a>. IBM likes the term <em>storage-class memory</em> to 	cover all this &#8212; which I find regrettable, since the acronym SCM is 	way overloaded already. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
<li><strong>Putting a disk controller in 	front of solid-state memory is really wasteful.</strong> It wreaks havoc 	on I/O rates.</li>
<li><strong>Generic PCIe interfaces don&#8217;t 	suffice either,</strong> in many analytic use cases. Their I/O is better, 	but still not good enough. (Doing better yet is where Petascan – 	the stealth-mode company I keep teasing about – comes in.)</li>
<li><strong>Disk will long be useful for 	very large databases.</strong> Kryder&#8217;s Law, about disk <strong>capacity,</strong> has at 	least as high an annual improvement as Moore&#8217;s Law shows for chip 	capacity, the <a href="http://www.dbms2.com/2010/01/31/the-disk-rotation-speed-bottleneck/" >disk rotation speed bottleneck</a> notwithstanding. Disk 	will long be much cheaper than silicon for data storage. And cheaper 	silicon in sensors will lead to ever more <a href="http://www.dbms2.com/2010/01/17/three-broad-categories-of-data/" >machine-generated data</a> that fills up a lot of disks.</li>
<li><strong>Disk will long be useful for 	archiving.</strong> Disk is the new tape.</li>
</ul>
<p style="margin-bottom: 0in;"><em>*When the first three people to the question microphone include both Mike Stonebraker and Dave DeWitt, your thinking tends to clarify in a hurry.</em></p>
<p style="margin-bottom: 0in;"><em><strong>Related links</strong></em></p>
<ul>
<li><span style="font-style: normal;"><span style="font-weight: normal;">A 	<a href="http://drona.csa.iisc.ernet.in/%7Egopi/west10/HPCA-WEST-SCMandSoftware.pdf" onclick="javascript:pageTracker._trackPageview('/drona.csa.iisc.ernet.in');">slide 	deck by C. Mohan of IBM</a> similar to the one he presented at the 	NEDB Summit about storage-class memories.</span></span></li>
<li><span style="font-style: normal;"><span style="font-weight: normal;">A 	much more detailed <a href="http://www.usenix.org/events/fast/tutorials/T3.pdf" onclick="javascript:pageTracker._trackPageview('/www.usenix.org');">IBM 	presentation</a> on storage-class memories.</span></span></li>
<li><span style="font-style: normal;"><span style="font-weight: normal;"><a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" >Oracle&#8217;s</a> and <a href="http://www.dbms2.com/2009/10/25/teradata-hardware-strategy-and-tactics/" >Teradata&#8217;s</a> beliefs about the importance of solid-state memory.<br />
</span></span></li>
</ul>
<p><em><strong>Other posts based on my January, 2010 New England Database Summit keynote address</strong></em></p>
<ul>
<li><a title="Data-based snooping — a huge threat to liberty that we’re all helping make worse" href="../2010/01/31/data-based-snooping-threat-libert/">Data-based snooping — a huge threat to liberty that we’re all helping make worse</a></li>
<li><a title="Interesting trends in database and analytic technology" href="../2010/01/31/trends-database-aanalytic-technology/">Interesting trends in database and analytic technology</a></li>
<li><a title="Open issues in database and analytic technology" href="../2010/02/01/open-issues-in-database-and-analytic-technology/">Open issues in database and analytic technology</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/01/31/flash-pcmsolid-state-memory-disk/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
