<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS2 -- DataBase Management System Services &#187; Michael Stonebraker</title>
	<atom:link href="http://www.dbms2.com/category/michael-stonebraker/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 18 Mar 2010 05:19:19 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Flash, other solid-state memory, and disk</title>
		<link>http://www.dbms2.com/2010/01/31/flash-pcmsolid-state-memory-disk/</link>
		<comments>http://www.dbms2.com/2010/01/31/flash-pcmsolid-state-memory-disk/#comments</comments>
		<pubDate>Sun, 31 Jan 2010 22:12:30 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1469</guid>
		<description><![CDATA[If there&#8217;s one subject on which the New England Database Summit changed or at least clarified my thinking,* it&#8217;s future storage technologies. Here&#8217;s what I now think:

Solid-state memory will soon be 	the right storage technology for a large fraction of databases, OLTP and analytic alike. I&#8217;m not sure whether the initial cutoff in 	database size [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">If there&#8217;s one subject on which the <a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/" >New England Database Summit</a> changed or at least clarified my thinking,* it&#8217;s future storage technologies. Here&#8217;s what I now think:</p>
<ul>
<li><strong>Solid-state memory will soon be 	the right storage technology for a large fraction of databases,</strong> OLTP and analytic alike. I&#8217;m not sure whether the initial cutoff in 	database size is best thought of as terabytes or 10s of terabytes, 	but it&#8217;s in that range. And it will increase over time, for the 	usual cheaper-parts reasons.</li>
<li><strong>That doesn&#8217;t necessarily mean 	flash.</strong> <a href="http://en.wikipedia.org/wiki/Phase-change_memory" onclick="javascript:pageTracker._trackPageview('/en.wikipedia.org');">PCM</a> (Phase-Change Memory) is coming down the pike, with perhaps 100X the 	durability of flash, in terms of the total number of writes it can 	tolerate. On the other hand, PCM has issues in the face of heat. 	More futuristically, IBM is also high on <a href="http://www.almaden.ibm.com/spinaps/research/sd/?racetrack" onclick="javascript:pageTracker._trackPageview('/www.almaden.ibm.com');">magnetic racetrack 	memory</a>. IBM likes the term <em>storage-class memory</em> to 	cover all this &#8212; which I find regrettable, since the acronym SCM is 	way overloaded already. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
<li><strong>Putting a disk controller in 	front of solid-state memory is really wasteful.</strong> It wreaks havoc 	on I/O rates.</li>
<li><strong>Generic PCIe interfaces don&#8217;t 	suffice either,</strong> in many analytic use cases. Their I/O is better, 	but still not good enough. (Doing better yet is where Petascan – 	the stealth-mode company I keep teasing about – comes in.)</li>
<li><strong>Disk will long be useful for 	very large databases.</strong> Kryder&#8217;s Law, about disk <strong>capacity,</strong> has at 	least as high an annual improvement as Moore&#8217;s Law shows for chip 	capacity, the <a href="http://www.dbms2.com/2010/01/31/the-disk-rotation-speed-bottleneck/" >disk rotation speed bottleneck</a> notwithstanding. Disk 	will long be much cheaper than silicon for data storage. And cheaper 	silicon in sensors will lead to ever more <a href="http://www.dbms2.com/2010/01/17/three-broad-categories-of-data/" >machine-generated data</a> that fills up a lot of disks.</li>
<li><strong>Disk will long be useful for 	archiving.</strong> Disk is the new tape.</li>
</ul>
<p style="margin-bottom: 0in;"><em>*When the first three people to the question microphone include both Mike Stonebraker and Dave DeWitt, your thinking tends to clarify in a hurry.</em></p>
<p style="margin-bottom: 0in;"><em><strong>Related links</strong></em></p>
<ul>
<li><span style="font-style: normal;"><span style="font-weight: normal;">A 	<a href="http://drona.csa.iisc.ernet.in/%7Egopi/west10/HPCA-WEST-SCMandSoftware.pdf" onclick="javascript:pageTracker._trackPageview('/drona.csa.iisc.ernet.in');">slide 	deck by C. Mohan of IBM</a> similar to the one he presented at the 	NEDB Summit about storage-class memories.</span></span></li>
<li><span style="font-style: normal;"><span style="font-weight: normal;">A 	much more detailed <a href="http://www.usenix.org/events/fast/tutorials/T3.pdf" onclick="javascript:pageTracker._trackPageview('/www.usenix.org');">IBM 	presentation</a> on storage-class memories.</span></span></li>
<li><span style="font-style: normal;"><span style="font-weight: normal;"><a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" >Oracle&#8217;s</a> and <a href="http://www.dbms2.com/2009/10/25/teradata-hardware-strategy-and-tactics/" >Teradata&#8217;s</a> beliefs about the importance of solid-state memory.<br />
</span></span></li>
</ul>
<p><em><strong>Other posts based on my January, 2010 New England Database Summit keynote address</strong></em></p>
<ul>
<li><a title="Data-based snooping — a huge threat to liberty that we’re all helping make worse" href="../2010/01/31/data-based-snooping-threat-libert/">Data-based snooping — a huge threat to liberty that we’re all helping make worse</a></li>
<li><a title="Interesting trends in database and analytic technology" href="../2010/01/31/trends-database-aanalytic-technology/">Interesting trends in database and analytic technology</a></li>
<li><a title="Open issues in database and analytic technology" href="../2010/02/01/open-issues-in-database-and-analytic-technology/">Open issues in database and analytic technology</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/01/31/flash-pcmsolid-state-memory-disk/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>New England Database Summit (January 28, 2010)</title>
		<link>http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/</link>
		<comments>http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/#comments</comments>
		<pubDate>Wed, 25 Nov 2009 06:46:45 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1260</guid>
		<description><![CDATA[New England Database Day has now, in its third year, become a &#8220;Summit.&#8221;  It&#8217;s a nice event, providing an opportunity for academics and business folks to mingle.  The organizers are basically the local branch of the Mike Stonebraker research tree, with this year&#8217;s programming head being Daniel Abadi. It will be on Thursday, January 28, [...]]]></description>
			<content:encoded><![CDATA[<p>New England Database Day has now, in its third year, become a &#8220;Summit.&#8221;  It&#8217;s a nice event, providing an opportunity for academics and business folks to mingle.  The organizers are basically the local branch of the Mike Stonebraker research tree, with this year&#8217;s programming head being Daniel Abadi. It will be on Thursday, January 28, 2010, once again in the Stata Center at MIT. It would be reasonable to park in the venerable 4/5 Cambridge Center parking lot, especially if you&#8217;d like to eat at Legal Seafood afterwards.</p>
<p>So far there are two confirmed speakers &#8212; Raghu Ramakrishnan of Yahoo and me.  My talk title will be something like &#8220;Database and analytic technology: The state of the union&#8221;, with all wordplay intended.</p>
<p>There&#8217;s more information at <a href="http://db.csail.mit.edu/nedbday10/" onclick="javascript:pageTracker._trackPageview('/db.csail.mit.edu');">the official New England Database Summit website</a>. There&#8217;s also a post with similar information on <a href="http://dbmsmusings.blogspot.com/2009/11/deadlines-approaching-for-two-upcoming.html" onclick="javascript:pageTracker._trackPageview('/dbmsmusings.blogspot.com');">Daniel Abadi&#8217;s <em>DBMS Musings</em> blog</a>.</p>
<p><em>Edit after the event:</em></p>
<p><em><strong>Posts based on my January, 2010 New England Database Summit keynote address</strong></em></p>
<ul>
<li><em><a title="Data-based snooping — a huge threat to liberty that we’re all helping make worse" href="../2010/01/31/data-based-snooping-threat-libert/">Data-based snooping — a huge threat to liberty that we’re all helping make worse</a></em></li>
<li><em><a title="Flash, other solid-state memory, and disk" href="../2010/01/31/flash-pcmsolid-state-memory-disk/">Flash, other solid-state memory, and disk</a></em></li>
<li><em><a title="Interesting trends in database and analytic technology" href="../2010/01/31/trends-database-aanalytic-technology/">Interesting trends in database and analytic technology</a></em></li>
<li><em><a title="Open issues in database and analytic technology" href="../2010/02/01/open-issues-in-database-and-analytic-technology/">Open issues in database and analytic technology</a></em></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Three big myths about MapReduce</title>
		<link>http://www.dbms2.com/2009/10/18/three-big-myths-about-mapreduce/</link>
		<comments>http://www.dbms2.com/2009/10/18/three-big-myths-about-mapreduce/#comments</comments>
		<pubDate>Sun, 18 Oct 2009 16:14:37 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1135</guid>
		<description><![CDATA[Once again, I find myself writing and talking a lot about MapReduce.  But I suspect that MapReduce-related conversations would go better if we overcame three fairly common MapReduce myths:

MapReduce is something very new
MapReduce involves strict 	adherence to the Map-Reduce programming paradigm
MapReduce is a single technology

So let&#8217;s give it a try.
When Dave DeWitt and Mike [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Once again, I find myself writing and talking a lot about MapReduce.  But I suspect that MapReduce-related conversations would go better if we overcame three fairly common MapReduce myths:</p>
<ul>
<li>MapReduce is something very new</li>
<li>MapReduce involves strict 	adherence to the Map-Reduce programming paradigm</li>
<li>MapReduce is a single technology</li>
</ul>
<p style="margin-bottom: 0in;"><span id="more-1135"></span>So let&#8217;s give it a try.</p>
<p style="margin-bottom: 0in;">When Dave DeWitt and Mike Stone<span style="font-style: normal;">braker leveled <a href="../2008/01/18/the-great-mapreduce-debate/">their famous blast at MapReduce</a>, many people thought they overstated their case. But one part of their story – one that both Mike and Dave say was most central to their case – was never effectively refuted, n</span>amely the claim that these ideas aren&#8217;t particularly new. I haven&#8217;t actually read enough computer science literature to have an independent opi<span style="font-style: normal;">nion on that issue. But I&#8217;ll say this – claims from companies such as <a href="../2009/10/18/introduction-to-sensage/">SenSage</a>, <a href="../2009/10/06/oracle-mapreduce/">Oracle</a>, or <a href="../2009/10/18/technical-introduction-to-splunk/">Splunk</a> that “We&#8217;ve be</span>en doing MapReduce all along” seem pretty credible to me.</p>
<p style="margin-bottom: 0in;">True, what those companies were doing things may not have looked exactly like the instant-classic MapReduce programming paradigm. But the same is true of many things almost everybody would agree count as MapReduce.  In particular, it is often not the case that you alternate Map and Reduce steps, each of whose outputs is a set of simple &lt;Key, Value&gt; pairs, with data redistributed based on Key at every step.</p>
<p style="margin-bottom: 0in;">Here are some examples of what I mean, drawn from <a href="http://www.asterdata.com/blog/index.php/2009/10/15/mastering-mapreduce/" onclick="javascript:pageTracker._trackPageview('/www.asterdata.com');">my recent MapReduce webinar</a>.</p>
<ul>
<li>If you do text indexing in 	MapReduce, your goal is to wind up with a text index. So at some 	point you Reduce to a pair &lt;WordName, {all the (DocumentID, 	offset) pairs for the whole corpus, suitably ordered}&gt;.  That&#8217;s a 	heckuva compound “Value”.</li>
<li>The goal of data mining is usually 	to estimate a rather small number of parameters based on a large 	overall data set, often – depending on algorithm – in the form 	of a single vector. When you do that in MapReduce. you partition 	data among nodes, calculate something on each node that is 	structured more or less like your final vector. So when it comes 	time for the reduce, you just ship all of your vectors – one per 	node – to a single Reduce node, and do the appropriate math. 	Redistribution based on Key would be quite pointless.</li>
<li>When you sessionize clickstream 	logs in MapReduce, you may have just as many output records as input 	records. However, they now are reformatted, and might have a 	SessionID appended. In those cases, Reduce isn&#8217;t doing much by the 	way of reduction.</li>
<li>And as I happens in some 	<a href="../2009/08/04/verticas-version-of-mapreduce-integration/">Vertica-Hadoop</a> use cases around mortgage trading, sometimes MapReduce can even make 	data s<span style="font-style: normal;">ets vastly larger.</span></li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;">By no means do I think this is a weakness of the MapReduce programming paradigm. Rather, I think it&#8217;s a MapReduce strength. But it&#8217;s not quite the way MapReduce has been promoted and explained to the IT public.</p>
<p style="margin-bottom: 0in; font-style: normal;">Finally: MapReduce, as commonly conceived, spans two different – albeit closely related – technology domains:</p>
<ul>
<li>Parallel 	programming</li>
<li>Distributed 	data management</li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;">For example, I imagine Greenplum&#8217;s and Vertica&#8217;s MapReduce/SQL combined syntaxes are very similar to each others. But Vertica&#8217;s data management implementation of MapReduce, which relies on Hadoop, is very different from Greenplum&#8217;s, which is tied into the Greenplum DBMS. Similary, non-DBMS MapReduce implementations are commonly associated with distributed file systems – notably HDFS (Hadoop Distributed File Systems) or Google&#8217;s internal GFS (Google File System). In those systems, the parallel language execution part should be aware of how the distributed file management part works – but perhaps that awareness can be pretty lightweight.</p>
<p style="margin-bottom: 0in; font-style: normal;">Right now, this is a distinction pretty much without a difference. If you choose an implementation of MapReduce &#8212; like pure Hadoop (say in the Cloudera distribution) or Hadoop-Vertica or Aster Data&#8217;s SQL/MapReduce – you&#8217;re basically picking an entire technology stack. But those stacks are going to do a whole lot of changing and maturing in the near future – and as they do, it&#8217;s likely that projects will interact or even combine in all sorts of interesting ways.</p>
<p style="margin-bottom: 0in; font-style: normal;"><strong>Bottom line: There are a lot of different ways to exploit MapReduce-related technology.</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/18/three-big-myths-about-mapreduce/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Introduction to the XLDB and SciDB projects</title>
		<link>http://www.dbms2.com/2009/09/12/xldb-scid/</link>
		<comments>http://www.dbms2.com/2009/09/12/xldb-scid/#comments</comments>
		<pubDate>Sat, 12 Sep 2009 19:54:51 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[eBay]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=883</guid>
		<description><![CDATA[Before I write anything else about the overlapping efforts known as XLDB and SciDB, I probably should explain and disambiguate what they are as best I can.  XLDB was organized and still is run by guys who want to solve a scientific problem in eXtremely Large DataBase Management, most especially Jacek Becla of SLAC [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Before I write anything else about the overlapping efforts known as <em>XLDB</em> and <em>SciDB</em>, I probably should explain and disambiguate what they are as best I can.  XLDB was organized and still is run by guys who want to solve a scientific problem in eXtremely Large DataBase Management, most especially Jacek Becla of SLAC (the organization previously known as Stanford Linear Accelerator Center). Becla&#8217;s original motivation was that he needs a DBMS to manage what will be 55 petabytes of raw image data and 100 petabytes of astronomical data total for <a href="http://www.lsst.org/lsst" onclick="javascript:pageTracker._trackPageview('/www.lsst.org');">LSST</a> (Large Synoptic Survey Telescope).<span id="more-883"></span></p>
<p style="margin-bottom: 0in;">XLDB more or less comprises:</p>
<ul>
<li>A series of what have now been 	three workshops: <span style="font-style: normal;"><a href="http://www-conf.slac.stanford.edu/xldb07/" onclick="javascript:pageTracker._trackPageview('/www-conf.slac.stanford.edu');">XLDB1 	in 2007</a>, <a href="http://www-conf.slac.stanford.edu/xldb08/" onclick="javascript:pageTracker._trackPageview('/www-conf.slac.stanford.edu');">XLDB2 	in 2008</a>, and <a href="http://www-conf.slac.stanford.edu/xldb09/default.htm" onclick="javascript:pageTracker._trackPageview('/www-conf.slac.stanford.edu');">XLDB3 	in 2009</a></span> (the closest thing to a master link is probably 	the <a href="http://www-conf.slac.stanford.edu/xldb09/links.htm" onclick="javascript:pageTracker._trackPageview('/www-conf.slac.stanford.edu');">XLDB3 	site&#8217;s related link page)</a>. Participants have included, among 	others:
<ul>
<li>A lot of big-name 	database-oriented computer science researchers &#8212; Mike Stonebraker, 	Dave DeWitt, Martin Kersten, and numerous others</li>
<li>Academics responsible for 	scientific database management, especially but not only in the 	astronomy area</li>
<li>Some vendors (although vendor 	participation was cut back after XLDB1) &#8212; at XLDB3, which is the 	one I went to, the three vendor folks who actually talked were 	Stephen Brobst of Teradata, Luke Lonergan of Greenplum (who worked 	in scientific high performance computing earlier in his career), and 	Jeff Hammerbacher of Cloudera.</li>
<li>eBay and to some extent other 	large web companies</li>
<li>A European Union funding 	bureaucrat</li>
<li>Me</li>
</ul>
</li>
<li>An attempt to kick start a broader 	movement, perhaps comprising (it&#8217;s not totally clear yet):
<ul>
<li>Computer science researchers 	interested in database issues</li>
<li>Database technology vendors</li>
<li>Scientific researchers (academic) 	who have very large or otherwise difficult database management 	problems</li>
<li>Scientific researchers 	(commercial) who have very large or otherwise difficult database 	management problems</li>
<li>Other commercial users who have 	very large database management problems</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;">The first result or spin-out from the XLDB effort seems to have been the <a href="http://www.scidb.org/" onclick="javascript:pageTracker._trackPageview('/www.scidb.org');">SciDB</a> project. This is an effort to build an open source DBMS called SciDB that will address <strong>some</strong> of the needs the XLDB effort is uncovering. (More on that in other posts.) Somewhat confusingly, <strong>all</strong> the use cases the XLDB group is collecting are currently being posted on SciDB&#8217;s website, apparently because it&#8217;s glitzier and healthier than, say, the excessively sparse XLDB wiki. Some SciDB development has happened, but no large sugar daddy has yet been found. (It&#8217;s a fairly open secret that eBay looked seriously and favorably at funding SciDB before the economic downturn.) hit.</p>
<p style="margin-bottom: 0in;">Numerous big-name computer scientists are associated with SciDB, indeed more closely (it would seem) than with XLDB. That said, I&#8217;m guessing Dave DeWitt&#8217;s involvement in the open-source SciDB isn&#8217;t what it would be if he hadn&#8217;t gone to Microsoft. DeWitt actually skipped XLDB3, although he was in town for VLDB. (XLDB3 was back-to-back with VLDB 2009 in Lyon, France in late August.) Stonebraker just didn&#8217;t make the flight for either conference, due to the double-knee &#8220;upgrade&#8221; he had back in March.</p>
<p style="margin-bottom: 0in;">There&#8217;s a lot more to be said about the cross-discipline or science-specific requirements that researchers place on data management, but I&#8217;ll leave that for later and just get this posted as a start &#8212; assuming, of course, that <a href="http://www.dbms2.com/2009/09/12/availability-nightmares-continue/" >blog outages</a> permit. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' /> </p>
<p style="margin-bottom: 0in;"><em><strong>Related links</strong></em></p>
<ul>
<li><a href="http://www-db.cs.wisc.edu/cidr/cidr2009/Paper_26.pdf" onclick="javascript:pageTracker._trackPageview('/www-db.cs.wisc.edu');">Paper 	laying out the SciDB project</a></li>
<li><a href="http://database.cs.brown.edu/projects/scidb/" onclick="javascript:pageTracker._trackPageview('/database.cs.brown.edu');">One 	version of a SciDB overview page</a>, with links to academic papers</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/09/12/xldb-scid/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>NoSQL?</title>
		<link>http://www.dbms2.com/2009/07/01/nosql-sql-alternative/</link>
		<comments>http://www.dbms2.com/2009/07/01/nosql-sql-alternative/#comments</comments>
		<pubDate>Wed, 01 Jul 2009 07:33:25 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[DBMS product categories]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Fox and MySpace]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[VoltDB and H-Store]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=827</guid>
		<description><![CDATA[Eric Lai emailed today to ask what I thought about the NoSQL folks, and especially whether I thought their ideas were useful for enterprises in general, as opposed to just Web 2.0 companies. That was the first I heard of NoSQL, which seems to be a community discussing SQL alternatives popular among the cloud/big-web-company set, [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Eric Lai emailed today to ask what I thought about the <a href="http://blog.oskarsson.nu/2009/06/nosql-debrief.html" onclick="javascript:pageTracker._trackPageview('/blog.oskarsson.nu');">NoSQL</a> folks, and especially whether I thought their ideas were useful for enterprises in general, as opposed to just Web 2.0 companies. That was the first I heard of NoSQL, which seems to be a community discussing SQL alternatives popular among the cloud/big-web-company set, such as BigTable, Hadoop, Cassandra and so on. My short answers are:</p>
<ul>
<li>In most cases, no.</li>
<li>Most of these technologies are 	designed for simple, high-volume OLTP (OnLine Transaction 	Processing.) Most large enterprises have an established way of doing 	OLTP, probably via relational database management systems. Why 	change?</li>
<li>MapReduce is an exception, in that 	it&#8217;s designed for analytics. MapReduce may be useful for 	enterprises. But where it is, it probably should be <a href="http://www.dbms2.com/2008/08/26/why-mapreduce-matters-to-sql-data-warehousing/" >integrated 	into an analytic DBMS</a>.</li>
<li>There&#8217;s one 	big countervailing factor to all these generalities &#8212; <em>schema 	flexibility.</em></li>
</ul>
<p style="margin-bottom: 0in;">As for the longer form, let me start by noting that there are two main kinds of reason for not liking SQL.  <span id="more-827"></span>First, you might be fine with the idea of a (somewhat) nonprocedural, schema-aware DML/DDL (Data Manipulation/Description Language), but just think another kind is better, or more suited to your use case.  If your reason is like that, you might favor alternatives such as:</p>
<ul>
<li>OLAP-based languages such as MDX.</li>
<li>XML-oriented languages.</li>
<li>&#8220;True&#8221; relational 	languages, because SQL deviated from the path of relational virtue 	under the corrupt influence of IBM &#8212; aka &#8220;Blue Babylon&#8221; &#8212; and the IT world has been 	languishing in sin ever since.</li>
</ul>
<p style="margin-bottom: 0in;">The second class of reason for avoiding SQL is because you don&#8217;t like the idea of a separate schema-aware DML at all.  Possible reasons for this orientation include:</p>
<ul>
<li>You just like to program, and want 	to manipulate stored data the same way you do anything else. Thus, 	you are bothered by an &#8220;impedance mismatch&#8221; between SQL 	and your favorite programming languages.  This is real. It also has been overcome by many, many enterprises around the world.</li>
<li>You believe that more procedural 	alternatives are a better fit for cloud computing and extreme 	scale-out on failure-prone commodity hardware. <a href="http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/" >Facebook</a> made 	that case to me.  However, I have trouble thinking of very many 	enterprise scenarios where it applies, especially when one considers 	electricity costs and the like.</li>
<li>Your schemas change more quickly 	than your data architects can reasonably be expected to keep up 	with.  Facebook made that case to me too. Enterprise examples might 	include marketing campaigns and M&amp;A.  I&#8217;ve long thought this to 	be a legitimate, looming concern. But I don&#8217;t know that 	stripped-down DBMS are the way to address it.</li>
<li>You believe that SQL has severe 	processing overhead.  In most enterprise use cases, that would just 	be bogus.</li>
<li>You lack familiarity with SQL.</li>
</ul>
<p style="margin-bottom: 0in;">That last point is not a joke. One of the weirder database architectures I know of is <a href="http://www.dbms2.com/2007/06/09/the-database-technology-of-guild-wars/" >the one underlying Guild Wars</a>.  Its developer &#8212; a brilliantly impressive guy &#8212; told me flat-out that he learned in college how to build a DBMS, but he didn&#8217;t learn how to develop for a conventional one.  This was instrumental in his decision to build an unconventional data management architecture that uses SQL Server as little more than a smart file manager.</p>
<p style="margin-bottom: 0in;">The questions of SQL performance and &#8212; often-unspecified &#8212; &#8220;overhead&#8221; are interesting to view through the lens of the <a href="http://www.dbms2.com/2009/06/22/h-store-horizontica-voltdb/" >H-Store/VoltDB</a> project. Mike Stonebraker et al.:</p>
<ul>
<li>Are building a scale-out-oriented 	OLTP DBMS that is meant to run in RAM, preserving data through 	replication to other servers&#8217; RAM more than through output to disk.</li>
<li>Believe that 95% of what a typical 	SQL DBMS does to manage OLTP is wasteful overhead</li>
<li>Originally planned to not use SQL, 	but wound up going with SQL because alternatives were insufficiently 	performant.</li>
</ul>
<p style="margin-bottom: 0in;">Mike himself, of course, has been all over the spectrum on SQL-like languages. First he favored QUEL vigorously over SQL for mainstream relational DBMS.  Then he led the charge to extend SQL in PostgreSQL, Illustra, et al. Then he actually staked out a contrarian position in the area of complex event/stream processing <span style="font-style: normal;">by favoring a SQL-like language in an area where other alternatives were better established &#8212; but that was at what turned into StreamBase, <a href="http://www.dbms2.com/2009/05/21/notes-on-cep-application-development/" >which now emphasizes visual programming over any kind of coding language</a>.</span></p>
<p style="margin-bottom: 0in;">I need to write much more about schema flexibility, but tonight &#8212; which will be my third straight of &lt;&lt;8 hours sleep &#8212; is not the time for that.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/07/01/nosql-sql-alternative/feed/</wfw:commentRss>
		<slash:comments>30</slash:comments>
		</item>
		<item>
		<title>There always seems to be a fire drill around MapReduce news</title>
		<link>http://www.dbms2.com/2009/04/14/there-always-seems-to-be-a-fire-drill-around-mapreduce-news/</link>
		<comments>http://www.dbms2.com/2009/04/14/there-always-seems-to-be-a-fire-drill-around-mapreduce-news/#comments</comments>
		<pubDate>Tue, 14 Apr 2009 06:03:43 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[About this blog]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=746</guid>
		<description><![CDATA[Last August I flew out to see my new clients at Greenplum. They told me they planned to roll out MapReduce in a few weeks, and asked for my help in publicizing it.  From their offices I went to dinner with non-clients Aster Data, who told me they&#8217;d gotten wind of a Greenplum MapReduce [...]]]></description>
			<content:encoded><![CDATA[<p>Last August I flew out to see my new clients at Greenplum. They told me they planned to roll out MapReduce in a few weeks, and asked for my help in publicizing it.  From their offices I went to dinner with non-clients Aster Data, who told me they&#8217;d gotten wind of a Greenplum MapReduce announcement and planned to come out ahead of it. A couple of hours later, Aster signed up as a client.  In something of a pickle &#8212; but not one of my own making &#8212; I knocked heads, and persuaded both vendors to <a href="http://www.dbms2.com/2008/08/25/mapreduce-links/" >announce MapReduce at the same time</a>, namely the following Monday.  Lots of publicity ensued for both vendors, and everybody was reasonably satisfied.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><span id="more-746"></span>Last week I went back to California, visiting &#8212; among others &#8212; Greenplum, Aster, and eBay. Greenplum turns out to have a somewhat more skeptical view of MapReduce than they held <a href="http://www.dbms2.com/2009/03/07/three-greenplum-customers-applications-of-mapreduce/" >previously</a>.<span style="font-style: normal;"> Aster Data continues to be somewhat more bullish, a difference I attribute in part to a focus on slightly different customer segments.  (For the record, I probably put more weight on that reason than Aster itself does.) <a href="http://www.dbms2.com/2009/04/14/ebay-thinks-mpp-dbms-clobber-mapreduce/" >eBay seems even more negative on MapReduce</a>, if that is possible, than it </span>previously<span style="font-style: normal;"> was.  Also, I gathered I should talk with Hadoop-centric start-up Cloudera, and arranged to do so for this Tuesday, after which I planned to write a MapReduce update.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">This afternoon (Monday) I was at Vertica, with Mike Stonebraker part of the meeting.  He mentioned that he and David DeWitt had a new SIGMOD paper that compared MapReduce unfavorably to parallel SQL DBMS, both Vertica and a row store &#8220;DBMS-X&#8221;. Mike offered to get me a copy next week, and I agreed to hold off on my MapReduce update until then.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">At 6:30 this evening, Eric Lai of Computerworld emailed me with a draft of the paper he&#8217;d gotten from DeWitt, with a request for comment.  He was submitting the <a href="http://www.computerworld.com/action/article.do?command=viewArticleBasic&amp;articleId=9131526" onclick="javascript:pageTracker._trackPageview('/www.computerworld.com');">story</a> at 8 pm.  I sent email back to Vertica saying &#8220;What the hell??????&#8221; (after editing my original draft of the third word in that) and set to work.  Later in the evening, coauthor Andy Pavlo posted a web page with the benchmark particulars, and eventually posted a link to the paper to. And I rushed out several <a href="http://www.dbms2.com/2009/04/14/stonebraker-dewitt-et-al-compare-mapreduce-to-dbms/" >related</a> blog posts.</span></p>
<p style="margin-bottom: 0in;">Frankly, my views on MapReduce are more balanced than today&#8217;s weary negativity would seem to imply.  But I didn&#8217;t have time to wait and lead off with an overview post reflecting that balance.  Stay tuned.</p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">There. I&#8217;ve vented. I feel better. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </span></p>
<p style="margin-bottom: 0in;">And by the way, I&#8217;m not angry at anybody.  Really.  What amazes me is how SNAFUed things manage to get <em>without</em> anybody doing anything particularly wrong.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/04/14/there-always-seems-to-be-a-fire-drill-around-mapreduce-news/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Stonebraker, DeWitt, et al. compare MapReduce to DBMS</title>
		<link>http://www.dbms2.com/2009/04/14/stonebraker-dewitt-et-al-compare-mapreduce-to-dbms/</link>
		<comments>http://www.dbms2.com/2009/04/14/stonebraker-dewitt-et-al-compare-mapreduce-to-dbms/#comments</comments>
		<pubDate>Tue, 14 Apr 2009 05:53:37 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=748</guid>
		<description><![CDATA[Along with five other coauthors &#8212; the lead author seems to be Andy Pavlo &#8212; famous MapReduce non-fans Mike Stonebraker and David DeWitt have posted a SIGMOD 2009 paper called &#8220;A Comparison of Approaches to Large-Scale Data Analysis.&#8221;  The heart of the paper is benchmarks of Hadoop, Vertica, and &#8220;DBMS-X&#8221; on identical clusters of [...]]]></description>
			<content:encoded><![CDATA[<p>Along with five other coauthors &#8212; the lead author seems to be Andy Pavlo &#8212; <a href="../2008/01/18/the-great-mapreduce-debate/">famous MapReduce non-fans Mike Stonebraker and David DeWitt</a> have posted a SIGMOD 2009 paper called &#8220;<a href="http://database.cs.brown.edu/sigmod09/" onclick="javascript:pageTracker._trackPageview('/database.cs.brown.edu');">A Comparison of Approaches to Large-Scale Data Analysis</a>.&#8221;  The heart of the paper is benchmarks of Hadoop, Vertica, and &#8220;DBMS-X&#8221; on identical clusters of 100 low-end nodes., across a series of tests including (if I understood correctly):</p>
<ul>
<li>A couple of different flavors of <span style="font-style: normal;">a 	Grep t</span>ask originally proposed in a Google MapReduce paper.</li>
<li>A database query on simulated 	clickstream data</li>
<li>A join on the same clickstream 	data.</li>
<li>Two aggregations on the 	clickstream data.</li>
</ul>
<p style="margin-bottom: 0in;"><span id="more-748"></span><span>Both D</span>BMS outshone Hadoop, and Vertica outperformed DBMS-X.  This was true bo<span style="font-style: normal;">th on the Grep</span> task, and also on all the other DBMS-like tasks the authors specified. Reasons for the DBMS outdoing Hadoop included compression and optimization. Reasons for Vertica outdoing DBMS-X included the usual benefits of column stores.</p>
<p style="margin-bottom: 0in;">More precisely, both DBMS clobbered Hadoop on throughput.  Hadoop, however, had some advantages in load speed and the like.</p>
<p style="margin-bottom: 0in;">The paper also argues strenuously that for complex and/or team-oriented database programming, one is much better off using a DBMS rather than reinventing the software wheel. However, it concedes that for simple programming tasks, Hadoop may be easier and lighter-weight. For example, some of the benchmark tasks required user-defined functions (UDFs) or the equivalent, and those weren&#8217;t as easy to write in the DBMS as one might think.</p>
<p style="margin-bottom: 0in;">Frankly, the paper is less extremely anti-MapReduce than I expected based on the authorship, or on how Mike Stonebraker framed it to me when <em>he told me about it Monday afternoon.</em> That said, it is absolutely in line with the DeWitt/Stonebraker meme &#8220;MapReduce isn&#8217;t nearly as good for DBMS-style processing as a DBMS is.&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/04/14/stonebraker-dewitt-et-al-compare-mapreduce-to-dbms/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>New England Database Day this Friday January 30</title>
		<link>http://www.dbms2.com/2009/01/26/new-england-database-day-this-friday-january-30/</link>
		<comments>http://www.dbms2.com/2009/01/26/new-england-database-day-this-friday-january-30/#comments</comments>
		<pubDate>Mon, 26 Jan 2009 21:04:01 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=669</guid>
		<description><![CDATA[Dan Weinreb, to whose opinions I usually give great weight, spoke very favorably of last year&#8217;s New England Database Day conference.  Well, this year&#8217;s is taking place on Friday.  It&#8217;s at MIT and it&#8217;s free, with easy registration.  A list of papers is here. 
It&#8217;s pretty obvious who&#8217;s running the show. Sam Madden&#8217;s name is given [...]]]></description>
			<content:encoded><![CDATA[<p>Dan Weinreb, to whose opinions I usually give great weight, spoke very favorably of last year&#8217;s New England Database Day conference.  Well, this year&#8217;s is taking place on Friday.  It&#8217;s at MIT and it&#8217;s free, with easy registration.  A list of papers is <a href="http://db.csail.mit.edu/nedbday09/htdocs/papers.php" onclick="javascript:pageTracker._trackPageview('/db.csail.mit.edu');">here</a>. </p>
<p>It&#8217;s pretty obvious who&#8217;s running the show. Sam Madden&#8217;s name is given as a contact; elsewhere it&#8217;s referred to as being organized by Madden and Mike Stonebraker.  Of the six identified papers, 2-3 look like the subjects or people could be taken straight from Vertica&#8217;s <a href="http://www.databasecolumn.com/" onclick="javascript:pageTracker._trackPageview('/www.databasecolumn.com');">Database Column</a> blog.  But that hardly means the event will be one long Vertica commercial.  For example, the other papers include one from Netezza and one on Flash memory data access methods.</p>
<p>I really doubt I&#8217;ll make to Cambridge in time for the 9:00 am opening remarks <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> , but I&#8217;ll try to swing by later on.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/01/26/new-england-database-day-this-friday-january-30/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Mike Stonebraker&#8217;s counterarguments to MapReduce&#8217;s popularity</title>
		<link>http://www.dbms2.com/2008/09/04/mike-stonebraker-mapreduce/</link>
		<comments>http://www.dbms2.com/2008/09/04/mike-stonebraker-mapreduce/#comments</comments>
		<pubDate>Thu, 04 Sep 2008 23:39:46 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=519</guid>
		<description><![CDATA[In response to recent posting I&#8217;ve done about MapReduce, Mike Stonebraker just got on the phone to give me his views.  His core claim, more or less, is that anything you can do in MapReduce you could already do in a parallel database that complies with SQL-92 and/or has PostgreSQL underpinnnings.  In particular, [...]]]></description>
			<content:encoded><![CDATA[<p>In response to recent posting I&#8217;ve done about MapReduce, Mike Stonebraker just got on the phone to give me his views.  His core claim, more or less, is that anything you can do in MapReduce you could already do in a parallel database that complies with SQL-92 and/or has PostgreSQL underpinnnings.  In particular, Mike says:<span id="more-519"></span></p>
<ul>
<li><strong><em>Map</em> functions can&#8217;t do anything you can&#8217;t also do in PostgreSQL user-defined functions</strong> (assuming, of course, PostgreSQL UDFs can be written in the language you want to use).</li>
<li><strong><em>Reduce</em> functions can&#8217;t do anything you can&#8217;t also do in PostgreSQL user-defined aggregates</strong> (with the same caveat).</li>
<li><strong><em>Map</em> and <em>Reduce</em> functions always write their result sets to disk.</strong> This can create a large performance loss.</li>
<li><strong><em>Map</em> and <em>Reduce</em> functions require new instances to be fired up to run them.</strong> This can also create a large performance loss.  (Without checking, I&#8217;m guessing that one is very implementation-specific.  I.e., even if it&#8217;s true of Hadoop, it may not be true of Greenplum&#8217;s or Aster Data&#8217;s MapReduce implementations.)</li>
<li>Mike and his associates are working on benchmarks that he believes will show that MapReduce performance is <strong>10X worse than parallel row-based SQL DBMS</strong>, and <strong>100X worse than columnar SQL DBMS</strong>.</li>
<li>MapReduce doesn&#8217;t play nicely with the SQL Analytics part of the SQL standard.</li>
<li>The one advantage Mike concedes to MapReduce &#8212; more graceful degradation when nodes fail &#8212; isn&#8217;t that important in the hardware configurations on which parallel analytic DBMS actually run today.  I.e., a Greenplum or Vertica installation is going to have nodes fail much more rarely than a Google data center will.</li>
</ul>
<p>Bottom line:  <strong>Mike Stonebraker more than disagrees with the claim that <a href="http://www.dbms2.com/2008/08/26/why-mapreduce-matters-to-sql-data-warehousing/" >MapReduce is a valuable addition to SQL data warehousing</a></strong>, on somewhat different grounds than he emphasized in the <a href="http://www.dbms2.com/2008/01/18/the-great-mapreduce-debate/" >Great MapReduce Debate</a> last January.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/09/04/mike-stonebraker-mapreduce/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Microsoft is buying DATAllegro</title>
		<link>http://www.dbms2.com/2008/07/24/microsoft-is-buying-datallegro/</link>
		<comments>http://www.dbms2.com/2008/07/24/microsoft-is-buying-datallegro/#comments</comments>
		<pubDate>Thu, 24 Jul 2008 18:37:34 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[DATAllegro]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=467</guid>
		<description><![CDATA[I&#8217;ve long argued that:

Oracle 	and Microsoft are doomed in the data warehouse market unless they acquire MPP/shared-nothing data warehouse DBMS 	and/or data warehouse appliances.
DATAllegro 	is the ideal acquisition for either of them.

Microsoft has now validated my claim by agreeing to buy DATAllegro.  As you probably know, we&#8217;ve been covering DATAllegro extensively, as per the [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve long argued that:</p>
<ul>
<li><span style="color: #000080;"><span style="text-decoration: underline;"><a href="../2007/03/06/why-oracle-and-microsoft-will-lose-in-vldb-data-warehousing/">Oracle 	and Microsoft are doomed in the data warehouse market</a></span></span> <strong>unless</strong> they acquire MPP/shared-nothing data warehouse DBMS 	and/or data warehouse appliances.</li>
<li><span style="color: #000080;"><span style="text-decoration: underline;"><a href="../2007/04/11/deal-prospects-for-data-warehouse-dbms-vendors/">DATAllegro 	is the ideal acquisition</a></span></span> for either of them.</li>
</ul>
<p style="margin-bottom: 0in;">Microsoft has now validated my claim by agreeing to buy DATAllegro.  As you probably know, we&#8217;ve been covering DATAllegro extensively, as per the links listed below.</p>
<p style="margin-bottom: 0in;">Basic deal highlights include: <span id="more-467"></span></p>
<ul>
<li>A definitive agreement has been 	signed.</li>
<li>Deal closing is expected in a few 	weeks.</li>
<li>I got the impression that the 	undisclosed price is surely a nice step-up from the <span style="color: #000080;"><span style="text-decoration: underline;"><a href="http://www.datallegro.com/pr/5_1_08_series_d_funding.asp" onclick="javascript:pageTracker._trackPageview('/www.datallegro.com');">Series 	D round that closed a few months ago</a></span></span>.</li>
<li>DATAllegro CEO Stuart Frost will 	run an engineering division, based at DATAllegro&#8217;s current 	headquarters, reporting into Microsoft&#8217;s SQL Server division.  He 	seems to be locked into staying at Microsoft for at least a couple 	of years.</li>
<li>DATAllegro&#8217;s software will be 	ported from its current Linux/Ingres stack to Windows/SQL Server.</li>
<li>The DATAllegro brand name will 	probably go away.</li>
<li>Everything else is either 	undisclosed or truly not yet decided.  In particular, there&#8217;s no 	word as to whether Stuart will run any parts of what now is 	Microsoft.</li>
</ul>
<p style="margin-bottom: 0in;">To understand how DATAllegro will fit into Microsoft&#8217;s SQL Server product line, let&#8217;s start by reviewing aspects of DATAllegro&#8217;s product architecture:</p>
<ul>
<li>Each DATAllegro node except the 	head runs a full copy of Ingres, an OLTP DBMS, over Linux.</li>
<li>Thus, both data and SQL are 	shipped from node to node.  Like vendors of comparable systems, 	DATAllegro does a better job each release of either resolving 	queries on one node or shipping data from peer to peer, rather than 	sending all intermediate results up to the head node for further 	processing.</li>
<li>The whole thing runs on standard 	blades and EMC storage, plus Cisco Infiniband.</li>
<li>The DATAllegro head node runs 	DATAllegro&#8217;s own SQL optimizer.  Thus in some ways each query is 	optimized twice – once overall in the head node, then again as 	different pieces of the query are run on different nodes.  	(Actually, it&#8217;s generally more accurate to say that each piece of 	the query is run once per node, but sometimes that isn&#8217;t literally 	true due to considerations of partitioning.)</li>
</ul>
<p style="margin-bottom: 0in;">Feasibility work has already been done on the port to SQL Server. Stuart reports that the work so far indicates a significant speed-up, which he attributes to data warehouse performance optimizations present in SQL Server that are lacking in the less well-funded Ingres.  (Specifically mentioned were star joins and some sort of memory-centric capability.)  One interesting implication is that when DATAllegro&#8217;s optimizer is rewritten for the port, it will largely do <em>less </em>than it has been doing to date, since SQL Server needs less “help” in optimizing the single-node parts of queries than Ingres does.  The port will also of course involve changes to the file structures, due both to the change of DBMS and operating system; I got the sense that in this area, final decisions truly haven&#8217;t yet been made.</p>
<p style="margin-bottom: 0in;">And yes – Stuart now confesses that DATAllegro was designed for acquisition from the get-go, e.g. in the choice to incorporate a third-party OLTP DBMS.</p>
<p style="margin-bottom: 0in;"><em><strong>Related links (some about to go up)<br />
</strong></em></p>
<ul>
<li>Why DATAllegro could provide 	Microsoft with a true enterprise data warehouse sooner than you 	think</li>
<li>How will Oracle save its data 	warehouse business?</li>
<li>The data warehouse DBMS 	consolidation has begun</li>
<li>Our best cut at <span style="color: #000080;"><span style="text-decoration: underline;"><a href="../2007/10/25/datallegro-discloses-a-few-numbers/">DATAllegro&#8217;s 	numbers</a></span></span></li>
<li>Two <span style="color: #000080;"><span style="text-decoration: underline;"><a href="http://www.monash.com/whitepapers.html" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">white 	papers</a></span></span> (dated March and May of 2007) focusing on 	DATAllegro&#8217;s product architecture</li>
<li>Why MPP/shared-nothing 	architectures (which neither Oracle nor Microsoft SQL Server have) 	<span style="color: #000080;"><span style="text-decoration: underline;"><a href="../2007/03/06/why-oracle-and-microsoft-will-lose-in-vldb-data-warehousing/">will 	win in the data warehouse market</a></span></span></li>
<li>Why <span style="color: #000080;"><span style="text-decoration: underline;"><a href="../2008/06/28/response-to-rita-sallam-of-oracle/">Oracle&#8217;s 	counterarguments don&#8217;t hold water</a></span></span></li>
<li><span style="color: #000080;"><span style="text-decoration: underline;"><a href="../2007/10/12/three-ways-oracle-and-microsoft-could-go-mpp/">Three 	ways Oracle and Microsoft could go MPP</a></span></span></li>
<li><span style="color: #000080;"><span style="text-decoration: underline;"><a href="../2007/04/11/deal-prospects-for-data-warehouse-dbms-vendors/">Deal 	prospects in the data warehouse DBMS market</a></span></span></li>
<li><span style="color: #000080;"><span style="text-decoration: underline;"><a href="http://www.datallegro.com/" onclick="javascript:pageTracker._trackPageview('/www.datallegro.com');">DATAllegro&#8217;s 	web site</a></span></span></li>
<li><span style="color: #000080;"><span style="text-decoration: underline;"><a href="http://www.beyeblogs.com/DATAllegro/" onclick="javascript:pageTracker._trackPageview('/www.beyeblogs.com');">DATAllegro 	CEO Stuart Frost&#8217;s spirited blog</a></span></span></li>
<li>A general link to all our 	<span style="color: #000080;"><span style="text-decoration: underline;"><a href="../category/products-and-vendors/datallegro/">DATAllegro 	coverage</a></span></span></li>
</ul>
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/07/24/microsoft-is-buying-datallegro/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 0.335 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2010-03-18 15:38:26 -->
