<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; PostgreSQL</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/postgresql/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 02 Sep 2010 09:06:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>dbShards &#8212; a lot like an MPP OLTP DBMS based on MySQL or PostgreSQL</title>
		<link>http://www.dbms2.com/2010/07/28/dbshards/</link>
		<comments>http://www.dbms2.com/2010/07/28/dbshards/#comments</comments>
		<pubDate>Wed, 28 Jul 2010 09:39:11 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[dbShards and CodeFutures]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2662</guid>
		<description><![CDATA[I talked yesterday w/ Cory Isaacson, who runs CodeFutures, makers of dbShards.  dbShards is a software layer that turns an ordinary DBMS (currently MySQL or PostgreSQL) into an MPP shared-nothing ACID-compliant OLTP DBMS. Technical highlights included:  

Despite heavy emphasis on the 	word “sharding,” dbShards&#8217;s scale-out is transparent to the 	application programmer. E.g., in dbShards [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I talked yesterday w/ Cory Isaacson, who runs CodeFutures, makers of dbShards.  dbShards is a software layer that turns an ordinary DBMS (currently MySQL or PostgreSQL) into an MPP shared-nothing ACID-compliant OLTP DBMS. Technical highlights included:  <span id="more-2662"></span></p>
<ul>
<li>Despite heavy emphasis on the 	word “sharding,” dbShards&#8217;s scale-out is transparent to the 	application programmer. E.g., in dbShards + MySQL, the APIs are more 	or less the same ones you&#8217;d expect for MySQL (JDBC, etc.)</li>
<li>If the DBMS underneath is 	ACID-compliant (e.g., MySQL + InnoDB), then the dbShards version is 	ACID-compliant too.</li>
<li>Beyond those basics, I forgot to 	check the fine details of dbShards&#8217; MySQL (or PostgreSQL) syntax 	support. <a href="http://highscalability.com/blog/2010/6/23/product-dbshards-share-nothing-shard-everything.html" onclick="javascript:pageTracker._trackPageview('/highscalability.com');">Todd 	Hoff, however, did not forget</a>.</li>
<li>dbShards keeps copies of each 	shard on two different servers, via asynchronous log-shipping. This 	allows for failover in both planned and unplanned outages.</li>
<li>dbShards wants you to distribute 	big tables among shards via a “shard key,” which is a lot like 	the distribution key in MPP analytic DBMS. You&#8217;re encouraged to 	replicate small, low-update-volume tables across each shard.</li>
<li>Cory says that dbShards has good 	join performance when – you guessed it! – everything being joined 	is co-located shard-by-shard, because the tables were distributed on 	the same shard key and/or replicated across each shard. Cory can&#8217;t 	imagine why you&#8217;d want to do an inner join under any other 	circumstances.</li>
<li>The basic dbShards query execution 	model is: A query comes in; it&#8217;s parsed; a shard key is 	automagically detected (one hopes); the “global configuration 	file” is checked to see which shard to ship the work off too. I 	forgot to ask whether lookup was done via a hash table (the obvious 	guess) or something else. The programmer can put hints in the code 	comments to direct the sharding, but Cory asserts those aren&#8217;t 	needed very often.</li>
<li>Cory says that insert performance 	with dbShards + MySQL + InnoDB is 1500-3000 inserts per shard per 	second, scaling almost linearly with the number of shards. I forgot 	to ask how many shards this had been tested for.</li>
<li>If you want blazing dbShards 	performance, Cory&#8217;s base-case figure is 25 gigabytes of data per 	node, so that the most commonly used indexes can camp out in memory. 	(I forgot to ask what kind of hardware he was assuming per node.) 	This is if you&#8217;re going to be doing joins or aggregrations. If it&#8217;s 	just single-row inserts and updates, or if your performance 	requirements are lower, you can go with 10X that figure.</li>
<li>Cory tells stories wherein going 	from an unsharded database to 4 or so shards took database 	re-indexing time down 50X or more.  Apparently, such tasks can be 	exponential or even super-exponential with database size over 	InnoDB. (That said, I&#8217;d be surprised if all large InnoDB users 	suffered from that problem to the same degree.)</li>
<li>dbShards&#8217; customer workloads are 	all &gt;= 50% reads. This is reflective of dbShards&#8217; design 	priorities.</li>
<li>As long as it can be in charge, 	dbShards is happy to interface to whatever kind of database backup 	software you want to use on a node by node basis. (dbShards wants to 	drive your backup software for you so that it can be sure the 	replicas are handled properly.)</li>
<li>It&#8217;s “fairly common” for 	dbShards to be paired with memcached. I forgot to ask whether 	memcached typically lived on its own pool of servers, or on the same 	pool that runs dbShards.</li>
<li>Future DBMS options under 	consideration for dbShards include Oracle and (unspecified) 	in-memory.</li>
</ul>
<p style="margin-bottom: 0in;">Business highlights for CodeFutures and dbShards include:</p>
<ul>
<li>dbShards&#8217; price is 	$5000/server/year, including support and OEMed MySQL, with stated 	quantity discounts up to 40%.</li>
<li>dbShards cloud pricing is 	different (on a usage basis).</li>
<li>dbShards has 6 or so customers, 	half each on-premises and in the cloud. One of them is Facebook. (Those &#8220;100s&#8221; of customers mentioned on the dbShards website are for a fairly unrelated product.)</li>
<li>CodeFutures has been at this 2 ½ 	years or so. There is no venture capital in the company.</li>
<li>Early deals dbShards deals have 	evidently involved a fair amount of professional services.</li>
<li>Counting contractors, Code Futures 	has 10-12 people, which has been as high as 15.</li>
<li>Target dbShards customers are as 	you&#8217;d expect. Cory says he&#8217;s actually been more successful getting 	early-adopted money out of Web companies than Wall Street firms.</li>
<li>There are a couple of dbShards 	PostgreSQL customers for greenfield applications. Most dbShards 	customers and prospects, however, are looking to scale out existing 	apps.</li>
<li>Despite its connection to open source DBMS, there&#8217;s nothing open source about dbShards itself.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/07/28/dbshards/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Stakeholder-facing analytics</title>
		<link>http://www.dbms2.com/2010/05/15/stakeholder-facing-analytics/</link>
		<comments>http://www.dbms2.com/2010/05/15/stakeholder-facing-analytics/#comments</comments>
		<pubDate>Sat, 15 May 2010 07:58:05 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Fox and MySpace]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2149</guid>
		<description><![CDATA[There&#8217;s a point I keep making in speeches, and used to keep making in white papers, yet have almost never spelled out in this blog. Let me now (somewhat) correct the oversight.
Analytic technology isn&#8217;t only for you. It&#8217;s also for your customers, citizens, and other stakeholders.
I am not referring here to what is well understood [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s a point I keep making in speeches, and used to keep making in white papers, yet have almost never spelled out in this blog. Let me now (somewhat) correct the oversight.</p>
<p><strong>Analytic technology isn&#8217;t only for you. It&#8217;s also for your customers, citizens, and other stakeholders.</strong></p>
<p>I am <strong>not</strong> referring here to what is well understood to be an important, fast-growing activity &#8212; providing data and its analysis to customers as your primary or only business &#8212; nor to the related business of taking people&#8217;s data, crunching it for them, and giving them results. That combined sector &#8212; which I am pretty alone in aggregating into one and calling <a href="http://www.dbms2.com/category/analytics-technologies/data-mart-warehouse-outsourcing/" >data mart outsourcing</a> &#8212; is one of the top several vertical markets for a lot of the analytic DBMS vendors I write about. Rather, I&#8217;m talking about enterprises that gather data for some primary purpose, and have discovered that a good <strong>secondary</strong> use of the data is to reflect it back to stakeholders, often the same ones who provided or created it in the first place.</p>
<p>For now I&#8217;ll call this category <strong>stakeholder-facing analytics,</strong> as the shorter phrase &#8220;stakeholder analytics&#8221; would be ambiguous.* I first picked up the idea early this decade from Information Builders, for whom it had become something of a specialty. I&#8217;ve been asking analytics vendors for examples of stakeholder-facing analytics ever since, and a number have been able to comply. But the whole thing is in its early days even so; almost any sufficiently large enterprise should be more active in stakeholder-facing analytics than it currently is.<br />
<span id="more-2149"></span><br />
<em>*Comments as to what the category</em> should<em> be called are welcome below.</em></p>
<p>Examples of stakeholder-facing analytics include:</p>
<ul>
<li>Enterprises report back on the business customers do with them. For example:
<ul>
<li>Credit card companies provide reports on spending back to their credit card holders, especially small businesses.</li>
<li>So do office supply retailers.</li>
<li>Brokerage firms provide reporting back to their small-institution customers.</li>
</ul>
</li>
<li>Governments expose information to their citizens online.
<ul>
<li>In an early example, New York City restaurant ratings were put online.</li>
<li><a href="http://sec.gov/edgar/searchedgar/companysearch.html" onclick="javascript:pageTracker._trackPageview('/sec.gov');">Putting SEC filings online</a> has has been a huge success.</li>
<li>The Obama Administration has committed to putting <a href="http://www.data.gov/catalog" onclick="javascript:pageTracker._trackPageview('/www.data.gov');">large amounts of information</a> online.</li>
</ul>
</li>
<li>Regulated companies (such as utilities) could be required to put data online directly, without even using the government as an intermediary.</li>
<li>Some part of Fox &#8212; perhaps MySpace Music? &#8212; offers free access to a PostgreSQL extract from <a href="http://www.dbms2.com/2009/03/05/fox-interactive-medias-multi-hundred-terabyte-database-running-on-greenplum/" >its Greenplum database</a> to each of its largest advertisers.</li>
<li>Google Analytics offers some basic BI for free to website owners everywhere.</li>
<li>Anybody from web hosting companies to public utilities could open their kimonos and allow their customers to track adherence to actual or implied SLAs (Service Level Agreements) in areas such as uptime, length of outage, responsiveness, and the like.</li>
</ul>
<p>So what cool examples do you have of stakeholder-facing analytics?*</p>
<p><em>*Yes, this is an invitation to drop links to case studies into the comment thread below. </em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/15/stakeholder-facing-analytics/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Notes on the evolution of OLTP database management systems</title>
		<link>http://www.dbms2.com/2010/04/05/oltp-database-management-systems-2/</link>
		<comments>http://www.dbms2.com/2010/04/05/oltp-database-management-systems-2/#comments</comments>
		<pubDate>Mon, 05 Apr 2010 08:22:03 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Akiban]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EnterpriseDB and Postgres Plus]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Mid-range]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[VoltDB and H-Store]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1841</guid>
		<description><![CDATA[The past few years have seen a spate of startups in the analytic DBMS business. Netezza, Vertica, Greenplum, Aster Data and others are all reasonably prosperous, alongside older specialty product vendors Teradata and Sybase (the Sybase IQ part).  OLTP (OnLine Transaction Processing) and general purpose DBMS startups, however, have not yet done as well, with [...]]]></description>
			<content:encoded><![CDATA[<p>The past few years have seen a spate of startups in the analytic DBMS business. Netezza, Vertica, Greenplum, Aster Data and others are all reasonably prosperous, alongside older specialty product vendors Teradata and Sybase (the Sybase IQ part).  OLTP <span style="font-weight: normal;">(OnLine Transaction Processing) </span>and general purpose DBMS startups, however, have not yet done as well, with such success as there has been (MySQL, Intersystems Cache&#8217;, solidDB&#8217;s exit, etc.) generally accruing to products that originated in the 20th Century.</p>
<p>Nonetheless, OLTP/general-purpose data management startup activity has recently picked up, targeting what I see as some very real opportunities and needs. So as a jumping-off point for further writing, I thought it might be interesting to collect a few observations about the market in one place.  These include:</p>
<ul>
<li><span style="font-weight: normal;">Big-brand 	OLTP/general-purpose DBMS have more “stickiness” 	than analytic DBMS.</span></li>
<li><span style="font-weight: normal;">By 	number, most of an enterprise&#8217;s OLTP/general-purpose databases are low-volume and 	low-value. </span></li>
<li>Most 	interesting new OLTP/general-purpose data management products are <span style="font-style: normal;">either 	MySQL-based or NoSQL.</span></li>
<li>It&#8217;s not yet 	clear whether MySQL will prevail over MySQL forks, or vice-versa, or 	whether they will co-exist.</li>
<li>The era of 	silicon-centric relational DBMS is coming.</li>
<li>The emphasis 	on scale-out and reducing the cost of joins spans the NoSQL and 	SQL-based worlds.<em> </em></li>
<li><span style="font-weight: normal;">Users&#8217; 	instance on “free” could be a major problem for OLTP DBMS 	innovation. </span></li>
</ul>
<p style="margin-bottom: 0in;">I shall explain.<span id="more-1841"></span></p>
<p style="margin-bottom: 0in;"><strong>Big-brand OLTP/general-purpose DBMS have more “stickiness” than analytic DBMS.</strong></p>
<ul>
<li>OLTP 	applications are more complex than analytic ones, and hence more 	tightly wired into particular brands of DBMS. For example, 	third-party packaged OLTP applications are typically portable among 	only a few brands of DBMS. But third-party business intelligence 	tools, and the BI “applications” built in them, are more easily 	and widely portable.</li>
<li>Specific technical observations 	such as “OLTP apps tend to use stored procedures, which are 	DBMS-specific” or “OLTP apps tend to have lots and lots of 	tables” serve to underscore the first point.</li>
<li>An enterprise&#8217;s highest-value data 	is commonly the financial stuff handled by its core OLTP systems, so 	those are the last things they want to mess around with just to get 	some cost savings. Security, high availability, and so on are major 	considerations that can outweigh cost.</li>
</ul>
<p style="margin-bottom: 0in;"><strong>By number, most of an enterprise&#8217;s OLTP/general-purpose databases are low-volume and low-value. </strong>Indeed, “OLTP” is often a misnomer, which is why I tend to go with “general-purpose” or some similarly wishy-washy phrase instead.</p>
<ul>
<li>In theory, this is a ripe area for 	what I&#8217;ve called <a href="http://www.dbms2.com/category/database-management-system/mid-range/" >mid-range DBMS</a>.</li>
<li>The big brand vendors try hard to 	keep as many of those databases for themselves as they can. 	Enterprise-wide license pricing helps. Going forward, so will 	virtualization/consolidation strategies, such as <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" >Oracle&#8217;s 	Exadata-centric approach</a>.</li>
<li>A variety of mid-range DBMS 	alternatives beyond the big brands have technical merit, at least in 	some cases and configurations – MySQL, PostgreSQL, Intersystems 	Cache&#8217;, and so on.</li>
<li>The only such mid-range DBMS 	alternative with much large enterprise business momentum, however, 	appears to be MySQL.</li>
</ul>
<p style="margin-bottom: 0in;"><strong>&#8220;General-purpose&#8221; might be a better term than &#8220;OLTP&#8221; anyway.</strong></p>
<ul>
<li>I don&#8217;t have a link, but it&#8217;s widely agreed that over half of the processing on an &#8220;OLTP&#8221; enterprise app is commonly reporting and so on.</li>
<li>&#8220;Operational BI&#8221; is progressing by fits and starts, but it is progressing.</li>
<li>Anything customer-facing &#8212; web-based, call center, or otherwise &#8212; is likely to include a heavy dose of &#8220;real-time&#8221; analytic optimization.</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Most interesting new OLTP/general-purpose data management products are <span style="font-style: normal;">either MySQL-based or NoSQL.</span></strong></p>
<ul>
<li><a href="http://www.dbms2.com/2009/06/22/h-store-horizontica-voltdb/" >VoltDB</a> is the main 	exception that jumps to mind.</li>
<li>This isn&#8217;t true in the analytic 	DBMS area, where Netezza, Greenplum, Aster, Vertica and others 	started from PostgreSQL&#8217;s code, APIs, or both.</li>
</ul>
<p style="margin-bottom: 0in;"><strong>It&#8217;s not yet clear whether MySQL will prevail over MySQL forks, or vice-versa, or whether they will co-exist.</strong></p>
<ul>
<li>MySQL is a limited product without 	all the third-party storage engines that are being developed.</li>
<li><a href="http://www.dbms2.com/2009/12/14/oracle-mysql-storage-engine/" >Oracle&#8217;s promise of MySQL good 	behavior</a> has an expiration date.</li>
<li>None of the MySQL front-end 	alternatives are remotely mature yet.</li>
</ul>
<p style="margin-bottom: 0in;"><strong>The era of silicon-centric relational DBMS is coming.</strong></p>
<ul>
<li>I think “silicon” means 	“solid-state memory” as much as or more than it means “RAM,” 	but that&#8217;s not yet certain.</li>
<li>What is pretty certain is that, 	thanks to Moore&#8217;s Law, some kind of silicon will increasingly 	replace disk.</li>
<li><a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" >Oracle&#8217;s increasingly 	Flash-centric story</a> is a challenge to everybody.</li>
<li>RAM-centric VoltDB will launch 	fairly soon. (By the way, while VoltDB still has <a href="http://www.dbms2.com/2009/06/22/h-store-horizontica-voltdb/" >a lot in common 	with H-Store</a>, they&#8217;re not exactly the same thing. And <a href="http://bit.ly/9QxjV2." onclick="javascript:pageTracker._trackPageview('/bit.ly');">H-Store 	research</a> is progressing too.)</li>
<li><span style="font-style: normal;"><a href="http://rethinkdb.com/" onclick="javascript:pageTracker._trackPageview('/rethinkdb.com');">RethinkDB</a> is being de</span>veloped, focused directly on solid-state memory. 	Based on the sparse information available online, RethinkDB sounds 	somewhat like a dumbed-down H-Store.</li>
<li>New disk-based vendors may never 	optimize their use of disk, instead targeting a solid-state future. 	(E.g., I think Akiban should and quite well might follow this path.)</li>
</ul>
<p style="margin-bottom: 0in; font-weight: normal;"><strong>The emphasis on scale-out and reducing the cost of joins spans the NoSQL and SQL-based worlds.</strong> We hear that from the <a href="http://www.dbms2.com/2010/03/14/nosql-taxonomy/" >NoSQL</a> guys all the time. But I also just heard it from <a href="http://www.dbms2.com/2010/04/03/akiban-highlights/" >Akiban</a>.</p>
<p style="margin-bottom: 0in;"><strong>Users&#8217; instance on “free” could be a major problem for OLTP DBMS innovation.</strong> Vendors of new OLTP data management technologies often feel obligated to open source their products, notwithstanding the historical lack of revenue in the open source OLTP DBMS market. As just one of many examples,  <a href="http://www.novaspivack.com/uncategorized/evri-ties-the-knot-with-twine" onclick="javascript:pageTracker._trackPageview('/www.novaspivack.com');">Nova Spivack</a> wrote:</p>
<blockquote>
<p style="margin-bottom: 0in;">I have recently seen some new graph data storage products that may provide the levels of scale and performance needed, but pricing has not been determined yet. In short, storage and retrieval of semantic graph datasets is a big unsolved challenge that is holding back the entire industry. We need federated database systems that can handle hundreds of billions to trillions of triples under high load conditions, in the cloud, on commodity hardware and open source software. Only then will it be affordable to make semantic applications and services at Web-scale.</p>
</blockquote>
<p style="margin-bottom: 0in;">I hear similar things from other startups, who evidently believe they need and/or are entitled to enjoy sophisticated, high-performance, zero-cost, specialized database management technology.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/04/05/oltp-database-management-systems-2/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Greenplum Single-Node Edition &#8212; sometimes free is a real cool price</title>
		<link>http://www.dbms2.com/2009/10/19/greenplum-free-single-node-edition/</link>
		<comments>http://www.dbms2.com/2009/10/19/greenplum-free-single-node-edition/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 13:25:41 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EnterpriseDB and Postgres Plus]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Scientific research]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1158</guid>
		<description><![CDATA[Greenplum is announcing today that you can run Greenplum software on a single 8-core commodity server, free.  First and foremost, that&#8217;s a strong statement that Greenplum wants enterprises to pay it for Greenplum&#8217;s parallelization/”private cloud” capabilities. Second, it may be an attractive gift to a variety of folks who want to extract insight from [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Greenplum is announcing today that you can run Greenplum software on a single 8-core commodity server, free.  First and foremost, that&#8217;s a strong statement that Greenplum wants enterprises to pay it for Greenplum&#8217;s parallelization/”<a href="../2009/06/08/the-future-of-data-marts/">private cloud</a>” capabilities. Second, it may be an attractive gift to a variety of folks who want to extract insight from terabyte-scale databases of various kinds.</p>
<p style="margin-bottom: 0in;">Greenplum Single-Node Edition:</p>
<ul>
<li>Is free of charge, although you 	can buy support.</li>
<li>Has no restrictions on use, 	production or otherwise.</li>
<li>Has no restrictions on database 	size.</li>
<li>Is closed-source.</li>
</ul>
<p style="margin-bottom: 0in;">For those who want free, terabyte-scale data warehousing software, Greenplum Single-Node Edition may be quite appealing, considering that the main available alternatives are:</p>
<ul>
<li>General-purpose open-source DBMS, 	such as PostgreSQL and MySQL (lacking analytic DBMS performance and 	features)</li>
<li>Infobright Community Edition (the 	other best choice – <a href="../2009/10/14/infobright-notes/">Infobright&#8217;s 	commercial sales success</a> indicates the solidity of Infobright&#8217;s 	technology)</li>
<li>Rough research-project code and 	other other questionable open source offerings</li>
<li>Crippleware from other commercial 	analytic DBMS vendors (e.g., <a href="../2009/10/19/teradata-partners-2009/">Teradata</a>)</li>
</ul>
<p style="margin-bottom: 0in;">For example, comparing PostgreSQL-based Greenplum with PostgreSQL itself, Greenplum offers:</p>
<ul>
<li>The ability to scale out queries 	across all cores in your box (and no, pgpool is not a serious 	alternative)</li>
<li>Storage alternatives such as 	columnar (I am told that EnterpriseDB recently stopped funding a 	project for a PostgreSQL columnar option)</li>
</ul>
<p style="margin-bottom: 0in;"><span id="more-1158"></span>Greenplum would surely also argue that its software is superior to PostgreSQL in parallel load, compression, MapReduce integration, and general fit-and-finish. I imagine that in some (perhaps not all) cases it would be right. PostgreSQL&#8217;s main technical advantages over Greenplum would probably lie in the area of datatype extensibility.</p>
<p style="margin-bottom: 0in;">The main target users for Greenplum&#8217;s Single-Node Edition are obviously <strong>individual enterprise power users or very small analytic teams.</strong> I.e., it&#8217;s people with a data mart need that a central data warehouse isn&#8217;t meeting. Potential benefits to Greenplum include:</p>
<ul>
<li>Adding value to its <a href="../2009/06/08/the-future-of-data-marts/">Enterprise 	Data Cloud</a> story</li>
<li>Seeding the market for future 	enterprise sales</li>
<li>Depriving competitors of revenue, 	perhaps at enterprises too small to ever be paying Greenplum 	customers</li>
</ul>
<p style="margin-bottom: 0in;">In addition, I see free Greenplum as a charity offering that could be appealing to <a href="http://" onclick="javascript:pageTracker._trackPageview('/');">scientists</a> who face PostgreSQL performance limitations.</p>
<p style="margin-bottom: 0in;"><em><strong>Related links</strong></em></p>
<ul>
<li><a href="http://www.greenplum.com/news/252/388/Greenplum-Introduces-Free-Greenplum-Database-Edition-for-Data-Analysts/d,press-releases/" onclick="javascript:pageTracker._trackPageview('/www.greenplum.com');">Greenplum 	Free Single-Node Edition press release</a> (I&#8217;m quoted)</li>
<li><a href="http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/" onclick="javascript:pageTracker._trackPageview('/www.mysqlperformanceblog.com');">MySQL 	Performance blog on MonetDB and Infobright community edition</a></li>
<li><a href="http://archives.postgresql.org/pgsql-general/2009-03/msg01227.php" onclick="javascript:pageTracker._trackPageview('/archives.postgresql.org');">PostgreSQL&#8217;s 	restriction to one core per query</a></li>
<li><a href="http://www.infobright.org/Forums/viewthread/1141/" onclick="javascript:pageTracker._trackPageview('/www.infobright.org');">Infobright&#8217;s 	restriction to one core per query</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/19/greenplum-free-single-node-edition/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>HadoopDB</title>
		<link>http://www.dbms2.com/2009/09/13/hadoopdb/</link>
		<comments>http://www.dbms2.com/2009/09/13/hadoopdb/#comments</comments>
		<pubDate>Sun, 13 Sep 2009 04:59:39 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data types]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[Structured documents]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=890</guid>
		<description><![CDATA[Despite a thoughtful heads-up from Daniel Abadi at the time of his original posting about HadoopDB, I&#8217;m just getting around to writing about it now.  HadoopDB is a research project carried out by a couple of Abadi&#8217;s students.  Further research is definitely planned. But it seems too early to say that HadoopDB will [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Despite a thoughtful heads-up from Daniel Abadi at the time of <a href="http://dbmsmusings.blogspot.com/2009/07/announcing-release-of-hadoopdb-longer.html" onclick="javascript:pageTracker._trackPageview('/dbmsmusings.blogspot.com');">his original posting about HadoopDB</a>, I&#8217;m just getting around to writing about it now.  HadoopDB is a research project carried out by a couple of Abadi&#8217;s students.  Further research is definitely planned. But it seems too early to say that HadoopDB will ever get past the &#8220;research and oh by the way the code is open sourced&#8221; stage and become a real code line &#8212; whether commercialized, open source, or both.</p>
<p style="margin-bottom: 0in;">The basic idea of HadoopDB is to put copies of a DBMS at different nodes of a grid, and use Hadoop to parcel work among them. Major benefits when compared with massively parallel DBMS are said to be:</p>
<ul>
<li>Open/cheap/free</li>
<li><a href="http://www.dbms2.com/2009/09/13/fault-tolerant-queries/" >Query fault-tolerance</a></li>
<li><span style="font-style: normal;">The 	related concept of tolerating node degradation that isn&#8217;t an 	outright node failure.</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">HadoopDB has actually been built with PostgreSQL. That version achieved performance well below that of a commercial DBMS &#8220;DBX&#8221;, where X=2. Column-store guru Abadi has repeatedly signaled his intention to try out HadoopDB with </span><a href="http://www.dbms2.com/2009/08/04/vectorwise-ingres-and-monetdb/" >VectorWise</a><span style="font-style: normal;"> at the nodes instead.  (Recall that VectorWise is shared-everything.) It will be interesting to see how that configuration performs.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">The real opportunity for HadoopDB, however, in my opinion may lie elsewhere.<span id="more-890"></span> Rather than trying to compete with parallel relational DBMS, HadoopDB might do more good parallelizing more specialized kinds of database engines. How about, for example, a massively parallel XML manager to compete with MarkLogic? Or a massively parallel array processor other than the still-nascent </span><a href="http://www.dbms2.com/2009/09/12/xldb-scid/" >SciDB</a>? <span style="font-style: normal;">Or, even more to the point, something that parallelizes a yet-more-specialized scientific data management engine? That kind of area is where I suspect the potential for HadoopDB really lives.</span></p>
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/09/13/hadoopdb/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>What could or should make Oracle/MySQL antitrust concerns go away?</title>
		<link>http://www.dbms2.com/2009/09/10/what-could-or-should-make-oraclemysql-antitrust-concerns-go-away/</link>
		<comments>http://www.dbms2.com/2009/09/10/what-could-or-should-make-oraclemysql-antitrust-concerns-go-away/#comments</comments>
		<pubDate>Thu, 10 Sep 2009 14:53:23 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Mid-range]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=879</guid>
		<description><![CDATA[When the Oracle/MySQL deal was first announced, I wrote:
I can probably come up with business practices that could make things very hard on Oracle/MySQL competitors &#8230; but I haven’t found a compelling antitrust trigger on my first pass over the subject.
Subsequently, there&#8217;s been a lot of discussion about whether or not Oracle can use control [...]]]></description>
			<content:encoded><![CDATA[<p>When the Oracle/MySQL deal was first announced, I <a href="http://www.dbms2.com/2009/04/20/should-the-oraclemysql-combo-face-antitrust-opposition/" >wrote</a>:</p>
<blockquote><p>I can probably come up with business practices that could make things very hard on Oracle/MySQL competitors &#8230; but I haven’t found a compelling antitrust trigger on my first pass over the subject.</p></blockquote>
<p>Subsequently, there&#8217;s been <a href="http://www.dbms2.com/2009/05/15/mysql-fork-open-database-alliance-gpl/" >a lot of</a> <a href="http://www.dbms2.com/2009/05/22/yet-more-on-mysql-forks-and-storage-engines/" >discussion</a> about whether or not Oracle can use control of MySQL to make life difficult for third-party MySQL storage engine vendors.</p>
<p>Now that the European Commission <a href="http://www.nytimes.com/2009/09/04/technology/companies/04oracle.html" onclick="javascript:pageTracker._trackPageview('/www.nytimes.com');">is delaying the Oracle/Sun deal, explicitly because of Oracle/MySQL antitrust fears</a>.  That is, the European Commission wants to be reassured that an Oracle takeover of MySQL won&#8217;t unduly impinge upon the future availability of open source/low cost DBMS alternatives.  This raises that natural question:</p>
<p><strong>What could Oracle do to assure concerned parties that its ownership of MySQL won&#8217;t unduly hamper open-source-based DBMS competition?</strong></p>
<p>I think that&#8217;s indeed the crucial question. The Oracle/Sun deal has enough momentum at this point that it both should and will be allowed to happen &#8212; perhaps with safeguards &#8212; rather than banned outright. <strong>If  you have concerns about Oracle&#8217;s pending acquisition of MySQL, you should speak up and outline what kinds of regulatory safeguards would alleviate the problems you foresee.</strong></p>
<p>More or less obvious possibilities include:</p>
<ul>
<li><strong>Divest MySQL.</strong> This is obviously an extreme measure, but it surely would work.</li>
<li><strong>Provide some money and trademark rights to MySQL forkers.</strong> If MariaDB and Drizzle were put into strong competitive positions with MySQL today, it&#8217;s hard to argue how regulators could object to any future Oracle maneuverings Oracle might envision with the GPLed side of MySQL.</li>
<li><strong>Offer a standard, attractive, long-term deal to MySQL bundlers. </strong>The commercial/non-GPL version of MySQL is a requirement for appliance vendors (surely), OEM vendors (probably), and storage engine vendors (maybe &#8212; I disagree, but I&#8217;m evidently in the minority).</li>
<li><strong>Strengthen PostgreSQL. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </strong> Realistically, that&#8217;s not going to be part of any Oracle/MySQL resolution, so I&#8217;ll leave it as a subject for another time.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/09/10/what-could-or-should-make-oraclemysql-antitrust-concerns-go-away/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Continuent on clustering</title>
		<link>http://www.dbms2.com/2009/09/03/continuent-on-clustering/</link>
		<comments>http://www.dbms2.com/2009/09/03/continuent-on-clustering/#comments</comments>
		<pubDate>Thu, 03 Sep 2009 13:46:56 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Continuent]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=876</guid>
		<description><![CDATA[Robert Hodges, CTO of my client Continuent, put up a blog post laying out his and Continuent&#8217;s views on database clustering. Continuent offers Tungsten, its third try at database clustering technology, targeted at MySQL, PostgreSQL, and perhaps Oracle. Unlike Continuent&#8217;s more ambitious. second-generation product, Tungsten offers single-master replication, which in Robert&#8217;s view allows for great [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Robert Hodges, CTO of my client Continuent, put up <a href="http://scale-out-blog.blogspot.com/2009/09/future-of-database-clustering.html" onclick="javascript:pageTracker._trackPageview('/scale-out-blog.blogspot.com');">a blog post</a> laying out his and Continuent&#8217;s views on database clustering. Continuent offers Tungsten, its third try at database clustering technology, targeted at MySQL, PostgreSQL, and perhaps Oracle. Unlike Continuent&#8217;s more ambitious. second-generation product, Tungsten offers single-master replication, which in Robert&#8217;s view allows for great ease of deployment and administration (he likes the phrase “bone-simple”).</p>
<p style="margin-bottom: 0in;">The downside to Continuent Tungsten &#8217;s stripped down architecture is that it doesn&#8217;t solve the most extreme performance scale-out problems.  Instead, Continuent focuses on the other big benefits of keeping your data in more than one place, namely high availability and data loss prevention (i.e., backup).</p>
<p style="margin-bottom: 0in;">Continuent has been around for a number of years, starting out in Finland but now being based in Silicon Valley. For most purposes, however, it&#8217;s reasonable to think of Continuent and Tungsten as start-up efforts.</p>
<p style="margin-bottom: 0in;">As you might guess from the references to Finland and MySQL, Continuent&#8217;s products are open source, or at least have open source versions. I&#8217;m still a little fuzzy as to which features are open sourced and which are not. For that matter, I&#8217;m still unclear as to Tungsten&#8217;s feature list overall &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/09/03/continuent-on-clustering/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>What are the best choices for scaling Postgres?</title>
		<link>http://www.dbms2.com/2009/07/29/scaling-postgres-choices/</link>
		<comments>http://www.dbms2.com/2009/07/29/scaling-postgres-choices/#comments</comments>
		<pubDate>Wed, 29 Jul 2009 06:16:02 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cache]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[EnterpriseDB and Postgres Plus]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=849</guid>
		<description><![CDATA[I have a client who wants to build a new application with peak update volume of several million transactions per hour.  (Their base business is data mart outsourcing, but now they&#8217;re building update-heavy technology as well. ) They have a small budget.  They&#8217;ve been a MySQL shop in the past, but would prefer to contract [...]]]></description>
			<content:encoded><![CDATA[<p>I have a client who wants to build a new application with peak update volume of several million transactions per hour.  (Their base business is data mart outsourcing, but now they&#8217;re building update-heavy technology as well. ) They have a small budget.  They&#8217;ve been a MySQL shop in the past, but would prefer to contract (not eliminate) their use of MySQL rather than expand it.</p>
<p>My client actually signed a deal for EnterpriseDB&#8217;s Postgres Plus Advanced Server and GridSQL, but unwound the transaction quickly. (They say EnterpriseDB was very gracious about the reversal.) There seem to have been two main reasons for the flip-flop.  First, it seems that EnterpriseDB&#8217;s version of Postgres isn&#8217;t up to PostgreSQL&#8217;s 8.4 feature set yet, although EnterpriseDB&#8217;s timetable for catching up might have tolerable. But GridSQL apparently is further behind yet, with no timetable for up-to-date PostgreSQL compatibility.  That was the dealbreaker.</p>
<p>The current base-case plan is to use generic open source PostgreSQL, with scale-out achieved via hand sharding, Hibernate, or &#8230; ??? Experience and thoughts along those lines would be much appreciated.</p>
<p>Another option for OLTP performance and scale-out is of course memory-centric options such as <a href="http://www.dbms2.com/2009/06/22/h-store-horizontica-voltdb/" >VoltDB</a> or <a href="http://www.dbms2.com/2009/07/28/the-groovy-sql-switch/" >the Groovy SQL Switch</a>.  But this client&#8217;s database is terabyte-scale, so hardware costs could be an issue, as of course could be product maturity.</p>
<p>By the way, a large fraction of these updates will be actual changes, as opposed to new records, in case that matters.  I expect that the schema being updated will be very simple &#8212; i.e., clearly simpler than in a classic order entry scenario.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/07/29/scaling-postgres-choices/feed/</wfw:commentRss>
		<slash:comments>30</slash:comments>
		</item>
		<item>
		<title>Greenplum update &#8212; Release 3.3 and so on</title>
		<link>http://www.dbms2.com/2009/06/05/greenplum-update-release-3-3/</link>
		<comments>http://www.dbms2.com/2009/06/05/greenplum-update-release-3-3/#comments</comments>
		<pubDate>Fri, 05 Jun 2009 13:17:51 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Pricing]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=799</guid>
		<description><![CDATA[I visited Greenplum in early April, and talked with them again last night. As I noted in a separate post, there are a couple of subjects I won&#8217;t write about today. But that still leaves me free to cover a number of other points about Greenplum, including:

After much prodding, Greenplum 	finally gave me clear list [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I visited Greenplum in early April, and talked with them again last night. As I noted in <a href="http://www.dbms2.com/2009/06/05/greenplum-june-2009-announcements/" >a separate post</a>, there are a couple of subjects I won&#8217;t write about today. But that still leaves me free to cover a number of other points about Greenplum, including:<span id="more-799"></span></p>
<ul>
<li>After much prodding, Greenplum 	finally gave me clear <strong>list pricing.</strong> Greenplum <strong>perpetual 	licenses</strong> list at $16K/core or $70K/terabyte.  <strong>Annual 	maintenance</strong> is 22% of purchase price. Alternatively, one can buy 	an <strong>annual subscription</strong> on either basis, at 50% of the 	perpetual license purchase price.  Of course, that&#8217;s just list. 	Quantity discounts are <em>de rigeur.</em></li>
<li>Greenplum had 	<strong>about 65 paying customers</strong> at the end of Q1. I&#8217;ve forgotten how that jibes with a figure of <a href="http://www.dbms2.com/2008/08/25/greenplum-is-in-the-big-leagues/" >50 customers last August</a>.</li>
<li>Greenplum 	claims <strong>rich functionality in standard SQL.</strong> In particular, 	Greenplum says &#8220;lots&#8221; of customers are using SQL 2003 	OLAP.  Greenplum further says it has “comprehensive” SQL-92 and 	–99 support.</li>
<li>Greenplum 	Release 3.3 has  &#8220;more flexible&#8221; <strong>compression,</strong> which 	Greenplum bravely asserts is now fairly close to columnar 	compression in effectiveness. (Aster Data and other row-based 	vendors make similar claims.)</li>
<li>Greenplum 	Release 3.3 contains few <strong>performance enhancements for analytics</strong>, 	fixing an OLAP edge case that wasn&#8217;t previously parallelized &#8212; 	relevant buzzwords include grouping, aggregates, and DISTINCT, 	apparently in combination with each other &#8212; and speeding up sorts.</li>
<li>Greenplum&#8217;s 	<strong>data loading </strong>story goes something like this:
<ul>
<li>Greenplum has 	an <em>external tables</em> facility that, in principle, could be used 	to index and query on tables outside Greenplum. It&#8217;s almost never 	actually used for that.  However, <em>external tables</em> is the main 	way to load data into Greenplum from another relational DBMS.</li>
<li>A huge benefit 	of loading Greenplum via <em>external tables </em>is that <strong>you can 	load in parallel without passing the data through the master node.</strong></li>
<li>Another benefit is that you can do <strong>ETL</strong> by building a view on 	the foreign database, then loading that view verbatim into 	Greenplum. (I guess this is an exception to Greenplum&#8217;s ELT 	orientation.)</li>
<li>In addition, 	Greenplum has something called <em>Scatter/Gather</em> which puts 	daemons on the hosts for flat files, allow the <strong>files to be loaded 	into Greenplum in parallel.</strong></li>
<li>Like many data 	warehouse DBMS vendors, Greenplum tells you that if update volumes 	are high enough, you should bang them into something else and then 	feed the data warehouse in microbatches.  Greenplum&#8217;s 	recommendations for the &#8220;something else&#8221; are PostgreSQL or 	file systems. Apparently, this is happening at some Greenplum telco 	customers. In one case, latency is only 15 seconds.</li>
</ul>
</li>
<li>In general, 	Greenplum asserts that very little work is done at the Greenplum 	master node, and the Greenplum master node isn&#8217;t a bottleneck.</li>
<li>Greenplum 	proudly promises that its customers will never have to do 	dump/restore for any release, even the big more-than-point ones that 	only come around every few years.</li>
<li>Greenplum 	added some Greenplum-awareness features to the <strong>pgAdmin III Postgres 	administration tool,</strong> which seems to be the most widely used tool 	with Greenplum today.</li>
<li>Greenplum says 	it has <strong>10-gigabit switches</strong> running in the lab, but doesn&#8217;t need 	them. For now it&#8217;s sticking with its &#8220;handful of commodity 	1-gigabit switches&#8221; strategy.</li>
<li>Greenplum <strong>MapReduce</strong> news and 	commentary include:
<ul>
<li>I&#8217;ve only ever gotten a single 	clear example of Greenplum MapReduce production use. But multiple 	Greenplum users are actively developing in MapReduce, judging by 	their dialog with the company.</li>
<li>Greenplum 3.3 has some MapReduce 	ease-of-use/programming upgrades, in low-glitz areas such as 	error-handling.</li>
<li>Greenplum&#8217;s current MapReduce 	language support is: Perl, Python, R, and C. Java didn&#8217;t make it 	into Release 3.3</li>
<li>Greenplum agrees with the 	MapReduce skeptics&#8217; claim that you can in principle do anything in 	UDFs (User-Defined Functions) you can in MapReduce, but believes 	that sometimes doing it in MapReduce turns out to be easier.</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/06/05/greenplum-update-release-3-3/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Yet more on MySQL forks and storage engines</title>
		<link>http://www.dbms2.com/2009/05/22/yet-more-on-mysql-forks-and-storage-engines/</link>
		<comments>http://www.dbms2.com/2009/05/22/yet-more-on-mysql-forks-and-storage-engines/#comments</comments>
		<pubDate>Fri, 22 May 2009 06:15:34 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=790</guid>
		<description><![CDATA[The issue of MySQL forks and their possible effect on closed-source storage engine vendors continues to get attention.  The underlying question is:
Suppose Oracle wants to make life difficult for third-party storage engine vendors via its incipient control of MySQL?  Can the storage engine vendors insulate themselves from this risk by working with a MySQL fork?
As [...]]]></description>
			<content:encoded><![CDATA[<p>The issue of MySQL forks and their possible effect on closed-source storage engine vendors continues to get attention.  The underlying question is:</p>
<p><strong>Suppose Oracle wants to make life difficult for third-party storage engine vendors via its incipient control of MySQL?  Can the storage engine vendors insulate themselves from this risk by working with a MySQL fork?</strong></p>
<p><span id="more-790"></span>As laid out most clearly in a comment thread to a previous post*, Mike Hogan (CEO of ScaleDB) believes <a href="http://www.dbms2.com/2009/05/15/mysql-fork-open-database-alliance-gpl/" >closed-source storage engine vendors can use a MySQL fork without running afoul of the GPL</a>. In a nutshell, what he proposes is an inbetween layer of software, itself open-sourced, that on one side interfaces with MySQL, and on the other side talks cleanly enough to storage engines that it doesn&#8217;t infect them with the GPL.</p>
<p><em>*For some reason, the identical comments have also appeared one by one on <a rel="nofollow" href="http://redmonk.com/sogrady/2009/05/14/open-database-allianc/" onclick="javascript:pageTracker._trackPageview('/redmonk.com');">a Stephen O&#8217;Grady blog post</a>, with links back to the original thread. I did not actually post the comments attributed to me, so I presume there&#8217;s some <a href="http://twitter.com/sogrady/status/1871754631" onclick="javascript:pageTracker._trackPageview('/twitter.com');">automated</a> process going on.</em></p>
<p>The most natural way for such software to be created would be obviously be in connection with the new Open Database Alliance.  So <a href="http://blogs.the451group.com/opensource/2009/05/21/are-closed-source-mysql-storage-engines-compatible-with-mariadb/" onclick="javascript:pageTracker._trackPageview('/blogs.the451group.com');">Matthew Aslett of the 451 Group</a> asked the ODA&#8217;s two founding CEOs &#8212; Peter Zaitsev of Percona and Monty Widenius of Monty Program AB &#8212; what they thought on the subject.  He got rather different-sounding answers.  Zaitsev in effect said &#8220;Yes, Mike Hogan&#8217;s idea probably works, but one never knows for sure when there are lawyers involved. &#8221; Widenius in effect said &#8220;Nope. A license would have to be purchased from MySQL.&#8221;</p>
<p>On a first reading that all looks discouraging, but let&#8217;s probe further. In particular, let&#8217;s invoke the open source community&#8217;s famous distinction between two kinds of freedoms: <strong>&#8220;Free as in speech and free as in beer.&#8221; </strong>&#8220;Free as in speech&#8221; takes care of most technical fears &#8212; <strong>there&#8217;s no way Oracle can directly stop forkers from creating their own version of MySQL. </strong></p>
<p>True, third-party storage vendors might have to compete with Oracle&#8217;s own storage engines, where Oracle has four kinds of competitive advantage:</p>
<ul>
<li>General business clout</li>
<li>A whole lot of database development expertise</li>
<li>The opportunity to build tight hooks between MySQL&#8217;s generic front end and Oracle&#8217;s own preferred back end(s)</li>
<li>Alternatively, the opportunity to foot-drag on MySQL development and thus sabotage the third-party storage engine market altogether</li>
</ul>
<p>But the first two of those points are exactly what independent DBMS vendors already have to deal with when they compete with Oracle, many of them quite successfully (especially in the analytic DBMS market). Ditto, really, for the third one. And the fourth is exactly what forking takes care of.</p>
<p>The situation for &#8220;Free as in beer&#8221; is not quite as clean.  Could Oracle could successfully charge a financial &#8220;tax&#8221; on every closed-source MySQL storage engine sale, prohibitive or otherwise &#8212; even the ones running with forked MySQL rather than Oracle&#8217;s code line?  Mike Hogan says No, Monty Widenius says Yes, and Peter Zaitsev isn&#8217;t sure. I find Hogan&#8217;s argument fairly persuasive, but he and I are probably in the minority.</p>
<p>So let&#8217;s suppose Widenius&#8217; pessimistic view is correct. Right now it seems that MySQL charges a non-prohibitive tax those engine vendors are perfectly happy to pay.  If Oracle made drastic increases in those charges, it could face all sorts of PR, business, and even legal adverse consequences. So <strong>the risk that Oracle cripples the MySQL storage engine vendors strictly through licensing fees seems relatively low.</strong></p>
<p>One last question complicates all this even further &#8212; <strong>why would the storage engine vendors want to rely on MySQL-compatible front ends indefinitely anyway?</strong> The whole point of specialized storage engines is to do things very differently from generic MySQL, so I&#8217;m not clear on what kind of user technical skills argument there is.  For most uses, the Postgres interface is as good or better than MySQL&#8217;s. Perhaps a third open source front-end flavor could eventually become popular.  And by the way, Monty Widenius himself <a href="http://monty-says.blogspot.com/2009/05/open-database-alliance-founded.html?showComment=1242286980000#c1484957191355306994" onclick="javascript:pageTracker._trackPageview('/monty-says.blogspot.com');">wrote</a>:</p>
<blockquote><p>The reason we decided on the name &#8220;Open Database Alliance&#8221; was to be able to include companies and people working on all other open source database in the Alliance.</p>
<p>This is work in progress. We will make this clear on the Alliance web site ASAP.</p></blockquote>
<p>Interesting times indeed.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/05/22/yet-more-on-mysql-forks-and-storage-engines/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
	</channel>
</rss>
