<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; MySQL</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/mysql/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 02 Sep 2010 09:06:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>More on NoSQL and HVSP (or OLRP)</title>
		<link>http://www.dbms2.com/2010/08/26/nosql-hvsp-olrp/</link>
		<comments>http://www.dbms2.com/2010/08/26/nosql-hvsp-olrp/#comments</comments>
		<pubDate>Thu, 26 Aug 2010 09:10:31 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Akiban]]></category>
		<category><![CDATA[Basho and Riak]]></category>
		<category><![CDATA[Cache]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Clustrix]]></category>
		<category><![CDATA[CouchDB]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Object]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Riptano]]></category>
		<category><![CDATA[Schooner]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Tokutek]]></category>
		<category><![CDATA[memcached]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2907</guid>
		<description><![CDATA[Since posting last Wednesday morning that I&#8217;m looking into NoSQL and HVSP, I&#8217;ve had a lot of conversations, including with (among others):

Dwight Merriman of 10gen (MongoDB)
Damien Katz of Couchio (CouchDB)
Matt Pfeil of Riptano (Cassandra)
Todd Lipcon of Cloudera (HBase committer)
Tony Falco of Basho (Riak)
John Busch of Schooner
Ori Herrnstadt of Akiban

By no means do I have time [...]]]></description>
			<content:encoded><![CDATA[<p>Since posting last Wednesday morning that <a href="http://www.dbms2.com/2010/08/18/nosql-hvsp-adoption/" >I&#8217;m looking into NoSQL and HVSP</a>, I&#8217;ve had a lot of conversations, including with (among others):</p>
<ul>
<li>Dwight Merriman of 10gen (MongoDB)</li>
<li>Damien Katz of Couchio (CouchDB)</li>
<li>Matt Pfeil of <a href="http://www.dbms2.com/2010/07/06/riptano-and-cassandra-adoption/" >Riptano</a> (Cassandra)</li>
<li>Todd Lipcon of Cloudera (HBase committer)</li>
<li>Tony Falco of Basho (Riak)</li>
<li>John Busch of Schooner</li>
<li><strong><span style="font-weight: normal;">Ori Herrnstadt</span></strong> of <a href="http://www.dbms2.com/2010/04/03/akiban-highlights/" >Akiban</a></li>
</ul>
<p><span id="more-2907"></span>By no means do I have time to do these conversations justice, in terms of giving them the write-ups and/or immediate follow-up that they deserve. Indeed, I&#8217;ll leave for vacation Saturday morning with my 2000-word NoSQL article still unwritten. So I&#8217;ll dump as many observations as I can into one or a few posts now, and play catch-up later as circumstances allow.</p>
<p>In no particular order:</p>
<ul>
<li>A number of NoSQL offerings have had more uptake to date than most of the scale-out SQL offerings have.</li>
<li>&#8220;Document-oriented&#8221; NoSQL projects CouchDB and MongoDB have probably had the most users get into production, but perhaps for pretty small systems.</li>
<li>Cassandra and Hbase &#8212; the column-group-architecture guys &#8212; have probably had the most bang-in-lots-of-writes <a href="http://www.dbms2.com/2010/03/13/the-naming-of-the-foo/" >HVSP</a> production uptake.*</li>
<li>I didn&#8217;t talk customer count with Schooner, but the decently-stocked <a href="http://www.schoonerinfotech.com/customers" onclick="javascript:pageTracker._trackPageview('/www.schoonerinfotech.com');">Schooner customer page</a> suggests Schooner may be something of an exception to these generalities.</li>
<li>A lot of these companies are in the low-to-mid-teens of employees.</li>
<li>The SQL-oriented companies, despite having fewer or no customers, often seem to have more money. (One reason I get the impression SQL guys have more money is, frankly, that more  of them are talking about engaging <a href="http://www.monash.com/advantage.html" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">my services</a>.)
<ul>
<li>Schooner cites $20 million in VC.</li>
<li><a href="http://www.dbms2.com/2010/05/12/the-clustrix-story/" >Clustrix</a> cites a figure close to that.</li>
<li>Basho cites $10 million, plus <a href="http://www.masshightech.com/stories/2010/08/02/daily35-Basho-rejects-VC-takes-late-friends-and-family-round.html" onclick="javascript:pageTracker._trackPageview('/www.masshightech.com');">a new round of $1.5 or $2 or $2.5 million</a>. The new round is at a  lowered valuation.</li>
<li>That same site says <a href="http://www.dbms2.com/2009/04/16/introduction-to-tokutek/" >Tokutek</a> finally was able to<a href="http://www.masshightech.com/stories/2010/08/16/daily47-Database-software-firm-Tokutek-lands-28M.html" onclick="javascript:pageTracker._trackPageview('/www.masshightech.com');"> raise some VC</a>. Congrats!</li>
</ul>
</li>
<li>It&#8217;s only a two-company trend, but I was pleased to hear that both 10gen/MongoDB and Akiban were seeing Drupal as a major use case or potential use case. No word on rescuing WordPress from its MySQL implementation, alas, but it seems that a Drupal site typically has 40-200+ tables, while a WordPress one has 10ish.</li>
<li>Another trend I think I&#8217;m seeing is serious object-oriented apps banging things straight into a simple back end. <a href="http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/" >Workday</a> is a huge example of that. Akiban hopes to do something similar with Hibernate.</li>
<li>Stability and maturity are still issues for many of these products. E.g., HBase isn&#8217;t even in Release 1.0 yet. Ditto Cassandra, and surely many of the others. Unsurprisingly, <a href="http://blog.mikiobraun.de/2010/08/cassandra-gc-tuning.html" onclick="javascript:pageTracker._trackPageview('/blog.mikiobraun.de');">making Cassandra stable is still a challenge</a>.</li>
</ul>
<p><em>*As is common for terms I suggest, the &#8220;HVSP&#8221; name is not getting any traction. What do you think of Marton Trencseni&#8217;s suggestion of <a href="http://www.dbms2.com/2010/03/13/the-naming-of-the-foo/#comment-182138" >OLRP, for OnLine Request Processing</a>?</em></p>
<p>One thing that makes following this area interesting is that so many projects are open source, leading there to be a lot of information in the wild. I hardly have time to read the mailing list for each project; but the people I talk with do, and often they may sorta kinda remember something somebody else posted one or several months back. As just one example, the mailing lists are said to confirm:</p>
<ul>
<li>Contrary to rumor, <a href="http://twitter.com/eventcloudpro/status/17872687577" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Facebook hasn&#8217;t moved in-box search off of Cassandra</a>.</li>
<li>Apparently, however, it&#8217;s true that <a href="http://www.dbms2.com/2008/07/21/project-cassandra-facebook-open-sourced-quasi-dbms/" >Cassandra inventor Facebook</a> has stopped working on Cassandra, and Facebook&#8217;s core Cassandra developers have shifted over to HBase.</li>
</ul>
<p>Also, figuring out usage of open source software can be &#8230; interesting.</p>
<ul>
<li> People who use open source software don&#8217;t have to reveal themselves, as there&#8217;s no purchase transaction to kick things off.</li>
<li>On the other hand, if they&#8217;re serious enough in their use, they often do.
<ul>
<li>There are two main ways to get tech support for open source software &#8212; the community or a company that sells support &#8212; and both ways let the main support-selling company know that one is a user.</li>
<li>Some folks even add themselves to open lists of users, for example these rather long lists for <a href="http://wiki.apache.org/hadoop/Hbase/PoweredBy" onclick="javascript:pageTracker._trackPageview('/wiki.apache.org');">HBase</a> and <a href="http://wiki.apache.org/couchdb/CouchDB_in_the_wild" onclick="javascript:pageTracker._trackPageview('/wiki.apache.org');">CouchDB</a>.</li>
<li>Or they show up at conferences. For example, <a href="http://twitter.com/spyced/status/21490457839" onclick="javascript:pageTracker._trackPageview('/twitter.com');">two</a> <a href="http://twitter.com/spyced/status/21675203015" onclick="javascript:pageTracker._trackPageview('/twitter.com');">tweets</a> from Riptano founder Jonathan Ellis suggest at least 30 production Cassandra users were represented at a recent event. That&#8217;s more detail than his colleague Matt Pfeil wanted to give me when talked. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
</ul>
</li>
</ul>
<p>OK. This post has gotten pretty long, even without me saying anything resembling an overview of any of the seven companies I listed up top, or of their products&#8217; adoption. So I&#8217;ll just publish this now, and edit in links below to follow-on posts if and when they become available.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/26/nosql-hvsp-olrp/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How should somebody teach themselves database and programming skills?</title>
		<link>http://www.dbms2.com/2010/07/29/how-should-somebody-teach-themselves-programming-skills/</link>
		<comments>http://www.dbms2.com/2010/07/29/how-should-somebody-teach-themselves-programming-skills/#comments</comments>
		<pubDate>Thu, 29 Jul 2010 07:36:57 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Microstrategy]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Open source]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2677</guid>
		<description><![CDATA[From time to time,  I get in a conversation with somebody who is:

Unemployed, underemployed, or otherwise desirous of having more commercial skills.
Not a programmer, but desirous of having some technical skills.
Astute enough to realize s/he will never be a serious techie.

I generally have two models in mind when guiding such a person:

Analytics/business intelligence/stats.
Website building.

Those are [...]]]></description>
			<content:encoded><![CDATA[<p>From time to time,  I get in a conversation with somebody who is:</p>
<ul>
<li>Unemployed, underemployed, or otherwise desirous of having more commercial skills.</li>
<li>Not a programmer, but desirous of having some technical skills.</li>
<li>Astute enough to realize s/he will never be a serious techie.</li>
</ul>
<p>I generally have two models in mind when guiding such a person:</p>
<ul>
<li>Analytics/business intelligence/stats.</li>
<li>Website building.</li>
</ul>
<p>Those are both useful skill sets for people who aren&#8217;t full-time techies, the first perhaps best for those who are more quantitative and big-company-friendly, the second perhaps better for the creative and/or rebellious types.</p>
<p>So what SPECIFICALLY should one guide them to do? My initial thoughts include:  <span id="more-2677"></span></p>
<ul>
<li>Learning Java is overkill for most of these people.</li>
<li>Learning C++ is overkill for ALL of these people. If you&#8217;re not out to be a hardcore engineer, the &#8220;advantages&#8221; of C++ over Java are pointless.</li>
<li>They all should learn some SQL.</li>
<li>MySQL is the most accessible DBMS against which to learn SQL. They should download a (free) copy and install it on their PC.</li>
<li>But I have no idea which books or websites they should go to to learn about SQL.</li>
<li>While at first blush it sounds like overkill, downloading and installing the free version of Microstrategy 9 is a good way to learn about BI and also the analytic side of SQL.</li>
<li>The first thing you learn in an app dev tool used to be and probably still is how to do a master-detail form. That would cover the other side of learning SQL. But what would be a good choice of tool? (Preferably free, as building serious OLTP apps is probably not what these people will want to do.)</li>
<li>One idea I had is that the website-oriented ones should learn how to modify WordPress, by which I really mean modifying WordPress themes. That would involve learning PHP, SQL, and HTML/CSS, which seems like a great place to start.</li>
<li>But I have no idea which books or websites they should go to to learn  about PHP.</li>
<li>I also have no idea which books or websites they should go to to learn  about CSS &#8212; or for that matter even basic HTML.</li>
<li>If they want to take the analytics route, I assume R is the way to go. Thoughts?</li>
<li>Python isn&#8217;t the ideal language for much of anything, but it&#8217;s an easily accessible &#8220;first language&#8221;. Umm, is that a good way to go, or would PHP be a better choice?</li>
<li>Any other ideas?</li>
</ul>
<p>For anybody who pitches in &#8212; thanks!! I hope to get enough useful answers so as to keep editing this post with people&#8217;s ideas.</p>
<p><em>Edit:</em> Suggestions have started to come in on Twitter. A couple of folks are saying that HTML is a good place to start. Hard to argue with that, although it&#8217;s hardly where one should finish. There also was <a href="http://twitter.com/labsji/status/19813118791" onclick="javascript:pageTracker._trackPageview('/twitter.com');">a vote for Yahoo YQL</a>, and of course for a vendor&#8217;s own product.</p>
<p>Some great points are in the comments below, including the idea that you should pick an actual, fun, small project to build to get you started. (A site built in WordPress or Mambo would be a pretty obvious choice for such a project, come to think of it.)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/07/29/how-should-somebody-teach-themselves-programming-skills/feed/</wfw:commentRss>
		<slash:comments>32</slash:comments>
		</item>
		<item>
		<title>dbShards &#8212; a lot like an MPP OLTP DBMS based on MySQL or PostgreSQL</title>
		<link>http://www.dbms2.com/2010/07/28/dbshards/</link>
		<comments>http://www.dbms2.com/2010/07/28/dbshards/#comments</comments>
		<pubDate>Wed, 28 Jul 2010 09:39:11 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[dbShards and CodeFutures]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2662</guid>
		<description><![CDATA[I talked yesterday w/ Cory Isaacson, who runs CodeFutures, makers of dbShards.  dbShards is a software layer that turns an ordinary DBMS (currently MySQL or PostgreSQL) into an MPP shared-nothing ACID-compliant OLTP DBMS. Technical highlights included:  

Despite heavy emphasis on the 	word “sharding,” dbShards&#8217;s scale-out is transparent to the 	application programmer. E.g., in dbShards [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I talked yesterday w/ Cory Isaacson, who runs CodeFutures, makers of dbShards.  dbShards is a software layer that turns an ordinary DBMS (currently MySQL or PostgreSQL) into an MPP shared-nothing ACID-compliant OLTP DBMS. Technical highlights included:  <span id="more-2662"></span></p>
<ul>
<li>Despite heavy emphasis on the 	word “sharding,” dbShards&#8217;s scale-out is transparent to the 	application programmer. E.g., in dbShards + MySQL, the APIs are more 	or less the same ones you&#8217;d expect for MySQL (JDBC, etc.)</li>
<li>If the DBMS underneath is 	ACID-compliant (e.g., MySQL + InnoDB), then the dbShards version is 	ACID-compliant too.</li>
<li>Beyond those basics, I forgot to 	check the fine details of dbShards&#8217; MySQL (or PostgreSQL) syntax 	support. <a href="http://highscalability.com/blog/2010/6/23/product-dbshards-share-nothing-shard-everything.html" onclick="javascript:pageTracker._trackPageview('/highscalability.com');">Todd 	Hoff, however, did not forget</a>.</li>
<li>dbShards keeps copies of each 	shard on two different servers, via asynchronous log-shipping. This 	allows for failover in both planned and unplanned outages.</li>
<li>dbShards wants you to distribute 	big tables among shards via a “shard key,” which is a lot like 	the distribution key in MPP analytic DBMS. You&#8217;re encouraged to 	replicate small, low-update-volume tables across each shard.</li>
<li>Cory says that dbShards has good 	join performance when – you guessed it! – everything being joined 	is co-located shard-by-shard, because the tables were distributed on 	the same shard key and/or replicated across each shard. Cory can&#8217;t 	imagine why you&#8217;d want to do an inner join under any other 	circumstances.</li>
<li>The basic dbShards query execution 	model is: A query comes in; it&#8217;s parsed; a shard key is 	automagically detected (one hopes); the “global configuration 	file” is checked to see which shard to ship the work off too. I 	forgot to ask whether lookup was done via a hash table (the obvious 	guess) or something else. The programmer can put hints in the code 	comments to direct the sharding, but Cory asserts those aren&#8217;t 	needed very often.</li>
<li>Cory says that insert performance 	with dbShards + MySQL + InnoDB is 1500-3000 inserts per shard per 	second, scaling almost linearly with the number of shards. I forgot 	to ask how many shards this had been tested for.</li>
<li>If you want blazing dbShards 	performance, Cory&#8217;s base-case figure is 25 gigabytes of data per 	node, so that the most commonly used indexes can camp out in memory. 	(I forgot to ask what kind of hardware he was assuming per node.) 	This is if you&#8217;re going to be doing joins or aggregrations. If it&#8217;s 	just single-row inserts and updates, or if your performance 	requirements are lower, you can go with 10X that figure.</li>
<li>Cory tells stories wherein going 	from an unsharded database to 4 or so shards took database 	re-indexing time down 50X or more.  Apparently, such tasks can be 	exponential or even super-exponential with database size over 	InnoDB. (That said, I&#8217;d be surprised if all large InnoDB users 	suffered from that problem to the same degree.)</li>
<li>dbShards&#8217; customer workloads are 	all &gt;= 50% reads. This is reflective of dbShards&#8217; design 	priorities.</li>
<li>As long as it can be in charge, 	dbShards is happy to interface to whatever kind of database backup 	software you want to use on a node by node basis. (dbShards wants to 	drive your backup software for you so that it can be sure the 	replicas are handled properly.)</li>
<li>It&#8217;s “fairly common” for 	dbShards to be paired with memcached. I forgot to ask whether 	memcached typically lived on its own pool of servers, or on the same 	pool that runs dbShards.</li>
<li>Future DBMS options under 	consideration for dbShards include Oracle and (unspecified) 	in-memory.</li>
</ul>
<p style="margin-bottom: 0in;">Business highlights for CodeFutures and dbShards include:</p>
<ul>
<li>dbShards&#8217; price is 	$5000/server/year, including support and OEMed MySQL, with stated 	quantity discounts up to 40%.</li>
<li>dbShards cloud pricing is 	different (on a usage basis).</li>
<li>dbShards has 6 or so customers, 	half each on-premises and in the cloud. One of them is Facebook. (Those &#8220;100s&#8221; of customers mentioned on the dbShards website are for a fairly unrelated product.)</li>
<li>CodeFutures has been at this 2 ½ 	years or so. There is no venture capital in the company.</li>
<li>Early deals dbShards deals have 	evidently involved a fair amount of professional services.</li>
<li>Counting contractors, Code Futures 	has 10-12 people, which has been as high as 15.</li>
<li>Target dbShards customers are as 	you&#8217;d expect. Cory says he&#8217;s actually been more successful getting 	early-adopted money out of Web companies than Wall Street firms.</li>
<li>There are a couple of dbShards 	PostgreSQL customers for greenfield applications. Most dbShards 	customers and prospects, however, are looking to scale out existing 	apps.</li>
<li>Despite its connection to open source DBMS, there&#8217;s nothing open source about dbShards itself.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/07/28/dbshards/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Yet more on the GPL, WordPress themes, and the implications for MySQL storage engines</title>
		<link>http://www.dbms2.com/2010/07/23/gpl-wordpress-themes-mysql-storage-engines/</link>
		<comments>http://www.dbms2.com/2010/07/23/gpl-wordpress-themes-mysql-storage-engines/#comments</comments>
		<pubDate>Fri, 23 Jul 2010 04:53:56 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Open source]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2616</guid>
		<description><![CDATA[The debate I wrote about a few days ago over whether or not the WordPress theme called Thesis needed to be GPLed has been resolved in practice &#8211; it will be. More precisely, the parts that WordPress developers and the Free Software Foundation said need to be GPLed will be GPLed, while the rest won&#8217;t [...]]]></description>
			<content:encoded><![CDATA[<p>The debate I wrote about a few days ago over <a href="http://www.dbms2.com/2010/07/17/mysql-gpl-storage-engine-wordpress-theme/" >whether or not the WordPress theme called Thesis needed to be GPLed</a> has been resolved in practice &#8211; it will be. More precisely, <a href="http://thenextweb.com/socialmedia/2010/07/22/wordpress-vs-thesis-the-battle-is-over/" onclick="javascript:pageTracker._trackPageview('/thenextweb.com');">the parts that WordPress developers and the Free Software Foundation said need to be GPLed will be GPLed</a>, while the rest won&#8217;t be, those parts being, in essence, the more &#8220;artistic&#8221; elements.</p>
<p>A consensus seems to have emerged that Thesis had actually copied beyond-fair-use amounts of WordPress code, which if true was Game Over. Beyond that, however, both sides of the strongly-viral-GPL debate scored some points.  <span id="more-2616"></span></p>
<ul>
<li>Public pressure, FUD, etc. in favor of the GPL clearly were successful.</li>
<li>Mark Aquith carefully explained  <a href="http://markjaquith.wordpress.com/2010/07/17/why-wordpress-themes-are-derivative-of-wordpress/" onclick="javascript:pageTracker._trackPageview('/markjaquith.wordpress.com');">how tightly integrated WordPress and WordPress themes are</a>. So we don&#8217;t necessarily have much of a precedent for more hands-off integration.</li>
<li>Previous precedents were dredged up on both sides. More on that below.</li>
<li>Once again, <strong>the claims about the GPL were not tested in court.</strong> Various open source developers and lawyers have stated their opinions as to what their rights are, but the courts were not given the opportunity in this case to (dis)agree on <strong>the core issue</strong> &#8230;</li>
<li>&#8230; which everybody seems to be correctly agreeing is: <strong>&#8220;When is one software program a &#8216;derivative work&#8217; of another one, in the copyright-law sense of &#8216;derivative work&#8217;?&#8221;</strong></li>
</ul>
<p>The pro-GPL argument on that last point would probably boil down, colloquially, to &#8220;Well, is it one program or two? If it&#8217;s one, then it clearly is derivative from the big part that&#8217;s copyrighted under the GPL. If it&#8217;s two, then the creator of the second one is home free &#8212; but c&#8217;mon, now, it&#8217;s really just one.&#8221; That, in turn, would be supported by Dan Weinreb&#8217;s point from <a href="http://www.dbms2.com/2009/04/21/i-dont-see-why-the-gpl-would-be-a-major-barrier-to-a-useful-mysql-fork/" >a previous MySQL storage engine/GPL comment thread</a> (which also has a lot of applicability to the WordPress theme case) &#8211;<strong> you really don&#8217;t want to make the user do two installs, so you really do want to include the GPLed code in your package.</strong> And by the way &#8212; so far the product packaging by the MySQL storage engine vendors* is in line with Dan&#8217;s observation.</p>
<p><em>*Infobright, Akiban, Tokutek, Calpont, et al.</em></p>
<p>One point I haven&#8217;t seen discussed much yet is this:</p>
<p><strong>Suppose a MySQL storage engine vendor integrated with a forked, GPLed MySQL, and then didn&#8217;t obey the GPL. Who would have standing to sue them? </strong>It&#8217;s obvious that the developers of the forked MySQL would. But it&#8217;s not at all obvious that Oracle would. A derivative of a derivative of a copyrighted work is NOT necessarily a derivative of the original. (Think about it.) Unless Oracle could prove that the MySQL storage engine really did happen to be a derivative of MySQL Classic, I don&#8217;t know why Boogeyman Oracle would have standing to sue.</p>
<p>Finally, those precedents.</p>
<ul>
<li>A couple of comments on <a href="http://www.dbms2.com/2010/07/17/mysql-gpl-storage-engine-wordpress-theme/" >my earlier post</a> point out that the Linux community tends to be pretty tolerant of proprietary code that links tightly into Linux. Oh, they may not like &#8212; but in most cases they neither do sue nor believe they successfully can.</li>
<li>Wikipedia cites <a href="http://en.wikipedia.org/wiki/GPL#The_GPL_in_court" onclick="javascript:pageTracker._trackPageview('/en.wikipedia.org');">some cases in which the GPL has been successfully enforced</a>.</li>
<li>One of the earliest GPL controversies was over what sounds like a MySQL storage engine &#8212; <a href="http://www.nusphere.com/products/library/gemini.pdf" onclick="javascript:pageTracker._trackPageview('/www.nusphere.com');">Progress Software&#8217;s NuSphere</a>, developed in connection with MySQL AB. Litigation ensued, and before the case was settled, the judge wrote &#8220;<a href="http://pacer.mad.uscourts.gov/dc/opinions/saris/pdf/progress%20software.pdf" onclick="javascript:pageTracker._trackPageview('/pacer.mad.uscourts.gov');">After hearing, MySQL seems to have the better argument here, but the matter is one of fair dispute.</a>&#8220;</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/07/23/gpl-wordpress-themes-mysql-storage-engines/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>New insights into the GPL vs. MySQL storage engine debates</title>
		<link>http://www.dbms2.com/2010/07/17/mysql-gpl-storage-engine-wordpress-theme/</link>
		<comments>http://www.dbms2.com/2010/07/17/mysql-gpl-storage-engine-wordpress-theme/#comments</comments>
		<pubDate>Sun, 18 Jul 2010 02:12:45 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2604</guid>
		<description><![CDATA[Around the time of Oracle&#8217;s acquisition of Sun and hence MySQL, there was a lot of discussion as to whether MySQL&#8217;s GPL license could inhibit MySQL storage engine vendors from selling their products without MySQL code (e.g., with MySQL-fork front-ends).  I argued No. Most people, however, seemed to think &#8220;Yes, and even if the matter [...]]]></description>
			<content:encoded><![CDATA[<p>Around the time of Oracle&#8217;s acquisition of Sun and hence MySQL, there was a lot of discussion as to whether MySQL&#8217;s GPL license could inhibit MySQL storage engine vendors from selling their products without MySQL code (e.g., with MySQL-fork front-ends).  I <a href="http://www.dbms2.com/2009/04/21/i-dont-see-why-the-gpl-would-be-a-major-barrier-to-a-useful-mysql-fork/" >argued</a> <a href="http://www.dbms2.com/2009/05/22/yet-more-on-mysql-forks-and-storage-engines/" >No</a>. Most people, however, seemed to think &#8220;Yes, and even if the matter isn&#8217;t clear, the threat of nasty lawyers creates enough FUD to be a practical market problem for the storage engine vendors.&#8221; Based on those concerns, I eventually took the position that <a href="http://www.dbms2.com/2009/09/10/what-could-or-should-make-oraclemysql-antitrust-concerns-go-away/" >Oracle should be inhibited for antitrust reasons from invoking its real or alleged GPL rights to mess with the MySQL storage engine vendors</a>. Oracle&#8217;s agreement with the EU <a href="http://www.dbms2.com/2009/12/14/oracle-mysql-storage-engine/" >alleviated that concern</a>, except that there was an annoying time limit on the alleviation.</p>
<p>Now a related can of worms has been opened in a related technology area &#8212; <strong>WordPress</strong> and <strong>WordPress themes</strong>. Since many bloggers use WordPress, this has gotten a lot of attention, and some interesting new insights have emerged. <span id="more-2604"></span><em></em></p>
<p><em>Um, in case you didn&#8217;t know: WordPress is the software that runs blogs such as this, and it&#8217;s a GPLed open source project. However, the user interface &#8212; look, feel, and behavior alike &#8212; are determined by separate</em> themes,<em> that one usually gets from third parties (WordPress ships with a a few default choices). </em></p>
<p>It started when Matt Mullenweg went after the makers of an unfree theme Thesis, and wielding <a href="http://wordpress.org/news/2009/07/themes-are-gpl-too/" onclick="javascript:pageTracker._trackPageview('/wordpress.org');">a legal opinion from the Software Freedom Law Center</a>. The gist of the SFLC&#8217;s argument seems to be</p>
<blockquote><p>They are derivative of WordPress because every part of them is determined by the content of the WordPress functions they call. As works of authorship, they are designed only to be combined with WordPress into a larger work.</p></blockquote>
<p>And of course the point of the GPL is that if you create a derivative work of something GPLed, you have to GPL it yourself.</p>
<p>However, <em>Perpetual Beta</em> pointed out that, under the rules of copyright law as expressed in a court case known as Galoob, <a href="http://perpetualbeta.com/release/2009/11/why-the-gpl-does-not-apply-to-premium-wordpress-themes/" onclick="javascript:pageTracker._trackPageview('/perpetualbeta.com');">depending on another program does not make something a derivative work</a>. This is actually blindingly obvious, as in the example of any program that runs on top of an operating system. Or for more examples see <a href="http://scripting.com/stories/2010/07/16/areWordpressThemesNecessar.html" onclick="javascript:pageTracker._trackPageview('/scripting.com');">Dave Winer</a> on the point.</p>
<p><em>Perpetual Beta</em> further argued that, even if it were a derivative work, <a href="http://perpetualbeta.com/release/2009/11/why-the-gpl-does-not-apply-to-premium-wordpress-themes/" onclick="javascript:pageTracker._trackPageview('/perpetualbeta.com');">fair use would let one copy it anyway</a>. I.e., if you&#8217;re engaging in &#8220;fair use,&#8221; you&#8217;re entitled to do what otherwise would be a copyright violation. Good point. The GPL license says in effect &#8220;You only are allowed to use this material (in certain ways) if you do as we say about your own work,&#8221; so that is defeated if the Fair Use Doctrine lets you say &#8220;Um, actually, I&#8217;m using this without your permission, so buzz off.&#8221;</p>
<p>GPL advocates can pontificate all they want about certain uses of GPLed code violating their license terms. But if either of these arguments holds up &#8212; and it looks to me like both do &#8212; <strong>a program that invokes GPLed code is not subject to the GPL just based on those invocations. </strong>And that, in turn, would more than imply that <strong>MySQL storage engine vendors could use GPLed MySQL-compatible front-ends without being under any GPL obligation themselves.<br />
</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/07/17/mysql-gpl-storage-engine-wordpress-theme/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Infobright&#8217;s Release 3.4</title>
		<link>http://www.dbms2.com/2010/06/27/infobright-release-3-4/</link>
		<comments>http://www.dbms2.com/2010/06/27/infobright-release-3-4/#comments</comments>
		<pubDate>Sun, 27 Jun 2010 15:09:41 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2413</guid>
		<description><![CDATA[Infobright called a couple weeks ago to discuss, among other subjects, its subsequently-released Infobright Release 3.4. I made no effort to distinguish between community/open source and professional/chargeable editions, but leaving that aside, it seems fair to characterize Infobright 3.4 as having two overlapping primary themes:

Performance and bottleneck 	cleanup.
“Omigod, you 	mean you didn&#8217;t have that feature [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Infobright called a couple weeks ago to discuss, among other subjects, its subsequently-released Infobright Release 3.4. I made no effort to distinguish between community/open source and professional/chargeable editions, but leaving that aside, it seems fair to characterize Infobright 3.4 as having two overlapping primary themes:</p>
<ul>
<li>Performance and <a href="http://www.dbms2.com/2009/08/21/bottleneck-whack-a-mole/" >bottleneck 	cleanup</a>.</li>
<li>“Omigod, you 	mean you didn&#8217;t have that feature before?” cleanup.</li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;">That said, the traditional release for cleaning up the last huge gaps in an analytic DBMS product seems have become 4.0; recent examples include <a href="http://www.dbms2.com/2009/10/30/aster-data-application-server-ncluster/" >Aster Data</a>, <a href="http://www.dbms2.com/2010/02/22/vertica-4/" >Vertica</a> and <a href="http://www.dbms2.com/2010/04/12/greenplumchorus/" >Greenplum</a>. Infobright seems on track to be another example of that rule.</p>
<p style="margin-bottom: 0in; font-style: normal;"><em>Ack. Now that I&#8217;ve said that, other vendors are going to be tempted to accelerate their numbering so as to reach the 4.0 mark sooner &#8230;</em></p>
<p style="margin-bottom: 0in;">A lot of Infobright performance enhancements are in the vein “We used to rely on generic MySQL for that, but now we do it ourselves, and it works a lot better.” Examples include:  <span id="more-2413"></span></p>
<ul>
<li>Infobright now does DELETEs all at 	once, vs. the previous row-by-row way. This makes DELETE performance 	similar to SELECT performance, when previously there was a big 	difference.</li>
<li>Ditto, if I understood correctly, 	INSERTs and UPDATEs.</li>
<li>Each release, Infobright covers 	more SQL functionality itself and passes less through to the generic 	MySQL engine.</li>
<li>UTF-8 Unicode data can now be 	loaded via Infobright&#8217;s parallel loader. Previously, you had to use 	MySQL&#8217;s load.</li>
</ul>
<p style="margin-bottom: 0in;">Infobright has also added workload management in 3.4, and this is intertwined with multicore parallelization, apparently because the workload manager decides when a query should use multiple cores to execute. Infobright further says that multi-user INSERT performance has increased a lot more than single-user, but I have forgotten why that is.</p>
<p style="margin-bottom: 0in;">Infobright now streams data back to the client faster. E.g., unless there&#8217;s some good reason not to, partial query results are pipelined back as they become available.* Finally, loading data no longer locks tables from being read (in my notes I wasn&#8217;t sure whether that was a current or future feature, but Infobright&#8217;s marketing seems to indicate it&#8217;s current). For some reason, Infobright is positioning this as a major, innovative feature.</p>
<p style="margin-bottom: 0in;"><em>*A good reason not to do this would be an aggregate that requires full materialization of the table before Infobright can carry it out.</em></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/27/infobright-release-3-4/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>The Clustrix story</title>
		<link>http://www.dbms2.com/2010/05/12/the-clustrix-story/</link>
		<comments>http://www.dbms2.com/2010/05/12/the-clustrix-story/#comments</comments>
		<pubDate>Wed, 12 May 2010 08:53:48 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Application areas]]></category>
		<category><![CDATA[Clustrix]]></category>
		<category><![CDATA[Emulation, transparency, portability]]></category>
		<category><![CDATA[Games and virtual worlds]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Solid-state memory]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2096</guid>
		<description><![CDATA[After my recent post, the Clustrix guys raised their hands and briefed me. Takeaways included:    

Nothing in my 	original short post about Clustrix was actually incorrect.
Clustrix plans to reveal actual 	production “name-brand” customers soon.
The name of Clustrix&#8217;s software, 	or at least the guts thereof, is Sierra.
Clustrix&#8217;s products have actually 	been in general availability since last [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">After my recent post, the Clustrix guys raised their hands and briefed me. Takeaways included:    <span id="more-2096"></span></p>
<ul>
<li>Nothing in <a href="../2010/05/04/clustrix-may-be-doing-something-interesting/">my 	original short post about Clustrix</a> was actually incorrect.</li>
<li>Clustrix plans to reveal actual 	production “name-brand” customers soon.</li>
<li>The name of Clustrix&#8217;s software, 	or at least the guts thereof, is Sierra.</li>
<li>Clustrix&#8217;s products have actually 	been in general availability since last quarter, with some versions 	at customer sites for 2 years. Development started 3 ½ years ago.</li>
<li>Clustrix says its technology is 	for OLTP systems, which it calls “non-batch/non-analytic,” with 	mixed read/write workloads. All Clustrix&#8217;s example target markets 	are “internet verticals,” such as photo sharing, gaming, social 	media, e-commerce, etc.</li>
<li>Clustrix&#8217;s heart is in SQL, as is 	most of its customer base. Clustrix Sierra&#8217;s key-value-store option 	has little or no performance advantage over Clustrix Sierra&#8217;s SQL 	option, nor any other advantage over SQL that came up in discussion.</li>
<li>Clustrix Sierra is 	“wire-compatible” with MySQL, but doesn&#8217;t use MySQL code; 	Clustrix wrote all the code itself.</li>
<li>Clustrix asserts that Clustrix 	Sierra supports the “vast majority” of MySQL features. Examples 	of MySQL features Clustrix doesn&#8217;t support at this time are 	full-text search and geospatial indexing.</li>
<li>Indeed, Clustrix claims Clustrix 	Sierra can be used to replace MySQL with few or zero changes to 	existing applications.</li>
<li>I specifically asked about 	referential integrity, which has a poor performance reputation in 	MySQL. Besides saying they supported it, Clustrix said that some 	customers actually use referential integrity in some of their less 	active tables.</li>
<li>Clustrix Sierra is fully 	ACID-compliant, with no eventual consistency or <a href="http://www.dbms2.com/2010/05/01/ryw-read-your-writes-consistency/" >RYW consistency</a> story. The default number of copies of each datum is two, and 	they&#8217;re kept consistent via two-phase commit.</li>
<li>Clustrix Sierra is fully parallel, 	with no “head” node. I forgot to ask how it was determined which 	queries would be addressed to and/or controlled by which nodes, but 	I presume there&#8217;s some sort of a load-balancing scheme.</li>
<li>Clustrix says that because 	Clustrix Sierra uses MVCC (Multi-Version Concurrency Control), and 	thus reads and writes don&#8217;t block each other, global locks aren&#8217;t a 	major issue. (They&#8217;re rare or short or something – I have trouble 	seeing why they would be non-existent.)</li>
<li>Clustrix says there&#8217;s a second 	class of locks and latches that are purely local and short-lived, 	for B-tree indexes and the like. (I didn&#8217;t drill down into those 	either.) I guess this means Clustrix Sierra is B-tree-centric, which 	makes sense for an OLTP-oriented system.</li>
<li>Clustrix Sierra distributes data 	among nodes via consistent hashing (default), range partitioning, or 	“full distribution”(i.e., coping a – presumably small – 	table to each node). The choice of distribution plans is manual now; 	more automation is a future feature.</li>
<li>Clustrix Sierra&#8217;s CBO (Cost-Based 	Optimizer) is, as one would hope, distribution-aware.</li>
<li>Clustrix Sierra compiles query 	fragments and ships them off to the relevant nodes. A fragment might 	contain both instructions for SQL to be executed locally and for 	where data is to be sent next.</li>
<li>Clustrix says that Clustrix Sierra 	does data migration and redistribution (e.g., when you add a node) 	transparently online, and further says that in practice this doesn&#8217;t 	cause a performance hit.</li>
<li>As for Clustrix hardware:
<ul>
<li>Clustrix makes <a href="http://www.monashreport.com/2007/01/29/computing-appliances-trends/" onclick="javascript:pageTracker._trackPageview('/www.monashreport.com');">Type 	I appliances</a>.</li>
<li>A Clustrix node contains 2 	quad-core chips, 32 gigs of RAM, and 7 160 GB solid-state drives.</li>
<li>Specifically, Clustrix is using 	Intel SSDs, with a SAS interface.</li>
<li>Clustrix says solid-state memory 	isn&#8217;t really essential to the product design; it&#8217;s just cheap in 	terms of $/IOPS (I/O Per Second).</li>
</ul>
</li>
<li>A minimum Clustrix configuration 	is 3 nodes, for redundancy. After that you can add nodes one at a 	time. Clustrix says it built a 20-node system in-house, leading me 	to suspect that customers don&#8217;t have anything bigger than 20 nodes 	either.</li>
<li>That 20-node Clustrix system was 	tested to show near-linear scalability. (In discussing this, 	Clustrix tends to forget to use the word “near”.)</li>
<li>Clustrix has partnered with 	somebody to provide global 4-hour-response support. As of now 	Clustrix seems to be active mainly in North America and Europe.</li>
<li>Clustrix is formed from the 	combination of two startups, which I&#8217;ve heard elsewhere were called 	Clustrix and Sprout. Exactly when the combination happened sounds a 	little different depending on who&#8217;s telling the story (one version 	has the predecessors still being separate well into 2008, but 	Clustrix implies the combination happened pretty much on Day 1).</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/12/the-clustrix-story/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Notes on the evolution of OLTP database management systems</title>
		<link>http://www.dbms2.com/2010/04/05/oltp-database-management-systems-2/</link>
		<comments>http://www.dbms2.com/2010/04/05/oltp-database-management-systems-2/#comments</comments>
		<pubDate>Mon, 05 Apr 2010 08:22:03 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Akiban]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EnterpriseDB and Postgres Plus]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Mid-range]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[VoltDB and H-Store]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1841</guid>
		<description><![CDATA[The past few years have seen a spate of startups in the analytic DBMS business. Netezza, Vertica, Greenplum, Aster Data and others are all reasonably prosperous, alongside older specialty product vendors Teradata and Sybase (the Sybase IQ part).  OLTP (OnLine Transaction Processing) and general purpose DBMS startups, however, have not yet done as well, with [...]]]></description>
			<content:encoded><![CDATA[<p>The past few years have seen a spate of startups in the analytic DBMS business. Netezza, Vertica, Greenplum, Aster Data and others are all reasonably prosperous, alongside older specialty product vendors Teradata and Sybase (the Sybase IQ part).  OLTP <span style="font-weight: normal;">(OnLine Transaction Processing) </span>and general purpose DBMS startups, however, have not yet done as well, with such success as there has been (MySQL, Intersystems Cache&#8217;, solidDB&#8217;s exit, etc.) generally accruing to products that originated in the 20th Century.</p>
<p>Nonetheless, OLTP/general-purpose data management startup activity has recently picked up, targeting what I see as some very real opportunities and needs. So as a jumping-off point for further writing, I thought it might be interesting to collect a few observations about the market in one place.  These include:</p>
<ul>
<li><span style="font-weight: normal;">Big-brand 	OLTP/general-purpose DBMS have more “stickiness” 	than analytic DBMS.</span></li>
<li><span style="font-weight: normal;">By 	number, most of an enterprise&#8217;s OLTP/general-purpose databases are low-volume and 	low-value. </span></li>
<li>Most 	interesting new OLTP/general-purpose data management products are <span style="font-style: normal;">either 	MySQL-based or NoSQL.</span></li>
<li>It&#8217;s not yet 	clear whether MySQL will prevail over MySQL forks, or vice-versa, or 	whether they will co-exist.</li>
<li>The era of 	silicon-centric relational DBMS is coming.</li>
<li>The emphasis 	on scale-out and reducing the cost of joins spans the NoSQL and 	SQL-based worlds.<em> </em></li>
<li><span style="font-weight: normal;">Users&#8217; 	instance on “free” could be a major problem for OLTP DBMS 	innovation. </span></li>
</ul>
<p style="margin-bottom: 0in;">I shall explain.<span id="more-1841"></span></p>
<p style="margin-bottom: 0in;"><strong>Big-brand OLTP/general-purpose DBMS have more “stickiness” than analytic DBMS.</strong></p>
<ul>
<li>OLTP 	applications are more complex than analytic ones, and hence more 	tightly wired into particular brands of DBMS. For example, 	third-party packaged OLTP applications are typically portable among 	only a few brands of DBMS. But third-party business intelligence 	tools, and the BI “applications” built in them, are more easily 	and widely portable.</li>
<li>Specific technical observations 	such as “OLTP apps tend to use stored procedures, which are 	DBMS-specific” or “OLTP apps tend to have lots and lots of 	tables” serve to underscore the first point.</li>
<li>An enterprise&#8217;s highest-value data 	is commonly the financial stuff handled by its core OLTP systems, so 	those are the last things they want to mess around with just to get 	some cost savings. Security, high availability, and so on are major 	considerations that can outweigh cost.</li>
</ul>
<p style="margin-bottom: 0in;"><strong>By number, most of an enterprise&#8217;s OLTP/general-purpose databases are low-volume and low-value. </strong>Indeed, “OLTP” is often a misnomer, which is why I tend to go with “general-purpose” or some similarly wishy-washy phrase instead.</p>
<ul>
<li>In theory, this is a ripe area for 	what I&#8217;ve called <a href="http://www.dbms2.com/category/database-management-system/mid-range/" >mid-range DBMS</a>.</li>
<li>The big brand vendors try hard to 	keep as many of those databases for themselves as they can. 	Enterprise-wide license pricing helps. Going forward, so will 	virtualization/consolidation strategies, such as <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" >Oracle&#8217;s 	Exadata-centric approach</a>.</li>
<li>A variety of mid-range DBMS 	alternatives beyond the big brands have technical merit, at least in 	some cases and configurations – MySQL, PostgreSQL, Intersystems 	Cache&#8217;, and so on.</li>
<li>The only such mid-range DBMS 	alternative with much large enterprise business momentum, however, 	appears to be MySQL.</li>
</ul>
<p style="margin-bottom: 0in;"><strong>&#8220;General-purpose&#8221; might be a better term than &#8220;OLTP&#8221; anyway.</strong></p>
<ul>
<li>I don&#8217;t have a link, but it&#8217;s widely agreed that over half of the processing on an &#8220;OLTP&#8221; enterprise app is commonly reporting and so on.</li>
<li>&#8220;Operational BI&#8221; is progressing by fits and starts, but it is progressing.</li>
<li>Anything customer-facing &#8212; web-based, call center, or otherwise &#8212; is likely to include a heavy dose of &#8220;real-time&#8221; analytic optimization.</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Most interesting new OLTP/general-purpose data management products are <span style="font-style: normal;">either MySQL-based or NoSQL.</span></strong></p>
<ul>
<li><a href="http://www.dbms2.com/2009/06/22/h-store-horizontica-voltdb/" >VoltDB</a> is the main 	exception that jumps to mind.</li>
<li>This isn&#8217;t true in the analytic 	DBMS area, where Netezza, Greenplum, Aster, Vertica and others 	started from PostgreSQL&#8217;s code, APIs, or both.</li>
</ul>
<p style="margin-bottom: 0in;"><strong>It&#8217;s not yet clear whether MySQL will prevail over MySQL forks, or vice-versa, or whether they will co-exist.</strong></p>
<ul>
<li>MySQL is a limited product without 	all the third-party storage engines that are being developed.</li>
<li><a href="http://www.dbms2.com/2009/12/14/oracle-mysql-storage-engine/" >Oracle&#8217;s promise of MySQL good 	behavior</a> has an expiration date.</li>
<li>None of the MySQL front-end 	alternatives are remotely mature yet.</li>
</ul>
<p style="margin-bottom: 0in;"><strong>The era of silicon-centric relational DBMS is coming.</strong></p>
<ul>
<li>I think “silicon” means 	“solid-state memory” as much as or more than it means “RAM,” 	but that&#8217;s not yet certain.</li>
<li>What is pretty certain is that, 	thanks to Moore&#8217;s Law, some kind of silicon will increasingly 	replace disk.</li>
<li><a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" >Oracle&#8217;s increasingly 	Flash-centric story</a> is a challenge to everybody.</li>
<li>RAM-centric VoltDB will launch 	fairly soon. (By the way, while VoltDB still has <a href="http://www.dbms2.com/2009/06/22/h-store-horizontica-voltdb/" >a lot in common 	with H-Store</a>, they&#8217;re not exactly the same thing. And <a href="http://bit.ly/9QxjV2." onclick="javascript:pageTracker._trackPageview('/bit.ly');">H-Store 	research</a> is progressing too.)</li>
<li><span style="font-style: normal;"><a href="http://rethinkdb.com/" onclick="javascript:pageTracker._trackPageview('/rethinkdb.com');">RethinkDB</a> is being de</span>veloped, focused directly on solid-state memory. 	Based on the sparse information available online, RethinkDB sounds 	somewhat like a dumbed-down H-Store.</li>
<li>New disk-based vendors may never 	optimize their use of disk, instead targeting a solid-state future. 	(E.g., I think Akiban should and quite well might follow this path.)</li>
</ul>
<p style="margin-bottom: 0in; font-weight: normal;"><strong>The emphasis on scale-out and reducing the cost of joins spans the NoSQL and SQL-based worlds.</strong> We hear that from the <a href="http://www.dbms2.com/2010/03/14/nosql-taxonomy/" >NoSQL</a> guys all the time. But I also just heard it from <a href="http://www.dbms2.com/2010/04/03/akiban-highlights/" >Akiban</a>.</p>
<p style="margin-bottom: 0in;"><strong>Users&#8217; instance on “free” could be a major problem for OLTP DBMS innovation.</strong> Vendors of new OLTP data management technologies often feel obligated to open source their products, notwithstanding the historical lack of revenue in the open source OLTP DBMS market. As just one of many examples,  <a href="http://www.novaspivack.com/uncategorized/evri-ties-the-knot-with-twine" onclick="javascript:pageTracker._trackPageview('/www.novaspivack.com');">Nova Spivack</a> wrote:</p>
<blockquote>
<p style="margin-bottom: 0in;">I have recently seen some new graph data storage products that may provide the levels of scale and performance needed, but pricing has not been determined yet. In short, storage and retrieval of semantic graph datasets is a big unsolved challenge that is holding back the entire industry. We need federated database systems that can handle hundreds of billions to trillions of triples under high load conditions, in the cloud, on commodity hardware and open source software. Only then will it be affordable to make semantic applications and services at Web-scale.</p>
</blockquote>
<p style="margin-bottom: 0in;">I hear similar things from other startups, who evidently believe they need and/or are entitled to enjoy sophisticated, high-performance, zero-cost, specialized database management technology.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/04/05/oltp-database-management-systems-2/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Akiban highlights</title>
		<link>http://www.dbms2.com/2010/04/03/akiban-highlights/</link>
		<comments>http://www.dbms2.com/2010/04/03/akiban-highlights/#comments</comments>
		<pubDate>Sat, 03 Apr 2010 05:36:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Akiban]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Object]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1809</guid>
		<description><![CDATA[Akiban responded quickly to my complaints about its communication style, and I chatted for a couple of hours with senior Akiban techies Ori Herrnstadt, Peter Beaman and Jack Orenstein. It&#8217;s still early days for Akiban product development, so some details haven&#8217;t been determined yet, and others I just haven&#8217;t yet pinned down. Still, I know [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><strong><span style="font-weight: normal;">Akiban responded quickly to my </span></strong><a href="http://www.dbms2.com/2010/03/22/akibanakiba/" ><strong><span style="font-weight: normal;">complaints</span></strong></a><strong><span style="font-weight: normal;"> about its communication style, and I chatted for a couple of hours with senior Akiban techies Ori Herrnstadt, Peter Beaman and Jack Orenstein. It&#8217;s still early days for Akiban product development, so some details haven&#8217;t been determined yet, and others I just haven&#8217;t yet pinned down. Still, I know a lot more than I did a day ago. Highlights of my talk with Akiban included:<span id="more-1809"></span></span></strong></p>
<ul>
<li><strong><span style="font-weight: normal;">Akiban 	is basically in the business of making OLTP (OnLine Transaction 	Processing) DBMS.</span></strong></li>
<li><strong><span style="font-weight: normal;">That 	said, Akiban does not necessarily aspire to offer the DBMS that has 	the best update efficiency or throughput. In particular, the Akiban 	DBMS stores every datum twice, even before replication (and indexes) 	are taken into account.</span></strong></li>
<li><strong><span style="font-weight: normal;">Akiban 	wants to store everything as third normal form relational databases. 	I didn&#8217;t ask whether 3NF is a hard requirement, a Really Good Idea 	if you want Akiban to run fast (that&#8217;s the one I&#8217;d guess), or merely 	a general design assumption.</span></strong></li>
<li><strong><span style="font-weight: normal;">Akiban 	characterizes its core differentiators/value proposition as being:</span></strong>
<ul>
<li><strong><span style="font-weight: normal;">Scale-out</span></strong></li>
<li><strong><span style="font-weight: normal;">No 	need to pay the traditional cost of joins</span></strong></li>
</ul>
</li>
<li><strong><span style="font-weight: normal;">Thus, 	Akiban is telling something like a </span></strong><a href="http://www.dbms2.com/2010/03/14/nosql-taxonomy/" ><strong><span style="font-weight: normal;">NoSQL</span></strong></a><strong><span style="font-weight: normal;"> story.</span></strong></li>
<li><strong><span style="font-weight: normal;">However, 	Akiban offers SQL.</span></strong></li>
<li><strong><span style="font-weight: normal;">Specifically, 	Akiban offers SQL through a MySQL front end. However, the choice of 	front-end could change (Drizzle?), and non-relational front-ends 	(object?)* could eventually also be offered. </span></strong></li>
<li><strong><span style="font-weight: normal;">Akiban&#8217;s 	first target market is SaaS providers, specifically ones that have 	true multitenancy issues. More generally, Akiban is pursuing 	cloud/private cloud applications with lots of tables. (Ori talks of 	a few thousand tables as being a small number.) At least at first, 	Akiban is conceding the market for huge-volume, scale-out, 	no-expensive-join web databases to the NoSQL contenders.</span></strong></li>
<li><strong><span style="font-weight: normal;">Akiban 	has been in prototyping/development of some kind for several years. 	However, Akiban got its first angel funding early last year and its 	first venture funding late in 2009, so development only ramped up 	recently.</span></strong></li>
<li><strong><span style="font-weight: normal;">Ori 	tells a version of the rather common “Everything I need to know in 	life I learned in the Israeli Army, and now I&#8217;m commercializing it” 	story. However, I didn&#8217;t get the sense that Akiban is necessarily a 	direct extension of a specific Israeli military project.</span></strong></li>
</ul>
<p style="margin-bottom: 0in;"><strong><em><span style="font-weight: normal;">* A lot of Boston-area DBMS developers have significant non-relational experience. E.g., Jack Orenstein was an Object Design founder, and Peter Beaman used to work for Intersystems, both object-oriented DBMS vendors.</span></em></strong></p>
<p style="margin-bottom: 0in;"><strong><span style="font-weight: normal;">Akiban technical highlights include:</span></strong></p>
<ul>
<li><strong><span style="font-weight: normal;">Somewhat 	confusingly, Akiban databases are divided into “groups” of 	tables. The point of Akiban groups is:</span></strong>
<ul>
<li><strong><span style="font-weight: normal;">Many-to-many 	relationships exist only within Akiban groups, not among tables in 	different groups.</span></strong></li>
<li><strong><span style="font-weight: normal;">Tables 	within Akiban groups are kind of pre-joined; more precisely, data is 	organized physically in a way that anticipates joins.</span></strong></li>
<li><strong><span style="font-weight: normal;">Thus, 	most Akiban joins can be executed without the cost of traditional 	join algorithms.</span></strong></li>
</ul>
</li>
<li><strong><span style="font-weight: normal;">One 	copy of the data Akiban stores is, in effect, clustered by object. 	E.g., a customer and her orders are stored together, or a patient 	and the records of her doctor visits. That&#8217;s how Akiban anticipates 	most joins.</span></strong></li>
<li><strong><span style="font-weight: normal;">The 	other copy of Akiban data is stored in columns (I&#8217;m not sure if this 	part is strictly columnar or more hybrid row/column), which are 	ordered consistently. In particular, they&#8217;re in an order dictated by 	the organization of the other copy of the data, whatever that means. 	Akiban&#8217;s goal is for this copy of the data to support reporting, 	operational BI, etc.</span></strong></li>
<li><strong><span style="font-weight: normal;">Akiban 	relies heavily on its optimizer to determine data layout, probably more than conventional DBMS do.</span></strong></li>
<li><strong><span style="font-weight: normal;">In 	essence, Akiban has a MySQL front-end and a storage engine back end, 	each running on its own hardware cluster, with each node of one 	cluster talking to each node of the other. </span></strong></li>
<li><strong><span style="font-weight: normal;">I 	gather that Akiban distributes data among nodes clustered according 	to, in effect, object identifier. Presumably, inter-node joins are 	rare. But we didn&#8217;t discuss distribution, replication, or other 	scale-out issues in any detail. Indeed, I gathered that significant 	parts of all that weren&#8217;t built yet, and perhaps even not yet 	architected.<br />
</span></strong></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/04/03/akiban-highlights/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Some NoSQL links</title>
		<link>http://www.dbms2.com/2010/03/12/some-nosql-links/</link>
		<comments>http://www.dbms2.com/2010/03/12/some-nosql-links/#comments</comments>
		<pubDate>Fri, 12 Mar 2010 23:51:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Amazon and its cloud]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Continuent]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Tokutek]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1692</guid>
		<description><![CDATA[I plan to post a few things soon about MongoDB, Cassandra, and NoSQL in general. So I&#8217;m poking around a bit reading stuff on the subjects. Here are some links I found.

A little over a year ago, Julian Browne put up a great post on Eric Brewer&#8217;s CAP conjecture/theorem, which provides much of the impetus [...]]]></description>
			<content:encoded><![CDATA[<p>I plan to post a few things soon about MongoDB, Cassandra, and NoSQL in general. So I&#8217;m poking around a bit reading stuff on the subjects. Here are some links I found.<span id="more-1692"></span></p>
<ul>
<li>A little over a year ago, Julian Browne put up a great post on <a href="http://www.julianbrowne.com/article/viewer/brewers-cap-theorem" onclick="javascript:pageTracker._trackPageview('/www.julianbrowne.com');">Eric Brewer&#8217;s CAP conjecture/theorem</a>, which provides much of the impetus to relax the traditional requirement for atomicity/consistency.</li>
<li>Even more directly inspirational to NoSQL technology development were two seminal papers: Google&#8217;s on <a href="http://labs.google.com/papers/bigtable.html" onclick="javascript:pageTracker._trackPageview('/labs.google.com');">BigTable</a> and Amazon&#8217;s on <a href="http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf" onclick="javascript:pageTracker._trackPageview('/s3.amazonaws.com');">Dynamo</a>. (That said, I&#8217;m having trouble getting myself to actually read them from start to finish, especially since they&#8217;ve been superseded by subsequent technology development.)</li>
<li>10gen (the MongoDB guys) hosted a NoSQL conference yesterday. Much blogging has ensued. The best post I&#8217;ve seen so far was by <a href="http://blog.marcua.net/post/442594842/notes-from-nosql-live-boston-2010" onclick="javascript:pageTracker._trackPageview('/blog.marcua.net');">Adam Marcus</a>. I find the graph database notes near the bottom particularly interesting.</li>
<li>Mark Callaghan hit back against the <a href="http://mysqlha.blogspot.com/2010/03/plays-well-with-others.html" onclick="javascript:pageTracker._trackPageview('/mysqlha.blogspot.com');">NoSQL <span style="text-decoration: line-through;">movement</span> hype</a>, and in particular against the <a href="http://www.dbms2.com/2010/03/02/cassandra-nosql-scalable-oltp/" >MySQL/memcached is passe</a>&#8216; meme. On the other hand, he also bemoaned many failings of MySQL. On the third hand, he praised or at least expressed hope for a variety of MySQL-related technologies, including <a href="http://www.dbms2.com/2009/04/16/introduction-to-tokutek/" >Tokutek&#8217;s TokuDB</a> and <a href="http://www.dbms2.com/2009/09/03/continuent-on-clustering/" >Continuent&#8217;s Tungsten</a>.</li>
<li>In connection with that debate, Mark Rendle offered a <a href="http://blog.markrendle.net/2010/03/do-you-need-relational-database.html" onclick="javascript:pageTracker._trackPageview('/blog.markrendle.net');">funny rant</a>, mainly pro-NoSQL, in the style of a Socratic dialogue.</li>
<li>John Quinn of Digg recently described <a href="http://www.stumbleupon.com/su/5099Ti/about.digg.com/node/564" onclick="javascript:pageTracker._trackPageview('/www.stumbleupon.com');">Digg&#8217;s move from MySQL to Cassandra</a>, and outlined a lot of features Digg was adding to Cassandra, all of which it is open-sourcing.</li>
<li>The NoSQL guys maintain their own long <a href="http://nosql-database.org/links.html" onclick="javascript:pageTracker._trackPageview('/nosql-database.org');">list of NoSQL-related links</a>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/12/some-nosql-links/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 0.375 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2010-09-03 09:46:03 -->
