<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; PostgreSQL</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/postgresql/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Wed, 08 Feb 2012 12:22:57 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Hadapt is moving forward</title>
		<link>http://www.dbms2.com/2011/11/08/hadapt-is-moving-forward/</link>
		<comments>http://www.dbms2.com/2011/11/08/hadapt-is-moving-forward/#comments</comments>
		<pubDate>Tue, 08 Nov 2011 05:40:10 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Hadapt]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5609</guid>
		<description><![CDATA[I&#8217;ve talked with my clients at Hadapt a couple of times recently. News highlights include: The Hadapt 1.0 product is going &#8220;Early Access&#8221; today. General availability of Hadapt 1.0 is targeted for an officially unspecified time frame, but it&#8217;s soon. Hadapt raised a nice round of venture capital. Hadapt added Sharmila Mulligan to the board. [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve talked with my clients at Hadapt a couple of times recently. News highlights include:</p>
<ul>
<li>The Hadapt 1.0 product is going &#8220;Early Access&#8221; today.</li>
<li>General availability of Hadapt 1.0 is targeted for an officially unspecified time frame, but it&#8217;s soon.</li>
<li>Hadapt raised a nice round of venture capital.</li>
<li>Hadapt added Sharmila Mulligan to the board.</li>
<li>Dave Kellogg is in the picture too, albeit not as involved as Sharmila.</li>
<li>Hadapt has moved the company to Cambridge, which is preferable to Yale environs for obvious reasons. (First location = space they&#8217;re borrowing from their investors at Bessemer.)</li>
<li>Headcount is in the low teens, with a target of doubling fast.</li>
</ul>
<p>The <a href="../../../../../2011/07/06/hadapt-update/">Hadapt product story</a> hasn&#8217;t changed significantly from what it was before. Specific points I can add include:   <span id="more-5609"></span></p>
<ul>
<li>With one exception to date, Hadapt beta customers have used PostgreSQL as the underlying DBMS, rather than some faster columnar system.</li>
<li>Sure, you want to process data on the nodes where it resides on the cluster. But if each copy is replicated 3X or so, that gives you good flexibility to be adaptive by deciding which of the three copies you&#8217;ll operate against.</li>
<li>In Hadapt Version 1.0, scheduling and workload management are pretty much Hadoop&#8217;s. However &#8230;</li>
<li>&#8230; an improvement in scheduling is being actively researched.</li>
<li>In general, Hadapt&#8217;s design philosophy for executing SQL is to use MapReduce to get data to the proper nodes, while using the underlying DBMS for node-specific operations such as:
<ul>
<li>Initial retrieval from disk.</li>
<li>Joins and aggregations on data residing at (or visiting) a specific node.</li>
</ul>
</li>
</ul>
<p>A very busy Daniel Abadi also took the time to walk me through how Hadapt does joins. More precisely, what we discussed about joins includes some of the last features being added to Hadapt 1.0; many of the pieces are still missing from early-access Hadapt 1.0, and some may even slip out of the Hadapt 1.0 GA version. As Dan tells it, there are five kinds of joins in Hadapt:</p>
<ul>
<li><strong>Co-partitioned join.</strong> Both tables being joined happen to be partitioned on the join key. Happy happy joy joy. The tables are joined locally on each node, with the results aggregated via MapReduce.</li>
<li><strong>Directed join</strong>. One of the tables being joined happens to be partitioned on the join key. MapReduce distributes the other table along the join key, joins happen locally, and MapReduce does the rest.</li>
<li><strong>Broadcast join.</strong> One of the tables is broadcast in its entirety to every node. Joins then happen locally, and MapReduce does the rest.</li>
<li><strong>Split semijoin. </strong>One of the tables is projected to the join key and a row ID, and then distributed via MapReduce. Joins then happen locally. Later on, the joined rows are completed with the help of a second projection on the first table. MapReduce does the rest.</li>
<li><strong>Distributed/parallel hash join. </strong>Sometimes, Hadapt indeed joins just as Hadoop/Hive would.</li>
</ul>
<p>Highlight&#8217;s of Hadapt&#8217;s performance story include:</p>
<ul>
<li>Dan contends that using a DBMS rather than HDFS (Hadoop Distributed File System) for I/O always gives a performance advantage.</li>
<li>DBMS local-node join performance can be presumed to be superior as well.</li>
<li>Of course, Dan also thinks that using a columnar DBMS would extend Hadapt&#8217;s performance advantage further, but most of the specifics of what Hadapt has told me about why they don&#8217;t routinely use a columnar DBMS yet are NDA.</li>
<li>Even beta Hadapt/PostgreSQL outperforms Hadoop/Hive by almost 10X at Hadapt&#8217;s relatively small number of beta customer sites.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/08/hadapt-is-moving-forward/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Hadapt update</title>
		<link>http://www.dbms2.com/2011/07/06/hadapt-update/</link>
		<comments>http://www.dbms2.com/2011/07/06/hadapt-update/#comments</comments>
		<pubDate>Wed, 06 Jul 2011 23:43:49 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Hadapt]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[VectorWise]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4925</guid>
		<description><![CDATA[I met with the Hadapt guys today.  I think I can be a bit crisper than before in positioning Hadapt and its use cases, namely: Hadapt is additional software on a cluster that also runs fully functional Hadoop/HDFS. (Cloudera Hadoop more than straight-from-Apache Hadoop to date, but that&#8217;s not a requirement.) The cluster also runs [...]]]></description>
			<content:encoded><![CDATA[<p>I met with the Hadapt guys today.  I think I can be a bit crisper than before in positioning Hadapt and its use cases, namely:</p>
<ul>
<li>Hadapt is additional software on a cluster that also runs fully functional Hadoop/HDFS. (Cloudera Hadoop more than straight-from-Apache Hadoop to date, but that&#8217;s not a requirement.)</li>
<li>The cluster also runs a DBMS on every node, such as PostgreSQL or one of Infobright/Vectorwise.</li>
<li>Hadapt&#8217;s software manages parallel SQL queries by distributing them to the DBMS living on each node. Hadapt says that the resulting query performance far outshines Hive&#8217;s.</li>
<li>Hadapt further says that, by exploiting the partner DBMS, its SQL functionality outpaces Hive&#8217;s as well.</li>
<li>Target Hadapt use cases are centered around keeping <a href="http://www.dbms2.com/2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated</a> or other <a href="http://www.dbms2.com/2011/05/17/poly-structured-database/">poly-structured</a> data in Hadoop, and extracting, enhancing, or otherwise deriving some of it to live in the relational store.</li>
<li>In particular, Hadapt seems like an interesting choice when you want to use that relational data as you work on other data that&#8217;s still in HDFS, or if you want to keep using the relational data in other kinds of MapReduce jobs.</li>
<li>That all fits well with my thoughts about the importance of <a href="http://www.dbms2.com/2011/05/30/another-category-of-derived-data/">derived data</a>.</li>
</ul>
<p>Other evolution from <a href="http://www.dbms2.com/2011/03/23/hadapt-commercialized-hadoopdb/">what  I wrote about Hadapt a few months ago</a> includes:</p>
<ul>
<li>Hadapt  is in beta now.</li>
<li>Hadapt has added adult supervision in the form  of <a href="http://www.hadapt.com/wickline-announcement/">Philip Wickline</a>,  late of Endeca.</li>
</ul>
<p>In other news, Hadapt is our newest client.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/06/hadapt-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hadapt (commercialized HadoopDB)</title>
		<link>http://www.dbms2.com/2011/03/23/hadapt-commercialized-hadoopdb/</link>
		<comments>http://www.dbms2.com/2011/03/23/hadapt-commercialized-hadoopdb/#comments</comments>
		<pubDate>Wed, 23 Mar 2011 12:35:52 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Hadapt]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[VectorWise]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4042</guid>
		<description><![CDATA[The HadoopDB company Hadapt is finally launching, based on the HadoopDB project, albeit with code rewritten from scratch. As you may recall, the core idea of HadoopDB is to put a DBMS on every node, and use MapReduce to talk to the whole database. The idea is to get the same SQL/MapReduce integration as you [...]]]></description>
			<content:encoded><![CDATA[<p>The HadoopDB company Hadapt is finally launching, based on <a href="../../../../../2009/09/13/hadoopdb/">the HadoopDB project</a>, albeit with code rewritten from scratch. As you may recall, the core idea of HadoopDB is to put a DBMS on every node, and use MapReduce to talk to the whole database. The idea is to get the same SQL/MapReduce integration as you get if you use Hive, but with much better performance* and perhaps somewhat better SQL functionality.** Advantages vs. a DBMS-based <a href="../../../../../2011/02/24/analytic-platforms/">analytic platform</a> that includes MapReduce &#8212; e.g. Aster Data &#8212; are less clear.  <span id="more-4042"></span></p>
<p><em>*At least if the underlying DBMS is a fast one. Hadapt likes <a href="../../../../../2010/06/11/ingres-vectorwise-technical-highlights/">VectorWise</a> for that purpose, and <a href="http://gigaom.com/cloud/making-hadoop-work-in-more-places-with-hadapt/">is showing performance comparisons</a> that assume VectorWise is underneath.</em></p>
<p><em>**It seems that Hadapt in the future is assured of having more SQL coverage than Hive does today.</em></p>
<p>It&#8217;s still early days for the Hadapt company. Funding is on the angel level. There seem to be six employees &#8212; Yale professor Daniel Abadi, CEO Justin Borgman, Chief Scientist Kamil Bajda-Pawlikowski,* and three other coders. The Hadapt product will go into beta at an unspecified future time; there currently are a couple of alpha users/design partners. The Hadapt company, a Yale spin-off, obviously needs to move from Connecticut soon. I wasn&#8217;t able to detect any particular outside experience in the form of directors or advisors. And <a href="http://www.strategicmessaging.com/public-and-analyst-relations-an-example-of-epic-fail/2011/03/22/">Hadapt&#8217;s marketing efforts are still somewhat ragged</a>. So basically, the reasons for believing in Hadapt pretty much boil down to:</p>
<ul>
<li>Daniel Abadi is a star.**</li>
<li>Hadapt&#8217;s own tests show that Hadapt is a whole lot faster than Hive.</li>
</ul>
<p><em>*Bajda-Pawlikowski is one of the two Abadi students who did the HadoopDB work. It turns out he had numerous years of coding experience before entering graduate school. (The other student, Azza Abouzeid, is pursuing an academic career.)</em></p>
<p><em>**Vertica was built around Daniel&#8217;s C-Store Ph.D. thesis. He was involved in <a href="../../../../../2008/02/19/h-store-architecture/">H-Store</a> as well. He has <a href="http://dbmsmusings.blogspot.com/">a really good blog</a>. He&#8217;s a really nice guy. Etc.</em></p>
<p>As you might have guessed from the name, the Hadapt guys are proud that their technology is &#8220;adaptive,&#8221; which communicates their fond belief that Hadapt&#8217;s query optimization and planning are more modern and cool than other folks&#8217; query planning and optimization. In particular, Daniel suggested that Hadapt is more thoughtful than most DBMS are about looking at the size of intermediate result sets and  then replanning queries accordingly.</p>
<p>However, the really cool adaptivity point is that Hadapt watches the performance of individual nodes, and takes that into account in query replanning. Daniel asserts, credibly, that this is a Really Good Feature to have in cloud and/or virtualized environments, where Hadapt might not have full control and use of its nodes. I&#8217;d add that it could also give Hadapt a lot of flexibility to be run on clusters of non-identical machines.</p>
<p>On the negative side, Hadapt will not at first have any awareness of how its underlying DBMS are optimized; it will plan for VectorWise the same way it does for PostgreSQL. In that regard, this is a DATAllegro 1.0 story. If I understood correctly, Hadapt has specific connectors for a couple of DBMS (probably exactly those two), and can also talk JDBC to anything. PostgreSQL was apparently 5X faster than MySQL when tested (with either ISAM or InnoDB); Daniel snorted about, for example, MySQL&#8217;s apparent fondness for nested-loop joins over hybrid hash. On the other hand, he was more circumspect about his reasons for favoring VectorWise over, to name another open source columnar DBMS, Infobright.</p>
<p>And finally, a couple of other points:</p>
<ul>
<li>Hadapt will be closed source, although it will of course rely on large amounts of other people&#8217;s open source software. Pay no attention to the importance Daniel previously ascribed to HadoopDB&#8217;s open source nature.</li>
<li>Hadapt decompresses data before moving it from node to node, and also before doing non-SQL MapReduce operations on it. Pay no attention to the years Daniel spent insisting columnar DBMS absolutely must operate on data in compressed form.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/03/23/hadapt-commercialized-hadoopdb/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Notes, links, and comments January 20, 2011</title>
		<link>http://www.dbms2.com/2011/01/20/notes-links-and-comments-january-20-2010/</link>
		<comments>http://www.dbms2.com/2011/01/20/notes-links-and-comments-january-20-2010/#comments</comments>
		<pubDate>Thu, 20 Jan 2011 11:35:24 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[About this blog]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[GIS and geospatial]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[MongoDB and 10gen]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=3508</guid>
		<description><![CDATA[I haven&#8217;t done a pure notes/links/comments post for a while. Let&#8217;s fix that now. (A bunch of saved-up links, however, did find their way into my recent privacy threats overview.) First and foremost, the fourth annual New England Database Summit (nee &#8220;Day&#8221;) is next week, specifically Friday, January 28. As per my posts in previous [...]]]></description>
			<content:encoded><![CDATA[<p>I haven&#8217;t done a pure notes/links/comments post for a while. Let&#8217;s fix that now. <em>(A bunch of saved-up links, however, did find their way into my recent <a href="http://www.dbms2.com/2011/01/10/privacy-dangers-an-overview/">privacy threats overview</a>.)</em></p>
<p>First and foremost, the fourth annual <a href="http://db.csail.mit.edu/nedbday11/">New England Database Summit</a> (nee &#8220;Day&#8221;) is next week, specifically Friday, January 28. As per my posts in <a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/">previous</a> <a href="http://www.dbms2.com/2009/01/26/new-england-database-day-this-friday-january-30/">years</a>, I think well of the event, which has a friendly, gathering-of-the-clan flavor. Registration is free, but the organizers would prefer that you register online by the end of this week, if you would be so kind.</p>
<p><em>The two things potentially wrong with the New England Database Summit are parking and the rush hour drive home afterwards. I would listen with interest to any suggestions about dinner plans. </em></p>
<p>One thing I hope to figure out at the Summit or before is what the hell is going on on Vertica&#8217;s blog or, for that matter, at <strong>Vertica.</strong> The recent Mike Stonebraker post that spawned a lot of <a href="http://www.dbms2.com/2011/01/12/mike-stonebraker-on-real-column-stores/">discussion and commentary</a> has disappeared. Meanwhile, Vertica has had three consecutive heads of marketing leave the company since June, and I don&#8217;t know who to talk to there any more.  <span id="more-3508"></span></p>
<p>Speaking of blog problems, we&#8217;ve had performance/reliability glitches here again. Melissa Bradshaw determined that the problem was an apparently activated WP Super Cache not actually caching anything. We should be OK now, so please let me know if there are further difficulties. One interesting step &#8212; it turns out that there&#8217;s <a href="http://wordpress.org/extend/plugins/sqlmon/">a WordPress plug-in that does automatic EXPLAINs</a> (if you&#8217;re the blog administrator).</p>
<p>Another interesting <a href="http://voltdb.com/blog/clarifications-cap-theorem-and-data-related-errors">Mike Stonebraker post</a> can be found (at least for now) over on the VoltDB blog. He continued his assault on the <strong>CAP Theorem, </strong>arguing that availability is an exaggerated concern when there are bug- or other human-error-driven kinds of outages, and also arguing that the concept of &#8220;partition tolerance&#8221; is misguided. Commenters pushed back, pointing out that in geographically distributed scenarios, the CAP Theorem sense of partitioning is quite a legitimate concern.</p>
<p>When I posted <a href="../2010/12/30/examples-and-definition-of-machine-generated-data/">an   expansive definition of machine-generated data</a> a few weeks ago, Daniel Abadi shot   back advocating a narrower one (see the comment thread, which includes a   link to his thoughtful post). The disagreement boils down to   conflicting intuitions as to whether the machine-data/true-human-data   ratio will keep growing rapidly, in hybrid cases such as web logs or   social gaming.</p>
<p>Dave McClure recently offered a survey of <a href="http://blog.500startups.com/2011/01/15/top-10-tech-investing-trends-for-2011/">hot startup investing themes</a>. High on his list were location-based services, which is a reminder to us all that geo-spatial data is becoming much more important. Ray Wang is savvy enough to understand <a href="http://blog.softwareinsider.org/2011/01/17/mondays-musings-why-im-unplugging-from-location-based-services-until-the-privacy-issue-is-resolved/">the privacy dangers location-based services cause</a>, but influential though Ray is, his view will probably remain in the minority. Machine-generated data and video each also make appearances on Dave&#8217;s  list.</p>
<p>And wait! I have even more links for you!  Several are taken from Thomas Houston&#8217;s  choices for <a href="http://www.switched.com/2010/12/30/best-technology-writing-of-2010/">The  Best Tech Writing of 2010</a>. He chose well. I recommend sampling his  list further.</p>
<ul>
<li>In <a href="http://www.nytimes.com/2011/01/02/business/02speed.html">an  article about new electronic exchanges</a>, the <em>New York Times</em> shared some numbers &#8212; 56% of trading volume &#8220;high speed&#8221; in stocks, 1/3  or so when looking at domestic futures, .1 milliseconds to do a NASDAQ  trade, 13 milliseconds for a trade that involves Chicago/NYC  communication, 60 milliseconds for NYC/Frankfurt. Slashdot offers <a href="http://hardware.slashdot.org/story/11/01/03/2127257/NJ-Server-Farms-Remake-the-US-Financial-Markets">photos  and other context</a>.</li>
<li>James Taylor caught up with once-hot <a href="http://jtonedm.com/2010/12/14/update-kxen/">KXEN</a>, and  evidently got the impression KXEN was focusing a lot of its efforts on  the tedious, time-consuming data-preparation side of modeling.</li>
<li><a href="http://innocuous.org/articles/2011/01/03/toddler-science-and-big-data/">Richard  Tibbetts</a> is being pretty funny on his blog.</li>
<li>(Slashdot) <a href="http://linux.slashdot.org/story/10/12/27/2025258/Putin-Orders-Russian-Move-To-GNULinux">The  Russian government seems to be getting into open source software in a  big way</a>. Well, <strong>PostgreSQL</strong> is already big in Russia (close to 1  million installations, I was once told), so this might conceivably add  some energy to its development.</li>
<li>In <a href="http://www.theregister.co.uk/2011/01/07/drupal_7_released/">Drupal  7</a>, Drupal now has &#8220;a built-in test environment, version upgrade  manager, and a database  abstraction layer for use with MariaDB, SQL  Server, MongoDB, Oracle,  MySQL, PostgreSQL, and SQLite.&#8221; That may  explain how <strong>MongoDB</strong> can hope to further penetrate the Drupal market.</li>
<li>The Boston Phoenix argues that <a href="http://thephoenix.com/Boston/news/113481-infopocalypse-the-cost-of-too-much-data/?page=1#TOPCONTENT">government  lacks the manpower, budget, and expertise to keep up with its  responsibilities in preserving and exposing information</a>. Fixing that  problem sounds like a pretty worthy open source development effort to  me.</li>
</ul>
<p>Finally:</p>
<ul>
<li>Clay Shirky reminded us that <a href="http://www.wired.com/magazine/2010/12/ff_ai_essay_airevolution/">modern    machine learning is what replaced old-style AI</a>.</li>
<li>Nominally reviewing a book he obviously disdains, Garry Kasparov &#8212;   in my opinion the most admirable world chess champion ever &#8212; <a href="http://www.nybooks.com/articles/archives/2010/feb/11/the-chess-master-and-the-computer/">surveyed   computer chess</a> in quick, nontechnical way. The whole thing is a  bit wordy even so, so I&#8217;ll quote one part:</li>
</ul>
<blockquote><p>In 2005, the online chess-playing site Playchess.com hosted  what it  called a “freestyle” chess tournament in which anyone could  compete in  teams with other players or computers. &#8230; The surprise came  at the conclusion of the event. The winner  was revealed to be not a  grandmaster with a state-of-the-art PC but a pair of  amateur American chess players  using three computers at the same time.  Their skill at manipulating and  “coaching” their computers to look very  deeply into positions  effectively counteracted the superior chess  understanding of their  grandmaster opponents and the greater  computational power of other  participants. Weak human + machine +  better process was superior to a  strong computer alone and, more  remarkably, superior to a strong human +  machine + inferior process.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/01/20/notes-links-and-comments-january-20-2010/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>dbShards &#8212; a lot like an MPP OLTP DBMS based on MySQL or PostgreSQL</title>
		<link>http://www.dbms2.com/2010/07/28/dbshards/</link>
		<comments>http://www.dbms2.com/2010/07/28/dbshards/#comments</comments>
		<pubDate>Wed, 28 Jul 2010 09:39:11 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[dbShards and CodeFutures]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2662</guid>
		<description><![CDATA[I talked yesterday w/ Cory Isaacson, who runs CodeFutures, makers of dbShards. dbShards is a software layer that turns an ordinary DBMS (currently MySQL or PostgreSQL) into an MPP shared-nothing ACID-compliant OLTP DBMS. Technical highlights included:  Despite heavy emphasis on the word “sharding,” dbShards&#8217;s scale-out is transparent to the application programmer. E.g., in dbShards + [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I talked yesterday w/ Cory Isaacson, who runs CodeFutures, makers of dbShards.  dbShards is a software layer that turns an ordinary DBMS (currently MySQL or PostgreSQL) into an MPP shared-nothing ACID-compliant OLTP DBMS. Technical highlights included:  <span id="more-2662"></span></p>
<ul>
<li>Despite heavy emphasis on the 	word “sharding,” dbShards&#8217;s scale-out is transparent to the 	application programmer. E.g., in dbShards + MySQL, the APIs are more 	or less the same ones you&#8217;d expect for MySQL (JDBC, etc.)</li>
<li>If the DBMS underneath is 	ACID-compliant (e.g., MySQL + InnoDB), then the dbShards version is 	ACID-compliant too.</li>
<li>Beyond those basics, I forgot to 	check the fine details of dbShards&#8217; MySQL (or PostgreSQL) syntax 	support. <a href="http://highscalability.com/blog/2010/6/23/product-dbshards-share-nothing-shard-everything.html">Todd 	Hoff, however, did not forget</a>.</li>
<li>dbShards keeps copies of each 	shard on two different servers, via asynchronous log-shipping. This 	allows for failover in both planned and unplanned outages.</li>
<li>dbShards wants you to distribute 	big tables among shards via a “shard key,” which is a lot like 	the distribution key in MPP analytic DBMS. You&#8217;re encouraged to 	replicate small, low-update-volume tables across each shard.</li>
<li>Cory says that dbShards has good 	join performance when – you guessed it! – everything being joined 	is co-located shard-by-shard, because the tables were distributed on 	the same shard key and/or replicated across each shard. Cory can&#8217;t 	imagine why you&#8217;d want to do an inner join under any other 	circumstances.</li>
<li>The basic dbShards query execution 	model is: A query comes in; it&#8217;s parsed; a shard key is 	automagically detected (one hopes); the “global configuration 	file” is checked to see which shard to ship the work off too. I 	forgot to ask whether lookup was done via a hash table (the obvious 	guess) or something else. The programmer can put hints in the code 	comments to direct the sharding, but Cory asserts those aren&#8217;t 	needed very often.</li>
<li>Cory says that insert performance 	with dbShards + MySQL + InnoDB is 1500-3000 inserts per shard per 	second, scaling almost linearly with the number of shards. I forgot 	to ask how many shards this had been tested for.</li>
<li>If you want blazing dbShards 	performance, Cory&#8217;s base-case figure is 25 gigabytes of data per 	node, so that the most commonly used indexes can camp out in memory. 	(I forgot to ask what kind of hardware he was assuming per node.) 	This is if you&#8217;re going to be doing joins or aggregrations. If it&#8217;s 	just single-row inserts and updates, or if your performance 	requirements are lower, you can go with 10X that figure.</li>
<li>Cory tells stories wherein going 	from an unsharded database to 4 or so shards took database 	re-indexing time down 50X or more.  Apparently, such tasks can be 	exponential or even super-exponential with database size over 	InnoDB. (That said, I&#8217;d be surprised if all large InnoDB users 	suffered from that problem to the same degree.)</li>
<li>dbShards&#8217; customer workloads are 	all &gt;= 50% reads. This is reflective of dbShards&#8217; design 	priorities.</li>
<li>As long as it can be in charge, 	dbShards is happy to interface to whatever kind of database backup 	software you want to use on a node by node basis. (dbShards wants to 	drive your backup software for you so that it can be sure the 	replicas are handled properly.)</li>
<li>It&#8217;s “fairly common” for 	dbShards to be paired with memcached. I forgot to ask whether 	memcached typically lived on its own pool of servers, or on the same 	pool that runs dbShards.</li>
<li>Future DBMS options under 	consideration for dbShards include Oracle and (unspecified) 	in-memory.</li>
</ul>
<p style="margin-bottom: 0in;">Business highlights for CodeFutures and dbShards include:</p>
<ul>
<li>dbShards&#8217; price is 	$5000/server/year, including support and OEMed MySQL, with stated 	quantity discounts up to 40%.</li>
<li>dbShards cloud pricing is 	different (on a usage basis).</li>
<li>dbShards has 6 or so customers, 	half each on-premises and in the cloud. One of them is Facebook. (Those &#8220;100s&#8221; of customers mentioned on the dbShards website are for a fairly unrelated product.)</li>
<li>CodeFutures has been at this 2 ½ 	years or so. There is no venture capital in the company.</li>
<li>Early deals dbShards deals have 	evidently involved a fair amount of professional services.</li>
<li>Counting contractors, Code Futures 	has 10-12 people, which has been as high as 15.</li>
<li>Target dbShards customers are as 	you&#8217;d expect. Cory says he&#8217;s actually been more successful getting 	early-adopter money out of Web companies than Wall Street firms.</li>
<li>There are a couple of dbShards 	PostgreSQL customers for greenfield applications. Most dbShards 	customers and prospects, however, are looking to scale out existing 	apps.</li>
<li>Despite its connection to open source DBMS, there&#8217;s nothing open source about dbShards itself.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/07/28/dbshards/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Stakeholder-facing analytics</title>
		<link>http://www.dbms2.com/2010/05/15/stakeholder-facing-analytics/</link>
		<comments>http://www.dbms2.com/2010/05/15/stakeholder-facing-analytics/#comments</comments>
		<pubDate>Sat, 15 May 2010 07:58:05 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Fox and MySpace]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2149</guid>
		<description><![CDATA[There&#8217;s a point I keep making in speeches, and used to keep making in white papers, yet have almost never spelled out in this blog. Let me now (somewhat) correct the oversight. Analytic technology isn&#8217;t only for you. It&#8217;s also for your customers, citizens, and other stakeholders. I am not referring here to what is [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s a point I keep making in speeches, and used to keep making in white papers, yet have almost never spelled out in this blog. Let me now (somewhat) correct the oversight.</p>
<p><strong>Analytic technology isn&#8217;t only for you. It&#8217;s also for your customers, citizens, and other stakeholders.</strong></p>
<p>I am <strong>not</strong> referring here to what is well understood to be an important, fast-growing activity &#8212; providing data and its analysis to customers as your primary or only business &#8212; nor to the related business of taking people&#8217;s data, crunching it for them, and giving them results. That combined sector &#8212; which I am pretty alone in aggregating into one and calling <a href="http://www.dbms2.com/category/analytics-technologies/data-mart-warehouse-outsourcing/">data mart outsourcing</a> &#8212; is one of the top several vertical markets for a lot of the analytic DBMS vendors I write about. Rather, I&#8217;m talking about enterprises that gather data for some primary purpose, and have discovered that a good <strong>secondary</strong> use of the data is to reflect it back to stakeholders, often the same ones who provided or created it in the first place.</p>
<p>For now I&#8217;ll call this category <strong>stakeholder-facing analytics,</strong> as the shorter phrase &#8220;stakeholder analytics&#8221; would be ambiguous.* I first picked up the idea early this decade from Information Builders, for whom it had become something of a specialty. I&#8217;ve been asking analytics vendors for examples of stakeholder-facing analytics ever since, and a number have been able to comply. But the whole thing is in its early days even so; almost any sufficiently large enterprise should be more active in stakeholder-facing analytics than it currently is.<br />
<span id="more-2149"></span><br />
<em>*Comments as to what the category</em> should<em> be called are welcome below.</em></p>
<p>Examples of stakeholder-facing analytics include:</p>
<ul>
<li>Enterprises report back on the business customers do with them. For example:
<ul>
<li>Credit card companies provide reports on spending back to their credit card holders, especially small businesses.</li>
<li>So do office supply retailers.</li>
<li>Brokerage firms provide reporting back to their small-institution customers.</li>
</ul>
</li>
<li>Governments expose information to their citizens online.
<ul>
<li>In an early example, New York City restaurant ratings were put online.</li>
<li><a href="http://sec.gov/edgar/searchedgar/companysearch.html">Putting SEC filings online</a> has has been a huge success.</li>
<li>The Obama Administration has committed to putting <a href="http://www.data.gov/catalog">large amounts of information</a> online.</li>
</ul>
</li>
<li>Regulated companies (such as utilities) could be required to put data online directly, without even using the government as an intermediary.</li>
<li>Some part of Fox &#8212; perhaps MySpace Music? &#8212; offers free access to a PostgreSQL extract from <a href="http://www.dbms2.com/2009/03/05/fox-interactive-medias-multi-hundred-terabyte-database-running-on-greenplum/">its Greenplum database</a> to each of its largest advertisers.</li>
<li>Google Analytics offers some basic BI for free to website owners everywhere.</li>
<li>Anybody from web hosting companies to public utilities could open their kimonos and allow their customers to track adherence to actual or implied SLAs (Service Level Agreements) in areas such as uptime, length of outage, responsiveness, and the like.</li>
</ul>
<p>So what cool examples do you have of stakeholder-facing analytics?*</p>
<p><em>*Yes, this is an invitation to drop links to case studies into the comment thread below. </em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/15/stakeholder-facing-analytics/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Notes on the evolution of OLTP database management systems</title>
		<link>http://www.dbms2.com/2010/04/05/oltp-database-management-systems-2/</link>
		<comments>http://www.dbms2.com/2010/04/05/oltp-database-management-systems-2/#comments</comments>
		<pubDate>Mon, 05 Apr 2010 08:22:03 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Akiban]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EnterpriseDB and Postgres Plus]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Mid-range]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[VoltDB and H-Store]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1841</guid>
		<description><![CDATA[The past few years have seen a spate of startups in the analytic DBMS business. Netezza, Vertica, Greenplum, Aster Data and others are all reasonably prosperous, alongside older specialty product vendors Teradata and Sybase (the Sybase IQ part).  OLTP (OnLine Transaction Processing) and general purpose DBMS startups, however, have not yet done as well, with [...]]]></description>
			<content:encoded><![CDATA[<p>The past few years have seen a spate of startups in the analytic DBMS business. Netezza, Vertica, Greenplum, Aster Data and others are all reasonably prosperous, alongside older specialty product vendors Teradata and Sybase (the Sybase IQ part).  OLTP <span style="font-weight: normal;">(OnLine Transaction Processing) </span>and general purpose DBMS startups, however, have not yet done as well, with such success as there has been (MySQL, Intersystems Cache&#8217;, solidDB&#8217;s exit, etc.) generally accruing to products that originated in the 20th Century.</p>
<p>Nonetheless, OLTP/general-purpose data management startup activity has recently picked up, targeting what I see as some very real opportunities and needs. So as a jumping-off point for further writing, I thought it might be interesting to collect a few observations about the market in one place.  These include:</p>
<ul>
<li><span style="font-weight: normal;">Big-brand 	OLTP/general-purpose DBMS have more “stickiness” 	than analytic DBMS.</span></li>
<li><span style="font-weight: normal;">By 	number, most of an enterprise&#8217;s OLTP/general-purpose databases are low-volume and 	low-value. </span></li>
<li>Most 	interesting new OLTP/general-purpose data management products are <span style="font-style: normal;">either 	MySQL-based or NoSQL.</span></li>
<li>It&#8217;s not yet 	clear whether MySQL will prevail over MySQL forks, or vice-versa, or 	whether they will co-exist.</li>
<li>The era of 	silicon-centric relational DBMS is coming.</li>
<li>The emphasis 	on scale-out and reducing the cost of joins spans the NoSQL and 	SQL-based worlds.<em> </em></li>
<li><span style="font-weight: normal;">Users&#8217; 	instance on “free” could be a major problem for OLTP DBMS 	innovation. </span></li>
</ul>
<p style="margin-bottom: 0in;">I shall explain.<span id="more-1841"></span></p>
<p style="margin-bottom: 0in;"><strong>Big-brand OLTP/general-purpose DBMS have more “stickiness” than analytic DBMS.</strong></p>
<ul>
<li>OLTP 	applications are more complex than analytic ones, and hence more 	tightly wired into particular brands of DBMS. For example, 	third-party packaged OLTP applications are typically portable among 	only a few brands of DBMS. But third-party business intelligence 	tools, and the BI “applications” built in them, are more easily 	and widely portable.</li>
<li>Specific technical observations 	such as “OLTP apps tend to use stored procedures, which are 	DBMS-specific” or “OLTP apps tend to have lots and lots of 	tables” serve to underscore the first point.</li>
<li>An enterprise&#8217;s highest-value data 	is commonly the financial stuff handled by its core OLTP systems, so 	those are the last things they want to mess around with just to get 	some cost savings. Security, high availability, and so on are major 	considerations that can outweigh cost.</li>
</ul>
<p style="margin-bottom: 0in;"><strong>By number, most of an enterprise&#8217;s OLTP/general-purpose databases are low-volume and low-value. </strong>Indeed, “OLTP” is often a misnomer, which is why I tend to go with “general-purpose” or some similarly wishy-washy phrase instead.</p>
<ul>
<li>In theory, this is a ripe area for 	what I&#8217;ve called <a href="http://www.dbms2.com/category/database-management-system/mid-range/">mid-range DBMS</a>.</li>
<li>The big brand vendors try hard to 	keep as many of those databases for themselves as they can. 	Enterprise-wide license pricing helps. Going forward, so will 	virtualization/consolidation strategies, such as <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/">Oracle&#8217;s 	Exadata-centric approach</a>.</li>
<li>A variety of mid-range DBMS 	alternatives beyond the big brands have technical merit, at least in 	some cases and configurations – MySQL, PostgreSQL, Intersystems 	Cache&#8217;, and so on.</li>
<li>The only such mid-range DBMS 	alternative with much large enterprise business momentum, however, 	appears to be MySQL.</li>
</ul>
<p style="margin-bottom: 0in;"><strong>&#8220;General-purpose&#8221; might be a better term than &#8220;OLTP&#8221; anyway.</strong></p>
<ul>
<li>I don&#8217;t have a link, but it&#8217;s widely agreed that over half of the processing on an &#8220;OLTP&#8221; enterprise app is commonly reporting and so on.</li>
<li>&#8220;Operational BI&#8221; is progressing by fits and starts, but it is progressing.</li>
<li>Anything customer-facing &#8212; web-based, call center, or otherwise &#8212; is likely to include a heavy dose of &#8220;real-time&#8221; analytic optimization.</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Most interesting new OLTP/general-purpose data management products are <span style="font-style: normal;">either MySQL-based or NoSQL.</span></strong></p>
<ul>
<li><a href="http://www.dbms2.com/2009/06/22/h-store-horizontica-voltdb/">VoltDB</a> is the main 	exception that jumps to mind.</li>
<li>This isn&#8217;t true in the analytic 	DBMS area, where Netezza, Greenplum, Aster, Vertica and others 	started from PostgreSQL&#8217;s code, APIs, or both.</li>
</ul>
<p style="margin-bottom: 0in;"><strong>It&#8217;s not yet clear whether MySQL will prevail over MySQL forks, or vice-versa, or whether they will co-exist.</strong></p>
<ul>
<li>MySQL is a limited product without 	all the third-party storage engines that are being developed.</li>
<li><a href="http://www.dbms2.com/2009/12/14/oracle-mysql-storage-engine/">Oracle&#8217;s promise of MySQL good 	behavior</a> has an expiration date.</li>
<li>None of the MySQL front-end 	alternatives are remotely mature yet.</li>
</ul>
<p style="margin-bottom: 0in;"><strong>The era of silicon-centric relational DBMS is coming.</strong></p>
<ul>
<li>I think “silicon” means 	“solid-state memory” as much as or more than it means “RAM,” 	but that&#8217;s not yet certain.</li>
<li>What is pretty certain is that, 	thanks to Moore&#8217;s Law, some kind of silicon will increasingly 	replace disk.</li>
<li><a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/">Oracle&#8217;s increasingly 	Flash-centric story</a> is a challenge to everybody.</li>
<li>RAM-centric VoltDB will launch 	fairly soon. (By the way, while VoltDB still has <a href="http://www.dbms2.com/2009/06/22/h-store-horizontica-voltdb/">a lot in common 	with H-Store</a>, they&#8217;re not exactly the same thing. And <a href="http://bit.ly/9QxjV2.">H-Store 	research</a> is progressing too.)</li>
<li><span style="font-style: normal;"><a href="http://rethinkdb.com/">RethinkDB</a> is being de</span>veloped, focused directly on solid-state memory. 	Based on the sparse information available online, RethinkDB sounds 	somewhat like a dumbed-down H-Store.</li>
<li>New disk-based vendors may never 	optimize their use of disk, instead targeting a solid-state future. 	(E.g., I think Akiban should and quite well might follow this path.)</li>
</ul>
<p style="margin-bottom: 0in; font-weight: normal;"><strong>The emphasis on scale-out and reducing the cost of joins spans the NoSQL and SQL-based worlds.</strong> We hear that from the <a href="http://www.dbms2.com/2010/03/14/nosql-taxonomy/">NoSQL</a> guys all the time. But I also just heard it from <a href="http://www.dbms2.com/2010/04/03/akiban-highlights/">Akiban</a>.</p>
<p style="margin-bottom: 0in;"><strong>Users&#8217; instance on “free” could be a major problem for OLTP DBMS innovation.</strong> Vendors of new OLTP data management technologies often feel obligated to open source their products, notwithstanding the historical lack of revenue in the open source OLTP DBMS market. As just one of many examples,  <a href="http://www.novaspivack.com/uncategorized/evri-ties-the-knot-with-twine">Nova Spivack</a> wrote:</p>
<blockquote>
<p style="margin-bottom: 0in;">I have recently seen some new graph data storage products that may provide the levels of scale and performance needed, but pricing has not been determined yet. In short, storage and retrieval of semantic graph datasets is a big unsolved challenge that is holding back the entire industry. We need federated database systems that can handle hundreds of billions to trillions of triples under high load conditions, in the cloud, on commodity hardware and open source software. Only then will it be affordable to make semantic applications and services at Web-scale.</p>
</blockquote>
<p style="margin-bottom: 0in;">I hear similar things from other startups, who evidently believe they need and/or are entitled to enjoy sophisticated, high-performance, zero-cost, specialized database management technology.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/04/05/oltp-database-management-systems-2/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Greenplum Single-Node Edition &#8212; sometimes free is a real cool price</title>
		<link>http://www.dbms2.com/2009/10/19/greenplum-free-single-node-edition/</link>
		<comments>http://www.dbms2.com/2009/10/19/greenplum-free-single-node-edition/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 13:25:41 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EnterpriseDB and Postgres Plus]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Scientific research]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1158</guid>
		<description><![CDATA[Greenplum is announcing today that you can run Greenplum software on a single 8-core commodity server, free. First and foremost, that&#8217;s a strong statement that Greenplum wants enterprises to pay it for Greenplum&#8217;s parallelization/”private cloud” capabilities. Second, it may be an attractive gift to a variety of folks who want to extract insight from terabyte-scale [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Greenplum is announcing today that you can run Greenplum software on a single 8-core commodity server, free.  First and foremost, that&#8217;s a strong statement that Greenplum wants enterprises to pay it for Greenplum&#8217;s parallelization/”<a href="../2009/06/08/the-future-of-data-marts/">private cloud</a>” capabilities. Second, it may be an attractive gift to a variety of folks who want to extract insight from terabyte-scale databases of various kinds.</p>
<p style="margin-bottom: 0in;">Greenplum Single-Node Edition:</p>
<ul>
<li>Is free of charge, although you 	can buy support.</li>
<li>Has no restrictions on use, 	production or otherwise.</li>
<li>Has no restrictions on database 	size.</li>
<li>Is closed-source.</li>
</ul>
<p style="margin-bottom: 0in;">For those who want free, terabyte-scale data warehousing software, Greenplum Single-Node Edition may be quite appealing, considering that the main available alternatives are:</p>
<ul>
<li>General-purpose open-source DBMS, 	such as PostgreSQL and MySQL (lacking analytic DBMS performance and 	features)</li>
<li>Infobright Community Edition (the 	other best choice – <a href="../2009/10/14/infobright-notes/">Infobright&#8217;s 	commercial sales success</a> indicates the solidity of Infobright&#8217;s 	technology)</li>
<li>Rough research-project code and 	other other questionable open source offerings</li>
<li>Crippleware from other commercial 	analytic DBMS vendors (e.g., <a href="../2009/10/19/teradata-partners-2009/">Teradata</a>)</li>
</ul>
<p style="margin-bottom: 0in;">For example, comparing PostgreSQL-based Greenplum with PostgreSQL itself, Greenplum offers:</p>
<ul>
<li>The ability to scale out queries 	across all cores in your box (and no, pgpool is not a serious 	alternative)</li>
<li>Storage alternatives such as 	columnar (I am told that EnterpriseDB recently stopped funding a 	project for a PostgreSQL columnar option)</li>
</ul>
<p style="margin-bottom: 0in;"><span id="more-1158"></span>Greenplum would surely also argue that its software is superior to PostgreSQL in parallel load, compression, MapReduce integration, and general fit-and-finish. I imagine that in some (perhaps not all) cases it would be right. PostgreSQL&#8217;s main technical advantages over Greenplum would probably lie in the area of datatype extensibility.</p>
<p style="margin-bottom: 0in;">The main target users for Greenplum&#8217;s Single-Node Edition are obviously <strong>individual enterprise power users or very small analytic teams.</strong> I.e., it&#8217;s people with a data mart need that a central data warehouse isn&#8217;t meeting. Potential benefits to Greenplum include:</p>
<ul>
<li>Adding value to its <a href="../2009/06/08/the-future-of-data-marts/">Enterprise 	Data Cloud</a> story</li>
<li>Seeding the market for future 	enterprise sales</li>
<li>Depriving competitors of revenue, 	perhaps at enterprises too small to ever be paying Greenplum 	customers</li>
</ul>
<p style="margin-bottom: 0in;">In addition, I see free Greenplum as a charity offering that could be appealing to <a href="http://">scientists</a> who face PostgreSQL performance limitations.</p>
<p style="margin-bottom: 0in;"><em><strong>Related links</strong></em></p>
<ul>
<li><a href="http://www.greenplum.com/news/252/388/Greenplum-Introduces-Free-Greenplum-Database-Edition-for-Data-Analysts/d,press-releases/">Greenplum 	Free Single-Node Edition press release</a> (I&#8217;m quoted)</li>
<li><a href="http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/">MySQL 	Performance blog on MonetDB and Infobright community edition</a></li>
<li><a href="http://archives.postgresql.org/pgsql-general/2009-03/msg01227.php">PostgreSQL&#8217;s 	restriction to one core per query</a></li>
<li><a href="http://www.infobright.org/Forums/viewthread/1141/">Infobright&#8217;s 	restriction to one core per query</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/19/greenplum-free-single-node-edition/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>HadoopDB</title>
		<link>http://www.dbms2.com/2009/09/13/hadoopdb/</link>
		<comments>http://www.dbms2.com/2009/09/13/hadoopdb/#comments</comments>
		<pubDate>Sun, 13 Sep 2009 04:59:39 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data types]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[Structured documents]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=890</guid>
		<description><![CDATA[Despite a thoughtful heads-up from Daniel Abadi at the time of his original posting about HadoopDB, I&#8217;m just getting around to writing about it now. HadoopDB is a research project carried out by a couple of Abadi&#8217;s students. Further research is definitely planned. But it seems too early to say that HadoopDB will ever get [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Despite a thoughtful heads-up from Daniel Abadi at the time of <a href="http://dbmsmusings.blogspot.com/2009/07/announcing-release-of-hadoopdb-longer.html">his original posting about HadoopDB</a>, I&#8217;m just getting around to writing about it now.  HadoopDB is a research project carried out by a couple of Abadi&#8217;s students.  Further research is definitely planned. But it seems too early to say that HadoopDB will ever get past the &#8220;research and oh by the way the code is open sourced&#8221; stage and become a real code line &#8212; whether commercialized, open source, or both.</p>
<p style="margin-bottom: 0in;">The basic idea of HadoopDB is to put copies of a DBMS at different nodes of a grid, and use Hadoop to parcel work among them. Major benefits when compared with massively parallel DBMS are said to be:</p>
<ul>
<li>Open/cheap/free</li>
<li><a href="http://www.dbms2.com/2009/09/13/fault-tolerant-queries/">Query fault-tolerance</a></li>
<li><span style="font-style: normal;">The 	related concept of tolerating node degradation that isn&#8217;t an 	outright node failure.</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">HadoopDB has actually been built with PostgreSQL. That version achieved performance well below that of a commercial DBMS &#8220;DBX&#8221;, where X=2. Column-store guru Abadi has repeatedly signaled his intention to try out HadoopDB with </span><a href="http://www.dbms2.com/2009/08/04/vectorwise-ingres-and-monetdb/">VectorWise</a><span style="font-style: normal;"> at the nodes instead.  (Recall that VectorWise is shared-everything.) It will be interesting to see how that configuration performs.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">The real opportunity for HadoopDB, however, in my opinion may lie elsewhere.<span id="more-890"></span> Rather than trying to compete with parallel relational DBMS, HadoopDB might do more good parallelizing more specialized kinds of database engines. How about, for example, a massively parallel XML manager to compete with MarkLogic? Or a massively parallel array processor other than the still-nascent </span><a href="http://www.dbms2.com/2009/09/12/xldb-scid/">SciDB</a>? <span style="font-style: normal;">Or, even more to the point, something that parallelizes a yet-more-specialized scientific data management engine? That kind of area is where I suspect the potential for HadoopDB really lives.</span></p>
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/09/13/hadoopdb/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>What could or should make Oracle/MySQL antitrust concerns go away?</title>
		<link>http://www.dbms2.com/2009/09/10/what-could-or-should-make-oraclemysql-antitrust-concerns-go-away/</link>
		<comments>http://www.dbms2.com/2009/09/10/what-could-or-should-make-oraclemysql-antitrust-concerns-go-away/#comments</comments>
		<pubDate>Thu, 10 Sep 2009 14:53:23 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Mid-range]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=879</guid>
		<description><![CDATA[When the Oracle/MySQL deal was first announced, I wrote: I can probably come up with business practices that could make things very hard on Oracle/MySQL competitors &#8230; but I haven’t found a compelling antitrust trigger on my first pass over the subject. Subsequently, there&#8217;s been a lot of discussion about whether or not Oracle can [...]]]></description>
			<content:encoded><![CDATA[<p>When the Oracle/MySQL deal was first announced, I <a href="http://www.dbms2.com/2009/04/20/should-the-oraclemysql-combo-face-antitrust-opposition/">wrote</a>:</p>
<blockquote><p>I can probably come up with business practices that could make things very hard on Oracle/MySQL competitors &#8230; but I haven’t found a compelling antitrust trigger on my first pass over the subject.</p></blockquote>
<p>Subsequently, there&#8217;s been <a href="http://www.dbms2.com/2009/05/15/mysql-fork-open-database-alliance-gpl/">a lot of</a> <a href="http://www.dbms2.com/2009/05/22/yet-more-on-mysql-forks-and-storage-engines/">discussion</a> about whether or not Oracle can use control of MySQL to make life difficult for third-party MySQL storage engine vendors.</p>
<p>Now that the European Commission <a href="http://www.nytimes.com/2009/09/04/technology/companies/04oracle.html">is delaying the Oracle/Sun deal, explicitly because of Oracle/MySQL antitrust fears</a>.  That is, the European Commission wants to be reassured that an Oracle takeover of MySQL won&#8217;t unduly impinge upon the future availability of open source/low cost DBMS alternatives.  This raises that natural question:</p>
<p><strong>What could Oracle do to assure concerned parties that its ownership of MySQL won&#8217;t unduly hamper open-source-based DBMS competition?</strong></p>
<p>I think that&#8217;s indeed the crucial question. The Oracle/Sun deal has enough momentum at this point that it both should and will be allowed to happen &#8212; perhaps with safeguards &#8212; rather than banned outright. <strong>If  you have concerns about Oracle&#8217;s pending acquisition of MySQL, you should speak up and outline what kinds of regulatory safeguards would alleviate the problems you foresee.</strong></p>
<p>More or less obvious possibilities include:</p>
<ul>
<li><strong>Divest MySQL.</strong> This is obviously an extreme measure, but it surely would work.</li>
<li><strong>Provide some money and trademark rights to MySQL forkers.</strong> If MariaDB and Drizzle were put into strong competitive positions with MySQL today, it&#8217;s hard to argue how regulators could object to any future Oracle maneuverings Oracle might envision with the GPLed side of MySQL.</li>
<li><strong>Offer a standard, attractive, long-term deal to MySQL bundlers. </strong>The commercial/non-GPL version of MySQL is a requirement for appliance vendors (surely), OEM vendors (probably), and storage engine vendors (maybe &#8212; I disagree, but I&#8217;m evidently in the minority).</li>
<li><strong>Strengthen PostgreSQL. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </strong> Realistically, that&#8217;s not going to be part of any Oracle/MySQL resolution, so I&#8217;ll leave it as a subject for another time.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/09/10/what-could-or-should-make-oraclemysql-antitrust-concerns-go-away/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
	</channel>
</rss>

