<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; MapReduce</title>
	<atom:link href="http://www.dbms2.com/category/parallelization/mapreduce/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Wed, 08 Feb 2012 12:22:57 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Hadoop-related market categorization</title>
		<link>http://www.dbms2.com/2012/02/07/hadoop-related-market-categorization/</link>
		<comments>http://www.dbms2.com/2012/02/07/hadoop-related-market-categorization/#comments</comments>
		<pubDate>Tue, 07 Feb 2012 06:49:30 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Open source]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5914</guid>
		<description><![CDATA[I wasn&#8217;t the only one to be dubious about Forrester Research&#8217;s Hadoop taxonomy (or lack thereof). GigaOm&#8217;s Derrick Harris was as well, and offered a much superior approach of his own. In Derrick&#8217;s view, there&#8217;s Hadoop, Hadoop distributions, Hadoop management, and Hadoop applications. Taking those out of order, and recalling that no market categorization is [...]]]></description>
			<content:encoded><![CDATA[<p>I wasn&#8217;t the only one to be <a href="http://www.dbms2.com/2012/02/06/comments-on-the-2012-forrester-wave-enterprise-hadoop-solutions/">dubious about Forrester Research&#8217;s Hadoop taxonomy</a> (or lack thereof). GigaOm&#8217;s Derrick Harris was as well, and offered <a href="http://gigaom.com/cloud/what-it-really-means-when-someone-says-hadoop/">a much superior approach of his own</a>. In Derrick&#8217;s view, there&#8217;s Hadoop, Hadoop distributions, Hadoop management, and Hadoop applications. Taking those out of order, and recalling that <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">no market categorization is ever precise</a>:</p>
<ul>
<li>&#8220;Hadoop applications&#8221; is a catch-all category. Since Derrick offered suitable caveats around the label, I&#8217;m fine with what he said.</li>
<li>Hadoop management software commonly comes in the form of suites. Derrick&#8217;s discussion was solid.</li>
<li>Derrick seems to want to define &#8220;Hadoop&#8221; as being whatever is in the relevant Apache projects. Cool. He does seem to wind up on both sides of the &#8220;MapR and DataStax put Hadoop MapReduce on top of something that isn&#8217;t HDFS &#8212; so is that Hadoop or isn&#8217;t it?&#8221; question, but that&#8217;s a tough ambiguity to avoid.</li>
<li>Derrick could have been a little clearer on the subject of Hadoop distributions.</li>
</ul>
<p>Let&#8217;s drill down into that last one. Derrick refers to Hadoop distributions as &#8220;products&#8221; that:</p>
<blockquote><p>package a set of Hadoop projects (MapReduce, Hive, Sqoop, Pig, etc.) in a  way that in theory makes them integrate more naturally, and to run both  smoothly and securely.</p></blockquote>
<p>While that&#8217;s a reasonable recitation of the idea&#8217;s benefits, I&#8217;d rather say that a &#8220;distribution&#8221; of open source software comprises:<span id="more-5914"></span></p>
<ul>
<li>Open source software, in selected versions.</li>
<li>(Possibly) additional code.</li>
<li>(Likely) documentation.</li>
<li>(Possibly) legal assurances such as intellectual property indemnification.</li>
</ul>
<p>In the case of Hadoop:</p>
<ul>
<li> The version selection is a relatively big deal. There are a lot of Hadoop sub-projects. There&#8217;s been some splitting and forking and recombination. Testing a specific set of  point releases for integration and bugs is a non-trivial user benefit.</li>
<li>The additional code is generally focused on installation or whatever, because the rest is bundled into separately identified management software. Even so, because of the large number of moving parts, this is a good thing to have.</li>
<li>What&#8217;s more, in the case of Cloudera, using a particular distribution (theirs) is a prerequisite to getting the most widely adopted Hadoop management software (also theirs), which in turn is required if you want the industry&#8217;s most widely adopted Hadoop support (ditto). Similar things are apt to be true of rival distributions.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/07/hadoop-related-market-categorization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Comments on the 2012 Forrester Wave: Enterprise Hadoop Solutions</title>
		<link>http://www.dbms2.com/2012/02/06/comments-on-the-2012-forrester-wave-enterprise-hadoop-solutions/</link>
		<comments>http://www.dbms2.com/2012/02/06/comments-on-the-2012-forrester-wave-enterprise-hadoop-solutions/#comments</comments>
		<pubDate>Mon, 06 Feb 2012 05:16:20 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[MapR]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Pentaho]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5886</guid>
		<description><![CDATA[Forrester has released its Q1 2012 Forrester Wave: Enterprise Hadoop Solutions. (Googling turns up a direct link, but in case that doesn&#8217;t prove stable, here also is a registration-required link from IBM&#8217;s Conor O&#8217;Mahony.) My comments include: The Forrester Wave&#8217;s relative vendor rankings are meaningless, in that the document compares apples, peaches, almonds, and peanuts. [...]]]></description>
			<content:encoded><![CDATA[<p>Forrester has released its Q1 2012 Forrester Wave: Enterprise Hadoop Solutions. (Googling turns up a <a href="http://www.forrester.com/rb/go?docid=60755&amp;oid=1-K07LCA&amp;action=5">direct link</a>, but in case that doesn&#8217;t prove stable, here also is <a href="http://database-diary.com/2012/02/02/get-a-free-copy-of-the-forrester-wave-for-enterprise-hadoop-solutions/">a registration-required link from IBM&#8217;s Conor O&#8217;Mahony</a>.) My comments include:</p>
<ul>
<li>The Forrester Wave&#8217;s <strong>relative vendor rankings are meaningless,</strong> in that the document compares apples, peaches, almonds, and peanuts. Apparently, it covers any vendor that includes a distribution of Apache Hadoop MapReduce into something it offers, and that offered at least two (not necessarily full production) references for same.</li>
<li>The Forrester Wave for &#8220;enterprise Hadoop&#8221; contradicts itself on the subject of Hortonworks.
<ul>
<li>The Forrester Wave for &#8220;enterprise Hadoop&#8221; is correct when it says <strong>&#8220;Hortonworks &#8230; has Hadoop training and professional services offerings that are still embryonic.&#8221;</strong></li>
</ul>
<ul>
<li>Peculiarly, the Forrester Wave for &#8220;enterprise Hadoop&#8221; also says &#8220;Hortonworks offers an impressive Hadoop professional services portfolio&#8221;. Hortonworks will likely win one or more nice partnership deals with vendors in adjacent fields, but even so its professional services capabilities are &#8230; well, a good word might be &#8220;embryonic&#8221;.</li>
</ul>
</li>
<li><a href="http://www.dbms2.com/2011/02/11/comments-on-the-2011-forrester-wave-for-enterprise-data-warehouse-platforms/">Forrester Waves always seem to have weird implicit definitions of &#8220;data warehousing&#8221;</a>. This one is no exception.</li>
<li>Forrester gave top marks in &#8220;Functionality&#8221; to 11 of 13 &#8220;enterprise Hadoop&#8221; vendors. This seems odd.</li>
<li>I don&#8217;t know why MapR, which doesn&#8217;t like HDFS (Hadoop Distributed File System), got top marks in &#8220;Subproject integration&#8221;.</li>
<li>Forrester gave top marks in &#8220;Storage&#8221; to Datameer. It also gave higher marks to MapR than to EMC Greenplum, even though EMC Greenplum&#8217;s technology is a superset of MapR&#8217;s. Very strange. <em>(Edit: Actually, as per a comment below, there is some uncertainty about the EMC/MapR relationship.)</em></li>
<li>Forrester gave higher marks in &#8220;Acceleration and optimization&#8221; to Hortonworks than to Cloudera and IBM, and higher marks yet to Pentaho. Very odd.</li>
<li>I&#8217;m not sure what Forrester is calling a &#8220;Distributed EDW file store connector&#8221;, but it sounds like something that Cloudera has provided via partnership to a number of analytic DBMS vendors.</li>
<li>Forrester&#8217;s &#8220;Strategy&#8221; rankings seem to correlate to a metric of &#8220;We&#8217;re a large enough vendor to go in N directions at once&#8221;, for various values of N.</li>
<li>Forrester is correct to rank Cloudera&#8217;s &#8220;Adoption&#8221; as being stronger than EMC/Greenplum&#8217;s or MapR&#8217;s. But Hortonworks&#8217; strong mark for &#8220;Adoption&#8221; baffles me.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/06/comments-on-the-2012-forrester-wave-enterprise-hadoop-solutions/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Notes on the Oracle Big Data Appliance</title>
		<link>http://www.dbms2.com/2012/01/10/notes-on-the-oracle-big-data-appliance/</link>
		<comments>http://www.dbms2.com/2012/01/10/notes-on-the-oracle-big-data-appliance/#comments</comments>
		<pubDate>Wed, 11 Jan 2012 01:32:39 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Pricing]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5809</guid>
		<description><![CDATA[Oracle announced its Big Data Appliance. Specs may be found in the Oracle Big Data Appliance press release. Beyond that: The most important software on the Oracle Big Data Appliance is a full set of Cloudera Enterprise code. Oracle will do Tier 1 Cloudera/Hadoop support, while Cloudera handles Tiers 2 and 3. The key spec [...]]]></description>
			<content:encoded><![CDATA[<p>Oracle announced its Big Data Appliance. Specs may be found in <a href="http://www.oracle.com/us/corporate/press/1453721">the Oracle Big Data Appliance press release</a>. Beyond that:</p>
<ul>
<li>The most important software on the Oracle Big Data Appliance is a full set of <a href="../2012/01/10/a-couple-of-links-explaining-cloudera-manager/">Cloudera Enterprise</a> code. Oracle will do Tier 1 Cloudera/Hadoop support, while Cloudera handles Tiers 2 and 3.</li>
<li>The key spec ratios are 1 core/4 GB RAM/3 TB raw disk. That&#8217;s reasonably in line with <a href="http://www.dbms2.com/2011/06/04/hardware-for-hadoop/">Cloudera figures I published in June, 2010</a>.</li>
<li>This is really Oracle&#8217;s <a href="http://www.dbms2.com/2012/01/08/big-data-terminology-and-positioning/">multi-structured big data appliance</a>. Oracle&#8217;s relational big data appliance is Exadata, which has been out for years and has comparable capacity to Oracle&#8217;s new &#8220;Big Data Appliance.&#8221; (<a href="http://www.eweek.com/c/a/IT-Infrastructure/Oracle-Launches-ClouderaPowered-Big-Data-Appliance-172364/">Chris Preimesberger</a> made a similar point.)</li>
<li>The Oracle Big Data Appliance list price is $450,000 for 18 12-core servers, plus $54,000/year maintenance.
<ul>
<li>That&#8217;s around $25,000 per server (and associated storage).</li>
<li>That&#8217;s also around $2,000/core.</li>
<li>That&#8217;s also around $500/TB of spinning disk, before <a href="http://www.dbms2.com/2011/07/06/hadoop-hardware-and-compression/">compression</a>.</li>
<li>None of those per-unit figures sounds ridiculous &#8230;</li>
<li>&#8230; but because of Oracle&#8217;s appliance configuration there&#8217;s indeed a hefty minimum initial purchase.</li>
</ul>
</li>
</ul>
<p><a href="http://www.zdnet.com/blog/btl/oracle-rolls-out-big-data-play-with-aggressive-price-cloudera/66529"><span id="more-5809"></span>Peter Goldmacher</a> argues that, because of size and price point, the Oracle Big Data appliance is targeted for high-end deployments rather than starter/test/development set-ups. To first approximation, that makes sense, in that:</p>
<ul>
<li>The Oracle Big Data Appliance is in the petabyte range for data capacity, and &#8230;</li>
<li>&#8230; <a href="http://www.dbms2.com/2011/07/06/petabyte-hadoop-clusters/">the number of petabyte-scale Hadoop deployments is in the low tens</a>, and &#8230;</li>
<li>&#8230; many of those aren&#8217;t at Oracle shops anyway.</li>
</ul>
<p>Surely the Oracle Big Data Appliance isn&#8217;t designed for the 4-8 node play-with-Hadoop crowd.</p>
<p>On the the other hand, if you&#8217;re at a big, committed Oracle shop, and you want to do your first serious Hadoop deployment, why not go with the Oracle Big Data Appliance? You probably could save money with an alternative approach &#8212; but if your employers are committed to Oracle, saving money is surely not their greatest concern. Overpay by a bit; make your management happy with the Oracle logo; get Hadoop on your resume; prosper. That seems like a winning plan all the way around.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/01/10/notes-on-the-oracle-big-data-appliance/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>A couple of links explaining Cloudera Manager</title>
		<link>http://www.dbms2.com/2012/01/10/a-couple-of-links-explaining-cloudera-manager/</link>
		<comments>http://www.dbms2.com/2012/01/10/a-couple-of-links-explaining-cloudera-manager/#comments</comments>
		<pubDate>Tue, 10 Jan 2012 22:23:22 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5798</guid>
		<description><![CDATA[Predictably, I wasn&#8217;t pre-briefed on the details of Oracle&#8217;s Big Data Appliance announcement today, and an inquiry to partner Cloudera doesn&#8217;t happen to have been immediately answered.* But anyhow, it&#8217;s clear from coverage by Larry Dignan and Derrick Harris that Oracle&#8217;s Big Data Appliance includes: Some version of Cloudera Manager (I&#8217;m guessing more or less [...]]]></description>
			<content:encoded><![CDATA[<p>Predictably, I wasn&#8217;t pre-briefed on the details of Oracle&#8217;s Big Data Appliance announcement today, and an inquiry to partner Cloudera doesn&#8217;t happen to have been immediately answered.* But anyhow, it&#8217;s clear from coverage by <a href="http://www.zdnet.com/blog/btl/oracle-rolls-out-big-data-play-with-aggressive-price-cloudera/66529">Larry Dignan</a> and <a href="http://gigaom.com/cloud/cloudera-brings-the-hadoop-to-oracles-big-data-appliance/">Derrick Harris</a> that Oracle&#8217;s Big Data Appliance includes:</p>
<ul>
<li>Some version of Cloudera Manager (I&#8217;m guessing more or less the best one).*</li>
<li>Some version of Apache Hadoop (I&#8217;m guessing the same distribution that Cloudera prefers to use).*</li>
<li>Some kind of support.</li>
</ul>
<p>In other words, it&#8217;s a lot like getting Cloudera Enterprise,* plus some hardware, plus some other stuff.</p>
<p><em>*Edit: About 2 minutes after I posted this, I got email from Cloudera CEO Mike Olson. Yes, the Oracle Big Data Appliance bundles Cloudera Enterprise.</em></p>
<p>That raises an anyway recurring question: <strong>What exactly is Cloudera Manager?</strong> <span id="more-5798"></span>When asked, I&#8217;ve always tended to mumble something like: <strong>Um, it&#8217;s management stuff. </strong>There&#8217;s an overview on <a href="http://www.cloudera.com/products-services/tools/">the Cloudera Manager product page</a>, but it doesn&#8217;t really say much, even if you click on the Data Sheet link. More helpful, I think, is <a href="http://www.cloudera.com/blog/2011/12/cloudera-manager-3-7-released/">a December post on Cloudera&#8217;s busy blog</a>. Technically, the post is about the new features in the Cloudera Manager 3.7 point release, but more generally it helps to explain what Cloudera Manager does, in areas such as (and these bullet points are all direct quotes):</p>
<ul>
<li> Automated Hadoop Deployment</li>
<li> Centralized Management</li>
<li> Configuration Management</li>
<li> Service Monitoring</li>
<li> Log Search</li>
<li> Events and Alerts</li>
<li> Configuration versioning and Audit trails</li>
<li> Activity Monitoring</li>
<li> Operational Reports</li>
</ul>
<p>Taken together,<strong> those two Cloudera links do a pretty good job of explaining Cloudera Manager, and illustrating why a Hadoop user would want to have either Cloudera Manager or a similar competitive offering.</strong></p>
<p><em>Edit: The day after I originally made this post, Cloudera put up another post <a href="http://www.cloudera.com/blog/2012/01/cloudera-manager-thank-you-customers/">directly explaining what Cloudera Manager is about</a>.<br />
</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/01/10/a-couple-of-links-explaining-cloudera-manager/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hadapt is moving forward</title>
		<link>http://www.dbms2.com/2011/11/08/hadapt-is-moving-forward/</link>
		<comments>http://www.dbms2.com/2011/11/08/hadapt-is-moving-forward/#comments</comments>
		<pubDate>Tue, 08 Nov 2011 05:40:10 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Hadapt]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5609</guid>
		<description><![CDATA[I&#8217;ve talked with my clients at Hadapt a couple of times recently. News highlights include: The Hadapt 1.0 product is going &#8220;Early Access&#8221; today. General availability of Hadapt 1.0 is targeted for an officially unspecified time frame, but it&#8217;s soon. Hadapt raised a nice round of venture capital. Hadapt added Sharmila Mulligan to the board. [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve talked with my clients at Hadapt a couple of times recently. News highlights include:</p>
<ul>
<li>The Hadapt 1.0 product is going &#8220;Early Access&#8221; today.</li>
<li>General availability of Hadapt 1.0 is targeted for an officially unspecified time frame, but it&#8217;s soon.</li>
<li>Hadapt raised a nice round of venture capital.</li>
<li>Hadapt added Sharmila Mulligan to the board.</li>
<li>Dave Kellogg is in the picture too, albeit not as involved as Sharmila.</li>
<li>Hadapt has moved the company to Cambridge, which is preferable to Yale environs for obvious reasons. (First location = space they&#8217;re borrowing from their investors at Bessemer.)</li>
<li>Headcount is in the low teens, with a target of doubling fast.</li>
</ul>
<p>The <a href="../../../../../2011/07/06/hadapt-update/">Hadapt product story</a> hasn&#8217;t changed significantly from what it was before. Specific points I can add include:   <span id="more-5609"></span></p>
<ul>
<li>With one exception to date, Hadapt beta customers have used PostgreSQL as the underlying DBMS, rather than some faster columnar system.</li>
<li>Sure, you want to process data on the nodes where it resides on the cluster. But if each copy is replicated 3X or so, that gives you good flexibility to be adaptive by deciding which of the three copies you&#8217;ll operate against.</li>
<li>In Hadapt Version 1.0, scheduling and workload management are pretty much Hadoop&#8217;s. However &#8230;</li>
<li>&#8230; an improvement in scheduling is being actively researched.</li>
<li>In general, Hadapt&#8217;s design philosophy for executing SQL is to use MapReduce to get data to the proper nodes, while using the underlying DBMS for node-specific operations such as:
<ul>
<li>Initial retrieval from disk.</li>
<li>Joins and aggregations on data residing at (or visiting) a specific node.</li>
</ul>
</li>
</ul>
<p>A very busy Daniel Abadi also took the time to walk me through how Hadapt does joins. More precisely, what we discussed about joins includes some of the last features being added to Hadapt 1.0; many of the pieces are still missing from early-access Hadapt 1.0, and some may even slip out of the Hadapt 1.0 GA version. As Dan tells it, there are five kinds of joins in Hadapt:</p>
<ul>
<li><strong>Co-partitioned join.</strong> Both tables being joined happen to be partitioned on the join key. Happy happy joy joy. The tables are joined locally on each node, with the results aggregated via MapReduce.</li>
<li><strong>Directed join</strong>. One of the tables being joined happens to be partitioned on the join key. MapReduce distributes the other table along the join key, joins happen locally, and MapReduce does the rest.</li>
<li><strong>Broadcast join.</strong> One of the tables is broadcast in its entirety to every node. Joins then happen locally, and MapReduce does the rest.</li>
<li><strong>Split semijoin. </strong>One of the tables is projected to the join key and a row ID, and then distributed via MapReduce. Joins then happen locally. Later on, the joined rows are completed with the help of a second projection on the first table. MapReduce does the rest.</li>
<li><strong>Distributed/parallel hash join. </strong>Sometimes, Hadapt indeed joins just as Hadoop/Hive would.</li>
</ul>
<p>Highlight&#8217;s of Hadapt&#8217;s performance story include:</p>
<ul>
<li>Dan contends that using a DBMS rather than HDFS (Hadoop Distributed File System) for I/O always gives a performance advantage.</li>
<li>DBMS local-node join performance can be presumed to be superior as well.</li>
<li>Of course, Dan also thinks that using a columnar DBMS would extend Hadapt&#8217;s performance advantage further, but most of the specifics of what Hadapt has told me about why they don&#8217;t routinely use a columnar DBMS yet are NDA.</li>
<li>Even beta Hadapt/PostgreSQL outperforms Hadoop/Hive by almost 10X at Hadapt&#8217;s relatively small number of beta customer sites.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/08/hadapt-is-moving-forward/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>MarkLogic&#8217;s Hadoop connector</title>
		<link>http://www.dbms2.com/2011/11/03/marklogic-hadoop-connector/</link>
		<comments>http://www.dbms2.com/2011/11/03/marklogic-hadoop-connector/#comments</comments>
		<pubDate>Fri, 04 Nov 2011 00:58:06 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[MarkLogic]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5585</guid>
		<description><![CDATA[It&#8217;s time to circle back to a subject I skipped when I otherwise wrote about MarkLogic 5: MarkLogic&#8217;s new Hadoop connector. Most of what&#8217;s confusing about the MarkLogic Hadoop Connector lies in two pairs of options it presents you: Hadoop can talk XQuery to MarkLogic. But alternatively, Hadoop can use a long-established simple(r) Java API [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s time to circle back to a subject I skipped when I otherwise wrote about <a href="http://www.dbms2.com/2011/11/01/marklogic-version-5/">MarkLogic 5</a>: MarkLogic&#8217;s new Hadoop connector.</p>
<p>Most of what&#8217;s confusing about the MarkLogic Hadoop Connector lies in two pairs of options it presents you:</p>
<ul>
<li>Hadoop can talk XQuery to MarkLogic. But alternatively, Hadoop can use a long-established simple(r) Java API for streaming documents into or out of a MarkLogic database.</li>
<li>Hadoop can make requests to MarkLogic in MarkLogic&#8217;s normal mode of operation, namely to address any node in the MarkLogic cluster, which then serves as a &#8220;head&#8221; node for the duration of that particular request. But alternatively, Hadoop can use a long-standing MarkLogic option to circumvent the whole DBMS cluster and only talk to one specific MarkLogic node.</li>
</ul>
<p>Otherwise, the whole thing is just what you would think:</p>
<ul>
<li>Hadoop can read from and write to MarkLogic, in parallel at both ends.</li>
<li>If Hadoop is just writing to MarkLogic, there&#8217;s a good chance the process is properly called &#8220;ETL.&#8221;</li>
<li>If Hadoop is reading a lot from MarkLogic, there&#8217;s a good chance the process is properly called &#8220;batch analytics.&#8221;</li>
</ul>
<p>MarkLogic said that it wrote this Hadoop connector itself.</p>
<p><span id="more-5585"></span>When I realized MarkLogic was claiming the ability to seamlessly integrate short-request and batch analytic processing, I asked about workload management. I gathered that:</p>
<ul>
<li>MarkLogic believes that MarkLogic 5 does a great job of granular workload monitoring.</li>
<li>However, MarkLogic doesn&#8217;t have a strong workload management administrative interface. Rather, you may have to do workload management programmatically.</li>
</ul>
<p>Overall, I think the MarkLogic Hadoop connector could prove pretty useful. The first question I ask somebody who wants to process relational data in Hadoop is &#8220;Why not just an analytic RDBMS?&#8221; But the natural use cases for MarkLogic are often ones in which you might as well do your analytics in Hadoop, including a 4 billion Word/PDF/image document insurance-industry example I recently encountered, and for which <a href="../../../../../2011/10/10/text-data-management-part-2-general-and-short-request/">I favor MarkLogic over MongoDB or straight Hadoop alike</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/03/marklogic-hadoop-connector/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>IBM is buying parallelization expert Platform Computing</title>
		<link>http://www.dbms2.com/2011/10/11/ibm-is-buying-parallelization-expert-platform-computing/</link>
		<comments>http://www.dbms2.com/2011/10/11/ibm-is-buying-parallelization-expert-platform-computing/#comments</comments>
		<pubDate>Tue, 11 Oct 2011 16:13:05 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Scientific research]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5473</guid>
		<description><![CDATA[IBM is acquiring Platform Computing, a company with which I had one briefing, last August. Quick background includes:  Platform Computing started ~20 years ago. Platform Computing claimed close to $100 million in revenue and &#62;500 people. (This is Platform Computing&#8217;s most famous splash to date.) Platform Computing technology underlies SAS Institute&#8217;s preferred method of parallelization, [...]]]></description>
			<content:encoded><![CDATA[<p>IBM is acquiring Platform Computing, a company with which I had one briefing, last August. Quick background includes:  <span id="more-5473"></span></p>
<ul>
<li>Platform Computing started ~20 years ago.</li>
<li>Platform Computing claimed close to $100 million in revenue and &gt;500 people.</li>
<li><strong>(This is Platform Computing&#8217;s most famous splash to date.)</strong> Platform Computing technology underlies SAS Institute&#8217;s preferred method of parallelization, which may variously be called:
<ul>
<li>SAS Grid Manager (the more or less official brand name).</li>
<li><a href="../../../../../2011/04/21/sas-hpa-does-make-sense-after-all/">SAS HPA</a> (High Performance Analytics), sort of an alternate brand name.</li>
<li>MPI (Message Passing Interface), the industry&#8217;s name for the underlying semantics/syntax/API.</li>
</ul>
</li>
<li>Platform Computing&#8217;s original business was scientific grid computing.</li>
<li>Platform Computing&#8217;s second major business was its &#8220;Symphony&#8221; product line. According to Platform Computing, Symphony:
<ul>
<li>Debuted 6-7 years ago.</li>
<li>Is more commercially oriented.</li>
<li>Is what supports SAS HPA.</li>
<li>SAS aside, has been sold to Wall Street and so on.</li>
<li>Is sometimes used in conjunction with <a href="../../../../../2011/08/25/renaming-cep-or-not/">CEP/streaming</a>, mainly for backtesting.</li>
<li>Can be used to build global (parallel) persistent memory for R.</li>
</ul>
</li>
<li><strong>(This is probably why IBM is buying Platform Computing.)</strong> Platform Computing&#8217;s has a new MapReduce offering that:
<ul>
<li>Is based on Symphony.</li>
<li>Shipped last July, except that early access was a couple months before that.</li>
<li>Is focused on:
<ul>
<li>Lowering the latency of MapReduce.</li>
<li>Consolidating multiple MapReduce use cases into one high(er)-utilization cluster.</li>
<li>Offering workload management in support of those goals.</li>
<li>Reliability, availability, predictability, puppies, kittens, and apple pie.</li>
</ul>
</li>
</ul>
</li>
<li>Is most specifically a MapReduce run-time engine, with other stuff beyond that.</li>
</ul>
<p>Unfortunately, I&#8217;m not precisely clear as to how tied this offering is to Hadoop, but using it with Hadoop is at least the base case. But Platform Computing did say:</p>
<ul>
<li>It can support multiple virtual Hadoop clusters, which can be grown or shrunk at will.</li>
<li>Non-Hadoop workloads can be mixed in.</li>
</ul>
<p>Platform Computing said that key technical benefits of this offering included:</p>
<ul>
<li><strong>1-3 seconds to start a job, vs. 40-50 in generic Hadoop.</strong></li>
<li>Automatic recovery of JobTracker nodes.</li>
<li>Failover for NameNodes.</li>
<li>Workload management that:
<ul>
<li>Manages all of CPU, I/O, and RAM (this is quickly becoming an industry standard level of capability, although I&#8217;m judging more by the standards of the analytic DBMS world).</li>
<li>Monitors but doesn&#8217;t actively manage network resources.</li>
<li>Can reprioritize jobs that are in flight. (Also an industry-standard capability.)</li>
</ul>
</li>
</ul>
<p>This conflation of scientific, commercial analytic, streaming, and MapReduce is right in IBM&#8217;s philosophical wheelhouse. I base that comment on, among other factors:</p>
<ul>
<li>How IBM positions &#8220;Big Insights&#8221;.</li>
<li>IBM&#8217;s &#8220;smart consolidation&#8221; picture/pitch (which I really should get around to posting).</li>
<li>The fuss IBM makes about Watson, Blue Gene, and so on.</li>
</ul>
<p>The IBM acquisition probably obviates a lot of Platform Computing&#8217;s previous business comments, but at the time they included:</p>
<ul>
<li>POCs (Proofs of Concept):
<ul>
<li>Mainly in financial services, government, and telecom.</li>
<li>At both existing customers and new prospects.</li>
<li>Typically running 30-50 nodes, 2-50 terabytes.* The smallest databases evidently tended to be an financial services firms.</li>
</ul>
</li>
<li>Pricing that was starting out:
<ul>
<li>Perpetual license: $3450/server, 21% annual maintenance after the first year.</li>
<li>Subscription: $2070/server annually, or $3070 with HDFS support bundled in.</li>
</ul>
</li>
</ul>
<p><em><strong>*1 terabyte or less per node</strong> is probably the lowest data-per-node figure I&#8217;ve heard for anything Hadoop-like &#8212; even below Hadapt, and well below what <a href="../../../../../2011/07/06/hadoop-hardware-and-compression/">Cloudera and Hortonworks</a> usually see.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/10/11/ibm-is-buying-parallelization-expert-platform-computing/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Cloudera versus Hortonworks</title>
		<link>http://www.dbms2.com/2011/10/04/cloudera-versus-hortonworks/</link>
		<comments>http://www.dbms2.com/2011/10/04/cloudera-versus-hortonworks/#comments</comments>
		<pubDate>Tue, 04 Oct 2011 15:50:49 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Open source]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5409</guid>
		<description><![CDATA[A few weeks ago I wrote: The other big part of Hortonworks’ story is the claim that it holds the axe in Apache Hadoop development. and &#8230; just how dominant Hortonworks really is in core Hadoop development is a bit unclear. Meanwhile, Cloudera people seem to be leading a number of Hadoop companion or sub-projects, [...]]]></description>
			<content:encoded><![CDATA[<p>A few weeks ago <a href="http://www.dbms2.com/2011/09/12/hadoop-notes/">I wrote</a>:</p>
<blockquote><p>The other big part of <a href="../2011/07/10/cloudera-and-hortonworks/">Hortonworks’ story</a> is the claim that it holds the axe in Apache Hadoop development.</p></blockquote>
<p>and</p>
<blockquote><p>&#8230; just how dominant Hortonworks really is in core Hadoop  development is a bit unclear. Meanwhile, Cloudera people seem to be  leading a number of Hadoop companion or sub-projects, including the  first two I can think of that relate to Hadoop integration or  connectivity, namely Sqoop and Flume. So I’m not persuaded that the “we  know this stuff better” part of the Hortonworks partnering story really  holds up.</p></blockquote>
<p>Now Mike Olson &#8212; CEO of my client Cloudera &#8212; has posted <a href="http://www.cloudera.com/blog/2011/10/the-community-effect/">his analysis of the matter</a>, in response to <a href="http://www.hortonworks.com/the-yahoo-effect/">an earlier Hortonworks post</a> asserting its claims. In essence, Mike argues:</p>
<ul>
<li>It&#8217;s ridiculous to say any one company, e.g. Hortonworks, has a controlling position in Hadoop development.</li>
<li>Such diversity is a Very Good Thing.</li>
<li>Cloudera folks now contribute and always have contributed to Hadoop at a higher rate than Hortonworks folks.</li>
<li>If you consider just core Hadoop projects &#8212; the most favorable way of counting from a Hadoop standpoint &#8212; Hortonworks has a lead, but not all that big of one.</li>
</ul>
<p><span id="more-5409"></span>I think Hortonworks likes to make the argument &#8220;But our contributions, on average, are more important than Cloudera&#8217;s contributions.&#8221; That claim perhaps aside, Cloudera&#8217;s argument looks persuasive.</p>
<p>Anyhow, the main bases for deciding whose enterprise support for Hadoop to buy &#8212; Cloudera&#8217;s or Hortonworks&#8217; &#8212; are probably:</p>
<ul>
<li><strong>Who is even offering it? <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </strong> Hortonworks, last I checked, wasn&#8217;t yet &#8212; Yahoo perhaps excepted &#8212; although it&#8217;s a near-term roadmap item for them to start doing so.</li>
<li><strong>Whose is better?</strong> Even when Hortonworks does offer enterprise support, it will lack experience at the support process. (To some extent, that could be worked around by providing money-losingly inefficient support at first.)</li>
<li><strong>Who bundles more useful proprietary software with their support? </strong>Unless you think the code in Cloudera Enterprise is 100% worthless, Cloudera wins that one.</li>
<li><strong>Price.</strong> I have no idea how that one will shake out.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/10/04/cloudera-versus-hortonworks/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Some notes on Hadoop (mainly) and appliances</title>
		<link>http://www.dbms2.com/2011/09/23/hadoop-appliances/</link>
		<comments>http://www.dbms2.com/2011/09/23/hadoop-appliances/#comments</comments>
		<pubDate>Fri, 23 Sep 2011 19:59:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapR]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[eBay]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5341</guid>
		<description><![CDATA[1. EMC Greenplum has evolved its appliance product line. As I read that, the latest announcement boils down to saying that you can neatly network together various Greenplum appliances in quarter-rack increments. If you take a quarter rack each of four different things, then Greenplum says &#8220;Hooray! Our appliance is all-in-one!&#8221; Big whoop. 2. That [...]]]></description>
			<content:encoded><![CDATA[<p>1. <a href="http://www.greenplum.com/products/greenplum-dca">EMC Greenplum has evolved its appliance product line</a>. As I read that, the latest announcement boils down to saying that you can neatly network together various Greenplum appliances in quarter-rack increments. If you take a quarter rack each of four different things, then Greenplum says &#8220;Hooray! Our appliance is all-in-one!&#8221; Big whoop.</p>
<p>2. That said, the Hadoop part of EMC &#8216;s story is based on MapR, which so far as I can tell is actually a pretty good Hadoop implementation. More precisely, MapR makes strong claims about performance and so on, and Apache Hadoop folks don&#8217;t reply &#8220;MapR is full of &amp;#$!&#8221; Rather, they say &#8220;We&#8217;re going to close the gap with MapR a lot faster than the MapR folks like to think &#8212; and by the way, guys, thanks for the butt-kick.&#8221; A lot more precision about MapR may be found in this <a href="http://www.slideshare.net/mcsrivas/design-scale-and-performance-of-maprs-distribution-for-hadoop">M. C. Srivas SlideShare</a>.</p>
<p>3. On its latest earnings call, Oracle clearly <a href="http://seekingalpha.com/article/294885-oracle-s-ceo-discusses-q1-2012-results-earnings-call-transcript?part=qanda">said it would introduce a Hadoop appliance</a>, versus just <a href="../../../../../2011/06/24/forthcoming-oracle-appliances/">hinting at a Hadoop appliance</a> the prior quarter. The money quote was:  <span id="more-5341"></span></p>
<blockquote><p>Finally, big data or the searching of large amounts of data using Hadoop. After Hadoop finishes filtering the data, the place you want to put that data is an Oracle Database, and that&#8217;s what a lot of our customers are doing. And we are exploiting the trend, the big data technology and the big data trend, if you prefer, by building a Hadoop appliance that attaches to the Oracle Exadata database or any Oracle Database for that matter. But you don&#8217;t have to buy our Hadoop appliance if you can use whatever servers you want running Hadoop, and we provide the interface between Hadoop and the Oracle Database.</p></blockquote>
<p>In other words, Oracle is saying &#8220;We&#8217;d like to sell you a Hadoop appliance, but you can run Hadoop in some other way and we&#8217;ll coexist with it just fine.&#8221; That makes sense; refusing to coexist with Hadoop is not exactly a realistic option.</p>
<p>4. Back in June, I expressed <a href="../../../../../2011/06/02/why-you-would-want-an-appliance-and-when-you-wouldnt/">great skepticism about the idea of a Hadoop appliance</a>. There was at least partial pushback in the comment thread from both Amr Awadallah and Eric Baldeschwieler. Oops.</p>
<p>Their reasoning seems to be centered around matters of installation, administration, and general packaging.</p>
<p>5. A month ago I noted aggressive near-term plans for <a href="../../../../../2011/08/21/hadoop-evolution/">Apache Hadoop evolution</a>. As noted above, one reason this is needed is competition from folks like MapR. Also, I note that:</p>
<ul>
<li>Three years ago, Oliver Ratzesberger&#8217;s group at eBay complained that <a href="../../../../../2008/10/15/ebay-doesnt-love-mapreduce/">CPU utilization running Hadoop was at 18%</a>.</li>
<li><a href="../../../../../2011/08/21/hadoop-evolution/#comment-241679">Now Oliver uses a figure of 10-15%.</a>, and attributes an even lower figure to &#8212; I&#8217;m guessing here &#8212; Yahoo. (Another possibility might be Facebook.)</li>
<li>In between eBay became one of the biggest and most prominent users of Hadoop.</li>
</ul>
<p>The moral of eBay&#8217;s Hadoop adventures, as I see it, is neither &#8220;Hadoop sucks!&#8221; nor &#8220;Hadoop doesn&#8217;t suck!&#8221;; rather, it&#8217;s that there&#8217;s a lot of scope for Hadoop to operate differently in the future than it does today.</p>
<p><em>Similarly, whatever throughput Yahoo does or doesn&#8217;t get, it clearly has adopted Hadoop at the expense of the <a href="../../../../../2008/05/29/yahoo-scales-web-analytics-database-petabyte/">columnar-in-Postgres</a> system it previously was so proud of.</em></p>
<p>Also, there has been a claim going around that &#8212; notwithstanding NameNode&#8217;s status as a single point of Hadoop failure &#8212;  no Hadoop installation has ever lost data due to a NameNode failure. The folks at MapR beg to differ, and sent over <a href="https://issues.apache.org/jira/browse/HDFS-1539">some</a> <a href="http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201107.mbox/%3CCAFUA3X2R_wH9GGGseUVSXVNVZQ+dBjZKDn0_pmDO8U31C05tMw@mail.gmail.com%3E">links</a> that sure seem to say the opposite.</p>
<p>6. Since we&#8217;ve just established that Hadoop will change, rapidly and pretty fundamentally, what exactly is the benefit of an appliance that is &#8220;balanced&#8221; for Hadoop usage today?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/23/hadoop-appliances/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Hadoop notes</title>
		<link>http://www.dbms2.com/2011/09/12/hadoop-notes/</link>
		<comments>http://www.dbms2.com/2011/09/12/hadoop-notes/#comments</comments>
		<pubDate>Mon, 12 Sep 2011 09:03:52 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Health care]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MapR]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5218</guid>
		<description><![CDATA[I visited California recently, and chatted with numerous companies involved in Hadoop &#8212; Cloudera, Hortonworks, MapR, DataStax, Datameer, and more. I&#8217;ll defer further Hadoop technical discussions for now &#8212; my target to restart them is later this month &#8212; but that still leaves some other issues to discuss, namely adoption and partnering. The total number [...]]]></description>
			<content:encoded><![CDATA[<p>I visited California recently, and chatted with numerous companies involved in Hadoop &#8212; Cloudera, Hortonworks, MapR, DataStax, Datameer, and more. I&#8217;ll defer further <a href="../../../../../2011/08/21/hadoop-evolution/">Hadoop technical discussions</a> for now &#8212; my target to restart them is later this month &#8212; but that still leaves some other issues to discuss, namely adoption and partnering.</p>
<p>The total number of enterprises in the world paying subscription and license fees that they would regard as being for &#8220;Hadoop or something Hadoop-related&#8221; probably is not much over 100 right now, but I&#8217;d expect to see pretty rapid growth. Beyond that, let&#8217;s divide customers into three groups:</p>
<ul>
<li>Internet businesses.</li>
<li>Traditional enterprises &#8216; internet operations.</li>
<li>Traditional enterprises&#8217; other operations.</li>
</ul>
<p>Hadoop vendors, in different mixes, claim to be doing well in all three segments. Even so, almost all use cases involve some kind of <a href="../../../../../2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a>, with one exception being a credit card vendor crunching a large database of transaction details. Multiple kinds of machine-generated data come into play &#8212; web/network/mobile device logs, financial trade data, scientific/experimental data, and more. In particular, pharmaceutical research got some mentions, which makes sense, in that it&#8217;s one area of scientific research that actually enjoys fat for-profit research budgets.</p>
<p><span id="more-5218"></span>On the partnering side, I heard things about a Hortonworks conference call that do not seem to have been contradicted by my visit to Hortonworks. Namely, Hortonworks promised prospective partners, such as analytic DBMS vendors, hardware vendors, or large system integrators, that it wouldn&#8217;t compete with them, in that Hortonworks pledges not to introduce its own products for at least two years. This is presumably targeted most directly at <a href="../../../../../2010/10/10/partnering-with-cloudera/">Cloudera</a>, which has lots of partners, but also some <a href="../../../../../2010/06/30/cloudera-enterprise-hadoop-evolution/">proprietary code</a> of its own. MapR, I&#8217;d think, would be the #2 target, but that&#8217;s just speculation.</p>
<p>The other big part of <a href="../../../../../2011/07/10/cloudera-and-hortonworks/">Hortonworks&#8217; story</a> is the claim that it holds the axe in Apache Hadoop development. Nobody doubts that a large fraction of the work on Hadoop&#8217;s core projects was done by Yahoo employees. Many of those indeed moved to Hortonworks; others left Yahoo earlier; Hadoop creator Doug Cutting is actually at Cloudera. So just how dominant Hortonworks really is in core Hadoop development is a bit unclear. Meanwhile, Cloudera people seem to be leading a number of Hadoop companion or sub-projects, including the first two I can think of that relate to Hadoop integration or connectivity, namely Sqoop and Flume. So I&#8217;m not persuaded that the &#8220;we know this stuff better&#8221; part of the Hortonworks partnering story really holds up.</p>
<p>What I am persuaded of is that the Hadoop platform competition is a good thing. Whichever vendors and projects win will be healthier from having had to outcompete worthy alternatives.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/12/hadoop-notes/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

