<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Cloudera</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/cloudera/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 09 Feb 2012 09:21:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Hadoop-related market categorization</title>
		<link>http://www.dbms2.com/2012/02/07/hadoop-related-market-categorization/</link>
		<comments>http://www.dbms2.com/2012/02/07/hadoop-related-market-categorization/#comments</comments>
		<pubDate>Tue, 07 Feb 2012 06:49:30 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Open source]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5914</guid>
		<description><![CDATA[I wasn&#8217;t the only one to be dubious about Forrester Research&#8217;s Hadoop taxonomy (or lack thereof). GigaOm&#8217;s Derrick Harris was as well, and offered a much superior approach of his own. In Derrick&#8217;s view, there&#8217;s Hadoop, Hadoop distributions, Hadoop management, and Hadoop applications. Taking those out of order, and recalling that no market categorization is [...]]]></description>
			<content:encoded><![CDATA[<p>I wasn&#8217;t the only one to be <a href="http://www.dbms2.com/2012/02/06/comments-on-the-2012-forrester-wave-enterprise-hadoop-solutions/">dubious about Forrester Research&#8217;s Hadoop taxonomy</a> (or lack thereof). GigaOm&#8217;s Derrick Harris was as well, and offered <a href="http://gigaom.com/cloud/what-it-really-means-when-someone-says-hadoop/">a much superior approach of his own</a>. In Derrick&#8217;s view, there&#8217;s Hadoop, Hadoop distributions, Hadoop management, and Hadoop applications. Taking those out of order, and recalling that <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">no market categorization is ever precise</a>:</p>
<ul>
<li>&#8220;Hadoop applications&#8221; is a catch-all category. Since Derrick offered suitable caveats around the label, I&#8217;m fine with what he said.</li>
<li>Hadoop management software commonly comes in the form of suites. Derrick&#8217;s discussion was solid.</li>
<li>Derrick seems to want to define &#8220;Hadoop&#8221; as being whatever is in the relevant Apache projects. Cool. He does seem to wind up on both sides of the &#8220;MapR and DataStax put Hadoop MapReduce on top of something that isn&#8217;t HDFS &#8212; so is that Hadoop or isn&#8217;t it?&#8221; question, but that&#8217;s a tough ambiguity to avoid.</li>
<li>Derrick could have been a little clearer on the subject of Hadoop distributions.</li>
</ul>
<p>Let&#8217;s drill down into that last one. Derrick refers to Hadoop distributions as &#8220;products&#8221; that:</p>
<blockquote><p>package a set of Hadoop projects (MapReduce, Hive, Sqoop, Pig, etc.) in a  way that in theory makes them integrate more naturally, and to run both  smoothly and securely.</p></blockquote>
<p>While that&#8217;s a reasonable recitation of the idea&#8217;s benefits, I&#8217;d rather say that a &#8220;distribution&#8221; of open source software comprises:<span id="more-5914"></span></p>
<ul>
<li>Open source software, in selected versions.</li>
<li>(Possibly) additional code.</li>
<li>(Likely) documentation.</li>
<li>(Possibly) legal assurances such as intellectual property indemnification.</li>
</ul>
<p>In the case of Hadoop:</p>
<ul>
<li> The version selection is a relatively big deal. There are a lot of Hadoop sub-projects. There&#8217;s been some splitting and forking and recombination. Testing a specific set of point releases for integration and bugs is a non-trivial user benefit.</li>
<li>The additional code is generally focused on installation or whatever, because the rest is bundled into separately identified management software. Even so, because of the large number of moving parts, this is a good thing to have.</li>
<li>What&#8217;s more, in the case of Cloudera, using a particular distribution (theirs) is a prerequisite to getting the most widely adopted Hadoop management software (also theirs), which in turn is required if you want the industry&#8217;s most widely adopted Hadoop support (ditto). Similar things are apt to be true of rival distributions.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/07/hadoop-related-market-categorization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Comments on the 2012 Forrester Wave: Enterprise Hadoop Solutions</title>
		<link>http://www.dbms2.com/2012/02/06/comments-on-the-2012-forrester-wave-enterprise-hadoop-solutions/</link>
		<comments>http://www.dbms2.com/2012/02/06/comments-on-the-2012-forrester-wave-enterprise-hadoop-solutions/#comments</comments>
		<pubDate>Mon, 06 Feb 2012 05:16:20 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[MapR]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Pentaho]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5886</guid>
		<description><![CDATA[Forrester has released its Q1 2012 Forrester Wave: Enterprise Hadoop Solutions. (Googling turns up a direct link, but in case that doesn&#8217;t prove stable, here also is a registration-required link from IBM&#8217;s Conor O&#8217;Mahony.) My comments include: The Forrester Wave&#8217;s relative vendor rankings are meaningless, in that the document compares apples, peaches, almonds, and peanuts. [...]]]></description>
			<content:encoded><![CDATA[<p>Forrester has released its Q1 2012 Forrester Wave: Enterprise Hadoop Solutions. (Googling turns up a <a href="http://www.forrester.com/rb/go?docid=60755&amp;oid=1-K07LCA&amp;action=5">direct link</a>, but in case that doesn&#8217;t prove stable, here also is <a href="http://database-diary.com/2012/02/02/get-a-free-copy-of-the-forrester-wave-for-enterprise-hadoop-solutions/">a registration-required link from IBM&#8217;s Conor O&#8217;Mahony</a>.) My comments include:</p>
<ul>
<li>The Forrester Wave&#8217;s <strong>relative vendor rankings are meaningless,</strong> in that the document compares apples, peaches, almonds, and peanuts. Apparently, it covers any vendor that includes a distribution of Apache Hadoop MapReduce into something it offers, and that offered at least two (not necessarily full production) references for same.</li>
<li>The Forrester Wave for &#8220;enterprise Hadoop&#8221; contradicts itself on the subject of Hortonworks.
<ul>
<li>The Forrester Wave for &#8220;enterprise Hadoop&#8221; is correct when it says <strong>&#8220;Hortonworks &#8230; has Hadoop training and professional services offerings that are still embryonic.&#8221;</strong></li>
</ul>
<ul>
<li>Peculiarly, the Forrester Wave for &#8220;enterprise Hadoop&#8221; also says &#8220;Hortonworks offers an impressive Hadoop professional services portfolio&#8221;. Hortonworks will likely win one or more nice partnership deals with vendors in adjacent fields, but even so its professional services capabilities are &#8230; well, a good word might be &#8220;embryonic&#8221;.</li>
</ul>
</li>
<li><a href="http://www.dbms2.com/2011/02/11/comments-on-the-2011-forrester-wave-for-enterprise-data-warehouse-platforms/">Forrester Waves always seem to have weird implicit definitions of &#8220;data warehousing&#8221;</a>. This one is no exception.</li>
<li>Forrester gave top marks in &#8220;Functionality&#8221; to 11 of 13 &#8220;enterprise Hadoop&#8221; vendors. This seems odd.</li>
<li>I don&#8217;t know why MapR, which doesn&#8217;t like HDFS (Hadoop Distributed File System), got top marks in &#8220;Subproject integration&#8221;.</li>
<li>Forrester gave top marks in &#8220;Storage&#8221; to Datameer. It also gave higher marks to MapR than to EMC Greenplum, even though EMC Greenplum&#8217;s technology is a superset of MapR&#8217;s. Very strange. <em>(Edit: Actually, as per a comment below, there is some uncertainty about the EMC/MapR relationship.)</em></li>
<li>Forrester gave higher marks in &#8220;Acceleration and optimization&#8221; to Hortonworks than to Cloudera and IBM, and higher marks yet to Pentaho. Very odd.</li>
<li>I&#8217;m not sure what Forrester is calling a &#8220;Distributed EDW file store connector&#8221;, but it sounds like something that Cloudera has provided via partnership to a number of analytic DBMS vendors.</li>
<li>Forrester&#8217;s &#8220;Strategy&#8221; rankings seem to correlate to a metric of &#8220;We&#8217;re a large enough vendor to go in N directions at once&#8221;, for various values of N.</li>
<li>Forrester is correct to rank Cloudera&#8217;s &#8220;Adoption&#8221; as being stronger than EMC/Greenplum&#8217;s or MapR&#8217;s. But Hortonworks&#8217; strong mark for &#8220;Adoption&#8221; baffles me.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/06/comments-on-the-2012-forrester-wave-enterprise-hadoop-solutions/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Notes on the Oracle Big Data Appliance</title>
		<link>http://www.dbms2.com/2012/01/10/notes-on-the-oracle-big-data-appliance/</link>
		<comments>http://www.dbms2.com/2012/01/10/notes-on-the-oracle-big-data-appliance/#comments</comments>
		<pubDate>Wed, 11 Jan 2012 01:32:39 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Pricing]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5809</guid>
		<description><![CDATA[Oracle announced its Big Data Appliance. Specs may be found in the Oracle Big Data Appliance press release. Beyond that: The most important software on the Oracle Big Data Appliance is a full set of Cloudera Enterprise code. Oracle will do Tier 1 Cloudera/Hadoop support, while Cloudera handles Tiers 2 and 3. The key spec [...]]]></description>
			<content:encoded><![CDATA[<p>Oracle announced its Big Data Appliance. Specs may be found in <a href="http://www.oracle.com/us/corporate/press/1453721">the Oracle Big Data Appliance press release</a>. Beyond that:</p>
<ul>
<li>The most important software on the Oracle Big Data Appliance is a full set of <a href="../2012/01/10/a-couple-of-links-explaining-cloudera-manager/">Cloudera Enterprise</a> code. Oracle will do Tier 1 Cloudera/Hadoop support, while Cloudera handles Tiers 2 and 3.</li>
<li>The key spec ratios are 1 core/4 GB RAM/3 TB raw disk. That&#8217;s reasonably in line with <a href="http://www.dbms2.com/2011/06/04/hardware-for-hadoop/">Cloudera figures I published in June, 2010</a>.</li>
<li>This is really Oracle&#8217;s <a href="http://www.dbms2.com/2012/01/08/big-data-terminology-and-positioning/">multi-structured big data appliance</a>. Oracle&#8217;s relational big data appliance is Exadata, which has been out for years and has comparable capacity to Oracle&#8217;s new &#8220;Big Data Appliance.&#8221; (<a href="http://www.eweek.com/c/a/IT-Infrastructure/Oracle-Launches-ClouderaPowered-Big-Data-Appliance-172364/">Chris Preimesberger</a> made a similar point.)</li>
<li>The Oracle Big Data Appliance list price is $450,000 for 18 12-core servers, plus $54,000/year maintenance.
<ul>
<li>That&#8217;s around $25,000 per server (and associated storage).</li>
<li>That&#8217;s also around $2,000/core.</li>
<li>That&#8217;s also around $500/TB of spinning disk, before <a href="http://www.dbms2.com/2011/07/06/hadoop-hardware-and-compression/">compression</a>.</li>
<li>None of those per-unit figures sounds ridiculous &#8230;</li>
<li>&#8230; but because of Oracle&#8217;s appliance configuration there&#8217;s indeed a hefty minimum initial purchase.</li>
</ul>
</li>
</ul>
<p><a href="http://www.zdnet.com/blog/btl/oracle-rolls-out-big-data-play-with-aggressive-price-cloudera/66529"><span id="more-5809"></span>Peter Goldmacher</a> argues that, because of size and price point, the Oracle Big Data appliance is targeted for high-end deployments rather than starter/test/development set-ups. To first approximation, that makes sense, in that:</p>
<ul>
<li>The Oracle Big Data Appliance is in the petabyte range for data capacity, and &#8230;</li>
<li>&#8230; <a href="http://www.dbms2.com/2011/07/06/petabyte-hadoop-clusters/">the number of petabyte-scale Hadoop deployments is in the low tens</a>, and &#8230;</li>
<li>&#8230; many of those aren&#8217;t at Oracle shops anyway.</li>
</ul>
<p>Surely the Oracle Big Data Appliance isn&#8217;t designed for the 4-8 node play-with-Hadoop crowd.</p>
<p>On the the other hand, if you&#8217;re at a big, committed Oracle shop, and you want to do your first serious Hadoop deployment, why not go with the Oracle Big Data Appliance? You probably could save money with an alternative approach &#8212; but if your employers are committed to Oracle, saving money is surely not their greatest concern. Overpay by a bit; make your management happy with the Oracle logo; get Hadoop on your resume; prosper. That seems like a winning plan all the way around.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/01/10/notes-on-the-oracle-big-data-appliance/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>A couple of links explaining Cloudera Manager</title>
		<link>http://www.dbms2.com/2012/01/10/a-couple-of-links-explaining-cloudera-manager/</link>
		<comments>http://www.dbms2.com/2012/01/10/a-couple-of-links-explaining-cloudera-manager/#comments</comments>
		<pubDate>Tue, 10 Jan 2012 22:23:22 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5798</guid>
		<description><![CDATA[Predictably, I wasn&#8217;t pre-briefed on the details of Oracle&#8217;s Big Data Appliance announcement today, and an inquiry to partner Cloudera doesn&#8217;t happen to have been immediately answered.* But anyhow, it&#8217;s clear from coverage by Larry Dignan and Derrick Harris that Oracle&#8217;s Big Data Appliance includes: Some version of Cloudera Manager (I&#8217;m guessing more or less [...]]]></description>
			<content:encoded><![CDATA[<p>Predictably, I wasn&#8217;t pre-briefed on the details of Oracle&#8217;s Big Data Appliance announcement today, and an inquiry to partner Cloudera doesn&#8217;t happen to have been immediately answered.* But anyhow, it&#8217;s clear from coverage by <a href="http://www.zdnet.com/blog/btl/oracle-rolls-out-big-data-play-with-aggressive-price-cloudera/66529">Larry Dignan</a> and <a href="http://gigaom.com/cloud/cloudera-brings-the-hadoop-to-oracles-big-data-appliance/">Derrick Harris</a> that Oracle&#8217;s Big Data Appliance includes:</p>
<ul>
<li>Some version of Cloudera Manager (I&#8217;m guessing more or less the best one).*</li>
<li>Some version of Apache Hadoop (I&#8217;m guessing the same distribution that Cloudera prefers to use).*</li>
<li>Some kind of support.</li>
</ul>
<p>In other words, it&#8217;s a lot like getting Cloudera Enterprise,* plus some hardware, plus some other stuff.</p>
<p><em>*Edit: About 2 minutes after I posted this, I got email from Cloudera CEO Mike Olson. Yes, the Oracle Big Data Appliance bundles Cloudera Enterprise.</em></p>
<p>That raises an anyway recurring question: <strong>What exactly is Cloudera Manager?</strong> <span id="more-5798"></span>When asked, I&#8217;ve always tended to mumble something like: <strong>Um, it&#8217;s management stuff. </strong>There&#8217;s an overview on <a href="http://www.cloudera.com/products-services/tools/">the Cloudera Manager product page</a>, but it doesn&#8217;t really say much, even if you click on the Data Sheet link. More helpful, I think, is <a href="http://www.cloudera.com/blog/2011/12/cloudera-manager-3-7-released/">a December post on Cloudera&#8217;s busy blog</a>. Technically, the post is about the new features in the Cloudera Manager 3.7 point release, but more generally it helps to explain what Cloudera Manager does, in areas such as (and these bullet points are all direct quotes):</p>
<ul>
<li> Automated Hadoop Deployment</li>
<li> Centralized Management</li>
<li> Configuration Management</li>
<li> Service Monitoring</li>
<li> Log Search</li>
<li> Events and Alerts</li>
<li> Configuration versioning and Audit trails</li>
<li> Activity Monitoring</li>
<li> Operational Reports</li>
</ul>
<p>Taken together,<strong> those two Cloudera links do a pretty good job of explaining Cloudera Manager, and illustrating why a Hadoop user would want to have either Cloudera Manager or a similar competitive offering.</strong></p>
<p><em>Edit: The day after I originally made this post, Cloudera put up another post <a href="http://www.cloudera.com/blog/2012/01/cloudera-manager-thank-you-customers/">directly explaining what Cloudera Manager is about</a>.<br />
</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/01/10/a-couple-of-links-explaining-cloudera-manager/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NoSQL notes</title>
		<link>http://www.dbms2.com/2011/10/23/nosql-notes/</link>
		<comments>http://www.dbms2.com/2011/10/23/nosql-notes/#comments</comments>
		<pubDate>Mon, 24 Oct 2011 04:20:27 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Basho and Riak]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[MongoDB and 10gen]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Parallelization]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5522</guid>
		<description><![CDATA[Last week I visited with James Phillips of Couchbase, Max Schireson and Eliot Horowitz of 10gen, and Todd Lipcon, Eric Sammer, and Omer Trajman of Cloudera. I guess it&#8217;s time for a round-up NoSQL post. Views of the NoSQL market horse race are reasonably consistent, with perhaps some elements of “Where you stand depends upon [...]]]></description>
			<content:encoded><![CDATA[<p>Last week I visited with James Phillips of Couchbase, Max Schireson and Eliot Horowitz of 10gen, and Todd Lipcon, Eric Sammer, and Omer Trajman of Cloudera. I guess it&#8217;s time for a round-up NoSQL post. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Views of the NoSQL market horse race are reasonably consistent, with perhaps some elements of “Where you stand depends upon where you sit.”</p>
<ul>
<li>As      James tells it, NoSQL is simply a three-horse race between Couchbase,      MongoDB, and Cassandra.</li>
<li>Max      would include HBase on the list.</li>
<li>Further,      Max pointed out that metrics such as job listings suggest MongoDB has the      most development activity, and Couchbase/Membase/CouchDB perhaps have      less.</li>
<li>The Cloudera      guys remarked on some serious HBase adopters.*</li>
<li>Everybody      I spoke with agreed that Riak had little current market presence, although      some Basho guys could surely be found who&#8217;d disagree.</li>
</ul>
<p><span id="more-5522"></span><em>*I hope to do a separate post on HBase adoption soon. In connection with that, any info on HBase adoption by Facebook (said to be very heavy), Twitter, et al. would be much appreciated.</em></p>
<p>The reasons for using NoSQL of course are, in some order, <a href="../../../../../2011/07/31/dynamic-fixed-schema-databases/">dynamic schemas</a>, scale-out, and open source. <a href="http://www.dbms2.com/2011/10/23/transparent-relational-oltp-scale-out/">I find the scale-out argument somewhat bogus</a>,* but the data model one is very real. Depending on whom you talk with, the most important point about dynamic schemas may actually be that they’re changeable, or it may just be that you don’t have to specify a schema at the time of initial application design. MongoDB gets particular praise as a good platform on which to throw something together quickly, although predictions as to how far the application will then scale may differ depending on whether you’re talking with, say, Max or Todd.</p>
<p><em>*It’s fair to say that NoSQL systems are more proven in scale-out than most relational DBMS. Even so, I would cringe at any line of reasoning that concluded one should adopt NoSQL because it is more mature than relational alternatives.</em></p>
<p>Finally, I was perhaps too extreme when <a href="../../../../../2011/10/20/more-notes-on-oracle-nosql/">I suggested there was no good reason for Oracle to have adopted the major key/minor key approach it took in its NoSQL offering</a>. Todd offered a reason why that approach – which he characterized as similar to Project Voldemort’s – could make sense:</p>
<ul>
<li>If you      have some kind of global secondary index, it’s hard to maintain that index      consistently without what amounts to distributed transactions.</li>
<li>If you      want to avoid the overhead of those, one alternative is a column-group      system such as HBase or Cassandra. Those have no indexes at all, except in      the sense that a column is its own index.</li>
<li>Another      alternative is to load as much indexing information as you can into the      key of a key-value store.</li>
</ul>
<p>I’d be interested to learn about the Couchbase and MongoDB answers to that challenge.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/10/23/nosql-notes/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Cloudera versus Hortonworks</title>
		<link>http://www.dbms2.com/2011/10/04/cloudera-versus-hortonworks/</link>
		<comments>http://www.dbms2.com/2011/10/04/cloudera-versus-hortonworks/#comments</comments>
		<pubDate>Tue, 04 Oct 2011 15:50:49 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Open source]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5409</guid>
		<description><![CDATA[A few weeks ago I wrote: The other big part of Hortonworks’ story is the claim that it holds the axe in Apache Hadoop development. and &#8230; just how dominant Hortonworks really is in core Hadoop development is a bit unclear. Meanwhile, Cloudera people seem to be leading a number of Hadoop companion or sub-projects, [...]]]></description>
			<content:encoded><![CDATA[<p>A few weeks ago <a href="http://www.dbms2.com/2011/09/12/hadoop-notes/">I wrote</a>:</p>
<blockquote><p>The other big part of <a href="../2011/07/10/cloudera-and-hortonworks/">Hortonworks’ story</a> is the claim that it holds the axe in Apache Hadoop development.</p></blockquote>
<p>and</p>
<blockquote><p>&#8230; just how dominant Hortonworks really is in core Hadoop  development is a bit unclear. Meanwhile, Cloudera people seem to be  leading a number of Hadoop companion or sub-projects, including the  first two I can think of that relate to Hadoop integration or  connectivity, namely Sqoop and Flume. So I’m not persuaded that the “we  know this stuff better” part of the Hortonworks partnering story really  holds up.</p></blockquote>
<p>Now Mike Olson &#8212; CEO of my client Cloudera &#8212; has posted <a href="http://www.cloudera.com/blog/2011/10/the-community-effect/">his analysis of the matter</a>, in response to <a href="http://www.hortonworks.com/the-yahoo-effect/">an earlier Hortonworks post</a> asserting its claims. In essence, Mike argues:</p>
<ul>
<li>It&#8217;s ridiculous to say any one company, e.g. Hortonworks, has a controlling position in Hadoop development.</li>
<li>Such diversity is a Very Good Thing.</li>
<li>Cloudera folks now contribute and always have contributed to Hadoop at a higher rate than Hortonworks folks.</li>
<li>If you consider just core Hadoop projects &#8212; the most favorable way of counting from a Hadoop standpoint &#8212; Hortonworks has a lead, but not all that big of one.</li>
</ul>
<p><span id="more-5409"></span>I think Hortonworks likes to make the argument &#8220;But our contributions, on average, are more important than Cloudera&#8217;s contributions.&#8221; That claim perhaps aside, Cloudera&#8217;s argument looks persuasive.</p>
<p>Anyhow, the main bases for deciding whose enterprise support for Hadoop to buy &#8212; Cloudera&#8217;s or Hortonworks&#8217; &#8212; are probably:</p>
<ul>
<li><strong>Who is even offering it? <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </strong> Hortonworks, last I checked, wasn&#8217;t yet &#8212; Yahoo perhaps excepted &#8212; although it&#8217;s a near-term roadmap item for them to start doing so.</li>
<li><strong>Whose is better?</strong> Even when Hortonworks does offer enterprise support, it will lack experience at the support process. (To some extent, that could be worked around by providing money-losingly inefficient support at first.)</li>
<li><strong>Who bundles more useful proprietary software with their support? </strong>Unless you think the code in Cloudera Enterprise is 100% worthless, Cloudera wins that one.</li>
<li><strong>Price.</strong> I have no idea how that one will shake out.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/10/04/cloudera-versus-hortonworks/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Hadoop notes</title>
		<link>http://www.dbms2.com/2011/09/12/hadoop-notes/</link>
		<comments>http://www.dbms2.com/2011/09/12/hadoop-notes/#comments</comments>
		<pubDate>Mon, 12 Sep 2011 09:03:52 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Health care]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MapR]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5218</guid>
		<description><![CDATA[I visited California recently, and chatted with numerous companies involved in Hadoop &#8212; Cloudera, Hortonworks, MapR, DataStax, Datameer, and more. I&#8217;ll defer further Hadoop technical discussions for now &#8212; my target to restart them is later this month &#8212; but that still leaves some other issues to discuss, namely adoption and partnering. The total number [...]]]></description>
			<content:encoded><![CDATA[<p>I visited California recently, and chatted with numerous companies involved in Hadoop &#8212; Cloudera, Hortonworks, MapR, DataStax, Datameer, and more. I&#8217;ll defer further <a href="../../../../../2011/08/21/hadoop-evolution/">Hadoop technical discussions</a> for now &#8212; my target to restart them is later this month &#8212; but that still leaves some other issues to discuss, namely adoption and partnering.</p>
<p>The total number of enterprises in the world paying subscription and license fees that they would regard as being for &#8220;Hadoop or something Hadoop-related&#8221; probably is not much over 100 right now, but I&#8217;d expect to see pretty rapid growth. Beyond that, let&#8217;s divide customers into three groups:</p>
<ul>
<li>Internet businesses.</li>
<li>Traditional enterprises &#8216; internet operations.</li>
<li>Traditional enterprises&#8217; other operations.</li>
</ul>
<p>Hadoop vendors, in different mixes, claim to be doing well in all three segments. Even so, almost all use cases involve some kind of <a href="../../../../../2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a>, with one exception being a credit card vendor crunching a large database of transaction details. Multiple kinds of machine-generated data come into play &#8212; web/network/mobile device logs, financial trade data, scientific/experimental data, and more. In particular, pharmaceutical research got some mentions, which makes sense, in that it&#8217;s one area of scientific research that actually enjoys fat for-profit research budgets.</p>
<p><span id="more-5218"></span>On the partnering side, I heard things about a Hortonworks conference call that do not seem to have been contradicted by my visit to Hortonworks. Namely, Hortonworks promised prospective partners, such as analytic DBMS vendors, hardware vendors, or large system integrators, that it wouldn&#8217;t compete with them, in that Hortonworks pledges not to introduce its own products for at least two years. This is presumably targeted most directly at <a href="../../../../../2010/10/10/partnering-with-cloudera/">Cloudera</a>, which has lots of partners, but also some <a href="../../../../../2010/06/30/cloudera-enterprise-hadoop-evolution/">proprietary code</a> of its own. MapR, I&#8217;d think, would be the #2 target, but that&#8217;s just speculation.</p>
<p>The other big part of <a href="../../../../../2011/07/10/cloudera-and-hortonworks/">Hortonworks&#8217; story</a> is the claim that it holds the axe in Apache Hadoop development. Nobody doubts that a large fraction of the work on Hadoop&#8217;s core projects was done by Yahoo employees. Many of those indeed moved to Hortonworks; others left Yahoo earlier; Hadoop creator Doug Cutting is actually at Cloudera. So just how dominant Hortonworks really is in core Hadoop development is a bit unclear. Meanwhile, Cloudera people seem to be leading a number of Hadoop companion or sub-projects, including the first two I can think of that relate to Hadoop integration or connectivity, namely Sqoop and Flume. So I&#8217;m not persuaded that the &#8220;we know this stuff better&#8221; part of the Hortonworks partnering story really holds up.</p>
<p>What I am persuaded of is that the Hadoop platform competition is a good thing. Whichever vendors and projects win will be healthier from having had to outcompete worthy alternatives.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/12/hadoop-notes/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>HBase is not broken</title>
		<link>http://www.dbms2.com/2011/07/18/hbase-is-not-broken/</link>
		<comments>http://www.dbms2.com/2011/07/18/hbase-is-not-broken/#comments</comments>
		<pubDate>Mon, 18 Jul 2011 05:25:27 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4990</guid>
		<description><![CDATA[It turns out that my impression that HBase is broken was unfounded, in at least two ways. The smaller is that something wrong with the HBase/Hadoop interface or Hadoop&#8217;s HBase support cannot necessarily be said to be wrong with HBase (especially since HBase is no longer a Hadoop subproject). The bigger reason is that, according [...]]]></description>
			<content:encoded><![CDATA[<p>It turns out that my impression that <a href="http://www.dbms2.com/2011/07/10/hadoop-futures-and-enhancements/">HBase is broken</a> was unfounded, in at least two ways. The smaller is that something wrong with the HBase/Hadoop interface or Hadoop&#8217;s HBase support cannot necessarily be said to be wrong with HBase (especially since HBase is no longer a Hadoop subproject). The bigger reason is that, according to consensus, <strong>HBase has worked pretty well since the .90 release</strong> in January of this year.</p>
<p>After Michael Stack of StumbleUpon beat me up for a while,* Omer Trajman of Cloudera was kind enough to walk me through HBase usage. He is informed largely by 18 Cloudera customers using, plus a handful of other well-known HBase users such as Facebook, StumbleUpon, and Yahoo. Of the 18 Cloudera customers Omer was thinking of, 15 are in HBase production, one is in HBase &#8220;early production&#8221;, one is still doing R&amp;D in the area of HBase, and one is a classified government customer not providing such details.<span id="more-4990"></span></p>
<p><em>*Just kidding &#8212; he was actually extremely gentle.</em></p>
<p>In the use cases that Omer offered, what&#8217;s stored in HBase is almost always <strong>records of web or network activity. </strong>Specific examples included clickstream information (at 5 different ad companies), crash reports (at Mozilla), and messages (at Facebook). Sometimes the data gets into Hadoop twice &#8212; once excerpted via HBase and once as part of a full log &#8212; and may even live in two different Hadoop clusters.</p>
<p>What&#8217;s served out from HBase in Omer&#8217;s examples is usually <a href="../../../../../2011/06/19/investigative-analytics-derived-data/">derived data</a>, such as a user profile, an ad selection, a text index, etc. That makes sense, not least because if you&#8217;re going to keep enhancing your data, schema-free programming &#8212; which HBase offers &#8212; looks ever more appealing. Omer further said that there are a growing number of cases in which HBase is being used to serve up reference data for batch MapReduce jobs, but he didn&#8217;t have specifics. A counterexample to the derived data emphasis would be, if I understood correctly, a case where HBase manages shopping carts.</p>
<p>I haven&#8217;t put much effort into unearthing open source or other third-party HBase-based projects, but two examples are Open  TSDB  (Time Series DataBase) and Lily CMS (Content Management Systems). <em>(Edit: But see the comment about Lily below.)</em></p>
<p>Omer is perhaps my top go-to guy on <a href="../../../../../2011/07/06/petabyte-hadoop-clusters/">database and cluster sizes</a>, so of course I asked him for HBase metrics as well. He responded (approximately) that Cloudera HBase customer installations average 20-30 nodes, but that half a dozen are in the 100-200 node range.</p>
<p>Finally, there&#8217;s the matter of latency. As a general rule, the HBase users Omer sees are using HBase with at least several minutes latency. (Again , that shopping cart case would seem to be a counterexample.) So, for example, the data recorded when you click on a page isn&#8217;t immediately applied toward tweaking your profile to determine which ad you&#8217;ll see next &#8212; but it might come into play after you spend a few minutes reading the page you&#8217;re on. Naturally, Omer knows of efforts to use HBase with lower latency yet, and I won&#8217;t be surprised if already-working examples of low-latency HBase show up in the comment thread to this post.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/18/hbase-is-not-broken/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Hadoop futures and enhancements</title>
		<link>http://www.dbms2.com/2011/07/10/hadoop-futures-and-enhancements/</link>
		<comments>http://www.dbms2.com/2011/07/10/hadoop-futures-and-enhancements/#comments</comments>
		<pubDate>Mon, 11 Jul 2011 03:14:24 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[Hadapt]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapR]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Zettaset]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4944</guid>
		<description><![CDATA[Hadoop is immature technology. As such, it naturally offers much room for improvement in both industrial-strengthness and performance. And since Hadoop is booming, multiple efforts are underway to fill those gaps. For example: Cloudera&#8217;s proprietary code is focused on management, set-up, etc. The &#8220;Phase 1&#8243; plans Hortonworks shared with me for Apache Hadoop are focused [...]]]></description>
			<content:encoded><![CDATA[<p>Hadoop is immature technology. As such, it naturally offers much room for improvement in both <strong>industrial-strengthness</strong> and <strong>performance.</strong> And since Hadoop is booming, multiple efforts are underway to fill those gaps. For example:</p>
<ul>
<li>Cloudera&#8217;s proprietary code is focused on management, set-up, etc.</li>
<li>The &#8220;Phase 1&#8243; plans Hortonworks shared with me for Apache Hadoop are focused on industrial-strengthness, as are significant parts of &#8220;Phase 2&#8243;.*</li>
<li>MapR tells a performance story versus generic Apache Hadoop HDFS and MapReduce. (One aspect of same is just C++ vs. Java.)</li>
<li>So does <a href="../../../../../2011/07/06/hadapt-update/">Hadapt</a>, but mainly vs. Hive.</li>
<li>Cloudera also tells me there&#8217;s a potential 4-5X performance improvement in Hive coming down the pike from what amounts to an optimizer rewrite.</li>
</ul>
<p>(Zettaset belongs in the discussion too, but made an unfortunate choice of embargo date.)</p>
<p><span id="more-4944"></span><em>*Hortonworks, <a href="http://www.dbms2.com/2011/07/10/cloudera-and-hortonworks/">a new Hadoop company spun out of Yahoo</a>,</em><em> graciously permitted me to post a <a href="http://www.monash.com/uploads/Hortonworks-Apache-Hadoop-July-2011.pptx">slide deck</a> outlining an Apache Hadoop roadmap. Phase 1 refers to stuff that is underway more or less now. Phase 2 is scheduled for alpha in October, 2011, with production availability not too late in 2012.</em></p>
<p>You&#8217;ve probably heard some <strong>single point of failure</strong> fuss. Hadoop NameNodes can crash, which wouldn&#8217;t cause data loss, but would shut down the cluster for a little while. It&#8217;s hard to come up with real-life stories in which this has been a problem; still, it&#8217;s something that should be fixed, and everybody (including the Apache Hadoop folks, as part of Phase 2) has a favored solution. A more serious problem is that Hadoop is currently bad for <strong>small updates,</strong> because:</p>
<ul>
<li>Hadoop&#8217;s fundamental paradigm assumes batch processing.</li>
<li>Both major workarounds to allow small updates are broken:
<ul>
<li>HBase is seriously buggy, to the point that it sometimes loses data.</li>
<li>Storing each update in a separate file runs afoul of a practical limit of 70-100 million files.</li>
</ul>
</li>
</ul>
<p><strong>File-count limits</strong> also get blamed for a second problem, in that there may not be enough intermediate files allowed for your Reduce steps, necessitating awkward and perhaps poorly-performing MapReduce workarounds. Anyhow, the Phase 2 Apache Hadoop roadmap features a serious <strong>HBase rewrite.</strong> I&#8217;m less clear as to where things stand with respect to file-count limits.</p>
<p><em>Edits: As per the comments below, I should perhaps have referred to HBase&#8217;s HDFS underpinnings rather than HBase itself. Anyhow, some details are in the slides. Please also see my follow-up post on <a href="http://www.dbms2.com/2011/07/18/hbase-is-not-broken/">how well HBase is indeed doing</a>.<br />
</em></p>
<p>The other big area for Hadoop improvement is <strong>modularity, pluggability, and coexistence</strong>, on both the <strong>storage</strong> and <strong>application execution</strong> tiers. For example:</p>
<ul>
<li>Greenplum/MapR and Hadapt both think you should have HDFS file management and relational DBMS coexisting on the same storage nodes. (I agree.)</li>
<li>Part of what Hortonworks calls &#8220;Phase 2&#8243; sets out to ensure that Hadoop can properly manage <a href="../2010/08/16/vertica-flash-temp-space/">temp space</a> and so on next to HDFS.</li>
<li>Perhaps HBase won&#8217;t always assume HDFS.</li>
<li>DataStax thinks you should <a href="http://www.dbms2.com/2011/03/23/datastax-cassandrafs-hadoop-brisk/">blend HDFS and Cassandra</a>.</li>
</ul>
<p>Meanwhile, Pig and Hive need to come closer together. Often you want to stream data into Hadoop. The argument that <a href="http://www.dbms2.com/2011/04/21/sas-hpa-does-make-sense-after-all/">MPI trumps MapReduce</a> does, in certain use cases, make sense. Apache Hadoop &#8220;Phase 2&#8243; and beyond are charted to accommodate some of those possibilities too.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/10/hadoop-futures-and-enhancements/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Cloudera and Hortonworks</title>
		<link>http://www.dbms2.com/2011/07/10/cloudera-and-hortonworks/</link>
		<comments>http://www.dbms2.com/2011/07/10/cloudera-and-hortonworks/#comments</comments>
		<pubDate>Mon, 11 Jul 2011 03:13:36 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4939</guid>
		<description><![CDATA[My clients at Cloudera have been around for a while, in effect positioned as &#8220;the Hadoop company.&#8221; Their business, in a nutshell, consists of: Packaging up a Cloudera distribution of Apache Hadoop. This distribution doesn&#8217;t have proprietary code; it&#8217;s just packaged by Cloudera from Apache projects (with a decent minority of the code happening to [...]]]></description>
			<content:encoded><![CDATA[<p>My clients at Cloudera have been around for a while, in effect positioned as &#8220;the Hadoop company.&#8221; Their business, in a nutshell, consists of:</p>
<ul>
<li>Packaging up <strong>a Cloudera distribution of Apache Hadoop.</strong> This distribution doesn&#8217;t have proprietary code; it&#8217;s just packaged by Cloudera from Apache projects (with a decent minority of the code happening to have been contributed by Cloudera engineers).</li>
<li>Paid subscription <strong>support for Apache Hadoop</strong> and, in connection with that &#8230;</li>
<li>&#8230;  <strong>proprietary software</strong> that all support customers automatically get. There are two points to this proprietary software:
<ul>
<li>It adds value for the customer.</li>
<li>It makes Cloudera&#8217;s support job easier.</li>
</ul>
</li>
<li><strong>Professional services</strong> around Hadoop.</li>
<li><strong>Training and conferences</strong> around Hadoop, which probably don&#8217;t generate all that much money, but are great marketing in terms of visibility, thought leadership, and lead generation.</li>
</ul>
<p><strong>Hortonworks</strong> spun out of Yahoo last week, with parts of the Cloudera business model, namely Hadoop support, training, and I guess conferences. Hortonworks emphatically rules out professional services, and says that it will contribute all code back to Apache Hadoop. Hortonworks does grudgingly admit that it might get into the proprietary software business at some point &#8212; but evidently hopes that day will never actually come.</p>
<p><span id="more-4939"></span>Hortonworks&#8217; two main initial marketing messages &#8212; and there&#8217;s some synergy between these &#8212; boil down to:</p>
<ul>
<li>Open source purism</li>
<li>&#8220;We have most of the Hadoop developers, so we&#8217;re better&#8221;*</li>
</ul>
<p>Frankly, the open source purism part sounds like doubletalk to me, in that Hortonworks has trouble articulating what supposedly-less-pure Cloudera does wrong that Hortonworks will do better. However, I&#8217;ve been hearing for a long time that Yahoo&#8217;s MapReduce developers feel very strongly about open source, so perhaps this is in part an emotional issue for them. More substantively, it fits well with the pro-Hortonworks story I&#8217;ve outlined below.</p>
<p><em>*&#8221;We have most of the Hadoop developers&#8221; seems fairly defensible, give or take dueling definitions of &#8220;committer,&#8221; &#8220;core developer,&#8221; &#8220;patch&#8221; or for that matter &#8220;Hadoop.&#8221;</em></p>
<p>The other branch of the Hortonworks marketing message can be lampooned as &#8220;We&#8217;re the right folks to identify your bugs, since we&#8217;re probably the ones who put them there in the first place.&#8221; More darkly, that pitch could be &#8220;If you want the bugs fixed that bother you, we&#8217;re the ones who have control over whether or not that happens.&#8221; Well, maybe. But I also see Cloudera having a couple years experience supporting Hadoop, as well as shipping some code that perhaps makes Hadoop more supportable.</p>
<p>That&#8217;s the skeptical view. <strong>A more favorable view of Hortonworks&#8217; prospects </strong>would go something like this:</p>
<ul>
<li>One version of Apache Hadoop is plenty.</li>
<li>Cloudera (and arguably other Hadoop platform software vendors) sell capabilities that will soon be eclipsed by core Apache Hadoop. Folks should just please wait.</li>
<li>Now that Hortonworks is an independent company focused on the task, it will speedily solve the packaging problems that have made Cloudera&#8217;s Hadoop distribution (perceived to be) necessary.</li>
<li>Yahoo and IBM both back Hortonworks&#8217; approach. That&#8217;s got to count for something.</li>
<li>Apache Hadoop will be quickly enhanced, and Hortonworks will be driving the enhancements. Hortonworks simply is the top Hadoop authority.</li>
</ul>
<p>We&#8217;ll see. Cloudera&#8217;s been around for a couple years, has smart people, and by definition has no technical inferiority to Hortonworks (since it has access to all Hortonworks&#8217; code). What&#8217;s more, it will be a long time before Hadoop technology is so mature that there&#8217;s nothing left to do; add-on software should long prove to be useful. As for &#8220;We&#8217;re purer about open source than the other guys&#8221; &#8212; well, I&#8217;m dubious that that will turn out to be a great marketing message.</p>
<p>And so I think Cloudera is the early favorite in the competition. But perhaps Hadoop users will be able to play Cloudera and Hortonworks off  against each other in price negotiations. Perhaps, notwithstanding <a href="../../../../../2011/06/02/why-you-would-want-an-appliance-and-when-you-wouldnt/">my skepticism about Hadoop appliances</a>, some hardware vendors will play them against each other for appliance partnerships.</p>
<p>Meanwhile, whatever else happens, I&#8217;m pretty psyched about <a href="http://www.dbms2.com/2011/07/10/hadoop-futures-and-enhancements/">some enhancements the Hortonworks folks plan to lead for Hadoop</a>.</p>
<p><strong><em>Related links</em></strong></p>
<ul>
<li>A <a href="http://www.monash.com/uploads/Hortonworks-Apache-Hadoop-July-2011.pptx">Hortonworks/Apache Hadoop slide deck</a> Hortonworks graciously allowed me to post</li>
<li>Cloudera&#8217;s post about it&#8217;s recent <a href="http://www.cloudera.com/blog/2011/07/the-only-full-lifecycle-management-for-apache-hadoop-introducing-cloudera-enterprise-3-5-and-scm-express/">3.5 release of Cloudera Enterprise</a></li>
<li>Pros and cons of <a href="http://www.softwarememories.com/2011/07/10/when-professional-services-and-software-mix/">professional services efforts at young software companies</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/10/cloudera-and-hortonworks/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
	</channel>
</rss>

