<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Aster Data</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/aster-data-warehouse/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Wed, 08 Feb 2012 17:17:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Hope for a new PostgreSQL era?</title>
		<link>http://www.dbms2.com/2011/11/23/hope-for-a-new-postgresql-era/</link>
		<comments>http://www.dbms2.com/2011/11/23/hope-for-a-new-postgresql-era/#comments</comments>
		<pubDate>Wed, 23 Nov 2011 14:18:00 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[EnterpriseDB and Postgres Plus]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[salesforce.com]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5728</guid>
		<description><![CDATA[In a comedy of briefing errors, I&#8217;m not too clear on the details of my client salesforce.com&#8217;s new PostgreSQL-as-a-service offering, nor exactly on what my clients at VMware are bringing to the PostgreSQL virtualization/cloud party. That said: PostgreSQL is good technology. MySQL is narrowing the gap, but PostgreSQL is still ahead of MySQL in some [...]]]></description>
			<content:encoded><![CDATA[<p>In a comedy of briefing errors, I&#8217;m not too clear on the details of my client <a href="http://gigaom.com/cloud/heroku-launches-sql-database-as-a-service/">salesforce.com&#8217;s new PostgreSQL-as-a-service offering</a>, nor exactly on what my clients at VMware are bringing to the PostgreSQL virtualization/cloud party. That said:</p>
<ul>
<li>PostgreSQL is good technology.</li>
<li>MySQL is narrowing the gap, but PostgreSQL is still ahead of MySQL in some ways.  (Database extensibility if nothing else.)</li>
<li>PostgreSQL has a lot of users. (Many of them in academia and/or Russia.)</li>
<li>Neither EnterpriseDB (which now calls itself &#8220;The enterprise PostgreSQL company&#8221;) nor the PostgreSQL community leadership have covered themselves with stewardship glory.</li>
<li>A significant number of interesting DBMS products can be regarded as PostgreSQL forks (e.g. Greenplum, Aster Data nCluster, Netezza if you squint, and Vertica if you stand on your head*).</li>
<li>PostgreSQL advancement is not dead. For example, <a href="../../../../../2011/11/08/hadapt-is-moving-forward/">Hadapt beta users are running actual PostgreSQL on many nodes each</a>.</li>
<li><a href="../../../../../2009/12/14/oracle-mysql-storage-engine/">There&#8217;s no assurance that Oracle will be a benevolent MySQL steward forever</a>. (Specifically, Oracle&#8217;s &#8220;Play nicely with others&#8221; antitrust commitments expire in 2014.)</li>
</ul>
<p>So I think it would be cool if one or the other big company put significant wood behind the PostgreSQL arrow.</p>
<p><em>*While Vertica was originally released using little or no PostgreSQL code &#8212; reports varied &#8212; it featured high degrees of PostgreSQL compatibility.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/23/hope-for-a-new-postgresql-era/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Highlights of a busy news week</title>
		<link>http://www.dbms2.com/2011/09/26/highlights-of-a-busy-news-week/</link>
		<comments>http://www.dbms2.com/2011/09/26/highlights-of-a-busy-news-week/#comments</comments>
		<pubDate>Mon, 26 Sep 2011 05:50:35 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[DataStax]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Ingres]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[VectorWise]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5372</guid>
		<description><![CDATA[I put up 14 posts over the past week, so perhaps you haven&#8217;t had a chance yet to read them all. Highlights included: My most important post of the week was a general guide to IT vendor strategy. That one has already spawned discussion at many companies, from the tiny to the multi-billion-dollar. The best [...]]]></description>
			<content:encoded><![CDATA[<p>I put up 14 posts over the past week, so perhaps you haven&#8217;t had a chance yet to read them all. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Highlights included:</p>
<ul>
<li>My most important post of the week was a general <a href="http://www.strategicmessaging.com/strategy-for-it-vendors-a-worksheet/2011/09/18/">guide to IT vendor strategy</a>. That one has already spawned discussion at many companies, from the tiny to the multi-billion-dollar.</li>
<li>The best comment thread of the week was probably on my post about <a href="http://www.dbms2.com/2011/09/19/oltp-disk-solid-state/">scale-out relational OLTP choices</a>, in which people discussed the merits of various particular alternatives.</li>
<li>I recommended that people strongly consider attending <a href="http://www.dbms2.com/2011/09/20/xldb-the-one-conference-i-like-to-go-to/">XLDB 5 in Menlo Park on October 18-19</a>.</li>
</ul>
<p>Most of the posts, however, were reactions to news events. In particular:</p>
<ul>
<li>Teradata announced that <a href="http://www.dbms2.com/2011/09/22/teradata-columnar-compression/">Teradata 14 will be hybrid-columnar</a>, more in Vertica&#8217;s way than in Greenplum&#8217;s or Aster Data&#8217;s. (Pay no attention to the <em>Wall Street Journal&#8217;s</em> apparent belief that <a href="http://www.dbms2.com/2011/09/22/hybrid-columnar-soundbites/">no other analytic DBMS is hybrid-columnar at all</a>.)</li>
<li>Aster announced the unsurprising news that there will be a Teradata Aster appliance. Also, <a href="http://www.dbms2.com/2011/09/22/aster-database-release-5-and-teradata-aster-appliance/">Aster talked about greater analytic flexibility in the forthcoming Aster 5.0</a>.</li>
<li>With Oracle OpenWorld coming up, Oracle decided to get some of its announcing out of the way early. In particular, it announced the <a href="http://www.dbms2.com/2011/09/21/oracle-database-appliance-soundbites/">Oracle Database Appliance</a>, which is small-business-friendly hardware for running the Oracle DBMS. However, the Oracle Database Appliance doesn&#8217;t seem to do much about the complexity of running the Oracle DBMS software.</li>
<li>In <a href="http://www.dbms2.com/2011/09/23/hadoop-appliances/">a catch-all Hadoop post</a>, I noted that:
<ul>
<li>Oracle has now clearly said it has a Hadoop appliance coming, no doubt next week at OpenWorld.</li>
<li>I still can&#8217;t see why Hadoop appliances would succeed, but a lot of smart folks seem to disagree with me.</li>
<li>Greenplum announced what looks like a nice but unimportant little product upgrade.</li>
<li>It&#8217;s a really good thing that previously reported plans to revamp Hadoop are underway.</li>
</ul>
</li>
<li>DataStax announced that <a href="http://www.dbms2.com/2011/09/22/datastax-pivots-back-to-its-original-strategy/">it really is a Cassandra company after all</a>. Pay no attention to previous marketing that seemed to put DataStax in the same Hadoop-alternative category as, say, MapR.</li>
<li><a href="../2011/09/25/ingres-actian/">Ingres has changed its name to Actian</a>. The announcement seems like a confession that Ingres and VectorWise are going nowhere.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/26/highlights-of-a-busy-news-week/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Workload management and RAM</title>
		<link>http://www.dbms2.com/2011/09/25/workload-management-and-ram/</link>
		<comments>http://www.dbms2.com/2011/09/25/workload-management-and-ram/#comments</comments>
		<pubDate>Sun, 25 Sep 2011 05:04:35 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5354</guid>
		<description><![CDATA[Closing out my recent round of Teradata-related posts, here&#8217;s a little anomaly: Teradata is proud that Teradata 14&#8242;s workload management now explicitly manages I/O, to go with Teradata&#8217;s long-standing management of CPU. Teradata&#8217;s WLM still does not explicitly manage RAM. Aster is proud that Aster 5&#8242;s workload management now explicitly manages RAM, to go along [...]]]></description>
			<content:encoded><![CDATA[<p>Closing out my recent round of Teradata-related posts, here&#8217;s a little anomaly:</p>
<ul>
<li>Teradata is proud that <a href="../../../../../2011/09/22/teradata-columnar-compression/">Teradata 14&#8242;s</a> workload management now explicitly manages I/O, to go with Teradata&#8217;s long-standing management of CPU. Teradata&#8217;s WLM still does not explicitly manage RAM.</li>
<li>Aster is proud that <a href="../../../../../2011/09/22/aster-database-release-5-and-teradata-aster-appliance/">Aster 5&#8242;s workload management now explicitly manages RAM</a>, to go along with <a href="../../../../../2009/10/30/aster-data-application-server-ncluster/">the WLM capabilities Aster has had for a while managing CPU and I/O</a>. Aster&#8217;s Tasso Argyros believes this is an important capability, at least in some edge cases.</li>
<li>Mike Pilcher of SAND emailed me that SAND&#8217;s WLM capabilities to explicitly manage CPU, I/O, and RAM are very well-received by the marketplace.</li>
</ul>
<p><span id="more-5354"></span>One would think that Teradata&#8217;s workload management is more sophisticated and powerful than Aster Data&#8217;s.* So I asked Scott Gnau what gives (he was pretty much the ideal guy to comment, since he runs development for Teradata and oversees Teradata&#8217;s Aster acquisition as well).</p>
<p><em>*Except, of course, that Aster was a pioneer in having workload management cover all kinds of analytic processes, rather than just traditional database requests.</em></p>
<p>Scott&#8217;s main response was that Aster&#8217;s system was much more consumptive  of RAM than Teradata&#8217;s; indeed, he reminded me that in the very old  days, Teradata could make do with as little as 4 megabytes. Scott also  did not argue when I suggested that Aster&#8217;s not-just-database analytic  processes might require large amounts of RAM as well.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/25/workload-management-and-ram/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Hybrid-columnar soundbites</title>
		<link>http://www.dbms2.com/2011/09/22/hybrid-columnar-soundbites/</link>
		<comments>http://www.dbms2.com/2011/09/22/hybrid-columnar-soundbites/#comments</comments>
		<pubDate>Thu, 22 Sep 2011 18:06:30 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5326</guid>
		<description><![CDATA[Busy couple of days talking with reporters. A few notes on hybrid-columnar analytic DBMS, all backed up by yesterday&#8217;s post on Teradata columnar: Oracle does not actually offer columnar I/O; the other three systems do. But see the &#8220;I won&#8217;t be surprised&#8221; part in yesterday&#8217;s Teradata post. Aster does not offer columnar compression; the other [...]]]></description>
			<content:encoded><![CDATA[<p>Busy couple of days talking with reporters. A few notes on hybrid-columnar analytic DBMS, all backed up by <a href="http://www.dbms2.com/2011/09/22/teradata-columnar-compression/">yesterday&#8217;s post on Teradata columnar</a>:</p>
<ul>
<li>Oracle does not actually offer columnar I/O; the other three systems do. But see the &#8220;I won&#8217;t be surprised&#8221; part in yesterday&#8217;s Teradata post.</li>
<li>Aster does not offer columnar compression; the other three do.</li>
<li>EMC  Greenplum and Teradata offer different kinds of ways to mix column and  row storage in the same table; each has its advantages.</li>
<li>Teradata  generally has a more mature and capable offering than EMC Greenplum, for  most purposes, whichever way you choose to organize your tables.</li>
</ul>
<p><em>Edit: The <a href="http://online.wsj.com/article/BT-CO-20110921-715547.html">Wall Street Journal</a> got this wrong, writing that Teradata was the first-ever hybrid columnar system. Specifically, they wrote</em></p>
<p><em> </em></p>
<blockquote><p><em>While columnar technology has been around for years, Teradata says its  product is unique because it allows users to include both columns and  rows in the same database.</em></p></blockquote>
<p><em> </em></p>
<p><em>Googling on &#8220;Teradata To Unveil New Analytics Product To Speed Business Adoption&#8221; might get you around the paywall to see the offending piece.<br />
</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/22/hybrid-columnar-soundbites/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Aster Database Release 5 and Teradata Aster appliance</title>
		<link>http://www.dbms2.com/2011/09/22/aster-database-release-5-and-teradata-aster-appliance/</link>
		<comments>http://www.dbms2.com/2011/09/22/aster-database-release-5-and-teradata-aster-appliance/#comments</comments>
		<pubDate>Thu, 22 Sep 2011 05:56:45 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5304</guid>
		<description><![CDATA[It was obviously just a matter of time before there would be an Aster appliance from Teradata and some tuned bidirectional Teradata-Aster connectivity. These have now been announced. I didn&#8217;t notice anything particularly surprising in the details of either. About the biggest excitement is that Aster is traditionally a Red Hat shop, but for the [...]]]></description>
			<content:encoded><![CDATA[<p>It was obviously just a matter of time before there would be an Aster appliance from Teradata and some tuned bidirectional Teradata-Aster connectivity. These have now been announced. I didn&#8217;t notice anything particularly surprising in the details of either. About the biggest excitement is that Aster is traditionally a Red Hat shop, but for the purposes of appliance delivery has now embraced SUSE Linux.</p>
<p>Along with the announcements comes updated positioning such as:</p>
<ul>
<li>Better SQL than the MapReduce alternatives have.</li>
<li>Better MapReduce than the SQL alternatives have.</li>
<li>Easy(ier) way to do complex analytics on <a href="../../../../../2011/05/15/what-to-do-about-unstructured-data/">multi-structured data</a>. (Aster has embraced that term.)</li>
</ul>
<p>and of course</p>
<ul>
<li>Now also with Teradata&#8217;s beautifully engineered hardware and system management software!</li>
</ul>
<p><span id="more-5304"></span>As might also be expected, the announcements are accompanied by pictures along the lines of &#8220;There are your various data sources; there&#8217;s Teradata; there&#8217;s Aster; there&#8217;s Hadoop; look at all the nice arrows connecting them!&#8221;</p>
<p>Teradata Aster further decided it was time for a 5.0 DBMS release. Highlights include:</p>
<ul>
<li>Aster&#8217;s SQL-MapReduce has more flexible inputs. Specifically, if you view SQL/ MapReduce as steroid-enhanced table functions, those functions can now each have multiple tables as input. Aster is rightly positioning this as the key feature of the Aster 5.0 release.</li>
<li>Workload management now explicitly manages not only CPU and I/O, but also RAM. That surely makes it safer to use algorithms which aggressively create temporary data structures. And the allocation is dynamic, in that it can be throttled back if workloads require.</li>
<li>There&#8217;s more SQL functionality &#8212; I think this is minor, as Aster seems to have had pretty good SQL coverage already.</li>
<li>Performance has been improved; i.e., <a href="../../../../../2009/08/21/bottleneck-whack-a-mole/">Bottleneck Whack-A-Mole</a> has progressed in multiple ways. One improvement Aster thinks is cutting-edge is a hybrid kind of join that tries to be a hash, then reverts to a merge if it has to spill out of memory. (E.g., if the available RAM is throttled back.)</li>
</ul>
<p>Also, Aster is always expanding its library of <a href="../../../../../2010/06/27/lots-of-aster-data-analytic-packages/">prebuilt analytic functions/packages</a> &#8212; often in connection with specific customer engagements &#8212; and took this opportunity to mention numerous recent or near-future additions to the list.</p>
<p>Part of Aster&#8217;s motivation in making multiple input tables available to its parallel analytic functions seems to be to allow the use of intermediate result sets alongside raw data. In some ways, this seems to be an alternative to <a href="../../../../../2011/04/21/sas-hpa-does-make-sense-after-all/">the MPI-based approach favored by SAS</a>, and highlights limitations of the vanilla MapReduce paradigm. The specific examples given were k-means clustering and &#8212; which I&#8217;d never heard of before &#8212; SAX pattern matching.</p>
<p>For an example of two true data tables being used as inputs, Aster offered a case of advertising attribution, with the data being about impressions and also conversions. Frankly, I suspect a &#8220;join them all and let MapReduce sort them out&#8221; strategy would also work for that application; if you join on something like Customer_ID, just how big would the result set really be? Even so, we can imagine other cases in which messy boundaries for graphs or time series makes that strategy unappealing, and &#8212; you read it here first! &#8212; <a href="../../../../../2011/09/08/aster-data-business-trends/">Aster&#8217;s target use cases are focused on time series and graphs</a>.</p>
<p>And finally: Whenever I ask the Aster folks &#8220;So, how big are Aster databases that are actually in production?&#8221;, they try to convince me that this is the wrong thing to ask. But &#8212; without actually answering the question &#8212; they did say:</p>
<ul>
<li>The new Teradata Aster appliance has been tested to a couple hundred terabytes.</li>
<li>They are very confident about scaling Aster to a few hundred terabytes.</li>
<li>They don&#8217;t have much in the way of proof in the 1 petabyte range.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/22/aster-database-release-5-and-teradata-aster-appliance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Aster Data business trends</title>
		<link>http://www.dbms2.com/2011/09/08/aster-data-business-trends/</link>
		<comments>http://www.dbms2.com/2011/09/08/aster-data-business-trends/#comments</comments>
		<pubDate>Thu, 08 Sep 2011 05:33:56 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Application areas]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[DataStax]]></category>
		<category><![CDATA[Liberty and privacy]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5204</guid>
		<description><![CDATA[Last month, I reviewed with the Aster Data folks which markets they were targeting and selling into, subsequent to acquisition by their new orange overlords. The answers aren&#8217;t what they used to be. Aster no longer focuses much on what it used to call frontline (i.e., low-latency, operational) applications; those are of course a key [...]]]></description>
			<content:encoded><![CDATA[<p>Last month, I reviewed with the Aster Data folks which markets they were targeting and selling into, subsequent to <a href="../../../../../2011/03/04/teradata-aster-data-ncluster/">acquisition</a> by their new orange overlords. The answers aren&#8217;t what they used to be. Aster no longer focuses much on what it used to call <a href="../../../../../2008/10/22/aster-data-systems-ncluster/">frontline</a> (i.e., low-latency, operational) applications; those are of course a key strength for Teradata. Rather, Aster focuses on <a href="../../../../../2011/03/03/investigative-analytics/">investigative analytics</a> &#8212; they&#8217;ve long <a href="../../../../../2011/02/12/upcoming-webinar-on-investigative-analytics/">endorsed</a> my use of the term &#8212; and on the batch run/scoring kinds of applications that inform operational systems.</p>
<p><span id="more-5204"></span>Also, Aster no longer focuses much on the general internet industry where it got its earliest sales, its <a href="../../../../../2011/09/05/zynga-linkedin-data-warehous/">continued success at LinkedIn</a> and a recent win at <span style="text-decoration: line-through;">an (NDA) fairly-big-name internet new account</span> <em>Razorfish</em> notwithstanding. That said, the first target market Aster did share with me was &#8220;digital marketing optimization,&#8221; which includes &#8220;marketing optimization&#8221; (duh), search engine optimization (SEO), clickstream analysis, and the like. Also, Aster is going after &#8220;data scientists&#8221; in general, and that&#8217;s a term I&#8217;m still seeing used most frequently in the internet area.</p>
<p><em>I&#8217;m seeing ever more granularity as companies break down internet-related market segments. DataStax showed me a chart last week of 15 different market segments it had sold into, and at least 14 were in some way internet-related.</em></p>
<p>Rather, if Aster is to name three industries in which it has pleasingly strong sales traction, it would say manufacturing (which in Teradata lingo includes resource extraction), financial services (including insurance), and retail. A cynic might note that that breakdown, like many similar ones, adds up to fairly large swaths of the economy and the computer market, but never mind that part. (Other firms might have thrown in telecommunications and health care as well, to get even more coverage.</p>
<p>Two of Aster&#8217;s other favorite application areas are social network analysis/influencer identification and &#8212; which is analytically very similar &#8212; fraud detection/prevention. Taken together, that&#8217;s a whole lot of graph analysis. And I note with interest that the influencer identification stuff does NOT seem to be concentrated in telecom, which is the traditional sector one would imagine it being used in; all those call records are a lovely source of graph edges. Rather, the influencers seem to be identified from sources such as social media and credit card data .</p>
<p><em>Once again, this kind of thing gives me privacy jitters.</em></p>
<p>The match between Aster&#8217;s favorite industries and application areas is pretty much as you might expect &#8212; fraud in financial services, influencer analysis in retailing (and probably consumer financial services too), and digital marketing in both. As for manufacturing, the opportunities there seem to be focused on machine-generated data. That would be at least in high-tech manufacturing (I bet especially in flow-oriented stuff such as semiconductor fab) and oil/gas. Smart grid opportunities don&#8217;t seem to have arisen yet for Aster the way they have for a couple other vendors.</p>
<p>As for general Aster business trends, I think they&#8217;re good, while Aster would perhaps want to portray them as very good. Aster named a couple of impressive joint Teradata/Aster wins under NDA, but only a couple. Ramping up sales headcount is proving challenging, and some sales leadership turnover probably hasn&#8217;t helped. I do believe Aster&#8217;s spin that this is a matter of somebody being promoted quickly to a bigger job, and am optimistic about the current team &#8212; still, such moves tend to have at least short-term cost.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/08/aster-data-business-trends/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Data management at Zynga and LinkedIn</title>
		<link>http://www.dbms2.com/2011/09/05/zynga-linkedin-data-warehous/</link>
		<comments>http://www.dbms2.com/2011/09/05/zynga-linkedin-data-warehous/#comments</comments>
		<pubDate>Mon, 05 Sep 2011 08:49:04 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Games and virtual worlds]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Zynga]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5159</guid>
		<description><![CDATA[Mike Driscoll and his Metamarkets colleagues organized a bit of a bash Thursday night. Among the many folks I chatted with were Ken Rudin of Zynga, Sam Shah of LinkedIn, and D. J. Patil, late of LinkedIn. I now know more about analytic data management at Zynga and LinkedIn, plus some bonus stuff on LinkedIn&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>Mike Driscoll and his <a href="http://www.metamarketsgroup.com/">Metamarkets</a> colleagues organized a bit of a <a href="http://yfrog.com/h8msmkqj">bash</a> Thursday night. Among the many folks I chatted with were Ken Rudin of Zynga, Sam Shah of LinkedIn, and D. J. Patil, late of LinkedIn. I now know more about analytic data management at Zynga and LinkedIn, plus some bonus stuff on LinkedIn&#8217;s People You May Know application. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>It&#8217;s blindingly obvious that Zynga is one of <a href="../../../../../2011/06/20/columnar-dbms-vendor-customer-metrics/">Vertica&#8217;s petabyte-scale customers</a>, given that Zynga sends 5 TB/day of data into Vertica, and keeps that data for about a year. (Zynga may retain even more data going forward; in particular, Zynga regrets ever having thrown out the first month of data for any game it&#8217;s tried to launch.) This is game actions, for the most part, rather than log files; true logs generally go into Splunk.</p>
<p><em>I don&#8217;t know whether the missing data is completely thrown away, or just stashed on inaccessible tapes somewhere.</em></p>
<p>I found two aspects of the Zynga story particularly interesting. First, those 5 TB/day are going straight into Vertica (from, I presume, <a href="http://www.dbms2.com/2010/08/18/nosql-hvsp-adoption/">memcached/Membase/Couchbase</a>), as Zynga decided that sending the data to some kind of log first was more trouble than it&#8217;s worth. Second, there&#8217;s Zynga&#8217;s approach to analytic database design. Highlights of that include: <span id="more-5159"></span></p>
<ul>
<li>Data is divided into two parts. One part has a  pretty ordinary schema; the other is just stored as a huge list of name-value pairs. (This is much like <a href="../../../../../2010/10/06/ebay-followup-greenplum-out-teradata-10-petabytes-hadoop-has-some-value-and-more/">eBay</a>&#8216;s approach with its Teradata-based Singularity, except that eBay puts the name-value pairs into long character strings.) About half the data is in each part, but I don&#8217;t think that&#8217;s by deliberate choice.</li>
<li>Zynga adds data into the real schema when it&#8217;s clear it will be needed for a while. This isn&#8217;t a matter of query volumes, for the most part; rather, it&#8217;s when Zynga&#8217;s tests (e.g. of new games?) have determined that the data will keep being collected and used for a while.</li>
<li>Zynga only adds columns to its analytic  database; it never goes through the more complex process of deleting them.</li>
</ul>
<p>Just as Zynga is one of Vertica&#8217;s flagship accounts, LinkedIn is one of Aster Data&#8217;s. Specifically, before leaving LinkedIn for Aster, Jonathan Goldman built LinkedIn&#8217;s People You May Know feature in Aster nCluster. This was long ago, and I&#8217;m not sure how sophisticated his use of <a href="../../../../../2009/03/07/three-greenplum-customers-applications-of-mapreduce/">SQL and MapReduce</a> would be in today&#8217;s terms; for example, I was told he didn&#8217;t use &#8220;nPath or anything like that.&#8221; <em>(Edit: See the comments below for clarifications from Jonathan.) </em>Anyhow, LinkedIn has replaced Aster for PYMK with Hadoop, and in my opinion is getting much better results.</p>
<p>That, from an Aster standpoint, is the bad news. The good news is that LinkedIn is happily using Aster nCluster for several other applications; LinkedIn folks doesn&#8217;t seem to regret throwing out* Greenplum for Aster; and they also seem to have a very high opinion of Jonathan and his work while he was there.</p>
<p><em>*And <a href="http://www.dbms2.com/2010/10/06/ebay-followup-greenplum-out-teradata-10-petabytes-hadoop-has-some-value-and-more/">this time</a> that is indeed the phrase that was used. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </em></p>
<p>One thing that astonished me is that LinkedIn PYMK is based only on data innate to LinkedIn (as opposed to imported email addresses, the results of web crawls, and so on). Given that, I am at a loss to explain how it suggested a couple of old friends, to whom I have no discernable chain of connection. Yes, we were at Harvard at the same time, but if that&#8217;s all it was, there would be a huge number of false positives I&#8217;m not actually seeing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/05/zynga-linkedin-data-warehous/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
		</item>
		<item>
		<title>Eight kinds of analytic database (Part 1)</title>
		<link>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/</link>
		<comments>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 08:17:44 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Buying processes]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MOLAP]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[QlikTech and QlikView]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Web analytics]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4868</guid>
		<description><![CDATA[Analytic data management technology has blossomed, leading to many questions along the lines of &#8220;So which products should I use for which category of problem?&#8221; The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for &#8220;big data&#8221; is little help. Let&#8217;s try eight categories instead. While no categorization [...]]]></description>
			<content:encoded><![CDATA[<p>Analytic data management technology has blossomed, leading to many questions along the lines of &#8220;So which products should I use for which category of problem?&#8221; The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for &#8220;big data&#8221; is little help.</p>
<p>Let&#8217;s try eight categories instead. While <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">no categorization is ever perfect</a>, these each have at least some degree of technical homogeneity. Figuring out which types of analytic database you have or need &#8212; and in most cases you&#8217;ll need several &#8212; is a great early step in your analytic technology planning.  <span id="more-4868"></span></p>
<p><strong><em>Enterprise data warehouse</em></strong> (Full or partial)</p>
<ul>
<li><em>Kinds of data likely to be included:</em> All, but especially operational</li>
<li><em>Likely use styles:</em> All</li>
<li><em>Canonical example:</em> Central EDW for a big enterprise</li>
<li><em>Stresses:</em> Concurrency, reliability, workload management</li>
</ul>
<p>The enterprise data warehouse (EDW) ideal says that you copy all your data into one place, and drive all decision-making from there. <a href="../../../../../2011/06/21/its-official-the-grand-central-edw-will-never-happen/">Full EDWs are pipedreams</a>. Still, a partial EDW makes sense for most large enterprises, and many indeed already have one. The first product lines to consider for classical EDWs are Teradata, DB2, Exadata, and maybe Microsoft SQL Server, especially if you&#8217;re going to stress concurrency and/or operational use cases.</p>
<p><strong><em>Traditional data mart</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All</li>
<li><em>Likely use styles:</em> Business intelligence, budgeting/consolidation, investigative</li>
<li><em>Examples:</em> Reporting servers, planning/consolidation servers, anything MOLAP, etc.</li>
<li><em>Stresses:</em> Performance, concurrency, TCO</li>
</ul>
<p>Whether or not you have something like an enterprise data warehouse, it&#8217;s common to have lighter-weight data marts as well. A traditional data mart might drive reports and dashboards. Or it might be specialized for budgeting, planning, and/or consolidation.  Some <a href="../../../../../2011/03/03/investigative-analytics/">investigative analytics</a> may be in the mix as well.</p>
<p>Any DBMS that can support an EDW can also support a data mart, but it may not be the most cost-effective way to do so. Columnar DBMS might have more attractive performance and TCO (Total Cost of Ownership); the same goes for Netezza. Some of them &#8212; e.g. Sybase IQ and <a href="../../../../../2011/06/20/vertica-release-5/">Vertica</a> &#8212; have excellent track records in concurrent usage as well. <a href="../../../../../2011/05/29/when-to-use-relational-database-management-system/">Ted Codd</a> pushed what amounts to MOLAP (Multidimensional OnLine Analytic Processing) systems for these use cases. But relational DBMS commonly do a better job, which is one reason most major MOLAP products have wound up at RDBMS companies.</p>
<p><strong><em>Investigative data mart &#8212; agile</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All, especially customer-centric</li>
<li><em>Likely use styles</em>: Investigative</li>
<li><em>Canonical example:</em> A few analysts getting a few TB to examine</li>
<li><em>Stresses:</em> Ease of setup/load, ease of admin, price/performance</li>
</ul>
<p>Besides the traditional data mart, there are at least two other kinds. Both are focused on investigative analytics, but they&#8217;re differentiated by database size.</p>
<p>If you have just a few analysts,* looking at no more than a few terabytes of data (perhaps even just some gigabytes) &#8212; and if that data is &#8220;single-subject&#8221; and fairly homogenous &#8212; your watchwords should be &#8220;cheap&#8221;, &#8220;easy&#8221;, and &#8220;fast&#8221;. You don&#8217;t need to invest in much hardware, in expensive software, in much administrative effort (the analysts can be their own DBAs),  nor should you endure much set-up time. Just grab a product, grab some data, and start running queries (or extracts into the statistical tool of your choice).</p>
<p><em>*If you have dozens or even hundreds of analysts hitting the same database, you&#8217;re probably back to the more concurrency-oriented scenarios outlined above.</em></p>
<p>Infobright is often cost-effective among columnar analytic DBMS. Other vendors might cut you a price break as well. If you have multiple terabytes of data, don&#8217;t rule out Netezza&#8217;s lowest-end products (even if they&#8217;d really rather sell you something bigger). Or, if you&#8217;re in the sub-terabyte range, maybe you can get by with an in-memory BI tool such as QlikView, and not do anything special on the DBMS side at all.</p>
<p><strong><em>Investigative data mart &#8212; big</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All, especially customer-centric, logs, financial trade, scientific</li>
<li><em>Likely use styles</em>: Investigative</li>
<li><em>Canonical example:</em> Single-subject 20 TB &#8211; 20 PB relational database<em></em></li>
<li><em>Stresses:</em> Performance, scale-out, analytic functionality</li>
</ul>
<p>But if you&#8217;re looking at tens of terabytes of relational data, or even more, you really do have a &#8220;big data&#8221; problem. Performance and scalability are major challenges, usually best addressed by MPP (Massively Parallel Processing) systems, such as Netezza, Vertica, Aster Data, ParAccel, Teradata, or Greenplum. Performance POCs (Proofs Of Concept) are a big part of the buying process. Vendor price negotiations are crucial too.</p>
<p><em>Actually, in the low tens of terabytes you might be able to get away with a shared-disk system that has excellent compression &#8212; e.g., columnar products like Sybase IQ, Infobright, or SAND, rather than just Vertica and ParAccel.</em></p>
<p>Assuming you have affordable, scalable query performance, the competitive differentiator can switch to additional analytic functionality. Aster, Netezza, ParAccel, Vertica, and Greenplum either offer full <a href="../../../../../2011/02/24/analytic-platforms/">analytic platforms</a>, or seem to be on the path to doing so. Teradata, which now owns Aster Data, offers substantial built-in analytic capability in its traditional products as well, and the same goes for Sybase IQ.</p>
<p><em>Continued in <a href="http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/">Part 2</a>,</em><em> where we cover some of the more difficult use cases.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Notes and links, June 15, 2011</title>
		<link>http://www.dbms2.com/2011/06/15/notes-and-links-june-15-2011/</link>
		<comments>http://www.dbms2.com/2011/06/15/notes-and-links-june-15-2011/#comments</comments>
		<pubDate>Wed, 15 Jun 2011 11:07:32 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[1010data]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4722</guid>
		<description><![CDATA[Five things:  Back in April, Steve Miller suggested that approximate BI could be a growing trend, gaining speed at the expense of (often false anyway) precision. That idea of course goes well with Infobright&#8217;s recent released Rough Query feature, and also with Datameer&#8217;s year-earlier pitch. Aster Data (now a Teradata company) is positioning itself as [...]]]></description>
			<content:encoded><![CDATA[<p>Five things:  <span id="more-4722"></span></p>
<p>Back in April, Steve Miller suggested that <a href="http://www.information-management.com/blogs/business_intelligence_big_data_analytics_approximate_BI-10020170-1.html">approximate BI</a> could be a growing trend, gaining speed at the expense of (often false anyway) precision. That idea of course goes well with Infobright&#8217;s recent released <a href="../../../../../2011/06/14/infobright-4-0/">Rough Query</a> feature, and also with <a href="../../../../../2010/04/16/introduction-to-datameer/">Datameer&#8217;s year-earlier pitch</a>.</p>
<p>Aster Data (now a Teradata company) is positioning itself as <a href="http://www.asterdata.com/blog/2011/06/13/multi-structured-data-platform-capabilities-required-for-big-data-analytics/">analyzing multi-structured data</a> &#8212; which is my second-choice term, behind the more precise but odder-sounding &#8220;<a href="../../../../../2011/05/17/poly-structured-database/">poly-structured</a>.&#8221; I hope &#8220;poly-structured&#8221; wins, and plan to keep using it myself; but I recognize that &#8220;multi-structured&#8221; may actually be the one that prevails.</p>
<p>Barbara Darrow wrote a great piece on <a href="http://searchdatacenter.techtarget.com/news/2240036530/Oracle-pitches-cut-rate-Exadata-hardware-to-boost-sales">Oracle Exadata pricing</a>. Highlights include:</p>
<ul>
<li>Routine Oracle software discounts are high.</li>
<li>Exadata discounts are higher.</li>
<li>Big/referenceable customers get the best Exadata discounts. The term &#8220;extremely deep&#8221; was used. (I&#8217;ve also heard that from Oracle competitors, with the term &#8220;free&#8221; even coming up, hyperbolically or otherwise.)</li>
<li>Oracle&#8217;s hardware maintenance pricing is forcing users to trash Sun gear, even when it&#8217;s working. One guy told the story of literally crying as the Sun boxes were taken away.</li>
<li>Oracle&#8217;s 22% of license maintenance fee goes up to 27% after two years. I didn&#8217;t know that.</li>
</ul>
<p>Oracle has been making considerable messaging fuss around a win in Japan, where <a href="../../../../../2011/02/02/exadata-notes/">Softbank replaced years-old Teradata systems with vastly less new Exadata gear</a>. I blogged that this is hardly an apples-to-apples comparison. During <a href="../../../../../2011/05/03/oracle-exadata-business-technology/">my visit last April</a>, Oracle pushed back, in particular pointing out that the Softbank division that awarded the deal was very separate from the one that was an Oracle reseller. But Monday Teradata shared with me a counter-pushback, asserting that during the recent worldwide recession, Softbank assigned its underemployed systems integration division to do internal projects &#8212; including the data warehouse upgrade. I.e., Teradata stands by its claim that this replacement was strongly influenced by the Softbank/Oracle partnership.</p>
<p>If you&#8217;re analytically inclined, Kx Systems has some interesting ideas, manifested in kdb+ and so on. A <a href="http://queue.acm.org/detail.cfm?id=1531242">2009 ACM article</a> seems as good a starting point as any, the company&#8217;s website probably aside. Confusingly, <a href="http://kx.com/index.php">Kx</a> is small company that evidently does most of its selling through a couple of much larger partners. Also, 1010data happens to be built on an older version of Kx&#8217;s technology.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/06/15/notes-and-links-june-15-2011/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Alternatives for Hadoop/MapReduce data storage and management</title>
		<link>http://www.dbms2.com/2011/05/14/hadoop-mapreduce-data-storage-management/</link>
		<comments>http://www.dbms2.com/2011/05/14/hadoop-mapreduce-data-storage-management/#comments</comments>
		<pubDate>Sat, 14 May 2011 05:00:52 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[DataStax]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadapt]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[MongoDB and 10gen]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Parallelization]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4438</guid>
		<description><![CDATA[There&#8217;s been a flurry of announcements recently in the Hadoop world. Much of it has been concentrated on Hadoop data storage and management. This is understandable, since HDFS (Hadoop Distributed File System) is quite a young (i.e. immature) system, with much strengthening and Bottleneck Whack-A-Mole remaining in its future. Known HDFS and Hadoop data storage [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s been a flurry of announcements recently in the Hadoop world. Much of it has been concentrated on Hadoop data storage and management. This is understandable, since HDFS (Hadoop Distributed File System) is quite a young (i.e. immature) system, with much strengthening and <a href="../../../../../2009/08/21/bottleneck-whack-a-mole/">Bottleneck Whack-A-Mole</a> remaining in its future.</p>
<p>Known HDFS and Hadoop data storage and management issues include but are not limited to:</p>
<ul>
<li>Hadoop is run by a master node, and specifically a namenode, that&#8217;s a single point of failure.</li>
<li>HDFS compression could be better.</li>
<li>HDFS likes to store three copies of everything, whereas many DBMS and file systems are satisfied with two.</li>
<li>Hive (the canonical way to do SQL joins and so on in Hadoop) is slow.</li>
</ul>
<p>Different entities have different ideas about how such deficiencies should be addressed.  <span id="more-4438"></span></p>
<p>For most practical purposes, <strong>Yahoo&#8217;s</strong> and <strong>IBM&#8217;s</strong> views about Hadoop have converged. Yahoo and IBM both believe that Hadoop data storage should be advanced solely through the <strong>Apache</strong> Hadoop open source process. In particular:</p>
<ul>
<li>IBM and Yahoo both talk of the great undesirability of Hadoop &#8220;forking&#8221; like Unix did.</li>
<li>Yahoo appeared on stage at IBM&#8217;s analyst event this week to reinforce the meeting-of-the-minds, even though there&#8217;s no IBM/Yahoo customer relationship involved.</li>
<li>IBM has disclaimed any intention of providing its own Hadoop distribution, but even so is committed to selling lots of <a href="http://www-01.ibm.com/software/data/bigdata/enterprise.html">IBM InfoSphere BigInsights</a>, which incorporates Apache Hadoop.*</li>
<li><a href="http://developer.yahoo.com/blogs/hadoop/posts/2011/01/announcement-yahoo-focusing-on-apache-hadoop-discontinuing-the-yahoo-distribution-of-hadoop/">Yahoo has stopped offering its own Hadoop distribution</a>, period.</li>
</ul>
<p><em>*IBM is emphatic about ruling out marketing terms whose connotation it doesn&#8217;t like. IBM&#8217;s Hadoop distribution isn&#8217;t a &#8220;distribution,&#8221; because that might make it sound too proprietary; IBM&#8217;s Oracle emulation offering <a href="../../../../../2009/04/24/ibms-oracle-emulation-strategy-reconsidered/#comment-118444">isn&#8217;t an &#8220;emulation&#8221; offering</a>, because that might make it sound too slow; and <a href="../../../../../2009/05/13/ibm-system-s-infosphere-streams-processing/">IBM&#8217;s CEP product InfoSphere Streams isn&#8217;t a &#8220;CEP&#8221; product</a>, because that might make it sound too non-functional.</em></p>
<p><strong>Cloudera</strong> can probably be regarded as part of the Yahoo/IBM camp, some stern looks from IBM in Cloudera&#8217;s direction notwithstanding. <a href="../../../../../2010/06/30/cloudera-enterprise-hadoop-evolution/">Cloudera Enterprise</a> &#8212; also an embrace-and-extend offering &#8212; remains the obvious choice for enterprises Hadoop users; meanwhile, nobody has convinced me of any bogosity in <a href="http://www.cloudera.com/hadoop/">the &#8220;no forking&#8221; claim Cloudera makes for its free/open source Hadoop distribution</a>. Indeed, when I visited Cloudera a couple of weeks ago, Mike Olson showed me a slide demonstrating that Cloudera might be supplanting Yahoo as the biggest ongoing contributor to Apache Hadoop.</p>
<p><strong>EMC&#8217;s Data Computing Division, </strong>nee&#8217; <strong>Greenplum,</strong> made a lot of Hadoop noise this week. Unlike Yahoo, IBM, and Cloudera, EMC really is forking Hadoop. <a href="../../../../../2011/04/05/comments-on-emc-greenplum/">I&#8217;m not talking with the EMC/Greenplum folks</a> these days, but the whole thing was covered from various angles by <a href="http://www.computerworld.com/s/article/9216541/EMC_unveils_Hadoop_appliance_BI_software">Lucas Mearian</a>, <a href="http://www.informationweek.com/news/software/info_management/229403178">Doug Henschen</a>, <a href="http://gigaom.com/cloud/emc-hadoop/">Derrick Harris</a>, and <a href="http://davidmenninger.ventanaresearch.com/2011/05/12/emc-enters-elephant-race-with-hadoop/">Dave Menninger</a>.</p>
<p>Another option is to entirely replace HDFS with a DBMS, whether distributed or just instanced at each node. <strong>DataStax</strong> is doing that with <a href="../../../../../2011/03/23/datastax-cassandrafs-hadoop-brisk/">Cassandra-based Brisk</a>; <strong><a href="../../../../../2011/03/23/hadapt-commercialized-hadoopdb/">Hadapt</a></strong> plans to do that with PostgreSQL and VectorWise <em>(edit: As per the comment below, Hadapt only plans a partial replacement of HDFS);</em> and <a href="../../../../../2011/04/17/netezza-twinfin-i-class-overview/">Netezza&#8217;s analytic platform</a> has a Hadoop-over-<strong>Netezza</strong> option as well. Mike Olson objects to such implementations being called &#8220;Hadoop&#8221;; but trademark issues aside, those vendors plan to support a broad variety of Hadoop-compatible tools. <strong>Aster Data</strong> has long taken that approach one step further, by offering an enhanced version of MapReduce &#8212; aka <a href="../../../../../2009/12/02/mapreduce-for-complex-analytics-webina/">SQL/MapReduce</a> &#8212; over its nCluster DBMS. And <a href="../../../../../2011/04/04/the-mongodb-story/"><strong>10gen</strong> offers a more primitive form of MapReduce with MongoDB</a>, but probably wouldn&#8217;t position it as addressing a &#8220;MapReduce market&#8221; at all.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/05/14/hadoop-mapreduce-data-storage-management/feed/</wfw:commentRss>
		<slash:comments>21</slash:comments>
		</item>
	</channel>
</rss>

