<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; EMC</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/emc/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Wed, 08 Feb 2012 12:22:57 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Comments on the 2012 Forrester Wave: Enterprise Hadoop Solutions</title>
		<link>http://www.dbms2.com/2012/02/06/comments-on-the-2012-forrester-wave-enterprise-hadoop-solutions/</link>
		<comments>http://www.dbms2.com/2012/02/06/comments-on-the-2012-forrester-wave-enterprise-hadoop-solutions/#comments</comments>
		<pubDate>Mon, 06 Feb 2012 05:16:20 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[MapR]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Pentaho]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5886</guid>
		<description><![CDATA[Forrester has released its Q1 2012 Forrester Wave: Enterprise Hadoop Solutions. (Googling turns up a direct link, but in case that doesn&#8217;t prove stable, here also is a registration-required link from IBM&#8217;s Conor O&#8217;Mahony.) My comments include: The Forrester Wave&#8217;s relative vendor rankings are meaningless, in that the document compares apples, peaches, almonds, and peanuts. [...]]]></description>
			<content:encoded><![CDATA[<p>Forrester has released its Q1 2012 Forrester Wave: Enterprise Hadoop Solutions. (Googling turns up a <a href="http://www.forrester.com/rb/go?docid=60755&amp;oid=1-K07LCA&amp;action=5">direct link</a>, but in case that doesn&#8217;t prove stable, here also is <a href="http://database-diary.com/2012/02/02/get-a-free-copy-of-the-forrester-wave-for-enterprise-hadoop-solutions/">a registration-required link from IBM&#8217;s Conor O&#8217;Mahony</a>.) My comments include:</p>
<ul>
<li>The Forrester Wave&#8217;s <strong>relative vendor rankings are meaningless,</strong> in that the document compares apples, peaches, almonds, and peanuts. Apparently, it covers any vendor that includes a distribution of Apache Hadoop MapReduce into something it offers, and that offered at least two (not necessarily full production) references for same.</li>
<li>The Forrester Wave for &#8220;enterprise Hadoop&#8221; contradicts itself on the subject of Hortonworks.
<ul>
<li>The Forrester Wave for &#8220;enterprise Hadoop&#8221; is correct when it says <strong>&#8220;Hortonworks &#8230; has Hadoop training and professional services offerings that are still embryonic.&#8221;</strong></li>
</ul>
<ul>
<li>Peculiarly, the Forrester Wave for &#8220;enterprise Hadoop&#8221; also says &#8220;Hortonworks offers an impressive Hadoop professional services portfolio&#8221;. Hortonworks will likely win one or more nice partnership deals with vendors in adjacent fields, but even so its professional services capabilities are &#8230; well, a good word might be &#8220;embryonic&#8221;.</li>
</ul>
</li>
<li><a href="http://www.dbms2.com/2011/02/11/comments-on-the-2011-forrester-wave-for-enterprise-data-warehouse-platforms/">Forrester Waves always seem to have weird implicit definitions of &#8220;data warehousing&#8221;</a>. This one is no exception.</li>
<li>Forrester gave top marks in &#8220;Functionality&#8221; to 11 of 13 &#8220;enterprise Hadoop&#8221; vendors. This seems odd.</li>
<li>I don&#8217;t know why MapR, which doesn&#8217;t like HDFS (Hadoop Distributed File System), got top marks in &#8220;Subproject integration&#8221;.</li>
<li>Forrester gave top marks in &#8220;Storage&#8221; to Datameer. It also gave higher marks to MapR than to EMC Greenplum, even though EMC Greenplum&#8217;s technology is a superset of MapR&#8217;s. Very strange. <em>(Edit: Actually, as per a comment below, there is some uncertainty about the EMC/MapR relationship.)</em></li>
<li>Forrester gave higher marks in &#8220;Acceleration and optimization&#8221; to Hortonworks than to Cloudera and IBM, and higher marks yet to Pentaho. Very odd.</li>
<li>I&#8217;m not sure what Forrester is calling a &#8220;Distributed EDW file store connector&#8221;, but it sounds like something that Cloudera has provided via partnership to a number of analytic DBMS vendors.</li>
<li>Forrester&#8217;s &#8220;Strategy&#8221; rankings seem to correlate to a metric of &#8220;We&#8217;re a large enough vendor to go in N directions at once&#8221;, for various values of N.</li>
<li>Forrester is correct to rank Cloudera&#8217;s &#8220;Adoption&#8221; as being stronger than EMC/Greenplum&#8217;s or MapR&#8217;s. But Hortonworks&#8217; strong mark for &#8220;Adoption&#8221; baffles me.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/06/comments-on-the-2012-forrester-wave-enterprise-hadoop-solutions/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Analytic trends in 2012: Q&amp;A</title>
		<link>http://www.dbms2.com/2011/11/21/analytic-trends-in-2012-qa/</link>
		<comments>http://www.dbms2.com/2011/11/21/analytic-trends-in-2012-qa/#comments</comments>
		<pubDate>Mon, 21 Nov 2011 11:00:23 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[QlikTech and QlikView]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Tableau Software]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5692</guid>
		<description><![CDATA[As a new year approaches, it&#8217;s the season for lists, forecasts and general look-ahead. Press interviews of that nature have already begun. And so I&#8217;m working on a trilogy of related posts, all based on an inquiry about hot analytic trends for 2012. This post is a moderately edited form of an actual interview. Two [...]]]></description>
			<content:encoded><![CDATA[<p>As a new year approaches, it&#8217;s the season for lists, forecasts and general look-ahead. Press interviews of that nature have already begun. And so I&#8217;m working on a trilogy of related posts, all based on an inquiry about hot analytic trends for 2012.</p>
<p>This post is a moderately edited form of an actual interview. Two other posts cover analytic trends to watch (planned) and <a href="http://www.dbms2.com/2011/11/21/big-vendor-execution-analytics/">analytic vendor execution challenges to watch</a> (already up).</p>
<p><span id="more-5692"></span><strong>Question</strong>: What do you think will happen next year with the Tableaus of the world?</p>
<p><strong>Answer:</strong></p>
<ul>
<li>I think adoption of flexible-visualization business intelligence tools will continue to be rapid.</li>
<li>I think enterprise-friendly features will be increasingly important as a basis of competition.</li>
</ul>
<p><strong>Question</strong>: What do you mean by &#8220;enterprise-friendly&#8221;?</p>
<p><strong>Answer</strong>: An example would be <a href="http://www.dbms2.com/2011/11/16/qlikview-collaborative-business-intelligence/">QlikTech no longer forcing you to use their native ETL</a>, but rather working with Informatica and soon other third-party products. Also important can be:</p>
<ul>
<li>Database size.</li>
<li>Concurrency.</li>
<li>A full-featured development cycle for analytic applications.</li>
</ul>
<p><strong>Question</strong>: What does HP have to do to be relevant in analytics/data warehousing?</p>
<p><strong>Answer</strong>: Avoid stupidity. HP Vertica is already relevant.</p>
<p><strong>Question</strong>: OK. But what can HP do to build on Vertica?</p>
<p><strong>Answer</strong>: HP &#8212; which botched Exadata 1 hardware &#8212; could do a good job with SAP HANA or other kinds of appliance products.</p>
<p>However:</p>
<ul>
<li>I don&#8217;t think trying to force Vertica beyond its natural growth &#8212; <a href="http://www.dbms2.com/2011/04/16/unpacking-the-emc-greenplum-q1-sales-disaster-rumors/">the way EMC is with Greenplum</a> &#8212; is necessarily a good idea. Natural growth in Vertica&#8217;s case is plenty fast anyway.</li>
<li>Obviously, making good Vertica hardware would be nice. But being hardware-independent is crucial to Vertica, not least because of cloud deployment, an option many buyers want to at least have in their hip pockets.</li>
</ul>
<p><strong>Question</strong>: You expressed some skepticism toward mobile BI/use cases. Why so?</p>
<p><strong>Answer</strong>: The form factor hurts functionality a lot, so it&#8217;s only worthwhile in cases where timeliness is key.</p>
<p>And without more refined alert-setting functionality, it&#8217;s hard to think of that many cases.</p>
<p><em>Note: My views on mobile BI haven&#8217;t changed much since <a href="../../../../../2010/07/15/mobile-business-intelligence/">July, 2010</a>.</em></p>
<p><strong>Question</strong>: What about the idea of an enterprise being able to pay-per-drink to run jobs on an analytic cluster. Do you expect that concept to have any legs in 2012?</p>
<p><strong>Answer</strong>: While other kinds of SaaS (Software as a Service) BI might make sense, remote computing BI that focuses on hardware cost sharing is problematic. Moving data in and out of the cluster is a big part of the overall cost, at least if you plan to process it only occasionally once it gets there. I haven&#8217;t seen a plan yet that gets around that point.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/21/analytic-trends-in-2012-qa/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Some notes on Hadoop (mainly) and appliances</title>
		<link>http://www.dbms2.com/2011/09/23/hadoop-appliances/</link>
		<comments>http://www.dbms2.com/2011/09/23/hadoop-appliances/#comments</comments>
		<pubDate>Fri, 23 Sep 2011 19:59:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapR]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[eBay]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5341</guid>
		<description><![CDATA[1. EMC Greenplum has evolved its appliance product line. As I read that, the latest announcement boils down to saying that you can neatly network together various Greenplum appliances in quarter-rack increments. If you take a quarter rack each of four different things, then Greenplum says &#8220;Hooray! Our appliance is all-in-one!&#8221; Big whoop. 2. That [...]]]></description>
			<content:encoded><![CDATA[<p>1. <a href="http://www.greenplum.com/products/greenplum-dca">EMC Greenplum has evolved its appliance product line</a>. As I read that, the latest announcement boils down to saying that you can neatly network together various Greenplum appliances in quarter-rack increments. If you take a quarter rack each of four different things, then Greenplum says &#8220;Hooray! Our appliance is all-in-one!&#8221; Big whoop.</p>
<p>2. That said, the Hadoop part of EMC &#8216;s story is based on MapR, which so far as I can tell is actually a pretty good Hadoop implementation. More precisely, MapR makes strong claims about performance and so on, and Apache Hadoop folks don&#8217;t reply &#8220;MapR is full of &amp;#$!&#8221; Rather, they say &#8220;We&#8217;re going to close the gap with MapR a lot faster than the MapR folks like to think &#8212; and by the way, guys, thanks for the butt-kick.&#8221; A lot more precision about MapR may be found in this <a href="http://www.slideshare.net/mcsrivas/design-scale-and-performance-of-maprs-distribution-for-hadoop">M. C. Srivas SlideShare</a>.</p>
<p>3. On its latest earnings call, Oracle clearly <a href="http://seekingalpha.com/article/294885-oracle-s-ceo-discusses-q1-2012-results-earnings-call-transcript?part=qanda">said it would introduce a Hadoop appliance</a>, versus just <a href="../../../../../2011/06/24/forthcoming-oracle-appliances/">hinting at a Hadoop appliance</a> the prior quarter. The money quote was:  <span id="more-5341"></span></p>
<blockquote><p>Finally, big data or the searching of large amounts of data using Hadoop. After Hadoop finishes filtering the data, the place you want to put that data is an Oracle Database, and that&#8217;s what a lot of our customers are doing. And we are exploiting the trend, the big data technology and the big data trend, if you prefer, by building a Hadoop appliance that attaches to the Oracle Exadata database or any Oracle Database for that matter. But you don&#8217;t have to buy our Hadoop appliance if you can use whatever servers you want running Hadoop, and we provide the interface between Hadoop and the Oracle Database.</p></blockquote>
<p>In other words, Oracle is saying &#8220;We&#8217;d like to sell you a Hadoop appliance, but you can run Hadoop in some other way and we&#8217;ll coexist with it just fine.&#8221; That makes sense; refusing to coexist with Hadoop is not exactly a realistic option.</p>
<p>4. Back in June, I expressed <a href="../../../../../2011/06/02/why-you-would-want-an-appliance-and-when-you-wouldnt/">great skepticism about the idea of a Hadoop appliance</a>. There was at least partial pushback in the comment thread from both Amr Awadallah and Eric Baldeschwieler. Oops.</p>
<p>Their reasoning seems to be centered around matters of installation, administration, and general packaging.</p>
<p>5. A month ago I noted aggressive near-term plans for <a href="../../../../../2011/08/21/hadoop-evolution/">Apache Hadoop evolution</a>. As noted above, one reason this is needed is competition from folks like MapR. Also, I note that:</p>
<ul>
<li>Three years ago, Oliver Ratzesberger&#8217;s group at eBay complained that <a href="../../../../../2008/10/15/ebay-doesnt-love-mapreduce/">CPU utilization running Hadoop was at 18%</a>.</li>
<li><a href="../../../../../2011/08/21/hadoop-evolution/#comment-241679">Now Oliver uses a figure of 10-15%.</a>, and attributes an even lower figure to &#8212; I&#8217;m guessing here &#8212; Yahoo. (Another possibility might be Facebook.)</li>
<li>In between eBay became one of the biggest and most prominent users of Hadoop.</li>
</ul>
<p>The moral of eBay&#8217;s Hadoop adventures, as I see it, is neither &#8220;Hadoop sucks!&#8221; nor &#8220;Hadoop doesn&#8217;t suck!&#8221;; rather, it&#8217;s that there&#8217;s a lot of scope for Hadoop to operate differently in the future than it does today.</p>
<p><em>Similarly, whatever throughput Yahoo does or doesn&#8217;t get, it clearly has adopted Hadoop at the expense of the <a href="../../../../../2008/05/29/yahoo-scales-web-analytics-database-petabyte/">columnar-in-Postgres</a> system it previously was so proud of.</em></p>
<p>Also, there has been a claim going around that &#8212; notwithstanding NameNode&#8217;s status as a single point of Hadoop failure &#8212;  no Hadoop installation has ever lost data due to a NameNode failure. The folks at MapR beg to differ, and sent over <a href="https://issues.apache.org/jira/browse/HDFS-1539">some</a> <a href="http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201107.mbox/%3CCAFUA3X2R_wH9GGGseUVSXVNVZQ+dBjZKDn0_pmDO8U31C05tMw@mail.gmail.com%3E">links</a> that sure seem to say the opposite.</p>
<p>6. Since we&#8217;ve just established that Hadoop will change, rapidly and pretty fundamentally, what exactly is the benefit of an appliance that is &#8220;balanced&#8221; for Hadoop usage today?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/23/hadoop-appliances/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Patent nonsense: Parallel Iron/HDFS edition</title>
		<link>http://www.dbms2.com/2011/06/10/patent-nonsense-parallel-ironhdfs-edition/</link>
		<comments>http://www.dbms2.com/2011/06/10/patent-nonsense-parallel-ironhdfs-edition/#comments</comments>
		<pubDate>Fri, 10 Jun 2011 08:10:13 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[EMC]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Storage]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4633</guid>
		<description><![CDATA[Alan Scott commented with concern about Parallel Iron&#8217;s patent lawsuit attacking HDFS (Hadoop Distributed File System), filed in &#8212; where else? &#8212; Eastern Texas. The patent in question &#8212; US 7,415,565 &#8212; seems to in essence cover any shared-nothing block storage that exploits a &#8220;configurable switch fabric&#8221;; indeed, it&#8217;s more oriented to OLTP (OnLine Transaction [...]]]></description>
			<content:encoded><![CDATA[<p>Alan Scott commented with concern about <a href="../../../../../2011/06/05/hadoop-confusion-from-forrester-research/#comment-226611">Parallel Iron&#8217;s patent lawsuit attacking HDFS</a> (Hadoop Distributed File System), filed in &#8212; where else? &#8212; Eastern Texas. <a href="http://www.patents.com/us-7415565.html">The patent in question</a> &#8212; US 7,415,565 &#8212; seems to in essence cover any shared-nothing block storage that exploits a &#8220;configurable switch fabric&#8221;; indeed, it&#8217;s more oriented to OLTP (OnLine Transaction Processing) than to analytics. For example, the Background section starts: <span id="more-4633"></span></p>
<blockquote><p>The present invention relates to data storage, and more particularly, to methods and systems for a high throughput storage device.</p>
<p>A form of on-line transaction processing (OLTP) applications requiring a high number of data block reads or writes are called H-OLTP applications. A large server or mainframe or several servers typically host an H-OLTP application. Typically, these applications involve the use of a real time operating system, a relational database, optical fiber based networking, distributed communications facilities to a user community, and the application itself. Storage solutions for these applications use a combination of mechanical disk drives and cached memory under stored program control. The techniques for the storage management of H-OLTP applications can use redundant file storage algorithms on multiple disk drives, memory cache replications, data coherency algorithms, and/or load balancing.</p></blockquote>
<p>and ends</p>
<blockquote><p>It would be desirable for large capacity storage to provide sufficient throughput for high-volume, real-time applications, especially, for example in emerging applications in financial, defense, research, customer management, and homeland security areas.</p></blockquote>
<p>The independent claims are:</p>
<blockquote><p>1. A storage system comprising: one or more memory sections, including one or more memory devices including storage locations that store data, and a memory section controller that provides addresses to the memory devices, the addresses identifying storage locations for a memory device, wherein the memory devices use the provided addresses to perform a function selected from the set of reading out and writing data to/from the memory devices; and one or more switches, comprising a configurable switch fabric, that receive a data request including a data block identifier and switch the data request to one or more of the memory sections determined by applying the data block identifier to an algorithm that selectively configures operation of the switch fabric, the data block identifier identifying a set of storage locations; wherein the memory sections to which the data request was switched forward the received data block identifier to its memory section controller which maps the data block identifier to a set of addresses for the storage locations identified by the data block identifier, and provides the set of addresses to one or more of the memory section&#8217;s memory devices.<br />
&#8230;<br />
16. A method for use in a storage system, comprising: storing data in storage locations in a memory device; receiving by a switch comprising a configurable switch fabric, a data request including a data block identifier; the switch switching the data request to a memory section including the memory device determined by applying the data block identifier to an algorithm that selectively configures operation of the switch, the data block identifier identifying a set of storage locations in the memory device; forwarding the received data block identifier to a memory section controller; the memory section controller mapping the data block identifier to a set of addresses for the storage locations identified by the data block identifier; and the memory section controller providing the set of addresses to the memory device; and the memory device using the provided addresses to perform a function selected from the set of reading and writing data to/from the memory device.<br />
&#8230;<br />
26. A storage system, comprising: means for storing, including: means for storing data in storage locations, the means for storing data in storage locations including means for reading data stored in the storage locations using an address; means for controlling the means for storing, the means for controlling including: means for mapping a data block identifier to a set of addresses, means for providing the addresses to the means for storing data in storage locations, the addresses identifying storage locations; means for switching, including means for receiving a data request including a data block identifier; means for switching the data request based on the data block identifier to a means for storing determined by applying the data block identifier to an algorithm that selectively configures operation of the means for switching, the data block identifier identifying a set of storage locations in the means for storing data in storage locations; and means for forwarding the received data block identifier to the means for storing.</p>
<p>27. A storage hub comprising a memory section, including a memory device including storage locations that store data, and a memory section controller that provides an address to the memory device, the address identifying a storage location, wherein the memory device uses the provided address to write data into the memory device; and a switch, comprising a configurable switch fabric, that receives a data request including a data block identifier and transmits the data request to the memory section determined by applying the data block identifier to an algorithm that selectively configures operation of the switch fabric, and that receives write data associated with the data request and transmits the write data to the determined memory section; wherein the memory section forwards the received data block identifier to the memory section controller, which determines from the data block identifier the address of a storage location and provides the address to the memory device, and the memory device stores the write data at the address.</p></blockquote>
<p>My one thought that could have led to the patent making sense was that maybe the term &#8220;configurable switch fabric&#8221; was defined in some particularly limited way. But noooo. Indeed, the term is not defined in the patent&#8217;s body at all; rather, the patent says (somewhat ungrammatically):</p>
<blockquote><p>The switches 22 may be any type of switch using any type of switch fabric, such as, for example, a time division multiplexed fabric or a space division multiplexed fabric. As used herein, the term &#8220;switch fabric&#8221; the physical interconnection architecture that directs data from an incoming interface to an outgoing interface. For example, the switches 22 may be a Fibre Channel switch, an ATM switch, a switched fast Ethernet switch, a switched FDDI switch, or any other type of switch. The switches 22 may also include a controller (not shown) for controlling the switch.</p></blockquote>
<p>I would be shocked if this patent held up upon reexamination. (If it did, EMC would pretty much be out of business, or at least vulnerable to a considerable cashectomy.) This is a particularly strong example of my belief that <a href="../../../../../2010/03/23/software-innovation-patent/">performance-enhancement software patents are always bogus</a>. What&#8217;s more, it seems strange to worry about this patent&#8217;s effect on HDFS in any case, because if you&#8217;re that much of a patent wimp, you probably don&#8217;t want to run afoul of <a href="../../../../../2010/02/11/google-mapreduce-patent/">Google&#8217;s (also bogus) MapReduce patent</a> in the first place.</p>
<p>On the whole, I&#8217;m somewhat more sympathetic to the idea of <a href="../../../../../2011/05/14/hadoop-mapreduce-data-storage-management/">replacing HDFS underneath Hadoop</a> than my clients at Cloudera or IBM would wish me to be. But the Parallel Iron patent is not a serious reason in support of such a change.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/06/10/patent-nonsense-parallel-ironhdfs-edition/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Alternatives for Hadoop/MapReduce data storage and management</title>
		<link>http://www.dbms2.com/2011/05/14/hadoop-mapreduce-data-storage-management/</link>
		<comments>http://www.dbms2.com/2011/05/14/hadoop-mapreduce-data-storage-management/#comments</comments>
		<pubDate>Sat, 14 May 2011 05:00:52 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[DataStax]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadapt]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[MongoDB and 10gen]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Parallelization]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4438</guid>
		<description><![CDATA[There&#8217;s been a flurry of announcements recently in the Hadoop world. Much of it has been concentrated on Hadoop data storage and management. This is understandable, since HDFS (Hadoop Distributed File System) is quite a young (i.e. immature) system, with much strengthening and Bottleneck Whack-A-Mole remaining in its future. Known HDFS and Hadoop data storage [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s been a flurry of announcements recently in the Hadoop world. Much of it has been concentrated on Hadoop data storage and management. This is understandable, since HDFS (Hadoop Distributed File System) is quite a young (i.e. immature) system, with much strengthening and <a href="../../../../../2009/08/21/bottleneck-whack-a-mole/">Bottleneck Whack-A-Mole</a> remaining in its future.</p>
<p>Known HDFS and Hadoop data storage and management issues include but are not limited to:</p>
<ul>
<li>Hadoop is run by a master node, and specifically a namenode, that&#8217;s a single point of failure.</li>
<li>HDFS compression could be better.</li>
<li>HDFS likes to store three copies of everything, whereas many DBMS and file systems are satisfied with two.</li>
<li>Hive (the canonical way to do SQL joins and so on in Hadoop) is slow.</li>
</ul>
<p>Different entities have different ideas about how such deficiencies should be addressed.  <span id="more-4438"></span></p>
<p>For most practical purposes, <strong>Yahoo&#8217;s</strong> and <strong>IBM&#8217;s</strong> views about Hadoop have converged. Yahoo and IBM both believe that Hadoop data storage should be advanced solely through the <strong>Apache</strong> Hadoop open source process. In particular:</p>
<ul>
<li>IBM and Yahoo both talk of the great undesirability of Hadoop &#8220;forking&#8221; like Unix did.</li>
<li>Yahoo appeared on stage at IBM&#8217;s analyst event this week to reinforce the meeting-of-the-minds, even though there&#8217;s no IBM/Yahoo customer relationship involved.</li>
<li>IBM has disclaimed any intention of providing its own Hadoop distribution, but even so is committed to selling lots of <a href="http://www-01.ibm.com/software/data/bigdata/enterprise.html">IBM InfoSphere BigInsights</a>, which incorporates Apache Hadoop.*</li>
<li><a href="http://developer.yahoo.com/blogs/hadoop/posts/2011/01/announcement-yahoo-focusing-on-apache-hadoop-discontinuing-the-yahoo-distribution-of-hadoop/">Yahoo has stopped offering its own Hadoop distribution</a>, period.</li>
</ul>
<p><em>*IBM is emphatic about ruling out marketing terms whose connotation it doesn&#8217;t like. IBM&#8217;s Hadoop distribution isn&#8217;t a &#8220;distribution,&#8221; because that might make it sound too proprietary; IBM&#8217;s Oracle emulation offering <a href="../../../../../2009/04/24/ibms-oracle-emulation-strategy-reconsidered/#comment-118444">isn&#8217;t an &#8220;emulation&#8221; offering</a>, because that might make it sound too slow; and <a href="../../../../../2009/05/13/ibm-system-s-infosphere-streams-processing/">IBM&#8217;s CEP product InfoSphere Streams isn&#8217;t a &#8220;CEP&#8221; product</a>, because that might make it sound too non-functional.</em></p>
<p><strong>Cloudera</strong> can probably be regarded as part of the Yahoo/IBM camp, some stern looks from IBM in Cloudera&#8217;s direction notwithstanding. <a href="../../../../../2010/06/30/cloudera-enterprise-hadoop-evolution/">Cloudera Enterprise</a> &#8212; also an embrace-and-extend offering &#8212; remains the obvious choice for enterprises Hadoop users; meanwhile, nobody has convinced me of any bogosity in <a href="http://www.cloudera.com/hadoop/">the &#8220;no forking&#8221; claim Cloudera makes for its free/open source Hadoop distribution</a>. Indeed, when I visited Cloudera a couple of weeks ago, Mike Olson showed me a slide demonstrating that Cloudera might be supplanting Yahoo as the biggest ongoing contributor to Apache Hadoop.</p>
<p><strong>EMC&#8217;s Data Computing Division, </strong>nee&#8217; <strong>Greenplum,</strong> made a lot of Hadoop noise this week. Unlike Yahoo, IBM, and Cloudera, EMC really is forking Hadoop. <a href="../../../../../2011/04/05/comments-on-emc-greenplum/">I&#8217;m not talking with the EMC/Greenplum folks</a> these days, but the whole thing was covered from various angles by <a href="http://www.computerworld.com/s/article/9216541/EMC_unveils_Hadoop_appliance_BI_software">Lucas Mearian</a>, <a href="http://www.informationweek.com/news/software/info_management/229403178">Doug Henschen</a>, <a href="http://gigaom.com/cloud/emc-hadoop/">Derrick Harris</a>, and <a href="http://davidmenninger.ventanaresearch.com/2011/05/12/emc-enters-elephant-race-with-hadoop/">Dave Menninger</a>.</p>
<p>Another option is to entirely replace HDFS with a DBMS, whether distributed or just instanced at each node. <strong>DataStax</strong> is doing that with <a href="../../../../../2011/03/23/datastax-cassandrafs-hadoop-brisk/">Cassandra-based Brisk</a>; <strong><a href="../../../../../2011/03/23/hadapt-commercialized-hadoopdb/">Hadapt</a></strong> plans to do that with PostgreSQL and VectorWise <em>(edit: As per the comment below, Hadapt only plans a partial replacement of HDFS);</em> and <a href="../../../../../2011/04/17/netezza-twinfin-i-class-overview/">Netezza&#8217;s analytic platform</a> has a Hadoop-over-<strong>Netezza</strong> option as well. Mike Olson objects to such implementations being called &#8220;Hadoop&#8221;; but trademark issues aside, those vendors plan to support a broad variety of Hadoop-compatible tools. <strong>Aster Data</strong> has long taken that approach one step further, by offering an enhanced version of MapReduce &#8212; aka <a href="../../../../../2009/12/02/mapreduce-for-complex-analytics-webina/">SQL/MapReduce</a> &#8212; over its nCluster DBMS. And <a href="../../../../../2011/04/04/the-mongodb-story/"><strong>10gen</strong> offers a more primitive form of MapReduce with MongoDB</a>, but probably wouldn&#8217;t position it as addressing a &#8220;MapReduce market&#8221; at all.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/05/14/hadoop-mapreduce-data-storage-management/feed/</wfw:commentRss>
		<slash:comments>21</slash:comments>
		</item>
		<item>
		<title>In-memory, parallel, not-in-database SAS HPA does make sense after all</title>
		<link>http://www.dbms2.com/2011/04/21/sas-hpa-does-make-sense-after-all/</link>
		<comments>http://www.dbms2.com/2011/04/21/sas-hpa-does-make-sense-after-all/#comments</comments>
		<pubDate>Thu, 21 Apr 2011 08:23:41 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[SAS Institute]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4343</guid>
		<description><![CDATA[I talked with SAS about its new approach to parallel modeling. The two key points are: SAS no longer plans to go as far with in-database modeling as it previously intended. Rather, SAS plans to run in RAM on MPP DBMS appliances, exploiting MPI (Message Passing Interface). The whole thing is called SAS HPA (High-Performance [...]]]></description>
			<content:encoded><![CDATA[<p>I talked with SAS about its <a href="../../../../../2011/03/13/so-how-many-columns-can-a-single-table-have-anyway/">new approach to parallel modeling</a>. The two key points are:</p>
<ul>
<li><strong>SAS no longer plans to go as far with in-database modeling as it previously intended.</strong></li>
<li>Rather, <strong>SAS plans to run in RAM on MPP DBMS appliances,</strong> exploiting MPI (Message Passing Interface).</li>
</ul>
<p>The whole thing is called SAS HPA (High-Performance Analytics), in an obvious reference to HPC (High-Performance Computing). It will run initially on RAM-heavy appliances from Teradata and EMC Greenplum.</p>
<p>A lot of what&#8217;s going on here is that SAS found it annoyingly difficult to parallelize modeling within the framework of a massively parallel DBMS such as Teradata. Notes on that aspect include:</p>
<ul>
<li><strong>SAS wasn&#8217;t exploiting the capabilities of individual DBMS to their fullest;</strong> rather, it was looking for an approach that would work across multiple brands of DBMS. Thus, for example, the fact that Aster&#8217;s analytic platform architecture is more flexible or powerful than Teradata&#8217;s didn&#8217;t help much with making SAS run within the Aster nCluster database.</li>
<li>Notwithstanding everything else, <strong>SAS did make a certain set of modeling procedures run in-database.</strong></li>
<li><strong>SAS&#8217; previous plans to run in-database modeling in Aster and/or Netezza DBMS may never come to fruition.</strong></li>
</ul>
<p><span id="more-4343"></span>SAS&#8217; problems developing in-database modeling stem from, in essence, the limitations of UDFs (User Defined Functions). So why weren&#8217;t, for example, <a href="../../../../../2009/08/02/teradata-13-focuses-on-advanced-analytic-performance/">Teradata&#8217;s 2009 enhancements to its UDF capabilities</a> enough? The clearest example SAS gave me is that, while <a href="../../../../../2011/03/13/so-how-many-columns-can-a-single-table-have-anyway/">database tables are commonly limited to something on the order of 1000 columns</a> (their figure as well as mine), SAS might need 50-100,000 columns. One reason seems to be interactions between variables; SAS used the word &#8220;multiplied&#8221; a few times, but even so was coy about whether this could simply be regarded as quadratic terms in a regression. Another reason seems to be that in some cases, every value in a column spawns a new column in an intermediate table/array; indeed, this seems to be going on in the previously discussed case of <a href="../../../../../2011/04/06/so-can-logistic-regression-be-parallelized-or-not/">logistic regression</a>.</p>
<p>SAS code will be launched by the DBMS/data warehouse appliances, so potentially it can run under their native workload management. Teradata presumably has enough workload management richness to exploit that; EMC Greenplum, as of my August 2010 notes, probably did not.</p>
<p>SAS was gracious enough to let me post its slide deck, in both <a href="http://www.monash.com/uploads/SAS_HPA_2011-Shorter.pdf">shorter</a> and <a href="http://www.monash.com/uploads/SAS_HPA_2011-Longer.pdf">longer</a> versions. Due to a technical glitch during the call, I neither looked at the slides nor took notes. I think the biggest loss from those difficulties is that I didn&#8217;t learn what the futures at the end of the longer deck were all about.</p>
<p><strong><em>Related links</em></strong></p>
<ul>
<li><a href="http://www.dbms2.com/2011/04/21/application-areas-for-sas-hpa/">Application areas for SAS HPA</a> (April, 2011)</li>
<li><a href="../../../../../2010/05/15/further-clarifying-in-database-mpp-sas/">SAS&#8217; MPP story as of May, 2010</a></li>
<li><a href="../../../../../2007/10/10/sas-goes-mpp-on-teradata-first/">SAS&#8217; plans to run in-database on Teradata</a> (October, 2007)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/04/21/sas-hpa-does-make-sense-after-all/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Unpacking the EMC Greenplum Q1 sales disaster rumors</title>
		<link>http://www.dbms2.com/2011/04/16/unpacking-the-emc-greenplum-q1-sales-disaster-rumors/</link>
		<comments>http://www.dbms2.com/2011/04/16/unpacking-the-emc-greenplum-q1-sales-disaster-rumors/#comments</comments>
		<pubDate>Sat, 16 Apr 2011 19:21:11 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4304</guid>
		<description><![CDATA[A well-connected tipster believes: EMC Greenplum&#8217;s* revenue target for Q1 had been $35 million. Actual EMC Greenplum revenue for Q1 was $3 million, or maybe it was $8 million. EMC Greenplum had 75 sales teams trying to generate this revenue. In the past I might have called Greenplum for clarification, but they&#8217;re not knocking themselves [...]]]></description>
			<content:encoded><![CDATA[<p>A well-connected tipster believes:</p>
<ul>
<li>EMC Greenplum&#8217;s* revenue target for Q1 had been $35 million.</li>
<li>Actual EMC Greenplum revenue for Q1 was $3 million, or maybe it was $8 million.</li>
<li>EMC Greenplum had 75 sales teams trying to generate this revenue.</li>
</ul>
<p>In the past I might have called Greenplum for clarification, but <a href="http://www.dbms2.com/2011/04/05/comments-on-emc-greenplum/">they&#8217;re not knocking themselves out to inform me</a> these days, nor to inspire me with confidence in what they say.  <span id="more-4304"></span></p>
<p><em>*I&#8217;m in the large majority that refers to the EMC Data Computing Division as &#8220;Greenplum&#8221; or &#8220;EMC Greenplum.&#8221;</em></p>
<p>Let&#8217;s unpack that a bit.</p>
<p>First, it makes a huge difference whether we&#8217;re talking about:</p>
<ul>
<li>All EMC sales Greenplum can be said to influence.</li>
<li>All Greenplum software and appliance hardware.</li>
<li>New Greenplum software and subscription recognized revenue also.</li>
</ul>
<p>Indeed, pre-EMC Greenplum got a considerable fraction of its revenue on a <a href="http://www.dbms2.com/2009/10/18/greenplum-customer-notes/">subscription</a> basis. One implication is that &#8220;license revenue&#8221; and &#8220;new-sale license revenue&#8221; aren&#8217;t the same figure. Another is that the difference in immediate revenue between an appliance sale and a software-only subscription is drastic (8X alone for the difference between quarterly subscription and perpetual license fee, times another factor for the inclusion of hardware).</p>
<p>I&#8217;m also having a bit of trouble swallowing that supposed $35 million target. If we recall that the quota for the sum is always less than the sum of the quotas, we&#8217;re talking about perhaps a $5-600K quota per team. That could be reasonable or even low for a fully productive team that&#8217;s selling hardware and software together (even in a Q1). But if there really are anywhere near that many Greenplum sales teams, then a large majority are really new. And data warehouse appliances (more so than just analytic DBMS) have long sales cycles.</p>
<p><strong>Bottom line:</strong> I haven&#8217;t heard anything that suggests <a href="http://www.dbms2.com/2010/10/13/emc-greenplum-data-computing-appliance/">EMC Greenplum&#8217;s storage-vs.-DBMS strategic war</a> is going well. But I also wouldn&#8217;t assume things are quite as grim as rumors suggest.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/04/16/unpacking-the-emc-greenplum-q1-sales-disaster-rumors/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Comments on EMC Greenplum</title>
		<link>http://www.dbms2.com/2011/04/05/comments-on-emc-greenplum/</link>
		<comments>http://www.dbms2.com/2011/04/05/comments-on-emc-greenplum/#comments</comments>
		<pubDate>Wed, 06 Apr 2011 00:57:28 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data integration and middleware]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[SAS Institute]]></category>
		<category><![CDATA[Solid-state memory]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4163</guid>
		<description><![CDATA[I am annoyed with my former friends at Greenplum, who took umbrage at a brief sentence I wrote in October, namely &#8220;eBay has thrown out Greenplum&#8220;.  Their reaction included: EMC Greenplum no longer uses my services. EMC Greenplum no longer briefs me. EMC Greenplum reneged on a commitment to fund an effort in the area [...]]]></description>
			<content:encoded><![CDATA[<p>I am annoyed with my former friends at Greenplum, who took umbrage at a brief sentence I wrote in October, namely &#8220;<a href="../../../../../2010/10/06/ebay-followup-greenplum-out-teradata-10-petabytes-hadoop-has-some-value-and-more/">eBay has thrown out Greenplum</a>&#8220;.  Their reaction included:</p>
<ul>
<li>EMC Greenplum no longer uses my services.</li>
<li>EMC Greenplum no longer briefs me.</li>
<li>EMC Greenplum reneged on a commitment to fund an effort in the area of privacy.</li>
</ul>
<p>The last one really hurt, because in trusting them, I put in quite a bit of effort, and discussed their promise with quite a few other people.</p>
<p><em><span id="more-4163"></span>Yes, that five-word sentence really seems to have been the problem. I&#8217;ve heard that from more than one source.</em></p>
<p>I think the rest is overwrought too, and not just because I regret the loss of revenue, or of what seemed to be a warm, friendly, hug-laden, and sushi-intensive relationship with Scott Yara and some other folks. At various times, on the subject of its eBay installation:</p>
<ul>
<li>Greenplum overoptimistically told me that eBay&#8217;s Teradata installation would be replaced with Greenplum gear.</li>
<li>Greenplum exaggerated the pace of its eBay installation; unfortunately, I believed them, and later had to publish a <a href="../../../../../2009/03/02/named-customer-silliness/">retraction</a>.</li>
<li>Greenplum neglected to tell me when eBay had its Greenplum equipment removed.</li>
</ul>
<p>Now the same Scott Yara who hovered over me for months in marketing micromanagement before I broke the news of <a href="../../../../../2009/04/30/ebays-two-enormous-data-warehouses/">the Greenplum and Teradata eBay installations</a> &#8212; he could do that because the whole discussion started out under NDA &#8212; doesn&#8217;t answer my email. Evidently, Greenplum thinks it&#8217;s OK to repeatedly be misleading, but doesn&#8217;t think it&#8217;s OK if my nuance is one they disagree with.</p>
<p><em>The most entertaining example I recall of Greenplum BS was when CTO Luke Lonergan told 50+ academics at the 2009 XLDB that Greenplum had 10 customers with half a petabyte each of data. I followed him out of the room and said &#8220;10 customers &#8212; half a petabyte each &#8212; I presume that&#8217;s for sufficiently small values of &#8216;one half&#8217;?&#8221; We eventually settled on a value of &#8220;one half&#8221; in the 0.2 range &#8212; which is actually a pretty impressive claim in itself.</em></p>
<p>Be all that as it may, EMC Greenplum has a couple of press releases out on which I&#8217;ve been asked to comment. One is a deal with <a href="http://www.greenplum.com/news/345/388/SAS-to-offer-high-performance-analytics-on-EMC-Greenplum-database-appliance/d,press-releases/">SAS</a>, less impressive than SAS&#8217; deals with Teradata and Aster Data in that it offers no actual in-database modeling. Yes, it sounds like modeling on the same nodes where the data sits, but it sounds less desirable than true in-database modeling in that:</p>
<ul>
<li>You can only get great performance if the amount of data modeled is small enough to fit into RAM.</li>
<li>Integration with other database processing, MapReduce, etc. may be limited.</li>
</ul>
<p>Also, <a href="http://www.greenplum.com/news/346/388/EMC-Expands-Greenplum-Big-Data-Analytics-Appliance-Family/d,press-releases/">EMC Greenplum expanded its line of appliances</a>, to include one that seems optimized for price-per-terabyte and one with solid-state drives. So far, that&#8217;s very standard stuff. There&#8217;s also a new data loading appliance, which seems to catch up with the Aster Data&#8217;s 2008 strategy of having <a href="../../../../../2008/09/05/mpp-data-warehouse-nodes/">separate nodes for bulk loading</a>.</p>
<p><em>Ironically, when <a href="../../../../../2010/10/10/partnering-with-cloudera/">Aster moved away from a total reliance on that strategy,</a> it was becoming more Greenplum-like. As is so often the case, it seems that different vendors&#8217; feature sets are converging.</em></p>
<p>Meanwhile, the last I heard about Greenplum&#8217;s previously very strategic <a href="../../../../../2010/04/12/greenplumchorus/">Chorus</a> effort is that it&#8217;s being revamped. I don&#8217;t get the impression it&#8217;s nearly as central to Greenplum&#8217;s strategy as it used to be.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/04/05/comments-on-emc-greenplum/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Updating our vendor client disclosures</title>
		<link>http://www.dbms2.com/2011/02/28/updating-our-vendor-client-disclosures/</link>
		<comments>http://www.dbms2.com/2011/02/28/updating-our-vendor-client-disclosures/#comments</comments>
		<pubDate>Mon, 28 Feb 2011 08:03:39 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[About this blog]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[MarkLogic]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[QlikTech and QlikView]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Schooner Information Technology]]></category>
		<category><![CDATA[Splunk]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Tableau Software]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[dbShards and CodeFutures]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=3906</guid>
		<description><![CDATA[From time to time, I disclose our vendor client lists. Another iteration is below. To be clear: This is a list of Monash Advantage members. All our vendor clients are Monash Advantage members, unless &#8230; &#8230; we work with them primarily in their capacity as technology users. (A large fraction of our user clients happen [...]]]></description>
			<content:encoded><![CDATA[<p>From time to time, I <a href="http://www.monashreport.com/2010/01/06/updating-our-disclosures/">disclose</a> our vendor client lists. Another iteration is below. To be clear:</p>
<ul>
<li>This is a list of <a href="http://www.monash.com/advantage.html"><strong><em>Monash Advantage</em></strong></a> members.</li>
<li>All our vendor clients are <strong><em>Monash Advantage</em></strong> members, unless &#8230;</li>
<li>&#8230; we work with them primarily in their capacity as technology users. (A large fraction of our user clients happen to be SaaS vendors.)</li>
<li>We do not usually disclose our user clients.</li>
<li>We do not usually disclose our venture capital clients, nor those who invest in publicly-traded securities.</li>
<li>Included in the list below are two expired <strong><em>Monash Advantage</em></strong> members who haven&#8217;t said they will renew, as mentioned in <a href="http://www.strategicmessaging.com/money-analyst-attention-and-implied-analyst-endorsement/2011/02/28/">my recent post on analyst bias</a>. (You can probably imagine a couple of reasons for that obfuscation.)</li>
</ul>
<p>With that said, our vendor client disclosures at this time are:</p>
<ul>
<li>Aster Data</li>
<li>Cloudera</li>
<li>CodeFutures/dbShards</li>
<li>Couchbase</li>
<li>EMC/Greenplum</li>
<li>Endeca</li>
<li>IBM/Netezza</li>
<li>Infobright</li>
<li>Intel</li>
<li>MarkLogic</li>
<li>ParAccel</li>
<li>QlikTech</li>
<li>salesforce.com/database.com</li>
<li>SAND Technology</li>
<li>SAP/Sybase</li>
<li>Schooner Information Technology</li>
<li>Skytide</li>
<li>Splunk</li>
<li>Teradata</li>
<li>Vertica</li>
</ul>
<p><span id="more-3906"></span>That list includes the two I&#8217;m obfuscating, plus one more who just emailed to say a signed renewal contract is arriving this week. It does not include others who, less concretely, have said they will sign up soon.</p>
<p>Also, I guess there&#8217;s a bit of a gray area for Tableau. As far as I&#8217;m concerned, I&#8217;m doing <a href="http://www.dbms2.com/2011/02/12/upcoming-webinar-on-investigative-analytics/">an upcoming co-sponsored webinar</a> just for <em><strong>Monash Advantage</strong></em> member Aster Data. Indeed, I declined to contract with or bill Tableau directly for its share,  because I had no good way to do that paperwork. But even so, Tableau is a cosponsor, was involved in the planning discussions and, behind the scenes, is surely footing part of the bill.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/02/28/updating-our-vendor-client-disclosures/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Comments on the 2011 Forrester Wave for Enterprise Data Warehouse Platforms</title>
		<link>http://www.dbms2.com/2011/02/11/comments-on-the-2011-forrester-wave-for-enterprise-data-warehouse-platforms/</link>
		<comments>http://www.dbms2.com/2011/02/11/comments-on-the-2011-forrester-wave-for-enterprise-data-warehouse-platforms/#comments</comments>
		<pubDate>Fri, 11 Feb 2011 06:09:58 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=3838</guid>
		<description><![CDATA[The Forrester Wave: Enterprise Data Warehouse Platforms, Q1 2011 is now out,* hot on the heels of the Gartner Magic Quadrant. Unfortunately, this particular Forrester Wave is riddled with inaccuracy.  *At the time of this writing, I don&#8217;t have a link to a free version of the full report. At the time of this writing, [...]]]></description>
			<content:encoded><![CDATA[<p>The Forrester Wave: Enterprise Data Warehouse Platforms, Q1 2011 is now out,* hot on the heels of the <a href="../../../../../2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/">Gartner Magic Quadrant</a>. Unfortunately, this particular Forrester Wave is riddled with inaccuracy.  <span id="more-3838"></span></p>
<p><em>*At the time of this writing, I don&#8217;t have a link to a free version of the full report. At the time of this writing, the 2011 Forrester Wave for Enterprise Data Warehouse Platforms graphic can be found <a href="http://www.teradata.com/t/News-Releases/2011/Independent-Analyst-Firm-Declares-Teradata-the-Most-Scalable-Flexible-Cloud-Capable-EDW-Solution-in-Todays-Market/">here</a>.</em></p>
<p>One example of the confusion pervading the 2011 Forrester Wave for Enterprise Data Warehouse Platforms lies in a list of three supposed trends.</p>
<ul>
<li>The Forrester Wave somehow<strong> conflates SaaS and MPP processing, </strong>tying them both to the term &#8220;cloud.&#8221; (In reality, the SaaS/cloud and MPP/cloud equations depend on two rather different word-senses for &#8220;cloud&#8221;.)</li>
<li>The Forrester Wave then conflates EDWs, analytic computing systems, and application servers, the latter perhaps because of <a href="../../../../../2009/10/30/aster-data-application-server-ncluster/">the &#8220;data-application server&#8221; product category name Aster Data floated</a>. The Forrester Wave also <strong>conflates investigative analytics with low-latency operational processes</strong> that exploit investigative analytics&#8217; results.</li>
<li>The Forrester Wave then<strong> conflates social media, &#8220;unstructured data&#8221; </strong>(by which it seems at one point to mean text and at another point to also mean logs), <strong>solid-state drives, and a whole bunch of other technologies</strong> (especially but not only low-latency ones) into another supposed single trend.</li>
</ul>
<p>Some of the sillier specific claims in the Forrester Wave for Enterprise Data Warehouse Platforms include:</p>
<ul>
<li><strong>According to the 2011 Forrester Wave for Enterprise Data Warehouse Platforms, Netezza has hybrid row/columnar persistence, </strong>while most other vendors cited don&#8217;t. To recycle an old Larry Ellison joke, somebody obviously has a better pharmacist than I do. It&#8217;s tough to imagine how anybody who understands <a href="../../../../../2011/02/06/columnar-compression-database-storage/">columnar storage</a> could at all believe Netezza currently offers it.</li>
<li><strong>According to the 2011 Forrester Wave for Enterprise Data Warehouse Platforms, EMC/Greenplum is limited in the hardware it supports.</strong> Actually, Greenplum runs on pretty much any commodity Intel hardware, just like any other software-only DBMS does.</li>
<li><strong>According to the 2011 Forrester Wave for Enterprise Data Warehouse Platforms, Teradata, Sybase, and others are differentiated in their Hadoop support.</strong> Actually, Hadoop support of various forms is a checkmark item for analytic DBMS vendors.</li>
<li><strong>According to the 2011 Forrester Wave for Enterprise Data Warehouse Platforms, Oracle, Teradata, and others are differentiated in their cloud/SaaS support.</strong> Actually, having some kind of public cloud offering is a checkmark item; use of same is quite a different matter.</li>
<li><strong>The 2011 Forrester Wave for Enterprise Data Warehouse Platforms calls out EMC Greenplum for special praise in mixed workload management.</strong> <a href="../../../../../2010/08/09/emc-greenplum/">Greenplum will probably be fine in concurrency and workload management</a>, but implying it&#8217;s a leader is overstated.</li>
<li><strong>According to the 2011 Forrester Wave for Enterprise Data Warehouse Platforms, Vertica has not made a significant investment in real-time technologies</strong> (despite doing a lot of work with StreamBase and selling a lot into the algorithmic trading market). I disagree.</li>
<li>Also <strong>according to the 2011 Forrester Wave for Enterprise Data Warehouse Platforms, Vertica has not made a significant investment in in-memory technology,</strong> despite the fact that all its updates pass through Vertica&#8217;s in-memory, query-responsive &#8220;Write-Optimized Store.&#8221; I disagree.</li>
</ul>
<p>Even leaving aside the errors that obviously riddled the Forrester Wave for Enterprise Data Warehouse Platforms&#8217; underlying 56-row matrix, I dispute the whole premise of the exercise. I&#8217;m not a big fan of overarching scorecard-based rankings, because the right choice of product varies so much by use case. For example:</p>
<ul>
<li>If you&#8217;re a smallish enterprise who can realistically do OLTP and data warehousing on the same instance of your DBMS, Oracle and Microsoft blow away everybody else mentioned.</li>
<li>If <a href="../../../../../2011/02/06/columnar-compression-database-storage/">columnar compression</a> methods work really well for your use case, Vertica or maybe Oracle Exadata might shine.</li>
<li>If you typically only retrieve a few columns from a wide table, so that columnar I/O is what you care most about, Vertica, Sybase, or even EMC Greenplum might shine. (The decidedly non-columnar Netezza and Oracle Exadata approaches to predicate pushdown might or might not excel as well.)</li>
<li>If your database is above a certain size, some of the alternatives (such as Sybase IQ or non-Exadata Oracle) should be taken off the table.</li>
<li>If you have a highly concurrent mixed workload, nobody else is as proven as Teradata.</li>
<li>If you don&#8217;t want to invest much in database administration, Oracle is about the last vendor you should consider, and Netezza might be the first.</li>
</ul>
<p>More excusable is some terminological confusion in the Forrester Wave for Enterprise Data Warehouse Platforms, the essence of which is this:</p>
<p>Notwithstanding its name, <strong>the Forrester Wave for Enterprise Data Warehouse Platforms isn&#8217;t just talking about what are called enterprise data warehouses (EDWs), but rather a broader range of analytic database management systems and use cases</strong>. These include:</p>
<ul>
<li>What are classically called operational data stores (the focus on &#8220;Next-Best Actions&#8221; suggests those are included).</li>
<li><a href="../../../../../2011/01/24/analytic-computing-system/">Analytic platforms/analytic computing systems</a> (the high-level mentions of MapReduce, predictive modeling integration, and so on suggest they&#8217;re in too).</li>
<li>Reporting data marts (some of the vendors cited might not make the minimum count threshold unless those are included too).</li>
</ul>
<p>Indeed, the definition provided of &#8220;EDW&#8221; basically boils down to &#8220;runs SQL, is tuned in some way for analytics, has a cost-based or other query optimizer, and isn&#8217;t tied to a specific application.&#8221;</p>
<p>Frankly, I think <a href="../../../../../2011/01/24/do-we-still-need-edws/">classical EDWs have their problems</a>, and are not necessarily the best way to address the numerous <a href="../../../../../2011/01/03/the-six-useful-things-you-can-do-with-analytic-technology/">use cases for analytic DBMS technology</a>. And <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">product category names are commonly problematic</a> anyhow. So I don&#8217;t much mind this overloading of the EDW term. But in one respect I think the Forrester Wave overdoes its inclusiveness &#8212; it includes things that aren&#8217;t actually DBMS, and then marks down just about every product cited for being a real DBMS rather than some sort of above-DBMS layer, at least when those things are sold by SAP. I&#8217;ve never agreed with the idea that SAP&#8217;s BW/BWA products should be included in a comparison with the other products cited in the Forrester Wave at all, and SAP HANA doesn&#8217;t change my mind.</p>
<p>One last thing &#8212; I&#8217;m suspicious of the Forrester Wave for Enterprise Data Warehouse Platforms&#8217; comments on <strong>data warehouse appliance prices.</strong> However, they are hard to judge without knowing whether Forrester was using the term &#8220;raw data&#8221; in its usual sense, or actually means &#8220;user data&#8221;, and also without knowing whether Forrester is talking about list or &#8220;street&#8221; pricing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/02/11/comments-on-the-2011-forrester-wave-for-enterprise-data-warehouse-platforms/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

