<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Data warehouse appliances</title>
	<atom:link href="http://www.dbms2.com/category/database-management-system/data-warehouse-appliances/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Wed, 08 Feb 2012 22:51:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Comments on the analytic DBMS industry and Gartner&#8217;s Magic Quadrant for same</title>
		<link>http://www.dbms2.com/2012/02/08/gartner-magic-quadrant-data-warehouse-2011-2012/</link>
		<comments>http://www.dbms2.com/2012/02/08/gartner-magic-quadrant-data-warehouse-2011-2012/#comments</comments>
		<pubDate>Wed, 08 Feb 2012 17:17:32 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Exasol]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Kognitio]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[illuminate Solutions]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5926</guid>
		<description><![CDATA[This year&#8217;s Gartner Magic Quadrant for Data Warehouse Database Management Systems is out.* I shall now comment, just as I did on the 2010, 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants, to varying extents. To frame the discussion, let me start by saying: In general, I regard Gartner Magic [...]]]></description>
			<content:encoded><![CDATA[<p>This year&#8217;s Gartner Magic Quadrant for Data Warehouse Database Management Systems is out.* I shall now comment, just as I did on the <a href="http://www.dbms2.com/2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/">2010</a>, <a href="../../../../../2010/02/10/gartner-magic-quadrant-data-warehouse-2009-2010/">2009</a>, <a href="../../../../../2009/01/12/gartners-2008-data-warehouse-database-management-system-magic-quadrant-is-out/">2008</a>, <a href="../../../../../2007/10/19/gartner-2007-magic-quadrant-for-data-warehouse-database-management-systems/">2007</a>, and <a href="../../../../../2006/10/03/vendor-segmentation-for-data-warehouse-dbms/">2006</a> Gartner Data Warehouse Database Management System Magic Quadrants, to varying extents. To frame the discussion, let me start by saying:</p>
<ul>
<li>In general, I regard Gartner Magic Quadrants as a bad use of good research.</li>
<li>Illustrating the uselessness of &#8212; or at least poor execution on &#8212; the  overall quadrant metaphor, a large majority of the vendors covered are  lined up near the line x = y, each outpacing the one below in both of  the quadrant&#8217;s dimensions.</li>
<li>I find fewer specifics to disagree with in this Gartner Magic Quadrant than in previous year&#8217;s versions. Two factors jump to mind as possible reasons:
<ul>
<li>This year&#8217;s Gartner Magic Quadrant for Data Warehouse Database Management Systems is somewhat less ambitious than others; while it gives as much company detail as its predecessors, it doesn&#8217;t add as much discussion of overall trends. So there&#8217;s less to (potentially) disagree with.</li>
<li><a href="http://www.dbms2.com/2010/12/28/evolving-definitions-and-technology-categories-for-2011/">Merv Adrian is now at Gartner</a>.</li>
</ul>
</li>
<li>Whatever the problems may be with Gartner&#8217;s approach, the whole thing comes out better than do <a href="http://www.dbms2.com/2011/02/11/comments-on-the-2011-forrester-wave-for-enterprise-data-warehouse-platforms/">Forrester&#8217;s failed imitations</a>.</li>
</ul>
<p><em>*At the time of this posting, I don&#8217;t yet have a link. However, I expect that to change quickly, and I plan to edit this paragraph accordingly. If nothing else, I hope people will drop links into the comment thread. </em></p>
<p>Specific company comments, roughly in line with Gartner&#8217;s rough single-dimensional rank ordering, include: <span id="more-5926"></span></p>
<ul>
<li>The Gartner Magic Quadrant&#8217;s comments on Teradata seem pretty fair. I don&#8217;t think I&#8217;m much in disagreement when I say:
<ul>
<li>Teradata has the richest, most mature analytic DBMS offering.</li>
<li>Teradata has an outstanding track record both for <a href="http://www.dbms2.com/2011/09/24/confusion-about-teradatas-big-customers/">managing large data volumes</a> and for high-concurrency mixed workloads.</li>
<li>Aster Data was a cool Teradata acquisition, even if Teradata/Aster synergies or integration have been nominal to date.</li>
<li>Teradata still needs to get out of its own way in marketing, positioning, packaging, and/or defining its premium-priced system vs. its more moderately-priced alternatives. Indeed, as necessary as this approach may have been to fending off encroachments by Netezza and others, what Teradata really needs to do is evolve to a more pick-your-own-node-combination mix-match kind of offering.</li>
</ul>
</li>
<li>Gartner has talked with a lot of Oracle Exadata users who say that the product works; Gartner has also stopped beating Oracle up for <a href="http://www.dbms2.com/2010/06/14/best-practices-analytic-database-poc/">its previous policy of almost never doing onsite POCs (Proofs of Concept)</a>; both parts of that ring true with me. But Gartner also rightly dings Oracle for various issues in cost and cumbersomeness. Overall, while I agree there are organizations for which Oracle should indeed be a top-ranked choice, there are many others who shouldn&#8217;t put Oracle on their short list.</li>
<li>Third in the Gartner MQ rankings is IBM.
<ul>
<li>Gartner gets so caught up in reciting the names of various IBM product offerings that it neglects to say much good about DB2 itself. (I tend to have a similar problem.)</li>
<li>But Gartner does mention concurrency as a strength. I agree, especially if we presume that that was a reference to DB2 rather than Netezza.</li>
<li>Gartner cites Netezza&#8217;s post-acquisition annual growth rate as 30%. Gartner seems to think this is a good number. I disagree, but in Netezza&#8217;s defense, it has had to endure IBM&#8217;s post-acquisition on-boarding process.</li>
</ul>
</li>
<li>Arguably fourth in the Gartner Data Warehouse Magic Quadrant rankings is EMC/Greenplum.
<ul>
<li>In general, Gartner likes the taste of Greenplum Kool-Aid.</li>
<li>Gartner neglects to ding Greenplum for concurrency challenges, which I view as an oversight given Gartner&#8217;s general stress on that area.</li>
<li>Gartner does ding Greenplum for support challenges.</li>
<li>Gartner neglects to praise Greenplum for true <a href="http://www.dbms2.com/2009/10/14/greenplum-hybrid-columnar/">hybrid row/columnar data management</a>, a feature shared by <a href="http://www.dbms2.com/2011/09/22/teradata-columnar-compression/">Teradata</a> and <a href="http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/">Vertica</a>, among others, but not by <a href="http://www.dbms2.com/2011/02/06/columnar-compression-database-storage/">Oracle</a>, DB2, or Netezza.</li>
<li>Gartner located a half-petabyte Greenplum database. This doesn&#8217;t surprise me, even though Greenplum has frequently made exaggerated claims about large-size database successes in the past.</li>
<li>Gartner reports a &gt;400 figure for Greenplum customers, which is plausible.</li>
</ul>
</li>
<li>In its first deviation from strict one-dimensional rank ordering, the Gartner Magic Quadrant ranks Sybase ahead of Greenplum in completeness of vision but behind in &#8220;ability to execute&#8221;.
<ul>
<li>If that were the other way around, it might make more sense. Greenplum promises anything and everything you might ever want for analytic data management or the associated analysis; but Sybase has vastly more analytic DBMS users than Greenplum does, running a variety of demanding workloads.</li>
<li>Gartner appears to think that Sybase IQ requires less database administration than I do.</li>
<li>Gartner seems concerned that SAP will position HANA and Sybase ASE as, between them, the only DBMS you&#8217;ll ever need, casting doubt on Sybase IQ&#8217;s future. I wouldn&#8217;t worry about that if you have a problem you want to solve today.</li>
</ul>
</li>
<li>The Gartner Magic Quadrant for Data Warehouse Database Management Systems ranks Microsoft sixth overall, despite noting that there isn&#8217;t a single production reference for Microsoft&#8217;s Parallel Data Warehouse. In support of this ranking, it for example cites the compression feature, which distinguishes Microsoft SQL Server from no other product on the list except Kognitio. If you have such an undemanding data warehousing problem that many different analytic DBMS could meet your needs, there&#8217;s a good chance Microsoft SQL Server can also do the job; and if you&#8217;ve bought into the Microsoft technology stack, you might as well keep going down that path. Otherwise, I don&#8217;t know why somebody should adopt Microsoft&#8217;s offering at this time.</li>
<li>Seventh along the main diagonal path in the Gartner Magic Quadrant is HP Vertica. I&#8217;d rank Vertica higher than that, but in fairness I note two execution concerns. First, HP has a lousy track record, both in acquisitions and in data warehousing/analytics. Second, Vertica is bad about answering my email. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Anyhow, Gartner doesn&#8217;t seem to have given Vertica credit either for <a href="http://www.dbms2.com/2011/06/20/columnar-dbms-vendor-customer-metrics/">its full customer count or for the multiple petabyte-scale databases Vertica runs</a>.</li>
<li>1010data is an outlier, with Gartner noting that it only partly fits in with other &#8220;Data Warehousing Database Management&#8221; companies, and hence kind of confessing that 1010data on the Magic Quadrant is somewhat arbitrary. Stuff like that is bound to happen, given <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">the inherent difficulties of defining market categories</a>. Anyhow, my thoughts on 1010data include:
<ul>
<li>I&#8217;m nervous about the fact that 1010data doesn&#8217;t actually control its own DBMS technology, but rather relies on old code from the small private company KX Systems.</li>
</ul>
<ul>
<li> There are three main reasons to consider 1010data:
<ul>
<li>You want to enter the data mart outsourcing business in a casual way, and you like its SaaS offering.</li>
<li>You want to engage in <a href="http://www.dbms2.com/2010/05/15/stakeholder-facing-analytics/">stakeholder-facing analytics</a> in a casual way, and you like its SaaS offering.</li>
<li>You love 1010data&#8217;s particular set of interactive analytic features and performance.</li>
</ul>
</li>
</ul>
</li>
<li>Back to the main path winding along the Gartner Magic Quadrant main diagonal &#8212; next up is ParAccel. While I question some of the peripheral comments, I agree with Gartner&#8217;s core messages that:
<ul>
<li>ParAccel, the product, is blazingly fast in certain use cases.</li>
<li>ParAccel, the company, is dangerously small.</li>
</ul>
</li>
<li>Eighth on the Gartner MQ&#8217;s main path is Kognitio. This is too high. Kognitio positions itself as offering in-memory DBMS, yet stubbornly refuses to do any kind of data compression. That&#8217;s an awful combination of choices. As for using Kognitio&#8217;s data warehousing SaaS offering &#8212; why would you do that, when more modern products are available on a SaaS/cloud basis as well?</li>
<li>Ninth in the Gartner Magic Quadrant main rankings is SAND.
<ul>
<li>The SAND section is not a triumph of Gartner accuracy. For example:
<ul>
<li><a href="http://www.dbms2.com/2011/11/12/clarifying-sands-customer-metrics-positioning-and-technical-story/">Gartner completely missed the errors in SAND&#8217;s reported customer counts</a>.</li>
<li>Gartner refers to SAND as being &#8220;in existence for approximately nine years&#8221;, which is too low by at least a factor of 2.</li>
<li>Gartner says &#8220;SAND is a privately held company&#8221;, even though <a href="http://itmarketstrategy.com/2009/06/07/sand-technology-a-risky-bet/">Merv knows better than that</a>.</li>
</ul>
</li>
<li>Otherwise, Gartner&#8217;s opinion on SAND seems to boil down to &#8220;Interesting technology and ideas, but dangerously small company.&#8221; I agree.</li>
</ul>
</li>
<li>Tenth and too low in the Gartner MQ main rankings is Infobright.
<ul>
<li>At least by some metrics (e.g. customer count), Infobright isn&#8217;t as dangerously small as ParAccel, SAND, Kognitio, et al.</li>
<li>That said, Infobright is small and focused on <a href="http://www.dbms2.com/2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a>. So I wouldn&#8217;t be confident in Infobright&#8217;s future technology path for human-generated data use cases.</li>
<li>Infobright&#8217;s performance is uneven &#8212; blazing in cases where the Knowledge Grid helps, but not necessarily stellar by analytic DBMS standards when full table scans are called for.</li>
<li>I agree with Gartner that the possibility of Oracle/MySQL future shenanigans is a concern. But while the energy behind MySQL forking efforts doesn&#8217;t seem too great right now, I&#8217;d expect them to revive and offer a successful escape path if it seemed Oracle was going to indeed play hardball.</li>
<li>Also, given that it&#8217;s already an open source vendor, there are various kinds of assurances Infobright could give that would also help alleviate customer concerns.</li>
</ul>
</li>
<li>Actian, formerly Ingres, took a big tumble in Gartner&#8217;s rankings versus last year, when I simply wrote &#8220;<a href="http://www.dbms2.com/2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/">What Gartner said in connection with <strong>Ingres</strong> is too inaccurate to deserve detailed attention</a>.&#8221; I&#8217;m even a little harsher about <a href="http://www.dbms2.com/2011/09/25/ingres-actian/">Ingres/Actian&#8217;s DBMS products and prospects</a> than Gartner is, but at least now we&#8217;re in the same ballpark.</li>
<li>Along with Infobright, ParAccel, and SAND, <a href="http://www.dbms2.com/2011/11/12/exasol-update/">Exasol</a> appears to be another of the &#8220;good columnar technology/small company&#8221; crowd. As with other such products, one should be careful about fit-and-finish features that are missing today, as there is no assurance they&#8217;ll be added in a timely manner going forward.</li>
<li>illuminate Solutions, which was on last year&#8217;s Gartner list, <a href="http://www.dbms2.com/2012/01/16/has-illuminate-solutions-joined-the-choir-invisible/">now appears to be an ex-company</a>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/08/gartner-magic-quadrant-data-warehouse-2011-2012/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>A couple of links explaining Cloudera Manager</title>
		<link>http://www.dbms2.com/2012/01/10/a-couple-of-links-explaining-cloudera-manager/</link>
		<comments>http://www.dbms2.com/2012/01/10/a-couple-of-links-explaining-cloudera-manager/#comments</comments>
		<pubDate>Tue, 10 Jan 2012 22:23:22 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5798</guid>
		<description><![CDATA[Predictably, I wasn&#8217;t pre-briefed on the details of Oracle&#8217;s Big Data Appliance announcement today, and an inquiry to partner Cloudera doesn&#8217;t happen to have been immediately answered.* But anyhow, it&#8217;s clear from coverage by Larry Dignan and Derrick Harris that Oracle&#8217;s Big Data Appliance includes: Some version of Cloudera Manager (I&#8217;m guessing more or less [...]]]></description>
			<content:encoded><![CDATA[<p>Predictably, I wasn&#8217;t pre-briefed on the details of Oracle&#8217;s Big Data Appliance announcement today, and an inquiry to partner Cloudera doesn&#8217;t happen to have been immediately answered.* But anyhow, it&#8217;s clear from coverage by <a href="http://www.zdnet.com/blog/btl/oracle-rolls-out-big-data-play-with-aggressive-price-cloudera/66529">Larry Dignan</a> and <a href="http://gigaom.com/cloud/cloudera-brings-the-hadoop-to-oracles-big-data-appliance/">Derrick Harris</a> that Oracle&#8217;s Big Data Appliance includes:</p>
<ul>
<li>Some version of Cloudera Manager (I&#8217;m guessing more or less the best one).*</li>
<li>Some version of Apache Hadoop (I&#8217;m guessing the same distribution that Cloudera prefers to use).*</li>
<li>Some kind of support.</li>
</ul>
<p>In other words, it&#8217;s a lot like getting Cloudera Enterprise,* plus some hardware, plus some other stuff.</p>
<p><em>*Edit: About 2 minutes after I posted this, I got email from Cloudera CEO Mike Olson. Yes, the Oracle Big Data Appliance bundles Cloudera Enterprise.</em></p>
<p>That raises an anyway recurring question: <strong>What exactly is Cloudera Manager?</strong> <span id="more-5798"></span>When asked, I&#8217;ve always tended to mumble something like: <strong>Um, it&#8217;s management stuff. </strong>There&#8217;s an overview on <a href="http://www.cloudera.com/products-services/tools/">the Cloudera Manager product page</a>, but it doesn&#8217;t really say much, even if you click on the Data Sheet link. More helpful, I think, is <a href="http://www.cloudera.com/blog/2011/12/cloudera-manager-3-7-released/">a December post on Cloudera&#8217;s busy blog</a>. Technically, the post is about the new features in the Cloudera Manager 3.7 point release, but more generally it helps to explain what Cloudera Manager does, in areas such as (and these bullet points are all direct quotes):</p>
<ul>
<li> Automated Hadoop Deployment</li>
<li> Centralized Management</li>
<li> Configuration Management</li>
<li> Service Monitoring</li>
<li> Log Search</li>
<li> Events and Alerts</li>
<li> Configuration versioning and Audit trails</li>
<li> Activity Monitoring</li>
<li> Operational Reports</li>
</ul>
<p>Taken together,<strong> those two Cloudera links do a pretty good job of explaining Cloudera Manager, and illustrating why a Hadoop user would want to have either Cloudera Manager or a similar competitive offering.</strong></p>
<p><em>Edit: The day after I originally made this post, Cloudera put up another post <a href="http://www.cloudera.com/blog/2012/01/cloudera-manager-thank-you-customers/">directly explaining what Cloudera Manager is about</a>.<br />
</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/01/10/a-couple-of-links-explaining-cloudera-manager/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Some big-vendor execution questions, and why they matter</title>
		<link>http://www.dbms2.com/2011/11/21/big-vendor-execution-analytics/</link>
		<comments>http://www.dbms2.com/2011/11/21/big-vendor-execution-analytics/#comments</comments>
		<pubDate>Mon, 21 Nov 2011 11:01:20 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cognos]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5704</guid>
		<description><![CDATA[When I drafted a list of key analytics-sector issues in honor of look-ahead season, the first item was &#8220;execution of various big vendors&#8217; ambitious initiatives&#8221;.  By &#8220;execute&#8221; I mean mainly: &#8220;Deliver products that really meet customers&#8217; desires and needs.&#8221; &#8220;Successfully convince them that you&#8217;re doing so &#8230;&#8221; &#8220;&#8230; at an attractive overall cost.&#8221; Vendors mentioned [...]]]></description>
			<content:encoded><![CDATA[<p>When I drafted a list of key analytics-sector issues in honor of <a href="http://www.dbms2.com/2011/11/21/analytic-trends-in-2012-qa/">look-ahead season</a>, the first item was &#8220;execution of various big vendors&#8217; ambitious initiatives&#8221;.  By &#8220;execute&#8221; I mean mainly:</p>
<ul>
<li>&#8220;Deliver products that really meet customers&#8217; desires and needs.&#8221;</li>
<li> &#8220;Successfully convince them that you&#8217;re doing so &#8230;&#8221;</li>
<li>&#8220;&#8230; at an attractive overall cost.&#8221;</li>
</ul>
<p>Vendors mentioned here are Oracle, SAP, HP, and IBM. Anybody smaller got left out due to the length of this post. Among the bigger omissions were:</p>
<ul>
<li>salesforce.com (multiple subjects).</li>
<li><a href="../../../../../2011/04/21/sas-hpa-does-make-sense-after-all/">SAS HPA</a>.</li>
<li><a href="../../../../../2011/08/21/hadoop-evolution/">The evolution of Hadoop</a>.</li>
</ul>
<p><span id="more-5704"></span><strong>A (lingering) issue for SAP and Oracle alike</strong></p>
<p>As I noted in January of this year, <a href="../../../../../2011/01/03/the-six-useful-things-you-can-do-with-analytic-technology/">integration of business intelligence into operational apps is making very slow progress</a>. Even so, it&#8217;s a huge part of the apparent strategy at SAP and Oracle alike, as well it should be. Much of the benefit from automating routine desk work has already happened. The areas ripest for exploitation are the ones where analytics are part of the equation.</p>
<p>Given the lack of tangible progress, why do I think this is a genuine area of Oracle and SAP emphasis? Three reasons of many are:</p>
<ul>
<li>Why else did SAP buy Business Objects?</li>
<li>If they&#8217;re not trying to <a href="../../../../../2011/03/30/short-request-and-analytic-processing/">integrate operational apps and analytics</a>, why else does SAP&#8217;s emphasis on HANA make sense?</li>
<li>Without business intelligence in the picture, how does Oracle&#8217;s integrated-stack story promise any direct user benefits?*</li>
</ul>
<p><em>*As opposed to IT concerns &#8212; integration, administration, TCO (Total Cost of Ownership), etc.</em></p>
<p>After so many years of disappointment, I&#8217;m not going to forecast 2012 as a pivotal year for <strong>the integration of business intelligence into operational applications.</strong> But if one of SAP or Oracle ever does get a significant BI/operational app integration advantage over the other, it could be a major competitive advantage in those application market segments that are still up for grabs. It also is an opportunity for both vendors to gain BI market share in their respective application customer bases.</p>
<p><strong>A more urgent issue for SAP</strong></p>
<p>SAP has put huge amounts of credibility on the line for HANA, the integration of two different and not particularly mature in-memory database technologies. So far, it is difficult to find evidence that HANA is robust enough for widespread adoption. Whether or not SAP can fix that is a huge open question, which could have significant impact on the course of several technology areas: applications, business intelligence, in-memory DBMS, and maybe even hardware.</p>
<p>Based on current information, which is admittedly partial, I&#8217;m a short-term pessimist on HANA. Longer-term, I&#8217;m on record as saying that <a href="../../../../../2011/05/23/databases-ram/">traditional databases will eventually wind up in RAM</a>. SAP will surely get that technology right some day, whether or not the way it does so has anything to do with present-day HANA code.</p>
<p><strong>Four more issues for Oracle </strong></p>
<p>Oracle&#8217;s ambitions are near-endless, and so also therefore is its list of execution challenges. Four in the analytics area that I find particularly interesting are:</p>
<ul>
<li><strong>True hybrid columnar DBMS.</strong> <a href="../../../../../2011/09/22/teradata-columnar-compression/">I was guessing that Oracle, like Teradata, would announce true hybrid columnar the week of Oracle OpenWorld</a>. I was wrong. But if Oracle can&#8217;t bring out true hybrid columnar DBMS functionality relatively soon, Exadata will lose credibility as a competitor to more specialized analytic DBMS.</li>
<li><strong>Oracle Exalytics.</strong> With Exalytics in the mix, Oracle&#8217;s technology stack has HANA-like potential. But will Exalytics even ship in 2012? (I think so.) Will it be good for much in the first release? (I&#8217;m skeptical.)</li>
<li><strong>Oracle&#8217;s Big Data Appliance</strong>. I&#8217;m skeptical both about <a href="../../../../../2011/10/20/more-notes-on-oracle-nosql/">Oracle&#8217;s NoSQL product</a> &#8212; <a href="http://www.infoworld.com/d/data-explosion/first-look-oracle-nosql-database-179107">a favorable InfoWorld review</a> notwithstanding &#8212; and <a href="../../../../../2011/09/23/hadoop-appliances/">Hadoop appliances</a>. But if I&#8217;m wrong, and Oracle can successfully embrace/extend the new non-relational paradigms, then it really might regain control over the evolution of data management.</li>
<li><strong><a href="../../../../../2011/10/18/oracle-is-buying-endeca/">Oracle&#8217;s Endeca acquisition</a></strong> &#8212; will Oracle prove me wrong and integrate Endeca effectively into its overall analytic product line? If it does, we might finally see effective text (and eventually speech) navigation of enterprise software. (But as with all Oracle issues cited here, this is something that probably won&#8217;t amount to much in 2012 even if it does later go well.)</li>
</ul>
<p><strong>Three issues for IBM</strong></p>
<p>Like Oracle, IBM is a huge company with many ambitions and hence many execution challenges. The biggest of those is surely: <strong>How effective can IBM be at selling outside its existing customer base?</strong> I don&#8217;t hear as much competitively about IBM DataStage, IBM SPSS or now IBM Netezza as I did when their vendors were independent companies. Even Cognos may not be much of an exception to the rule, although it has its own large customer base outside of IBM&#8217;s traditional one. (To lesser extents , the same is of course true of Netezza and numerous other IBM acquisitions.)</p>
<p>Another general issue for IBM is <strong>substantively integrating its various product lines,</strong> at least to the extent that makes sense. DB2/Netezza integration sounds good, but even that is a matter more of product marketing (the admirable part of that discipline) more than of actual technology. Other integrations (e.g. Cognos/DB2 in various bundles) have tended toward the dubious side.*</p>
<p><em>*I&#8217;m still waiting for IBM to get back to me with examples of how Cognos/DB2 joint tuning amounts to anything. It&#8217;s been more than a year, so I&#8217;m glad I didn&#8217;t hold my breath.</em></p>
<p>In a somewhat narrower vein, I wonder: <strong><a href="../../../../../2011/11/10/cep-streaming-catchup/">Will IBM be able to gain traction for InfoSphere Streams</a>? </strong>And if so, when and where will the traction be?</p>
<p><strong>Will HP screw up Vertica?</strong></p>
<p>Vertica has a very attractive product offering. It&#8217;s perhaps <a href="../../../../../2011/06/20/columnar-dbms-vendor-customer-metrics/">the most scalable analytic DBMS outside of Teradata</a>, running on the hardware of your reasonable choice.  It&#8217;s also the one I recommend most often to clients in the 1-50 terabyte range.</p>
<p>So far HP doesn&#8217;t seem to have done much to leadfoot Vertica. (About all I&#8217;ve heard from competitors is that Vertica seems to have faded somewhat in the financial services market, and there could be multiple explanations if that is indeed true.) But if HP Vertica does somehow manage to botch things, opportunities will open up for a range of columnar analytic DBMS competitors.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/21/big-vendor-execution-analytics/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Analytic trends in 2012: Q&amp;A</title>
		<link>http://www.dbms2.com/2011/11/21/analytic-trends-in-2012-qa/</link>
		<comments>http://www.dbms2.com/2011/11/21/analytic-trends-in-2012-qa/#comments</comments>
		<pubDate>Mon, 21 Nov 2011 11:00:23 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[QlikTech and QlikView]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Tableau Software]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5692</guid>
		<description><![CDATA[As a new year approaches, it&#8217;s the season for lists, forecasts and general look-ahead. Press interviews of that nature have already begun. And so I&#8217;m working on a trilogy of related posts, all based on an inquiry about hot analytic trends for 2012. This post is a moderately edited form of an actual interview. Two [...]]]></description>
			<content:encoded><![CDATA[<p>As a new year approaches, it&#8217;s the season for lists, forecasts and general look-ahead. Press interviews of that nature have already begun. And so I&#8217;m working on a trilogy of related posts, all based on an inquiry about hot analytic trends for 2012.</p>
<p>This post is a moderately edited form of an actual interview. Two other posts cover analytic trends to watch (planned) and <a href="http://www.dbms2.com/2011/11/21/big-vendor-execution-analytics/">analytic vendor execution challenges to watch</a> (already up).</p>
<p><span id="more-5692"></span><strong>Question</strong>: What do you think will happen next year with the Tableaus of the world?</p>
<p><strong>Answer:</strong></p>
<ul>
<li>I think adoption of flexible-visualization business intelligence tools will continue to be rapid.</li>
<li>I think enterprise-friendly features will be increasingly important as a basis of competition.</li>
</ul>
<p><strong>Question</strong>: What do you mean by &#8220;enterprise-friendly&#8221;?</p>
<p><strong>Answer</strong>: An example would be <a href="http://www.dbms2.com/2011/11/16/qlikview-collaborative-business-intelligence/">QlikTech no longer forcing you to use their native ETL</a>, but rather working with Informatica and soon other third-party products. Also important can be:</p>
<ul>
<li>Database size.</li>
<li>Concurrency.</li>
<li>A full-featured development cycle for analytic applications.</li>
</ul>
<p><strong>Question</strong>: What does HP have to do to be relevant in analytics/data warehousing?</p>
<p><strong>Answer</strong>: Avoid stupidity. HP Vertica is already relevant.</p>
<p><strong>Question</strong>: OK. But what can HP do to build on Vertica?</p>
<p><strong>Answer</strong>: HP &#8212; which botched Exadata 1 hardware &#8212; could do a good job with SAP HANA or other kinds of appliance products.</p>
<p>However:</p>
<ul>
<li>I don&#8217;t think trying to force Vertica beyond its natural growth &#8212; <a href="http://www.dbms2.com/2011/04/16/unpacking-the-emc-greenplum-q1-sales-disaster-rumors/">the way EMC is with Greenplum</a> &#8212; is necessarily a good idea. Natural growth in Vertica&#8217;s case is plenty fast anyway.</li>
<li>Obviously, making good Vertica hardware would be nice. But being hardware-independent is crucial to Vertica, not least because of cloud deployment, an option many buyers want to at least have in their hip pockets.</li>
</ul>
<p><strong>Question</strong>: You expressed some skepticism toward mobile BI/use cases. Why so?</p>
<p><strong>Answer</strong>: The form factor hurts functionality a lot, so it&#8217;s only worthwhile in cases where timeliness is key.</p>
<p>And without more refined alert-setting functionality, it&#8217;s hard to think of that many cases.</p>
<p><em>Note: My views on mobile BI haven&#8217;t changed much since <a href="../../../../../2010/07/15/mobile-business-intelligence/">July, 2010</a>.</em></p>
<p><strong>Question</strong>: What about the idea of an enterprise being able to pay-per-drink to run jobs on an analytic cluster. Do you expect that concept to have any legs in 2012?</p>
<p><strong>Answer</strong>: While other kinds of SaaS (Software as a Service) BI might make sense, remote computing BI that focuses on hardware cost sharing is problematic. Moving data in and out of the cluster is a big part of the overall cost, at least if you plan to process it only occasionally once it gets there. I haven&#8217;t seen a plan yet that gets around that point.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/21/analytic-trends-in-2012-qa/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Some notes on Hadoop (mainly) and appliances</title>
		<link>http://www.dbms2.com/2011/09/23/hadoop-appliances/</link>
		<comments>http://www.dbms2.com/2011/09/23/hadoop-appliances/#comments</comments>
		<pubDate>Fri, 23 Sep 2011 19:59:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapR]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[eBay]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5341</guid>
		<description><![CDATA[1. EMC Greenplum has evolved its appliance product line. As I read that, the latest announcement boils down to saying that you can neatly network together various Greenplum appliances in quarter-rack increments. If you take a quarter rack each of four different things, then Greenplum says &#8220;Hooray! Our appliance is all-in-one!&#8221; Big whoop. 2. That [...]]]></description>
			<content:encoded><![CDATA[<p>1. <a href="http://www.greenplum.com/products/greenplum-dca">EMC Greenplum has evolved its appliance product line</a>. As I read that, the latest announcement boils down to saying that you can neatly network together various Greenplum appliances in quarter-rack increments. If you take a quarter rack each of four different things, then Greenplum says &#8220;Hooray! Our appliance is all-in-one!&#8221; Big whoop.</p>
<p>2. That said, the Hadoop part of EMC &#8216;s story is based on MapR, which so far as I can tell is actually a pretty good Hadoop implementation. More precisely, MapR makes strong claims about performance and so on, and Apache Hadoop folks don&#8217;t reply &#8220;MapR is full of &amp;#$!&#8221; Rather, they say &#8220;We&#8217;re going to close the gap with MapR a lot faster than the MapR folks like to think &#8212; and by the way, guys, thanks for the butt-kick.&#8221; A lot more precision about MapR may be found in this <a href="http://www.slideshare.net/mcsrivas/design-scale-and-performance-of-maprs-distribution-for-hadoop">M. C. Srivas SlideShare</a>.</p>
<p>3. On its latest earnings call, Oracle clearly <a href="http://seekingalpha.com/article/294885-oracle-s-ceo-discusses-q1-2012-results-earnings-call-transcript?part=qanda">said it would introduce a Hadoop appliance</a>, versus just <a href="../../../../../2011/06/24/forthcoming-oracle-appliances/">hinting at a Hadoop appliance</a> the prior quarter. The money quote was:  <span id="more-5341"></span></p>
<blockquote><p>Finally, big data or the searching of large amounts of data using Hadoop. After Hadoop finishes filtering the data, the place you want to put that data is an Oracle Database, and that&#8217;s what a lot of our customers are doing. And we are exploiting the trend, the big data technology and the big data trend, if you prefer, by building a Hadoop appliance that attaches to the Oracle Exadata database or any Oracle Database for that matter. But you don&#8217;t have to buy our Hadoop appliance if you can use whatever servers you want running Hadoop, and we provide the interface between Hadoop and the Oracle Database.</p></blockquote>
<p>In other words, Oracle is saying &#8220;We&#8217;d like to sell you a Hadoop appliance, but you can run Hadoop in some other way and we&#8217;ll coexist with it just fine.&#8221; That makes sense; refusing to coexist with Hadoop is not exactly a realistic option.</p>
<p>4. Back in June, I expressed <a href="../../../../../2011/06/02/why-you-would-want-an-appliance-and-when-you-wouldnt/">great skepticism about the idea of a Hadoop appliance</a>. There was at least partial pushback in the comment thread from both Amr Awadallah and Eric Baldeschwieler. Oops.</p>
<p>Their reasoning seems to be centered around matters of installation, administration, and general packaging.</p>
<p>5. A month ago I noted aggressive near-term plans for <a href="../../../../../2011/08/21/hadoop-evolution/">Apache Hadoop evolution</a>. As noted above, one reason this is needed is competition from folks like MapR. Also, I note that:</p>
<ul>
<li>Three years ago, Oliver Ratzesberger&#8217;s group at eBay complained that <a href="../../../../../2008/10/15/ebay-doesnt-love-mapreduce/">CPU utilization running Hadoop was at 18%</a>.</li>
<li><a href="../../../../../2011/08/21/hadoop-evolution/#comment-241679">Now Oliver uses a figure of 10-15%.</a>, and attributes an even lower figure to &#8212; I&#8217;m guessing here &#8212; Yahoo. (Another possibility might be Facebook.)</li>
<li>In between eBay became one of the biggest and most prominent users of Hadoop.</li>
</ul>
<p>The moral of eBay&#8217;s Hadoop adventures, as I see it, is neither &#8220;Hadoop sucks!&#8221; nor &#8220;Hadoop doesn&#8217;t suck!&#8221;; rather, it&#8217;s that there&#8217;s a lot of scope for Hadoop to operate differently in the future than it does today.</p>
<p><em>Similarly, whatever throughput Yahoo does or doesn&#8217;t get, it clearly has adopted Hadoop at the expense of the <a href="../../../../../2008/05/29/yahoo-scales-web-analytics-database-petabyte/">columnar-in-Postgres</a> system it previously was so proud of.</em></p>
<p>Also, there has been a claim going around that &#8212; notwithstanding NameNode&#8217;s status as a single point of Hadoop failure &#8212;  no Hadoop installation has ever lost data due to a NameNode failure. The folks at MapR beg to differ, and sent over <a href="https://issues.apache.org/jira/browse/HDFS-1539">some</a> <a href="http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201107.mbox/%3CCAFUA3X2R_wH9GGGseUVSXVNVZQ+dBjZKDn0_pmDO8U31C05tMw@mail.gmail.com%3E">links</a> that sure seem to say the opposite.</p>
<p>6. Since we&#8217;ve just established that Hadoop will change, rapidly and pretty fundamentally, what exactly is the benefit of an appliance that is &#8220;balanced&#8221; for Hadoop usage today?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/23/hadoop-appliances/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Aster Database Release 5 and Teradata Aster appliance</title>
		<link>http://www.dbms2.com/2011/09/22/aster-database-release-5-and-teradata-aster-appliance/</link>
		<comments>http://www.dbms2.com/2011/09/22/aster-database-release-5-and-teradata-aster-appliance/#comments</comments>
		<pubDate>Thu, 22 Sep 2011 05:56:45 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5304</guid>
		<description><![CDATA[It was obviously just a matter of time before there would be an Aster appliance from Teradata and some tuned bidirectional Teradata-Aster connectivity. These have now been announced. I didn&#8217;t notice anything particularly surprising in the details of either. About the biggest excitement is that Aster is traditionally a Red Hat shop, but for the [...]]]></description>
			<content:encoded><![CDATA[<p>It was obviously just a matter of time before there would be an Aster appliance from Teradata and some tuned bidirectional Teradata-Aster connectivity. These have now been announced. I didn&#8217;t notice anything particularly surprising in the details of either. About the biggest excitement is that Aster is traditionally a Red Hat shop, but for the purposes of appliance delivery has now embraced SUSE Linux.</p>
<p>Along with the announcements comes updated positioning such as:</p>
<ul>
<li>Better SQL than the MapReduce alternatives have.</li>
<li>Better MapReduce than the SQL alternatives have.</li>
<li>Easy(ier) way to do complex analytics on <a href="../../../../../2011/05/15/what-to-do-about-unstructured-data/">multi-structured data</a>. (Aster has embraced that term.)</li>
</ul>
<p>and of course</p>
<ul>
<li>Now also with Teradata&#8217;s beautifully engineered hardware and system management software!</li>
</ul>
<p><span id="more-5304"></span>As might also be expected, the announcements are accompanied by pictures along the lines of &#8220;There are your various data sources; there&#8217;s Teradata; there&#8217;s Aster; there&#8217;s Hadoop; look at all the nice arrows connecting them!&#8221;</p>
<p>Teradata Aster further decided it was time for a 5.0 DBMS release. Highlights include:</p>
<ul>
<li>Aster&#8217;s SQL-MapReduce has more flexible inputs. Specifically, if you view SQL/ MapReduce as steroid-enhanced table functions, those functions can now each have multiple tables as input. Aster is rightly positioning this as the key feature of the Aster 5.0 release.</li>
<li>Workload management now explicitly manages not only CPU and I/O, but also RAM. That surely makes it safer to use algorithms which aggressively create temporary data structures. And the allocation is dynamic, in that it can be throttled back if workloads require.</li>
<li>There&#8217;s more SQL functionality &#8212; I think this is minor, as Aster seems to have had pretty good SQL coverage already.</li>
<li>Performance has been improved; i.e., <a href="../../../../../2009/08/21/bottleneck-whack-a-mole/">Bottleneck Whack-A-Mole</a> has progressed in multiple ways. One improvement Aster thinks is cutting-edge is a hybrid kind of join that tries to be a hash, then reverts to a merge if it has to spill out of memory. (E.g., if the available RAM is throttled back.)</li>
</ul>
<p>Also, Aster is always expanding its library of <a href="../../../../../2010/06/27/lots-of-aster-data-analytic-packages/">prebuilt analytic functions/packages</a> &#8212; often in connection with specific customer engagements &#8212; and took this opportunity to mention numerous recent or near-future additions to the list.</p>
<p>Part of Aster&#8217;s motivation in making multiple input tables available to its parallel analytic functions seems to be to allow the use of intermediate result sets alongside raw data. In some ways, this seems to be an alternative to <a href="../../../../../2011/04/21/sas-hpa-does-make-sense-after-all/">the MPI-based approach favored by SAS</a>, and highlights limitations of the vanilla MapReduce paradigm. The specific examples given were k-means clustering and &#8212; which I&#8217;d never heard of before &#8212; SAX pattern matching.</p>
<p>For an example of two true data tables being used as inputs, Aster offered a case of advertising attribution, with the data being about impressions and also conversions. Frankly, I suspect a &#8220;join them all and let MapReduce sort them out&#8221; strategy would also work for that application; if you join on something like Customer_ID, just how big would the result set really be? Even so, we can imagine other cases in which messy boundaries for graphs or time series makes that strategy unappealing, and &#8212; you read it here first! &#8212; <a href="../../../../../2011/09/08/aster-data-business-trends/">Aster&#8217;s target use cases are focused on time series and graphs</a>.</p>
<p>And finally: Whenever I ask the Aster folks &#8220;So, how big are Aster databases that are actually in production?&#8221;, they try to convince me that this is the wrong thing to ask. But &#8212; without actually answering the question &#8212; they did say:</p>
<ul>
<li>The new Teradata Aster appliance has been tested to a couple hundred terabytes.</li>
<li>They are very confident about scaling Aster to a few hundred terabytes.</li>
<li>They don&#8217;t have much in the way of proof in the 1 petabyte range.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/22/aster-database-release-5-and-teradata-aster-appliance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Eight kinds of analytic database (Part 2)</title>
		<link>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/</link>
		<comments>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 08:18:18 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Buying processes]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data types]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MOLAP]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Rainstor]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[SenSage]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4867</guid>
		<description><![CDATA[In Part 1 of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I&#8217;ll cover four more kinds of analytic database &#8212; even newer, for the most part, with a use case/product short list [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/">Part 1</a> of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I&#8217;ll cover four more kinds of analytic database &#8212; even newer, for the most part, with a use case/product short list match that is even less clear.  <span id="more-4867"></span></p>
<p><strong><em>Bit bucket</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included: </em>Logs, other technical/external</li>
<li><em>Likely use styles:</em> Staging/ETL, investigative</li>
<li><em>Canonical example: </em>Log files in a Hadoop cluster<em> </em></li>
<li><em>Stresses:</em> TCO, scale-out, transform/big-query performance, ETL functionality</li>
</ul>
<p>With the explosion of <a href="../../../../../2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a> has come the need for a place to put it all, sometimes called the <a href="../../../../../2011/06/04/dirty-data-stored-dirt-cheap/">big bit bucket</a>. This is like the investigative data mart for big databases, but more <a href="../../../../../2011/05/17/poly-structured-database/">poly-structured</a>. In some cases it is focused on data staging and transformation; but it can also be used for analysis in place.</p>
<p>The list of candidate technologies to run your bit bucket starts with Hadoop and Splunk.</p>
<p><strong><em>Archival data store</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included: </em>Operational, CDR (call detail record), security log</li>
<li><em>Likely use styles:</em> Archival, reporting (for compliance), possibly also investigative</li>
<li><em>Examples:</em> Any long-term detailed historical store</li>
<li><em>Stresses: </em>TCO, compression, scale-out, performance (if multi-use)<em> </em></li>
</ul>
<p><em> </em></p>
<p>Analytic DBMS vendors have been insulting each other with the claim &#8220;that&#8217;s just an archival data store,&#8221; dating back at least to the first time Greenplum was deployed on an underpowered Sun Thumper system. Perhaps only <a href="../../../../../2010/06/11/rainstor-update/">Rainstor</a> truly embraces the archival positioning, and I&#8217;ve become pretty dubious about their technical claims and their company alike.</p>
<p>Still, there&#8217;s a legitimate need for data stores &#8212; especially relational analytic DBMS that:</p>
<ul>
<li>Store data cheaply, with high rates of compression.</li>
<li>Have decent performance if you do want to query the data.</li>
<li>May have archiving/compliance-specific features as well.</li>
</ul>
<p>Along with Rainstor, SAND and SenSage have at least partially targeted that use case. In addition, appliance vendors such as Teradata and Netezza try to have an archive-oriented product version in their lineups.</p>
<p><strong><em>Outsourced data mart</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All</li>
<li><em>Likely use styles:</em> Traditional BI, investigative analytics, staging/ETL</li>
<li><em>Examples:</em> Advertising tracking, SaaS CRM</li>
<li><em>Stresses:</em> Performance, TCO, reliability, concurrency</li>
</ul>
<p>Much of what happens in analytic database management can also be outsourced. Some applications that run via SaaS (Software as a Service) are analytic. I&#8217;ve had three different clients whose main business is picking marketing targets in various vertical segments; others who wanted to add analytics to what were historically OLTP applications; and others yet who just offered online business intelligence. Also, if your fundamental business is gathering data and reselling it to a variety of user organizations, that&#8217;s an analytic data management challenge. The possibilities expand from there.</p>
<p>Data outsourcers are in the IT business, and so their IT development is &#8212; hopefully! &#8212; more serious and less politically encumbered than at many conventional enterprises. Thus, legacy systems and master data management issues are commonly less prevalent, or at least more aggressively disposed of. The same, up to a point, goes for vendor politics.*  <a href="../../../../../2011/06/26/what-to-think-about-before-you-make-a-technology-decision/">Multitenancy</a> is commonly an issue, as is running in the cloud.<em> </em></p>
<p><em>*Even so, there&#8217;s often That Guy who doesn&#8217;t want to migrate away from Oracle, no matter what.<strong> </strong></em></p>
<p>Vertica gets the nod in a number of these cases; it&#8217;s cloud-friendly, and often the problem is naturally columnar. Other columnar products can be good choices too, with added brownie points for Infobright if the shop is MySQL-oriented anyway. Running Netezza or other appliances makes sense mainly if you&#8217;re pretty sure you want to keep operating your own data centers, but some data outsourcers are just fine with that assumption.</p>
<p><strong><em>Operational analytic(s) server</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> Customer-centric, log, financial trade</li>
<li><em>Likely use styles:</em> Advanced operational analytics</li>
<li><em>Examples:</em>
<ul>
<li>Lower latency: Web or call-center personalization, anti-fraud</li>
<li>Higher latency: Customer profiling, Basel 3 risk analysis</li>
</ul>
</li>
<li><em>Stresses:</em> Performance, reliability, analytic functionality, perhaps concurrency</li>
</ul>
<p>Even with eight different choices, I need a &#8220;catch-all&#8221; category; this is it.</p>
<p>Suppose you want to do reasonably sophisticated analytics, then use the results in operations. This is the classical challenge in <a href="../../../../../2011/03/30/short-request-and-analytic-processing/">integrating short-request and analytic processing</a>. There are multiple ways to tackle it, embodying different trade-offs in cost, convenience, or analytic accuracy. If the platform on which you want to run your investigative analytics also has the reliability and concurrency appropriate for mission-critical operations, you&#8217;re set. Otherwise, you may want to pipe <a href="../../../../../2010/11/29/data-that-is-derived-augmented-enhanced-adjusted-or-cooked/">derived data</a> into a more &#8220;industrial-strength&#8221; DBMS, ideally the one that runs your operational apps anyway</p>
<p>Another option is to integrate a limited amount of analytics immediately into your short-request processing system. For example, as bad as they are at the kinds of queries that require joins, NoSQL systems are often fast at simple aggregations. As MapReduce/NoSQL integrations mature, that option may not require pumping the data anywhere else for deeper analytics; even if it does, at least you&#8217;re starting out with the data in a convenient bit bucket.</p>
<p>Streaming/CEP-centric architectures could come into play as well. And it goes on from there. The possibilities in this last category are just too varied to generalize about.</p>
<p><em>So did I get them all? Or are there yet other analytic data management use cases that I don&#8217;t fit into my eight categories?</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Eight kinds of analytic database (Part 1)</title>
		<link>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/</link>
		<comments>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 08:17:44 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Buying processes]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MOLAP]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[QlikTech and QlikView]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Web analytics]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4868</guid>
		<description><![CDATA[Analytic data management technology has blossomed, leading to many questions along the lines of &#8220;So which products should I use for which category of problem?&#8221; The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for &#8220;big data&#8221; is little help. Let&#8217;s try eight categories instead. While no categorization [...]]]></description>
			<content:encoded><![CDATA[<p>Analytic data management technology has blossomed, leading to many questions along the lines of &#8220;So which products should I use for which category of problem?&#8221; The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for &#8220;big data&#8221; is little help.</p>
<p>Let&#8217;s try eight categories instead. While <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">no categorization is ever perfect</a>, these each have at least some degree of technical homogeneity. Figuring out which types of analytic database you have or need &#8212; and in most cases you&#8217;ll need several &#8212; is a great early step in your analytic technology planning.  <span id="more-4868"></span></p>
<p><strong><em>Enterprise data warehouse</em></strong> (Full or partial)</p>
<ul>
<li><em>Kinds of data likely to be included:</em> All, but especially operational</li>
<li><em>Likely use styles:</em> All</li>
<li><em>Canonical example:</em> Central EDW for a big enterprise</li>
<li><em>Stresses:</em> Concurrency, reliability, workload management</li>
</ul>
<p>The enterprise data warehouse (EDW) ideal says that you copy all your data into one place, and drive all decision-making from there. <a href="../../../../../2011/06/21/its-official-the-grand-central-edw-will-never-happen/">Full EDWs are pipedreams</a>. Still, a partial EDW makes sense for most large enterprises, and many indeed already have one. The first product lines to consider for classical EDWs are Teradata, DB2, Exadata, and maybe Microsoft SQL Server, especially if you&#8217;re going to stress concurrency and/or operational use cases.</p>
<p><strong><em>Traditional data mart</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All</li>
<li><em>Likely use styles:</em> Business intelligence, budgeting/consolidation, investigative</li>
<li><em>Examples:</em> Reporting servers, planning/consolidation servers, anything MOLAP, etc.</li>
<li><em>Stresses:</em> Performance, concurrency, TCO</li>
</ul>
<p>Whether or not you have something like an enterprise data warehouse, it&#8217;s common to have lighter-weight data marts as well. A traditional data mart might drive reports and dashboards. Or it might be specialized for budgeting, planning, and/or consolidation.  Some <a href="../../../../../2011/03/03/investigative-analytics/">investigative analytics</a> may be in the mix as well.</p>
<p>Any DBMS that can support an EDW can also support a data mart, but it may not be the most cost-effective way to do so. Columnar DBMS might have more attractive performance and TCO (Total Cost of Ownership); the same goes for Netezza. Some of them &#8212; e.g. Sybase IQ and <a href="../../../../../2011/06/20/vertica-release-5/">Vertica</a> &#8212; have excellent track records in concurrent usage as well. <a href="../../../../../2011/05/29/when-to-use-relational-database-management-system/">Ted Codd</a> pushed what amounts to MOLAP (Multidimensional OnLine Analytic Processing) systems for these use cases. But relational DBMS commonly do a better job, which is one reason most major MOLAP products have wound up at RDBMS companies.</p>
<p><strong><em>Investigative data mart &#8212; agile</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All, especially customer-centric</li>
<li><em>Likely use styles</em>: Investigative</li>
<li><em>Canonical example:</em> A few analysts getting a few TB to examine</li>
<li><em>Stresses:</em> Ease of setup/load, ease of admin, price/performance</li>
</ul>
<p>Besides the traditional data mart, there are at least two other kinds. Both are focused on investigative analytics, but they&#8217;re differentiated by database size.</p>
<p>If you have just a few analysts,* looking at no more than a few terabytes of data (perhaps even just some gigabytes) &#8212; and if that data is &#8220;single-subject&#8221; and fairly homogenous &#8212; your watchwords should be &#8220;cheap&#8221;, &#8220;easy&#8221;, and &#8220;fast&#8221;. You don&#8217;t need to invest in much hardware, in expensive software, in much administrative effort (the analysts can be their own DBAs),  nor should you endure much set-up time. Just grab a product, grab some data, and start running queries (or extracts into the statistical tool of your choice).</p>
<p><em>*If you have dozens or even hundreds of analysts hitting the same database, you&#8217;re probably back to the more concurrency-oriented scenarios outlined above.</em></p>
<p>Infobright is often cost-effective among columnar analytic DBMS. Other vendors might cut you a price break as well. If you have multiple terabytes of data, don&#8217;t rule out Netezza&#8217;s lowest-end products (even if they&#8217;d really rather sell you something bigger). Or, if you&#8217;re in the sub-terabyte range, maybe you can get by with an in-memory BI tool such as QlikView, and not do anything special on the DBMS side at all.</p>
<p><strong><em>Investigative data mart &#8212; big</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All, especially customer-centric, logs, financial trade, scientific</li>
<li><em>Likely use styles</em>: Investigative</li>
<li><em>Canonical example:</em> Single-subject 20 TB &#8211; 20 PB relational database<em></em></li>
<li><em>Stresses:</em> Performance, scale-out, analytic functionality</li>
</ul>
<p>But if you&#8217;re looking at tens of terabytes of relational data, or even more, you really do have a &#8220;big data&#8221; problem. Performance and scalability are major challenges, usually best addressed by MPP (Massively Parallel Processing) systems, such as Netezza, Vertica, Aster Data, ParAccel, Teradata, or Greenplum. Performance POCs (Proofs Of Concept) are a big part of the buying process. Vendor price negotiations are crucial too.</p>
<p><em>Actually, in the low tens of terabytes you might be able to get away with a shared-disk system that has excellent compression &#8212; e.g., columnar products like Sybase IQ, Infobright, or SAND, rather than just Vertica and ParAccel.</em></p>
<p>Assuming you have affordable, scalable query performance, the competitive differentiator can switch to additional analytic functionality. Aster, Netezza, ParAccel, Vertica, and Greenplum either offer full <a href="../../../../../2011/02/24/analytic-platforms/">analytic platforms</a>, or seem to be on the path to doing so. Teradata, which now owns Aster Data, offers substantial built-in analytic capability in its traditional products as well, and the same goes for Sybase IQ.</p>
<p><em>Continued in <a href="http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/">Part 2</a>,</em><em> where we cover some of the more difficult use cases.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>What to think about BEFORE you make a technology decision</title>
		<link>http://www.dbms2.com/2011/06/26/what-to-think-about-before-you-make-a-technology-decision/</link>
		<comments>http://www.dbms2.com/2011/06/26/what-to-think-about-before-you-make-a-technology-decision/#comments</comments>
		<pubDate>Sun, 26 Jun 2011 18:51:06 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Buying processes]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4835</guid>
		<description><![CDATA[When you are considering technology selection or strategy, there are a lot of factors that can each have bearing on the final decision &#8212; a whole lot. Below is a very partial list. In almost any IT decision, there are a number of environmental constraints that need to be acknowledged. Organizations may have standard vendors, [...]]]></description>
			<content:encoded><![CDATA[<p>When you are considering technology selection or strategy, there are a lot of factors that can each have bearing on the final decision &#8212; a whole lot. Below is a very partial list.</p>
<p>In almost any IT decision, there are a number of <strong>environmental constraints</strong> that need to be acknowledged. Organizations may have <strong>standard vendors</strong>, favored vendors, or simply vendors who give them <a href="../../../../../2011/06/24/observations-on-oracle-pricing/">particularly deep discounts</a>. <strong>Legacy systems</strong> are in place, application and system alike, and may or may not be open to replacement. Enterprises may have on-premise or off-premise preferences; SaaS (Software as a Service) vendors probably have <strong>multitenancy</strong> concerns. Your organization can determine which aspects of your system you&#8217;d ideally like to see be tightly <strong>integrated </strong>with each other, and which you&#8217;d prefer to keep only loosely coupled. You may have biases for or against <strong>open-source software.</strong> You may be pro- or anti-<strong>appliance.</strong> Some applications have a substantial need for elastic scaling. And some kinds of issues cut across multiple areas, such as <strong>budget</strong>, <strong>timeframe, security, </strong>or<strong> trained personnel.</strong></p>
<p>Multitenancy is particularly interesting, because it has numerous implications. <span id="more-4835"></span>If you&#8217;re a SaaS vendor supporting multiple customers, you must keep each customer&#8217;s data inaccessible to other users* &#8212; even if you offer high levels of flexibility or customization. You probably also want to keep data logically partitioned by user, in a way that the DBMS recognizes; you may also want that partition to hunt as a pack for caching purposes, especially if no one customer occupies a large part of your database. Administratively, you need a way to measure customer-specific metrics of the sort that might go into SLAs (Service-Level Agreements).</p>
<p><em>*Of course, there are exceptions. One of my clients is a SaaS vendor facilitating commerce; the whole point of their app is to let two different customers see and update the same records.</em></p>
<p>Getting more specific now, I&#8217;m usually called upon to <a href="http://www.monash.com/adviseusers.html">advise users</a> in two categories &#8212; those that already know they want to upgrade analytic functionality, and those that quickly realize they do once I remind them of it. Even so, many organizations struggle with the question &#8220;What do you want to do analytically?&#8221; It&#8217;s tough to blame them, for the question is distressingly circular; <strong>a big part of analytics is figuring out which kinds of analytics are worth doing.</strong> Also, SaaS vendors often struggle with the same question for a different reason, responding &#8220;Well, we know we&#8217;ve only been giving them basic stuff to date. What else do you think they would like?&#8221;</p>
<p>There&#8217;s no perfect solution to those difficulties, but a good way to start the evaluation is by assessing:</p>
<ul>
<li>The<strong> nature and value of your decisions that analytics could reasonably affect.</strong></li>
<li>Your <strong>realistic scope for automation of analytic decisions.</strong></li>
<li>The <strong>number and training of your &#8220;full-time analysts&#8221;</strong> &#8212; statisticians, SQL jocks who can program, SQL jocks who can&#8217;t really program, full-time users of BI tools, whatever.</li>
<li>The <strong>number and training of your &#8220;part-time analysts&#8221;</strong> &#8212; normal business users who can get something out of a dashboard, and perhaps even drill down into it.</li>
</ul>
<p>That should at least tell you which broad categories of analytics you want to engage in, and roughly how advanced in those areas you should try to be.</p>
<p><em>Basic business intelligence/dashboarding? Surely. Visualization-centric BI? If nothing else, it demos well. Basic predictive modeling? Hmm, are you sure nobody will want that? Advanced predictive modeling? Um, are you sure your users can handle that, or that the results will be worth the investment?</em></p>
<p>When I talk with users, there&#8217;s usually a data management problem in the mix too. In such cases, I quickly ask about <strong>data-related metrics</strong>, starting with database size, ingest volumes (batch, if relevant, but especially continuous), and simultaneous query load /concurrent user count. Similarly important are requirements for various kinds of <a href="http://www.dbms2.com/2009/09/10/analytic-speed-latency/">latency</a>, the big two being <strong>query response time</strong> and <strong>how long it takes for data to first be available for query. </strong>Less numeric questions in a similar vein boil down to &#8220;What kinds of requests will you make against the database, in what volume?&#8221;</p>
<p><em>And this loops back to the analytic-user inventory. Suppose you had a near-real-time dashboard &#8212; would anybody actually look at it minute to minute?</em></p>
<p>Specialized metrics I request when considering analytic DBMS include &#8220;How many columns are there in your widest table?&#8221; and &#8220;How many joins &#8212; or lines of SQL &#8212; are there in your most complex query?&#8221;, both of which are tools for assessing &#8220;Is your use case naturally columnar?&#8221;. Another, more general <strong>&#8220;natural structure of data&#8221;</strong> kind of consideration is what structure the data is in before it gets to the database being discussed; candidates include relational batch, XML stream, log file, and many more.</p>
<p>Also crucial are requirements for <strong><a href="http://www.dbms2.com/2010/05/01/ryw-read-your-writes-consistency/">consistency</a>, availability, </strong>and<strong> data integrity.</strong> Those tell you your needs in <strong>high availability </strong>and<strong> disaster recovery,</strong> and perhaps even how picky you have to be about your brands of hardware, software, or cloud/hosting provider. They also indicate how much you should care about relational or ACID properties, and where you should come down on <a href="http://www.dbms2.com/2010/03/12/some-nosql-links/">CAP Theorem</a> trade-offs.</p>
<p><em>I could go on even longer, but those seem like a pretty good set of initial questions with which to start discussions of data management, data integration, and analytic tools and architectures. What do you think I left out? And what do you think I could make substantially clearer by just adding a few more words? Any comments will be much appreciated.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/06/26/what-to-think-about-before-you-make-a-technology-decision/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Forthcoming Oracle appliances</title>
		<link>http://www.dbms2.com/2011/06/24/forthcoming-oracle-appliances/</link>
		<comments>http://www.dbms2.com/2011/06/24/forthcoming-oracle-appliances/#comments</comments>
		<pubDate>Fri, 24 Jun 2011 06:44:56 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Object]]></category>
		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4822</guid>
		<description><![CDATA[Edit: I checked with Oracle, and it&#8217;s indeed TimesTen that&#8217;s supposed to be the basis of this new appliance, as per a comment below. That would be less cool, alas. Oracle seems to have said on yesterday&#8217;s conference call Oracle OpenWorld (first week in October) will feature appliances based on Tangosol and Hadoop. As I [...]]]></description>
			<content:encoded><![CDATA[<p><em>Edit: I checked with Oracle, and it&#8217;s indeed TimesTen that&#8217;s supposed to be the basis of this new appliance, as per a comment below. That would be less cool, alas.</em></p>
<p>Oracle seems to have said on yesterday&#8217;s conference call Oracle OpenWorld (first week in October) will feature appliances based on Tangosol and Hadoop. As I post this, <a href="http://seekingalpha.com/article/276425-oracle-s-ceo-discusses-q4-2011-results-earnings-call-transcript?part=qanda">the Seeking Alpha transcript of Oracle&#8217;s call</a> is riddled with typos. Bolded comments below are by me.  <span id="more-4822"></span></p>
<blockquote><p>Well, we&#8217;re planning to add a couple of appliances and announcing them this fall. One appliance, that should surprise you is a large memory addition to Exadata for analytics and memory, so we continue to invest. We thought that would &#8212; we&#8217;ve been the leader of in-memory database technology ever since we bought Tungsten. <strong>I presume that&#8217;s a typo for <a href="../../../../../2007/03/25/oracle-tangosol-objects-caching-and-disruption/">&#8220;Tangosol&#8221;</a>. And it sort of denigrates Oracle TimesTen.</strong> And that&#8217;s for both for transactions and for preprocessing. We are, as memories become cheaper and larger scale, we&#8217;ve changed as much of our algorithms and this in-memory analytics accelerator is going to be, again, coming out and we&#8217;ll be announcing it in the fall at Oracle OpenWorld.</p></blockquote>
<p>That part, especially in connection with the last sentence of the next quote, sounds almost as if Tangosol will be positioned as a  kind of memory-centric object-oriented DBMS, albeit with Oracle as its  persistence layer. Well, I favor both <a href="../../../../../2011/05/23/databases-ram/">in-memory</a> and <a href="../../../../../2011/05/21/object-oriented-database-management-systems-oodbms/">object-oriented</a> DBMS, and especially the intersection of those two categories. So in  principle this could be a very cool product. Exploiting that coolness, however, may require one heck of a missionary sell.</p>
<blockquote><p>In addition, attaching to our Exalogic box, there&#8217;s a lot of misunderstanding about what&#8217;s a dupe is, and is it a replacement for database.<strong> I presume &#8220;a dupe&#8221; is a typo for &#8220;Hadoop&#8221;</strong>. So the dupe is not a replacement for database. It&#8217;s an adjunct to the database, which we think, is very, very important. It really is a tool for Java programmers. And we&#8217;re the world leader in Java technology and we are building a big data accelerator to attach to our Exalogic box, which comes out also this fall. The big data accelerator includes some of the standard open source heavy software, HTFF, the heavy file system and a number of other pieces, but also some Oracle components that we think can dramatically speed up the entire math-produced process. <strong>I presume that&#8217;s a series of typos for &#8220;HDFS&#8221; and &#8220;MapReduce</strong>&#8220;. And will be particularly attractive to Java programmers who are the ones, who asked for &#8212; aspire to do. There are some interesting applications they do, ETL is one. Log processing is another. <strong>Those last two sentences are more evidence for the theory that this is about Hadoop. Besides, I spoke with somebody who listened to the call. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </strong> We&#8217;re going to have a lot of those features, functions and prebuilt applications in our big data accelerator. So, Oracle has always followed database technology trends, whether it&#8217;s object databases, in-memory databases and kept up with this technology and some, quite often led on innovation.</p></blockquote>
<p>And that part sounds as if Oracle will announce a Hadoop appliance, positioning it more as a Java software accelerator than a place to  store cheap data. Be the positioning as it may, my <a href="../../../../../2011/06/02/why-you-would-want-an-appliance-and-when-you-wouldnt/">objections  to the idea of a Hadoop appliance</a> still stand, although <a href="../../../../../2011/06/02/why-you-would-want-an-appliance-and-when-you-wouldnt/#comment-228238">Amr  Awadallah&#8217;s counterarguments</a> make sense as well.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/06/24/forthcoming-oracle-appliances/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
	</channel>
</rss>

