<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Vertica Systems</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/vertica-systems/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Wed, 08 Feb 2012 17:17:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Hope for a new PostgreSQL era?</title>
		<link>http://www.dbms2.com/2011/11/23/hope-for-a-new-postgresql-era/</link>
		<comments>http://www.dbms2.com/2011/11/23/hope-for-a-new-postgresql-era/#comments</comments>
		<pubDate>Wed, 23 Nov 2011 14:18:00 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[EnterpriseDB and Postgres Plus]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[salesforce.com]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5728</guid>
		<description><![CDATA[In a comedy of briefing errors, I&#8217;m not too clear on the details of my client salesforce.com&#8217;s new PostgreSQL-as-a-service offering, nor exactly on what my clients at VMware are bringing to the PostgreSQL virtualization/cloud party. That said: PostgreSQL is good technology. MySQL is narrowing the gap, but PostgreSQL is still ahead of MySQL in some [...]]]></description>
			<content:encoded><![CDATA[<p>In a comedy of briefing errors, I&#8217;m not too clear on the details of my client <a href="http://gigaom.com/cloud/heroku-launches-sql-database-as-a-service/">salesforce.com&#8217;s new PostgreSQL-as-a-service offering</a>, nor exactly on what my clients at VMware are bringing to the PostgreSQL virtualization/cloud party. That said:</p>
<ul>
<li>PostgreSQL is good technology.</li>
<li>MySQL is narrowing the gap, but PostgreSQL is still ahead of MySQL in some ways.  (Database extensibility if nothing else.)</li>
<li>PostgreSQL has a lot of users. (Many of them in academia and/or Russia.)</li>
<li>Neither EnterpriseDB (which now calls itself &#8220;The enterprise PostgreSQL company&#8221;) nor the PostgreSQL community leadership have covered themselves with stewardship glory.</li>
<li>A significant number of interesting DBMS products can be regarded as PostgreSQL forks (e.g. Greenplum, Aster Data nCluster, Netezza if you squint, and Vertica if you stand on your head*).</li>
<li>PostgreSQL advancement is not dead. For example, <a href="../../../../../2011/11/08/hadapt-is-moving-forward/">Hadapt beta users are running actual PostgreSQL on many nodes each</a>.</li>
<li><a href="../../../../../2009/12/14/oracle-mysql-storage-engine/">There&#8217;s no assurance that Oracle will be a benevolent MySQL steward forever</a>. (Specifically, Oracle&#8217;s &#8220;Play nicely with others&#8221; antitrust commitments expire in 2014.)</li>
</ul>
<p>So I think it would be cool if one or the other big company put significant wood behind the PostgreSQL arrow.</p>
<p><em>*While Vertica was originally released using little or no PostgreSQL code &#8212; reports varied &#8212; it featured high degrees of PostgreSQL compatibility.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/23/hope-for-a-new-postgresql-era/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Some big-vendor execution questions, and why they matter</title>
		<link>http://www.dbms2.com/2011/11/21/big-vendor-execution-analytics/</link>
		<comments>http://www.dbms2.com/2011/11/21/big-vendor-execution-analytics/#comments</comments>
		<pubDate>Mon, 21 Nov 2011 11:01:20 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cognos]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5704</guid>
		<description><![CDATA[When I drafted a list of key analytics-sector issues in honor of look-ahead season, the first item was &#8220;execution of various big vendors&#8217; ambitious initiatives&#8221;.  By &#8220;execute&#8221; I mean mainly: &#8220;Deliver products that really meet customers&#8217; desires and needs.&#8221; &#8220;Successfully convince them that you&#8217;re doing so &#8230;&#8221; &#8220;&#8230; at an attractive overall cost.&#8221; Vendors mentioned [...]]]></description>
			<content:encoded><![CDATA[<p>When I drafted a list of key analytics-sector issues in honor of <a href="http://www.dbms2.com/2011/11/21/analytic-trends-in-2012-qa/">look-ahead season</a>, the first item was &#8220;execution of various big vendors&#8217; ambitious initiatives&#8221;.  By &#8220;execute&#8221; I mean mainly:</p>
<ul>
<li>&#8220;Deliver products that really meet customers&#8217; desires and needs.&#8221;</li>
<li> &#8220;Successfully convince them that you&#8217;re doing so &#8230;&#8221;</li>
<li>&#8220;&#8230; at an attractive overall cost.&#8221;</li>
</ul>
<p>Vendors mentioned here are Oracle, SAP, HP, and IBM. Anybody smaller got left out due to the length of this post. Among the bigger omissions were:</p>
<ul>
<li>salesforce.com (multiple subjects).</li>
<li><a href="../../../../../2011/04/21/sas-hpa-does-make-sense-after-all/">SAS HPA</a>.</li>
<li><a href="../../../../../2011/08/21/hadoop-evolution/">The evolution of Hadoop</a>.</li>
</ul>
<p><span id="more-5704"></span><strong>A (lingering) issue for SAP and Oracle alike</strong></p>
<p>As I noted in January of this year, <a href="../../../../../2011/01/03/the-six-useful-things-you-can-do-with-analytic-technology/">integration of business intelligence into operational apps is making very slow progress</a>. Even so, it&#8217;s a huge part of the apparent strategy at SAP and Oracle alike, as well it should be. Much of the benefit from automating routine desk work has already happened. The areas ripest for exploitation are the ones where analytics are part of the equation.</p>
<p>Given the lack of tangible progress, why do I think this is a genuine area of Oracle and SAP emphasis? Three reasons of many are:</p>
<ul>
<li>Why else did SAP buy Business Objects?</li>
<li>If they&#8217;re not trying to <a href="../../../../../2011/03/30/short-request-and-analytic-processing/">integrate operational apps and analytics</a>, why else does SAP&#8217;s emphasis on HANA make sense?</li>
<li>Without business intelligence in the picture, how does Oracle&#8217;s integrated-stack story promise any direct user benefits?*</li>
</ul>
<p><em>*As opposed to IT concerns &#8212; integration, administration, TCO (Total Cost of Ownership), etc.</em></p>
<p>After so many years of disappointment, I&#8217;m not going to forecast 2012 as a pivotal year for <strong>the integration of business intelligence into operational applications.</strong> But if one of SAP or Oracle ever does get a significant BI/operational app integration advantage over the other, it could be a major competitive advantage in those application market segments that are still up for grabs. It also is an opportunity for both vendors to gain BI market share in their respective application customer bases.</p>
<p><strong>A more urgent issue for SAP</strong></p>
<p>SAP has put huge amounts of credibility on the line for HANA, the integration of two different and not particularly mature in-memory database technologies. So far, it is difficult to find evidence that HANA is robust enough for widespread adoption. Whether or not SAP can fix that is a huge open question, which could have significant impact on the course of several technology areas: applications, business intelligence, in-memory DBMS, and maybe even hardware.</p>
<p>Based on current information, which is admittedly partial, I&#8217;m a short-term pessimist on HANA. Longer-term, I&#8217;m on record as saying that <a href="../../../../../2011/05/23/databases-ram/">traditional databases will eventually wind up in RAM</a>. SAP will surely get that technology right some day, whether or not the way it does so has anything to do with present-day HANA code.</p>
<p><strong>Four more issues for Oracle </strong></p>
<p>Oracle&#8217;s ambitions are near-endless, and so also therefore is its list of execution challenges. Four in the analytics area that I find particularly interesting are:</p>
<ul>
<li><strong>True hybrid columnar DBMS.</strong> <a href="../../../../../2011/09/22/teradata-columnar-compression/">I was guessing that Oracle, like Teradata, would announce true hybrid columnar the week of Oracle OpenWorld</a>. I was wrong. But if Oracle can&#8217;t bring out true hybrid columnar DBMS functionality relatively soon, Exadata will lose credibility as a competitor to more specialized analytic DBMS.</li>
<li><strong>Oracle Exalytics.</strong> With Exalytics in the mix, Oracle&#8217;s technology stack has HANA-like potential. But will Exalytics even ship in 2012? (I think so.) Will it be good for much in the first release? (I&#8217;m skeptical.)</li>
<li><strong>Oracle&#8217;s Big Data Appliance</strong>. I&#8217;m skeptical both about <a href="../../../../../2011/10/20/more-notes-on-oracle-nosql/">Oracle&#8217;s NoSQL product</a> &#8212; <a href="http://www.infoworld.com/d/data-explosion/first-look-oracle-nosql-database-179107">a favorable InfoWorld review</a> notwithstanding &#8212; and <a href="../../../../../2011/09/23/hadoop-appliances/">Hadoop appliances</a>. But if I&#8217;m wrong, and Oracle can successfully embrace/extend the new non-relational paradigms, then it really might regain control over the evolution of data management.</li>
<li><strong><a href="../../../../../2011/10/18/oracle-is-buying-endeca/">Oracle&#8217;s Endeca acquisition</a></strong> &#8212; will Oracle prove me wrong and integrate Endeca effectively into its overall analytic product line? If it does, we might finally see effective text (and eventually speech) navigation of enterprise software. (But as with all Oracle issues cited here, this is something that probably won&#8217;t amount to much in 2012 even if it does later go well.)</li>
</ul>
<p><strong>Three issues for IBM</strong></p>
<p>Like Oracle, IBM is a huge company with many ambitions and hence many execution challenges. The biggest of those is surely: <strong>How effective can IBM be at selling outside its existing customer base?</strong> I don&#8217;t hear as much competitively about IBM DataStage, IBM SPSS or now IBM Netezza as I did when their vendors were independent companies. Even Cognos may not be much of an exception to the rule, although it has its own large customer base outside of IBM&#8217;s traditional one. (To lesser extents , the same is of course true of Netezza and numerous other IBM acquisitions.)</p>
<p>Another general issue for IBM is <strong>substantively integrating its various product lines,</strong> at least to the extent that makes sense. DB2/Netezza integration sounds good, but even that is a matter more of product marketing (the admirable part of that discipline) more than of actual technology. Other integrations (e.g. Cognos/DB2 in various bundles) have tended toward the dubious side.*</p>
<p><em>*I&#8217;m still waiting for IBM to get back to me with examples of how Cognos/DB2 joint tuning amounts to anything. It&#8217;s been more than a year, so I&#8217;m glad I didn&#8217;t hold my breath.</em></p>
<p>In a somewhat narrower vein, I wonder: <strong><a href="../../../../../2011/11/10/cep-streaming-catchup/">Will IBM be able to gain traction for InfoSphere Streams</a>? </strong>And if so, when and where will the traction be?</p>
<p><strong>Will HP screw up Vertica?</strong></p>
<p>Vertica has a very attractive product offering. It&#8217;s perhaps <a href="../../../../../2011/06/20/columnar-dbms-vendor-customer-metrics/">the most scalable analytic DBMS outside of Teradata</a>, running on the hardware of your reasonable choice.  It&#8217;s also the one I recommend most often to clients in the 1-50 terabyte range.</p>
<p>So far HP doesn&#8217;t seem to have done much to leadfoot Vertica. (About all I&#8217;ve heard from competitors is that Vertica seems to have faded somewhat in the financial services market, and there could be multiple explanations if that is indeed true.) But if HP Vertica does somehow manage to botch things, opportunities will open up for a range of columnar analytic DBMS competitors.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/21/big-vendor-execution-analytics/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Analytic trends in 2012: Q&amp;A</title>
		<link>http://www.dbms2.com/2011/11/21/analytic-trends-in-2012-qa/</link>
		<comments>http://www.dbms2.com/2011/11/21/analytic-trends-in-2012-qa/#comments</comments>
		<pubDate>Mon, 21 Nov 2011 11:00:23 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[QlikTech and QlikView]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Tableau Software]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5692</guid>
		<description><![CDATA[As a new year approaches, it&#8217;s the season for lists, forecasts and general look-ahead. Press interviews of that nature have already begun. And so I&#8217;m working on a trilogy of related posts, all based on an inquiry about hot analytic trends for 2012. This post is a moderately edited form of an actual interview. Two [...]]]></description>
			<content:encoded><![CDATA[<p>As a new year approaches, it&#8217;s the season for lists, forecasts and general look-ahead. Press interviews of that nature have already begun. And so I&#8217;m working on a trilogy of related posts, all based on an inquiry about hot analytic trends for 2012.</p>
<p>This post is a moderately edited form of an actual interview. Two other posts cover analytic trends to watch (planned) and <a href="http://www.dbms2.com/2011/11/21/big-vendor-execution-analytics/">analytic vendor execution challenges to watch</a> (already up).</p>
<p><span id="more-5692"></span><strong>Question</strong>: What do you think will happen next year with the Tableaus of the world?</p>
<p><strong>Answer:</strong></p>
<ul>
<li>I think adoption of flexible-visualization business intelligence tools will continue to be rapid.</li>
<li>I think enterprise-friendly features will be increasingly important as a basis of competition.</li>
</ul>
<p><strong>Question</strong>: What do you mean by &#8220;enterprise-friendly&#8221;?</p>
<p><strong>Answer</strong>: An example would be <a href="http://www.dbms2.com/2011/11/16/qlikview-collaborative-business-intelligence/">QlikTech no longer forcing you to use their native ETL</a>, but rather working with Informatica and soon other third-party products. Also important can be:</p>
<ul>
<li>Database size.</li>
<li>Concurrency.</li>
<li>A full-featured development cycle for analytic applications.</li>
</ul>
<p><strong>Question</strong>: What does HP have to do to be relevant in analytics/data warehousing?</p>
<p><strong>Answer</strong>: Avoid stupidity. HP Vertica is already relevant.</p>
<p><strong>Question</strong>: OK. But what can HP do to build on Vertica?</p>
<p><strong>Answer</strong>: HP &#8212; which botched Exadata 1 hardware &#8212; could do a good job with SAP HANA or other kinds of appliance products.</p>
<p>However:</p>
<ul>
<li>I don&#8217;t think trying to force Vertica beyond its natural growth &#8212; <a href="http://www.dbms2.com/2011/04/16/unpacking-the-emc-greenplum-q1-sales-disaster-rumors/">the way EMC is with Greenplum</a> &#8212; is necessarily a good idea. Natural growth in Vertica&#8217;s case is plenty fast anyway.</li>
<li>Obviously, making good Vertica hardware would be nice. But being hardware-independent is crucial to Vertica, not least because of cloud deployment, an option many buyers want to at least have in their hip pockets.</li>
</ul>
<p><strong>Question</strong>: You expressed some skepticism toward mobile BI/use cases. Why so?</p>
<p><strong>Answer</strong>: The form factor hurts functionality a lot, so it&#8217;s only worthwhile in cases where timeliness is key.</p>
<p>And without more refined alert-setting functionality, it&#8217;s hard to think of that many cases.</p>
<p><em>Note: My views on mobile BI haven&#8217;t changed much since <a href="../../../../../2010/07/15/mobile-business-intelligence/">July, 2010</a>.</em></p>
<p><strong>Question</strong>: What about the idea of an enterprise being able to pay-per-drink to run jobs on an analytic cluster. Do you expect that concept to have any legs in 2012?</p>
<p><strong>Answer</strong>: While other kinds of SaaS (Software as a Service) BI might make sense, remote computing BI that focuses on hardware cost sharing is problematic. Moving data in and out of the cluster is a big part of the overall cost, at least if you plan to process it only occasionally once it gets there. I haven&#8217;t seen a plan yet that gets around that point.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/21/analytic-trends-in-2012-qa/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Vertica Community Edition</title>
		<link>http://www.dbms2.com/2011/10/18/vertica-community-edition/</link>
		<comments>http://www.dbms2.com/2011/10/18/vertica-community-edition/#comments</comments>
		<pubDate>Tue, 18 Oct 2011 15:48:10 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5491</guid>
		<description><![CDATA[The press release announcing Vertica&#8217;s Community Edition is a bit vague. And indeed, much of what I know about Vertica Community Edition is along the lines of &#8220;This is what I think will happen, but of course it could still change.&#8221; That said, I believe: Vertica Community Edition has all of regular Vertica&#8217;s features. However [...]]]></description>
			<content:encoded><![CDATA[<p>The press release announcing <a href="http://www.vertica.com/news/press/vertica-announces-community-edition-version-of-vertica-analytic-database/">Vertica&#8217;s Community Edition</a> is a bit vague. And indeed, much of what I know about Vertica Community Edition is along the lines of &#8220;This is what I think will happen, but of course it could still change.&#8221; That said, I believe:</p>
<ul>
<li>Vertica Community Edition has all of regular Vertica&#8217;s features. However &#8230;</li>
<li>&#8230; HP Vertica reserves the right to open a feature gap in future releases.</li>
<li>The license restriction on Vertica Community Edition is that you&#8217;re limited to 1 terabyte of data, and 3 nodes. I imagine that&#8217;s for one production copy, and you&#8217;re perfectly free to also set up mirrors for test, development, disaster recovery, and so on. However &#8230;</li>
<li>&#8230; HP Vertica would be annoyed if you stuck a free copy of Vertica on each of 50 nodes and managed the whole thing via, say, Hadapt.</li>
<li>HP Vertica plans to be very generous with true academic researchers, suspending or waiving limits on database size and node count. Not coincidentally, Vertica Community Edition is being announced at <a href="http://www.dbms2.com/2011/09/20/xldb-the-one-conference-i-like-to-go-to/">XLDB</a>, where Vertica is also a top-level sponsor. (I introduced Vertica and XLDB&#8217;s Jacek Becla to each other as soon as I heard about Vertica&#8217;s Community Edition plans.)</li>
<li>The only support available for Vertica Community Edition is through forums. This could change.</li>
</ul>
<p>I&#8217;m a big supporter of the Vertica Community Edition idea, for four reasons:</p>
<ul>
<li>It should now be easier to download and evaluate Vertica.</li>
<li>Vertica Community Edition could be a big help to academic researchers.</li>
<li>Vertica could now be more appealing to some of the &#8220;Omigod, we&#8217;re outgrowing Oracle Standard Edition and we don&#8217;t want to pay up for Oracle Enterprise Edition/Exadata&#8221; crowd.</li>
<li>People are under the impression that what Vertica actually charges today resembles its <a href="http://www.dbms2.com/2009/04/25/vertica-pricing-and-customer-metrics/">long-ago list prices</a>. This announcement may help puncture Vertica&#8217;s outdated pricing image.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/10/18/vertica-community-edition/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>HP systems soundbites</title>
		<link>http://www.dbms2.com/2011/09/22/hp-systems-soundbites/</link>
		<comments>http://www.dbms2.com/2011/09/22/hp-systems-soundbites/#comments</comments>
		<pubDate>Thu, 22 Sep 2011 17:44:31 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Exadata]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5314</guid>
		<description><![CDATA[It is widely rumored that there will be a leadership change at HP (Meg Whitman in, Leo Apotheker out). In connection with that, I found myself holding forth on points such as: HP needs to make outstanding enterprise systems again. They fell away from that target under Mark Hurd, but they surely can hit it [...]]]></description>
			<content:encoded><![CDATA[<p>It is widely rumored that there will be a leadership change at HP (Meg Whitman in, Leo Apotheker out). In connection with that, I found myself holding forth on points such as:</p>
<ul>
<li>HP needs to make outstanding enterprise systems again.</li>
<li>They fell away from that target under Mark Hurd, but they surely can hit it again, based on the remnants of DEC (Digital Equipment Corporation), Tandem, the higher-end part of Compaq, and of course the original HP systems group.</li>
<li>In particular:
<ul>
<li>Rumors say that Oracle Exadata 1 boxes, made by HP, were much lower quality than Exadata 2 boxes made by Sun.</li>
<li>HP Neoview was a waste of good engineering talent.</li>
<li>I&#8217;d like to see a few excellent Vertica appliances.</li>
<li>I hope the SAP HANA appliances go well, whenever HANA finally becomes a serious product.</li>
<li>The general move from disk to solid-state memory should offer some opportunities.</li>
</ul>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/22/hp-systems-soundbites/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Vertica projections &#8212; an overview</title>
		<link>http://www.dbms2.com/2011/09/07/vertica-projections/</link>
		<comments>http://www.dbms2.com/2011/09/07/vertica-projections/#comments</comments>
		<pubDate>Thu, 08 Sep 2011 03:09:00 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5195</guid>
		<description><![CDATA[Partially at my suggestion, Vertica has blogged a three-part series explaining the &#8220;projections&#8221; that are central to a Vertica database. This is important, because in Vertica projections play the roles that in many analytic DBMS might be filled by base tables, indexes, AND materialized views. Highlights include: A Vertica projection can contain: All the columns [...]]]></description>
			<content:encoded><![CDATA[<p>Partially at my suggestion, Vertica has blogged a <a href="http://www.vertica.com/2011/09/01/the-power-of-projections-part-1/">three</a>-<a href="http://www.vertica.com/2011/09/02/the-power-of-projections-part-2/">part</a> <a href="http://www.vertica.com/2011/09/06/the-power-of-projections-part-3/">series</a> explaining the &#8220;projections&#8221; that are central to a Vertica database. This is important, because in Vertica projections play the roles that in many analytic DBMS might be filled by base tables, indexes, AND materialized views. Highlights include:</p>
<ul>
<li>A Vertica projection can contain:
<ul>
<li>All the columns in a table.</li>
<li>Some of the columns in a table.</li>
<li>A prejoin among tables.</li>
</ul>
</li>
<li>Vertica projections are updated and maintained just as base tables are. (I.e., there&#8217;s no kind of batch lag.)</li>
<li>You can import the same logical schema you use elsewhere. Vertica puts no constraints on your logical schema. <em>Note: Vertica has been claiming good support for all logical schemas since <a href="http://www.dbms2.com/2010/02/22/vertica-4/">Vertica 4.0</a> came out in early 2010.</em></li>
<li>Vertica (the product) will automatically generate a physical schema for you &#8212; i.e. a set of projections &#8212; that Vertica (the company) thinks will do a great job for you. <em>Note: That also dates back to <a href="http://www.dbms2.com/2010/02/22/vertica-4/">Vertica 4.0</a>.</em></li>
<li>Vertica claims that queries are very fast even when you haven&#8217;t created projections explicitly for them. <em>Note: While the extent to which this is true may be a matter of dispute, competitors clearly overreach when they make assertions like &#8220;every major Vertica query needs a projection prebuilt for it.&#8221;</em></li>
<li>On the other hand, it is advisable to build projections (automatically or manually) that optimize performance of certain parts of your query load.</li>
</ul>
<p>The blog posts contain a lot more than that, of course, both rah-rah and technical detail, including reminders of other Vertica advantages (compression, no logging, etc.). If you&#8217;re interested in analytic DBMS, they&#8217;re worth a look.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/07/vertica-projections/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Data management at Zynga and LinkedIn</title>
		<link>http://www.dbms2.com/2011/09/05/zynga-linkedin-data-warehous/</link>
		<comments>http://www.dbms2.com/2011/09/05/zynga-linkedin-data-warehous/#comments</comments>
		<pubDate>Mon, 05 Sep 2011 08:49:04 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Games and virtual worlds]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Zynga]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5159</guid>
		<description><![CDATA[Mike Driscoll and his Metamarkets colleagues organized a bit of a bash Thursday night. Among the many folks I chatted with were Ken Rudin of Zynga, Sam Shah of LinkedIn, and D. J. Patil, late of LinkedIn. I now know more about analytic data management at Zynga and LinkedIn, plus some bonus stuff on LinkedIn&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>Mike Driscoll and his <a href="http://www.metamarketsgroup.com/">Metamarkets</a> colleagues organized a bit of a <a href="http://yfrog.com/h8msmkqj">bash</a> Thursday night. Among the many folks I chatted with were Ken Rudin of Zynga, Sam Shah of LinkedIn, and D. J. Patil, late of LinkedIn. I now know more about analytic data management at Zynga and LinkedIn, plus some bonus stuff on LinkedIn&#8217;s People You May Know application. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>It&#8217;s blindingly obvious that Zynga is one of <a href="../../../../../2011/06/20/columnar-dbms-vendor-customer-metrics/">Vertica&#8217;s petabyte-scale customers</a>, given that Zynga sends 5 TB/day of data into Vertica, and keeps that data for about a year. (Zynga may retain even more data going forward; in particular, Zynga regrets ever having thrown out the first month of data for any game it&#8217;s tried to launch.) This is game actions, for the most part, rather than log files; true logs generally go into Splunk.</p>
<p><em>I don&#8217;t know whether the missing data is completely thrown away, or just stashed on inaccessible tapes somewhere.</em></p>
<p>I found two aspects of the Zynga story particularly interesting. First, those 5 TB/day are going straight into Vertica (from, I presume, <a href="http://www.dbms2.com/2010/08/18/nosql-hvsp-adoption/">memcached/Membase/Couchbase</a>), as Zynga decided that sending the data to some kind of log first was more trouble than it&#8217;s worth. Second, there&#8217;s Zynga&#8217;s approach to analytic database design. Highlights of that include: <span id="more-5159"></span></p>
<ul>
<li>Data is divided into two parts. One part has a  pretty ordinary schema; the other is just stored as a huge list of name-value pairs. (This is much like <a href="../../../../../2010/10/06/ebay-followup-greenplum-out-teradata-10-petabytes-hadoop-has-some-value-and-more/">eBay</a>&#8216;s approach with its Teradata-based Singularity, except that eBay puts the name-value pairs into long character strings.) About half the data is in each part, but I don&#8217;t think that&#8217;s by deliberate choice.</li>
<li>Zynga adds data into the real schema when it&#8217;s clear it will be needed for a while. This isn&#8217;t a matter of query volumes, for the most part; rather, it&#8217;s when Zynga&#8217;s tests (e.g. of new games?) have determined that the data will keep being collected and used for a while.</li>
<li>Zynga only adds columns to its analytic  database; it never goes through the more complex process of deleting them.</li>
</ul>
<p>Just as Zynga is one of Vertica&#8217;s flagship accounts, LinkedIn is one of Aster Data&#8217;s. Specifically, before leaving LinkedIn for Aster, Jonathan Goldman built LinkedIn&#8217;s People You May Know feature in Aster nCluster. This was long ago, and I&#8217;m not sure how sophisticated his use of <a href="../../../../../2009/03/07/three-greenplum-customers-applications-of-mapreduce/">SQL and MapReduce</a> would be in today&#8217;s terms; for example, I was told he didn&#8217;t use &#8220;nPath or anything like that.&#8221; <em>(Edit: See the comments below for clarifications from Jonathan.) </em>Anyhow, LinkedIn has replaced Aster for PYMK with Hadoop, and in my opinion is getting much better results.</p>
<p>That, from an Aster standpoint, is the bad news. The good news is that LinkedIn is happily using Aster nCluster for several other applications; LinkedIn folks doesn&#8217;t seem to regret throwing out* Greenplum for Aster; and they also seem to have a very high opinion of Jonathan and his work while he was there.</p>
<p><em>*And <a href="http://www.dbms2.com/2010/10/06/ebay-followup-greenplum-out-teradata-10-petabytes-hadoop-has-some-value-and-more/">this time</a> that is indeed the phrase that was used. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </em></p>
<p>One thing that astonished me is that LinkedIn PYMK is based only on data innate to LinkedIn (as opposed to imported email addresses, the results of web crawls, and so on). Given that, I am at a loss to explain how it suggested a couple of old friends, to whom I have no discernable chain of connection. Yes, we were at Harvard at the same time, but if that&#8217;s all it was, there would be a huge number of false positives I&#8217;m not actually seeing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/05/zynga-linkedin-data-warehous/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
		</item>
		<item>
		<title>HP/Autonomy sound bites</title>
		<link>http://www.dbms2.com/2011/08/18/hp-autonomy-vertica/</link>
		<comments>http://www.dbms2.com/2011/08/18/hp-autonomy-vertica/#comments</comments>
		<pubDate>Fri, 19 Aug 2011 02:09:31 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Structured documents]]></category>
		<category><![CDATA[Text]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5101</guid>
		<description><![CDATA[HP has announced that: HP is buying Autonomy. HP is pulling back from WebOS. HP may spin off its PC business altogether. On a high level, this means: HP is doubling down on enterprise IT. HP is taking a more software-centric approach to the enterprise IT business. HP is backing away from the consumer electronics [...]]]></description>
			<content:encoded><![CDATA[<p>HP has announced that:</p>
<ul>
<li>HP is buying Autonomy.</li>
<li>HP is pulling back from WebOS.</li>
<li>HP may spin off its PC business altogether.</li>
</ul>
<p>On a high level, this means:</p>
<ul>
<li>HP is doubling down on enterprise IT.</li>
<li>HP is taking a more software-centric approach to the enterprise IT business.</li>
<li>HP is backing away from the consumer electronics business.</li>
<li>HP in particular is backing away from the generic desktop/laptop PC business, which may with only moderate exaggeration be regarded as:
<ul>
<li>The intersection of the enterprise IT and consumer electronics businesses.</li>
<li>The least attractive sector of each.</li>
</ul>
</li>
</ul>
<p><a href="http://www.texttechnologies.com/category/vendors/autonomy/">My coverage of Autonomy</a> isn&#8217;t exactly current, but I don&#8217;t know of anything that contradicts long-time competitor* Dave Kellogg&#8217;s <a href="http://kellblog.com/2011/08/18/hp-rumored-to-be-buying-uks-autonomy-for-10b/">skeptical view of Autonomy</a>. Autonomy is a collection of businesses involved in the management, search, and retrieval of <a href="../../../../../2011/05/17/poly-structured-database/">poly-structured data</a>, in some cases with strong market share, but even so not necessarily with the strongest of reputations for technology or technology momentum. Autonomy started from a text search engine and a Bayesian search algorithm on top of that, which did a decent job for many customers. But if there&#8217;s been much in the way of impressive enhancement over the past 8-10 years, I&#8217;ve missed the news.</p>
<p><em>*Dave, of course, was CEO of MarkLogic.</em></p>
<p>Questions obviously arise about how the Autonomy acquisition relates to other HP businesses. My early thoughts include:  <span id="more-5101"></span></p>
<ul>
<li>HP has clearly signaled that it intends to pursue and focus on the data management business. Thus, we can anticipate marketing messages spanning Autonomy and <a href="../../../../../2011/06/20/vertica-release-5/">Vertica</a>. It may be helpful to recall that Vertica plays nicely with both <a href="../../../../../2010/10/12/vertica-hadoop-connector-integration/">Hadoop</a> and <a href="../../../../../2011/04/14/attensity-update/">Attensity</a>.</li>
<li>The first two natural tuck-in acquisitions I can think to add are Attensity and <a href="../../../../../2011/04/05/whither-marklogic/">MarkLogic</a>.</li>
<li>One place I&#8217;d look for synergy is with HP&#8217;s system management software business. HP has previously acquired its way into a strong position there. If you add in knowledge of how many kinds of data are used, you have a chance to set yourself apart in the system management area.</li>
<li>I had enough trouble advising Vertica about how to explain what they do in terms that HP&#8217;s hardware sales force can comfortably embrace. I think I did OK with that. But Autonomy? Youch. On the other hand, &#8230;</li>
<li>&#8230; HP is run by guys from SAP (Leo Apotheker) and Oracle (Ray Lane), both of whom have dealt with similarly tough sales challenges before. But even at best, HP&#8217;s sales force organization, commission structure, and training is going to consume a lot of attention at the very highest levels of HP.</li>
<li>Autonomy manages documents electronically. HP prints them. The markets where that seems synergistic, however, are fairly specialized or small. (E.g., equipment for printing on demand.) Perhaps there&#8217;s some grand joint venture possibility with Xerox here, antitrust permitting.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/08/18/hp-autonomy-vertica/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Hadoop hardware and compression</title>
		<link>http://www.dbms2.com/2011/07/06/hadoop-hardware-and-compression/</link>
		<comments>http://www.dbms2.com/2011/07/06/hadoop-hardware-and-compression/#comments</comments>
		<pubDate>Wed, 06 Jul 2011 05:09:10 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Zettaset]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4899</guid>
		<description><![CDATA[A month ago, I posted about typical Hadoop hardware. After talking today with Eric Baldeschwieler of Hortonworks, I have an update. I also learned some things from Eric and from Brian Christian of Zettaset about Hadoop compression. First the compression part. Eric thinks 6-10X compression is common for &#8220;curated&#8221; Hadoop data &#8212; i.e., the data [...]]]></description>
			<content:encoded><![CDATA[<p>A month ago, I posted about <a href="../../../../../2011/06/04/hardware-for-hadoop/">typical Hadoop hardware</a>. After talking today with Eric Baldeschwieler of Hortonworks, I have an update. I also learned some things from Eric and from Brian Christian of Zettaset about Hadoop compression.</p>
<p>First the compression part. Eric thinks 6-10X compression is common for &#8220;curated&#8221; Hadoop data &#8212; i.e., the data that actually gets used a lot. Brian used an overall figure of 6-8X, and told of a specific customer who had 6X or a little more. By way of comparison, it sounds as if the kinds of data involved are like what <a href="../../../../../2008/09/24/vertica-finally-spells-out-its-compression-claims/">Vertica claimed 10-60X compression</a> for almost three years ago.</p>
<p>Eric also made an excellent point about low-value <a href="../../../../../2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a>. I was suggesting that as Moore&#8217;s Law made sensor networks ever more affordable:  <span id="more-4899"></span></p>
<ul>
<li>There would be lots more data thrown off.</li>
<li>A lot of it would be repetitive &#8220;I&#8217;m fine; nothing to report&#8221; kinds of events.</li>
<li>It would be a good idea to filter this low-value information out rather than permanently storing it.</li>
</ul>
<p>Eric retorted that such data compresses extremely well. He was, of course, correct. If you have a long sequence or other large amount of identical data, and the right compression algorithms* &#8212; yeah, that compresses really well.</p>
<p><em>*Think run-length encoding (RLE), delta, or tokenization with variable-length tokens.</em></p>
<p>While I was at it, I asked Eric what might be typical for Hadoop temp/working space. He said at Yahoo it was getting down to 1/4 of the disk, from a previous range of 1/3.</p>
<p>Anyhow, Yahoo&#8217;s most recent standard Hadoop nodes feature:</p>
<ul>
<li>8-12 cores</li>
<li>48 gigabytes of RAM</li>
<li>12 disks of 2 or 3 TB each</li>
</ul>
<p>If you divide 12 by 3 for standard Hadoop redundancy, and take off 1/4, then you have 6-9 TB/node. Multiple that by a compression factor of 6-10X, at least for the &#8220;curated data,&#8221; and you get to 36-90 TB of user data per node.</p>
<p>As an alternative, suppose we take a point figure from <a href="http://www.dbms2.com/2011/06/04/hardware-for-hadoop/">Cloudera&#8217;s ranges</a> of 16 TB of spinning disk per node (8 spindles, 2 TB/disk). Go with the 6X compression figure. Lop off 1/3 for temp space. That more conservative calculation leaves us a bit over 20 TB/node, which is probably a more typical figure among today&#8217;s Hadoop users.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/06/hadoop-hardware-and-compression/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Eight kinds of analytic database (Part 2)</title>
		<link>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/</link>
		<comments>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 08:18:18 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Buying processes]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data types]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MOLAP]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Rainstor]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[SenSage]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4867</guid>
		<description><![CDATA[In Part 1 of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I&#8217;ll cover four more kinds of analytic database &#8212; even newer, for the most part, with a use case/product short list [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/">Part 1</a> of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I&#8217;ll cover four more kinds of analytic database &#8212; even newer, for the most part, with a use case/product short list match that is even less clear.  <span id="more-4867"></span></p>
<p><strong><em>Bit bucket</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included: </em>Logs, other technical/external</li>
<li><em>Likely use styles:</em> Staging/ETL, investigative</li>
<li><em>Canonical example: </em>Log files in a Hadoop cluster<em> </em></li>
<li><em>Stresses:</em> TCO, scale-out, transform/big-query performance, ETL functionality</li>
</ul>
<p>With the explosion of <a href="../../../../../2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a> has come the need for a place to put it all, sometimes called the <a href="../../../../../2011/06/04/dirty-data-stored-dirt-cheap/">big bit bucket</a>. This is like the investigative data mart for big databases, but more <a href="../../../../../2011/05/17/poly-structured-database/">poly-structured</a>. In some cases it is focused on data staging and transformation; but it can also be used for analysis in place.</p>
<p>The list of candidate technologies to run your bit bucket starts with Hadoop and Splunk.</p>
<p><strong><em>Archival data store</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included: </em>Operational, CDR (call detail record), security log</li>
<li><em>Likely use styles:</em> Archival, reporting (for compliance), possibly also investigative</li>
<li><em>Examples:</em> Any long-term detailed historical store</li>
<li><em>Stresses: </em>TCO, compression, scale-out, performance (if multi-use)<em> </em></li>
</ul>
<p><em> </em></p>
<p>Analytic DBMS vendors have been insulting each other with the claim &#8220;that&#8217;s just an archival data store,&#8221; dating back at least to the first time Greenplum was deployed on an underpowered Sun Thumper system. Perhaps only <a href="../../../../../2010/06/11/rainstor-update/">Rainstor</a> truly embraces the archival positioning, and I&#8217;ve become pretty dubious about their technical claims and their company alike.</p>
<p>Still, there&#8217;s a legitimate need for data stores &#8212; especially relational analytic DBMS that:</p>
<ul>
<li>Store data cheaply, with high rates of compression.</li>
<li>Have decent performance if you do want to query the data.</li>
<li>May have archiving/compliance-specific features as well.</li>
</ul>
<p>Along with Rainstor, SAND and SenSage have at least partially targeted that use case. In addition, appliance vendors such as Teradata and Netezza try to have an archive-oriented product version in their lineups.</p>
<p><strong><em>Outsourced data mart</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All</li>
<li><em>Likely use styles:</em> Traditional BI, investigative analytics, staging/ETL</li>
<li><em>Examples:</em> Advertising tracking, SaaS CRM</li>
<li><em>Stresses:</em> Performance, TCO, reliability, concurrency</li>
</ul>
<p>Much of what happens in analytic database management can also be outsourced. Some applications that run via SaaS (Software as a Service) are analytic. I&#8217;ve had three different clients whose main business is picking marketing targets in various vertical segments; others who wanted to add analytics to what were historically OLTP applications; and others yet who just offered online business intelligence. Also, if your fundamental business is gathering data and reselling it to a variety of user organizations, that&#8217;s an analytic data management challenge. The possibilities expand from there.</p>
<p>Data outsourcers are in the IT business, and so their IT development is &#8212; hopefully! &#8212; more serious and less politically encumbered than at many conventional enterprises. Thus, legacy systems and master data management issues are commonly less prevalent, or at least more aggressively disposed of. The same, up to a point, goes for vendor politics.*  <a href="../../../../../2011/06/26/what-to-think-about-before-you-make-a-technology-decision/">Multitenancy</a> is commonly an issue, as is running in the cloud.<em> </em></p>
<p><em>*Even so, there&#8217;s often That Guy who doesn&#8217;t want to migrate away from Oracle, no matter what.<strong> </strong></em></p>
<p>Vertica gets the nod in a number of these cases; it&#8217;s cloud-friendly, and often the problem is naturally columnar. Other columnar products can be good choices too, with added brownie points for Infobright if the shop is MySQL-oriented anyway. Running Netezza or other appliances makes sense mainly if you&#8217;re pretty sure you want to keep operating your own data centers, but some data outsourcers are just fine with that assumption.</p>
<p><strong><em>Operational analytic(s) server</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> Customer-centric, log, financial trade</li>
<li><em>Likely use styles:</em> Advanced operational analytics</li>
<li><em>Examples:</em>
<ul>
<li>Lower latency: Web or call-center personalization, anti-fraud</li>
<li>Higher latency: Customer profiling, Basel 3 risk analysis</li>
</ul>
</li>
<li><em>Stresses:</em> Performance, reliability, analytic functionality, perhaps concurrency</li>
</ul>
<p>Even with eight different choices, I need a &#8220;catch-all&#8221; category; this is it.</p>
<p>Suppose you want to do reasonably sophisticated analytics, then use the results in operations. This is the classical challenge in <a href="../../../../../2011/03/30/short-request-and-analytic-processing/">integrating short-request and analytic processing</a>. There are multiple ways to tackle it, embodying different trade-offs in cost, convenience, or analytic accuracy. If the platform on which you want to run your investigative analytics also has the reliability and concurrency appropriate for mission-critical operations, you&#8217;re set. Otherwise, you may want to pipe <a href="../../../../../2010/11/29/data-that-is-derived-augmented-enhanced-adjusted-or-cooked/">derived data</a> into a more &#8220;industrial-strength&#8221; DBMS, ideally the one that runs your operational apps anyway</p>
<p>Another option is to integrate a limited amount of analytics immediately into your short-request processing system. For example, as bad as they are at the kinds of queries that require joins, NoSQL systems are often fast at simple aggregations. As MapReduce/NoSQL integrations mature, that option may not require pumping the data anywhere else for deeper analytics; even if it does, at least you&#8217;re starting out with the data in a convenient bit bucket.</p>
<p>Streaming/CEP-centric architectures could come into play as well. And it goes on from there. The possibilities in this last category are just too varied to generalize about.</p>
<p><em>So did I get them all? Or are there yet other analytic data management use cases that I don&#8217;t fit into my eight categories?</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

