<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Columnar database management</title>
	<atom:link href="http://www.dbms2.com/category/database-theory-practice/columnar-database-management/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Wed, 08 Feb 2012 22:51:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Comments on the analytic DBMS industry and Gartner&#8217;s Magic Quadrant for same</title>
		<link>http://www.dbms2.com/2012/02/08/gartner-magic-quadrant-data-warehouse-2011-2012/</link>
		<comments>http://www.dbms2.com/2012/02/08/gartner-magic-quadrant-data-warehouse-2011-2012/#comments</comments>
		<pubDate>Wed, 08 Feb 2012 17:17:32 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Exasol]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Kognitio]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[illuminate Solutions]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5926</guid>
		<description><![CDATA[This year&#8217;s Gartner Magic Quadrant for Data Warehouse Database Management Systems is out.* I shall now comment, just as I did on the 2010, 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants, to varying extents. To frame the discussion, let me start by saying: In general, I regard Gartner Magic [...]]]></description>
			<content:encoded><![CDATA[<p>This year&#8217;s Gartner Magic Quadrant for Data Warehouse Database Management Systems is out.* I shall now comment, just as I did on the <a href="http://www.dbms2.com/2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/">2010</a>, <a href="../../../../../2010/02/10/gartner-magic-quadrant-data-warehouse-2009-2010/">2009</a>, <a href="../../../../../2009/01/12/gartners-2008-data-warehouse-database-management-system-magic-quadrant-is-out/">2008</a>, <a href="../../../../../2007/10/19/gartner-2007-magic-quadrant-for-data-warehouse-database-management-systems/">2007</a>, and <a href="../../../../../2006/10/03/vendor-segmentation-for-data-warehouse-dbms/">2006</a> Gartner Data Warehouse Database Management System Magic Quadrants, to varying extents. To frame the discussion, let me start by saying:</p>
<ul>
<li>In general, I regard Gartner Magic Quadrants as a bad use of good research.</li>
<li>Illustrating the uselessness of &#8212; or at least poor execution on &#8212; the  overall quadrant metaphor, a large majority of the vendors covered are  lined up near the line x = y, each outpacing the one below in both of  the quadrant&#8217;s dimensions.</li>
<li>I find fewer specifics to disagree with in this Gartner Magic Quadrant than in previous year&#8217;s versions. Two factors jump to mind as possible reasons:
<ul>
<li>This year&#8217;s Gartner Magic Quadrant for Data Warehouse Database Management Systems is somewhat less ambitious than others; while it gives as much company detail as its predecessors, it doesn&#8217;t add as much discussion of overall trends. So there&#8217;s less to (potentially) disagree with.</li>
<li><a href="http://www.dbms2.com/2010/12/28/evolving-definitions-and-technology-categories-for-2011/">Merv Adrian is now at Gartner</a>.</li>
</ul>
</li>
<li>Whatever the problems may be with Gartner&#8217;s approach, the whole thing comes out better than do <a href="http://www.dbms2.com/2011/02/11/comments-on-the-2011-forrester-wave-for-enterprise-data-warehouse-platforms/">Forrester&#8217;s failed imitations</a>.</li>
</ul>
<p><em>*At the time of this posting, I don&#8217;t yet have a link. However, I expect that to change quickly, and I plan to edit this paragraph accordingly. If nothing else, I hope people will drop links into the comment thread. </em></p>
<p>Specific company comments, roughly in line with Gartner&#8217;s rough single-dimensional rank ordering, include: <span id="more-5926"></span></p>
<ul>
<li>The Gartner Magic Quadrant&#8217;s comments on Teradata seem pretty fair. I don&#8217;t think I&#8217;m much in disagreement when I say:
<ul>
<li>Teradata has the richest, most mature analytic DBMS offering.</li>
<li>Teradata has an outstanding track record both for <a href="http://www.dbms2.com/2011/09/24/confusion-about-teradatas-big-customers/">managing large data volumes</a> and for high-concurrency mixed workloads.</li>
<li>Aster Data was a cool Teradata acquisition, even if Teradata/Aster synergies or integration have been nominal to date.</li>
<li>Teradata still needs to get out of its own way in marketing, positioning, packaging, and/or defining its premium-priced system vs. its more moderately-priced alternatives. Indeed, as necessary as this approach may have been to fending off encroachments by Netezza and others, what Teradata really needs to do is evolve to a more pick-your-own-node-combination mix-match kind of offering.</li>
</ul>
</li>
<li>Gartner has talked with a lot of Oracle Exadata users who say that the product works; Gartner has also stopped beating Oracle up for <a href="http://www.dbms2.com/2010/06/14/best-practices-analytic-database-poc/">its previous policy of almost never doing onsite POCs (Proofs of Concept)</a>; both parts of that ring true with me. But Gartner also rightly dings Oracle for various issues in cost and cumbersomeness. Overall, while I agree there are organizations for which Oracle should indeed be a top-ranked choice, there are many others who shouldn&#8217;t put Oracle on their short list.</li>
<li>Third in the Gartner MQ rankings is IBM.
<ul>
<li>Gartner gets so caught up in reciting the names of various IBM product offerings that it neglects to say much good about DB2 itself. (I tend to have a similar problem.)</li>
<li>But Gartner does mention concurrency as a strength. I agree, especially if we presume that that was a reference to DB2 rather than Netezza.</li>
<li>Gartner cites Netezza&#8217;s post-acquisition annual growth rate as 30%. Gartner seems to think this is a good number. I disagree, but in Netezza&#8217;s defense, it has had to endure IBM&#8217;s post-acquisition on-boarding process.</li>
</ul>
</li>
<li>Arguably fourth in the Gartner Data Warehouse Magic Quadrant rankings is EMC/Greenplum.
<ul>
<li>In general, Gartner likes the taste of Greenplum Kool-Aid.</li>
<li>Gartner neglects to ding Greenplum for concurrency challenges, which I view as an oversight given Gartner&#8217;s general stress on that area.</li>
<li>Gartner does ding Greenplum for support challenges.</li>
<li>Gartner neglects to praise Greenplum for true <a href="http://www.dbms2.com/2009/10/14/greenplum-hybrid-columnar/">hybrid row/columnar data management</a>, a feature shared by <a href="http://www.dbms2.com/2011/09/22/teradata-columnar-compression/">Teradata</a> and <a href="http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/">Vertica</a>, among others, but not by <a href="http://www.dbms2.com/2011/02/06/columnar-compression-database-storage/">Oracle</a>, DB2, or Netezza.</li>
<li>Gartner located a half-petabyte Greenplum database. This doesn&#8217;t surprise me, even though Greenplum has frequently made exaggerated claims about large-size database successes in the past.</li>
<li>Gartner reports a &gt;400 figure for Greenplum customers, which is plausible.</li>
</ul>
</li>
<li>In its first deviation from strict one-dimensional rank ordering, the Gartner Magic Quadrant ranks Sybase ahead of Greenplum in completeness of vision but behind in &#8220;ability to execute&#8221;.
<ul>
<li>If that were the other way around, it might make more sense. Greenplum promises anything and everything you might ever want for analytic data management or the associated analysis; but Sybase has vastly more analytic DBMS users than Greenplum does, running a variety of demanding workloads.</li>
<li>Gartner appears to think that Sybase IQ requires less database administration than I do.</li>
<li>Gartner seems concerned that SAP will position HANA and Sybase ASE as, between them, the only DBMS you&#8217;ll ever need, casting doubt on Sybase IQ&#8217;s future. I wouldn&#8217;t worry about that if you have a problem you want to solve today.</li>
</ul>
</li>
<li>The Gartner Magic Quadrant for Data Warehouse Database Management Systems ranks Microsoft sixth overall, despite noting that there isn&#8217;t a single production reference for Microsoft&#8217;s Parallel Data Warehouse. In support of this ranking, it for example cites the compression feature, which distinguishes Microsoft SQL Server from no other product on the list except Kognitio. If you have such an undemanding data warehousing problem that many different analytic DBMS could meet your needs, there&#8217;s a good chance Microsoft SQL Server can also do the job; and if you&#8217;ve bought into the Microsoft technology stack, you might as well keep going down that path. Otherwise, I don&#8217;t know why somebody should adopt Microsoft&#8217;s offering at this time.</li>
<li>Seventh along the main diagonal path in the Gartner Magic Quadrant is HP Vertica. I&#8217;d rank Vertica higher than that, but in fairness I note two execution concerns. First, HP has a lousy track record, both in acquisitions and in data warehousing/analytics. Second, Vertica is bad about answering my email. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Anyhow, Gartner doesn&#8217;t seem to have given Vertica credit either for <a href="http://www.dbms2.com/2011/06/20/columnar-dbms-vendor-customer-metrics/">its full customer count or for the multiple petabyte-scale databases Vertica runs</a>.</li>
<li>1010data is an outlier, with Gartner noting that it only partly fits in with other &#8220;Data Warehousing Database Management&#8221; companies, and hence kind of confessing that 1010data on the Magic Quadrant is somewhat arbitrary. Stuff like that is bound to happen, given <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">the inherent difficulties of defining market categories</a>. Anyhow, my thoughts on 1010data include:
<ul>
<li>I&#8217;m nervous about the fact that 1010data doesn&#8217;t actually control its own DBMS technology, but rather relies on old code from the small private company KX Systems.</li>
</ul>
<ul>
<li> There are three main reasons to consider 1010data:
<ul>
<li>You want to enter the data mart outsourcing business in a casual way, and you like its SaaS offering.</li>
<li>You want to engage in <a href="http://www.dbms2.com/2010/05/15/stakeholder-facing-analytics/">stakeholder-facing analytics</a> in a casual way, and you like its SaaS offering.</li>
<li>You love 1010data&#8217;s particular set of interactive analytic features and performance.</li>
</ul>
</li>
</ul>
</li>
<li>Back to the main path winding along the Gartner Magic Quadrant main diagonal &#8212; next up is ParAccel. While I question some of the peripheral comments, I agree with Gartner&#8217;s core messages that:
<ul>
<li>ParAccel, the product, is blazingly fast in certain use cases.</li>
<li>ParAccel, the company, is dangerously small.</li>
</ul>
</li>
<li>Eighth on the Gartner MQ&#8217;s main path is Kognitio. This is too high. Kognitio positions itself as offering in-memory DBMS, yet stubbornly refuses to do any kind of data compression. That&#8217;s an awful combination of choices. As for using Kognitio&#8217;s data warehousing SaaS offering &#8212; why would you do that, when more modern products are available on a SaaS/cloud basis as well?</li>
<li>Ninth in the Gartner Magic Quadrant main rankings is SAND.
<ul>
<li>The SAND section is not a triumph of Gartner accuracy. For example:
<ul>
<li><a href="http://www.dbms2.com/2011/11/12/clarifying-sands-customer-metrics-positioning-and-technical-story/">Gartner completely missed the errors in SAND&#8217;s reported customer counts</a>.</li>
<li>Gartner refers to SAND as being &#8220;in existence for approximately nine years&#8221;, which is too low by at least a factor of 2.</li>
<li>Gartner says &#8220;SAND is a privately held company&#8221;, even though <a href="http://itmarketstrategy.com/2009/06/07/sand-technology-a-risky-bet/">Merv knows better than that</a>.</li>
</ul>
</li>
<li>Otherwise, Gartner&#8217;s opinion on SAND seems to boil down to &#8220;Interesting technology and ideas, but dangerously small company.&#8221; I agree.</li>
</ul>
</li>
<li>Tenth and too low in the Gartner MQ main rankings is Infobright.
<ul>
<li>At least by some metrics (e.g. customer count), Infobright isn&#8217;t as dangerously small as ParAccel, SAND, Kognitio, et al.</li>
<li>That said, Infobright is small and focused on <a href="http://www.dbms2.com/2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a>. So I wouldn&#8217;t be confident in Infobright&#8217;s future technology path for human-generated data use cases.</li>
<li>Infobright&#8217;s performance is uneven &#8212; blazing in cases where the Knowledge Grid helps, but not necessarily stellar by analytic DBMS standards when full table scans are called for.</li>
<li>I agree with Gartner that the possibility of Oracle/MySQL future shenanigans is a concern. But while the energy behind MySQL forking efforts doesn&#8217;t seem too great right now, I&#8217;d expect them to revive and offer a successful escape path if it seemed Oracle was going to indeed play hardball.</li>
<li>Also, given that it&#8217;s already an open source vendor, there are various kinds of assurances Infobright could give that would also help alleviate customer concerns.</li>
</ul>
</li>
<li>Actian, formerly Ingres, took a big tumble in Gartner&#8217;s rankings versus last year, when I simply wrote &#8220;<a href="http://www.dbms2.com/2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/">What Gartner said in connection with <strong>Ingres</strong> is too inaccurate to deserve detailed attention</a>.&#8221; I&#8217;m even a little harsher about <a href="http://www.dbms2.com/2011/09/25/ingres-actian/">Ingres/Actian&#8217;s DBMS products and prospects</a> than Gartner is, but at least now we&#8217;re in the same ballpark.</li>
<li>Along with Infobright, ParAccel, and SAND, <a href="http://www.dbms2.com/2011/11/12/exasol-update/">Exasol</a> appears to be another of the &#8220;good columnar technology/small company&#8221; crowd. As with other such products, one should be careful about fit-and-finish features that are missing today, as there is no assurance they&#8217;ll be added in a timely manner going forward.</li>
<li>illuminate Solutions, which was on last year&#8217;s Gartner list, <a href="http://www.dbms2.com/2012/01/16/has-illuminate-solutions-joined-the-choir-invisible/">now appears to be an ex-company</a>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/08/gartner-magic-quadrant-data-warehouse-2011-2012/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Some big-vendor execution questions, and why they matter</title>
		<link>http://www.dbms2.com/2011/11/21/big-vendor-execution-analytics/</link>
		<comments>http://www.dbms2.com/2011/11/21/big-vendor-execution-analytics/#comments</comments>
		<pubDate>Mon, 21 Nov 2011 11:01:20 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cognos]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5704</guid>
		<description><![CDATA[When I drafted a list of key analytics-sector issues in honor of look-ahead season, the first item was &#8220;execution of various big vendors&#8217; ambitious initiatives&#8221;.  By &#8220;execute&#8221; I mean mainly: &#8220;Deliver products that really meet customers&#8217; desires and needs.&#8221; &#8220;Successfully convince them that you&#8217;re doing so &#8230;&#8221; &#8220;&#8230; at an attractive overall cost.&#8221; Vendors mentioned [...]]]></description>
			<content:encoded><![CDATA[<p>When I drafted a list of key analytics-sector issues in honor of <a href="http://www.dbms2.com/2011/11/21/analytic-trends-in-2012-qa/">look-ahead season</a>, the first item was &#8220;execution of various big vendors&#8217; ambitious initiatives&#8221;.  By &#8220;execute&#8221; I mean mainly:</p>
<ul>
<li>&#8220;Deliver products that really meet customers&#8217; desires and needs.&#8221;</li>
<li> &#8220;Successfully convince them that you&#8217;re doing so &#8230;&#8221;</li>
<li>&#8220;&#8230; at an attractive overall cost.&#8221;</li>
</ul>
<p>Vendors mentioned here are Oracle, SAP, HP, and IBM. Anybody smaller got left out due to the length of this post. Among the bigger omissions were:</p>
<ul>
<li>salesforce.com (multiple subjects).</li>
<li><a href="../../../../../2011/04/21/sas-hpa-does-make-sense-after-all/">SAS HPA</a>.</li>
<li><a href="../../../../../2011/08/21/hadoop-evolution/">The evolution of Hadoop</a>.</li>
</ul>
<p><span id="more-5704"></span><strong>A (lingering) issue for SAP and Oracle alike</strong></p>
<p>As I noted in January of this year, <a href="../../../../../2011/01/03/the-six-useful-things-you-can-do-with-analytic-technology/">integration of business intelligence into operational apps is making very slow progress</a>. Even so, it&#8217;s a huge part of the apparent strategy at SAP and Oracle alike, as well it should be. Much of the benefit from automating routine desk work has already happened. The areas ripest for exploitation are the ones where analytics are part of the equation.</p>
<p>Given the lack of tangible progress, why do I think this is a genuine area of Oracle and SAP emphasis? Three reasons of many are:</p>
<ul>
<li>Why else did SAP buy Business Objects?</li>
<li>If they&#8217;re not trying to <a href="../../../../../2011/03/30/short-request-and-analytic-processing/">integrate operational apps and analytics</a>, why else does SAP&#8217;s emphasis on HANA make sense?</li>
<li>Without business intelligence in the picture, how does Oracle&#8217;s integrated-stack story promise any direct user benefits?*</li>
</ul>
<p><em>*As opposed to IT concerns &#8212; integration, administration, TCO (Total Cost of Ownership), etc.</em></p>
<p>After so many years of disappointment, I&#8217;m not going to forecast 2012 as a pivotal year for <strong>the integration of business intelligence into operational applications.</strong> But if one of SAP or Oracle ever does get a significant BI/operational app integration advantage over the other, it could be a major competitive advantage in those application market segments that are still up for grabs. It also is an opportunity for both vendors to gain BI market share in their respective application customer bases.</p>
<p><strong>A more urgent issue for SAP</strong></p>
<p>SAP has put huge amounts of credibility on the line for HANA, the integration of two different and not particularly mature in-memory database technologies. So far, it is difficult to find evidence that HANA is robust enough for widespread adoption. Whether or not SAP can fix that is a huge open question, which could have significant impact on the course of several technology areas: applications, business intelligence, in-memory DBMS, and maybe even hardware.</p>
<p>Based on current information, which is admittedly partial, I&#8217;m a short-term pessimist on HANA. Longer-term, I&#8217;m on record as saying that <a href="../../../../../2011/05/23/databases-ram/">traditional databases will eventually wind up in RAM</a>. SAP will surely get that technology right some day, whether or not the way it does so has anything to do with present-day HANA code.</p>
<p><strong>Four more issues for Oracle </strong></p>
<p>Oracle&#8217;s ambitions are near-endless, and so also therefore is its list of execution challenges. Four in the analytics area that I find particularly interesting are:</p>
<ul>
<li><strong>True hybrid columnar DBMS.</strong> <a href="../../../../../2011/09/22/teradata-columnar-compression/">I was guessing that Oracle, like Teradata, would announce true hybrid columnar the week of Oracle OpenWorld</a>. I was wrong. But if Oracle can&#8217;t bring out true hybrid columnar DBMS functionality relatively soon, Exadata will lose credibility as a competitor to more specialized analytic DBMS.</li>
<li><strong>Oracle Exalytics.</strong> With Exalytics in the mix, Oracle&#8217;s technology stack has HANA-like potential. But will Exalytics even ship in 2012? (I think so.) Will it be good for much in the first release? (I&#8217;m skeptical.)</li>
<li><strong>Oracle&#8217;s Big Data Appliance</strong>. I&#8217;m skeptical both about <a href="../../../../../2011/10/20/more-notes-on-oracle-nosql/">Oracle&#8217;s NoSQL product</a> &#8212; <a href="http://www.infoworld.com/d/data-explosion/first-look-oracle-nosql-database-179107">a favorable InfoWorld review</a> notwithstanding &#8212; and <a href="../../../../../2011/09/23/hadoop-appliances/">Hadoop appliances</a>. But if I&#8217;m wrong, and Oracle can successfully embrace/extend the new non-relational paradigms, then it really might regain control over the evolution of data management.</li>
<li><strong><a href="../../../../../2011/10/18/oracle-is-buying-endeca/">Oracle&#8217;s Endeca acquisition</a></strong> &#8212; will Oracle prove me wrong and integrate Endeca effectively into its overall analytic product line? If it does, we might finally see effective text (and eventually speech) navigation of enterprise software. (But as with all Oracle issues cited here, this is something that probably won&#8217;t amount to much in 2012 even if it does later go well.)</li>
</ul>
<p><strong>Three issues for IBM</strong></p>
<p>Like Oracle, IBM is a huge company with many ambitions and hence many execution challenges. The biggest of those is surely: <strong>How effective can IBM be at selling outside its existing customer base?</strong> I don&#8217;t hear as much competitively about IBM DataStage, IBM SPSS or now IBM Netezza as I did when their vendors were independent companies. Even Cognos may not be much of an exception to the rule, although it has its own large customer base outside of IBM&#8217;s traditional one. (To lesser extents , the same is of course true of Netezza and numerous other IBM acquisitions.)</p>
<p>Another general issue for IBM is <strong>substantively integrating its various product lines,</strong> at least to the extent that makes sense. DB2/Netezza integration sounds good, but even that is a matter more of product marketing (the admirable part of that discipline) more than of actual technology. Other integrations (e.g. Cognos/DB2 in various bundles) have tended toward the dubious side.*</p>
<p><em>*I&#8217;m still waiting for IBM to get back to me with examples of how Cognos/DB2 joint tuning amounts to anything. It&#8217;s been more than a year, so I&#8217;m glad I didn&#8217;t hold my breath.</em></p>
<p>In a somewhat narrower vein, I wonder: <strong><a href="../../../../../2011/11/10/cep-streaming-catchup/">Will IBM be able to gain traction for InfoSphere Streams</a>? </strong>And if so, when and where will the traction be?</p>
<p><strong>Will HP screw up Vertica?</strong></p>
<p>Vertica has a very attractive product offering. It&#8217;s perhaps <a href="../../../../../2011/06/20/columnar-dbms-vendor-customer-metrics/">the most scalable analytic DBMS outside of Teradata</a>, running on the hardware of your reasonable choice.  It&#8217;s also the one I recommend most often to clients in the 1-50 terabyte range.</p>
<p>So far HP doesn&#8217;t seem to have done much to leadfoot Vertica. (About all I&#8217;ve heard from competitors is that Vertica seems to have faded somewhat in the financial services market, and there could be multiple explanations if that is indeed true.) But if HP Vertica does somehow manage to botch things, opportunities will open up for a range of columnar analytic DBMS competitors.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/21/big-vendor-execution-analytics/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Clarifying SAND&#8217;s customer metrics, positioning and technical story</title>
		<link>http://www.dbms2.com/2011/11/12/clarifying-sands-customer-metrics-positioning-and-technical-story/</link>
		<comments>http://www.dbms2.com/2011/11/12/clarifying-sands-customer-metrics-positioning-and-technical-story/#comments</comments>
		<pubDate>Sun, 13 Nov 2011 02:45:36 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5669</guid>
		<description><![CDATA[Talking with my clients at SAND can be confusing. That said: I need to revise my figures for SAND&#8217;s customer count way downward. SAND finally has a reasonably clear positioning. SAND&#8217;s product actually seems to have a lot of features. A few months ago, I wrote: SAND Technology reported &#62;600 total customers, including &#62;100 direct. [...]]]></description>
			<content:encoded><![CDATA[<p>Talking with my clients at SAND can be confusing. That said:</p>
<ul>
<li>I need to revise my figures for SAND&#8217;s customer count way downward.</li>
<li>SAND finally has a reasonably clear positioning.</li>
<li>SAND&#8217;s product actually seems to have a lot of features.</li>
</ul>
<p>A few months ago, I wrote:</p>
<blockquote><p>SAND Technology reported &gt;600 total customers, including &gt;100 direct.</p></blockquote>
<p>Upon talking with the company, I need to revise that figure downward, from &gt; 600 to 15.</p>
<p><span id="more-5669"></span><em>One embarrassing point: SAND is a client, and I view it as part of my job to save clients from that kind of inadvertent misstatement.</em></p>
<p>It turns out that SAND has a very impressive customer &#8212; Dunnhumby, a data mart outsourcer with 200 terabytes of data in SAND, 30 or so incoming data streams, 400 or so nodes &#8230; and 600 or so end customers, all of which SAND was counting as OEM end customers for its DBMS. But I, other industry observers, and other vendors generally don&#8217;t count that way.</p>
<p>Besides Dunnhumby, SAND has 14 other customers on maintenance, with &lt; 1 terabyte of data each. Until recently, SAND had a couple dozen more customers than that, but it <a href="http://www.sand.com/sand-technology-announces-sale-sap-ilm-product-line/">sold its SAP-oriented archiving/near-line storage product line to Informatica</a>.</p>
<p>I still don&#8217;t know where the &#8220;&gt; 100 direct&#8221; part came from.</p>
<p>After the sale of its other product line, SAND is squarely in the market for analytic DBMS. SAND&#8217;s sales efforts seem to be focused on <a href="http://www.dbms2.com/2011/03/03/investigative-analytics/">investigative analytics</a>, although some of its existing users seem to be more focused on <a href="http://www.dbms2.com/2011/11/08/terminology-operational-analytics/">operational analytics</a>. Most specifically, SAND is trying to focus on &#8220;people data&#8221; &#8212; customer loyalty, health care, etc . &#8212; rather than purely <a href="http://www.dbms2.com/2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a>, with the paradigmatic target application being personalized marketing.</p>
<p>SAND technical highlights include:</p>
<ul>
<li>SAND sells a columnar analytic DBMS.</li>
<li>The SAND DBMS operates on bitmaps, with heavy use of run-length encoding on the bitmaps. Bitmaps are used for everything except BLOBs (Binary Large OBjects).</li>
<li>Actual data compression also comes into play, e.g. as result sets are being assembled. This is based on a true global dictionary &#8212; multiple columns are tokenized together.</li>
<li>Indeed, SAND can decompose columns and tokenize their parts (e.g. time stamps).</li>
<li>SAND&#8217;s workload management sees RAM and CPU, but not explicitly I/O.</li>
<li>SAND lets you pin certain tables or even table segments in RAM if you want to.</li>
</ul>
<p>SAND&#8217;s update story is straightforward &#8212; when data comes in, all the columns and bitmaps are updated as needed. Still, since SAND is columnar, you wouldn&#8217;t expect true updates in place, and you&#8217;d be right. Rather, there&#8217;s a story with MVCC (MultiVersion Concurrency Control) and garbage collection, lock-free. The MVCC is also exploited for a kind of time travel, and further for some kind of virtual data mart capability.</p>
<p>SAND&#8217;s parallelization story is a bit complicated.</p>
<ul>
<li>SAND has, or at least has the potential for, <a href="../../../../../2008/09/05/mpp-data-warehouse-nodes/">node specialization</a>, with database and storage nodes being different.</li>
<li>In principle, disks are specific to storage nodes, and it&#8217;s a configuration option as to whether a database node sees one, some, or all storage nodes.</li>
<li>In practice, only Dunnhumby among SAND&#8217;s customers operates on other than a shared-disk basis. Dunnhumby&#8217;s configuration is mixed/matched among various SAND sharing options.</li>
</ul>
<p>SAND is proud of its PMML (Predictive Modeling Markup Language) scoring capabilities, but otherwise hasn&#8217;t shipped much in the way of <a href="../../../../../2011/02/24/analytic-platforms/">analytic platform</a> capabilities. That said, work is underway on a user-defined table function capability that can also query external tables, fire off MapReduce jobs, and so on, under the code name UQL.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/12/clarifying-sands-customer-metrics-positioning-and-technical-story/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Exasol update</title>
		<link>http://www.dbms2.com/2011/11/12/exasol-update/</link>
		<comments>http://www.dbms2.com/2011/11/12/exasol-update/#comments</comments>
		<pubDate>Sun, 13 Nov 2011 02:37:13 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Exasol]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5661</guid>
		<description><![CDATA[I last wrote about Exasol in 2008. After talking with the team Friday, I&#8217;m fixing that now. The general theme was as you&#8217;d expect: Since last we talked, Exasol has added some new management, put some effort into sales and marketing, got some customers, kept enhancing the product and so on. Top-level points included: Exasol&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p><a href="../../../../../2008/08/16/exasol-technical-briefing/">I last wrote about Exasol in 2008</a>. After talking with the team Friday, I&#8217;m fixing that now. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  The general theme was as you&#8217;d expect: Since last we talked, Exasol has added some new management, put some effort into sales and marketing, got some customers, kept enhancing the product and so on.</p>
<p>Top-level points included:</p>
<ul>
<li>Exasol&#8217;s technical philosophy is substantially the same as before, albeit not with as extreme a focus on fitting everything in RAM.</li>
<li>Exasol believes its flagship DBMS EXASolution has great performance on a load-and-go basis.</li>
<li>Exasol has 25 EXASolution customers, all in Germany.*</li>
<li>5 of those are &#8220;cloud&#8221; customers, at hosting providers engaged by Exasol.</li>
<li>EXASolution database sizes now range from the low 100s of gigabytes up to 30 terabytes.</li>
<li>Pretty much the whole company is in Nuremberg.</li>
</ul>
<p><span id="more-5661"></span><em>*That excludes some money from Hitachi. Exasol&#8217;s Hitachi partnership is still in limbo, an apparent casualty of the world economic crisis.</em></p>
<p>On the technical side:</p>
<ul>
<li>As noted in my 2008 post, EXASolution is a columnar, no-head-node MPP (Massively Parallel Processing) DBMS.</li>
<li>The main way EXASolution compresses data is via dictionary/tokenization. 5:1 is a typical compression ratio before mirroring and so on, out of a 2-10:1 range.</li>
<li>EXASolution writes data to blocks in memory that are smaller than what is otherwise its preferred size (1/2 to 5 megabytes). These are sent to disk, where merge eventually happens. Exasol insists that write performance has always been fully satisfactory to customers to date.</li>
<li>EXASolution doesn&#8217;t have much in the way of performance tuning knobs. Exasol says they aren&#8217;t needed, and says that one really can start an EXASolution POC (Proof of Concept) in a day or so.</li>
<li>EXASolution doesn&#8217;t have much in the way of workload management capabilities, except what&#8217;s automagic (e.g., short query bias). However, it does collect statistics you can query via your favorite BI tool.</li>
<li>EXASolution doesn&#8217;t have much in the way of <a href="../../../../../2011/02/24/analytic-platforms/">analytic platform</a> capabilities, although there is some Lua-based scripting. However, there&#8217;s something NDA in the analytic platform area Coming Soon.*</li>
</ul>
<p>In general, the whole thing sounds somewhat like ParAccel, at least at a high level.</p>
<p><em>*Exasol is not and never has been our client, but we can keep secrets for them even so.</em></p>
<p>Naturally, Exasol believes EXASolution has fine concurrency, with at least one customer routinely running 2000 concurrent users, 200 concurrent sessions (via connection pooling), and 5-10 concurrent queries. Another customer has 3500 Cognos users. 1-200 concurrent queries appears to be the record peak load. Anyhow, Exasol says that plans to offer real workload management could be accelerated if a need were discovered.</p>
<p>Exasol says it almost never loses POCs, but admits that it competes fairly rarely against Vertica and ParAccel, no doubt for reasons of geography. Exasol boasts one visible Sybase IQ replacement (Sony Music).</p>
<p>While Exasol&#8217;s sales to date have been in Germany, there are plans to change that soon. At least one sales cycle is well underway in Eastern Europe. Offices in other Germanic countries are planned. Existing customers are planning to deploy additional copies outside Germany. Discussions are underway regarding other geographies, e.g. English-speaking ones.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/12/exasol-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Oracle is buying Endeca</title>
		<link>http://www.dbms2.com/2011/10/18/oracle-is-buying-endeca/</link>
		<comments>http://www.dbms2.com/2011/10/18/oracle-is-buying-endeca/#comments</comments>
		<pubDate>Tue, 18 Oct 2011 16:09:56 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Endeca]]></category>
		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5502</guid>
		<description><![CDATA[Oracle is buying Endeca. The official talking points for the deal aren&#8217;t a perfect match for Endeca&#8217;s actual technology, but so be it. In that post, I wrote: &#8230; the Endeca paradigm is really to help you make your way through a structured database, where different portions of the database have different structures. Thus, at [...]]]></description>
			<content:encoded><![CDATA[<p>Oracle is buying Endeca. The <a href="http://www.oracle.com/us/corporate/press/517791">official talking points for the deal</a> aren&#8217;t a perfect match for <a href="http://www.dbms2.com/2011/04/18/endeca-topics/">Endeca&#8217;s actual technology</a>, but so be it.</p>
<p>In that post, I wrote:</p>
<blockquote><p>&#8230; the Endeca paradigm is really to help you make your way through a  structured database, where different portions of the database have  different structures. Thus, at various points in your journey, it  automagically provides you a list of choices as to where you could go  next.</p></blockquote>
<p>That kind of thing could help Oracle with apps like <a href="http://www.dbms2.com/2011/07/27/mongodb-users-and-use-cases/">the wireless telco product catalog deal MongoDB got</a>.</p>
<p>Going back to the Endeca-post quote well, Endeca itself said:</p>
<blockquote><p>Inside the MDEX Engine there is no overarching schema; each data record  carries its own metadata. This enables the rapid combination of a wide  range of structured and unstructured content into Latitude’s unified  data model. Once inside, the MDEX Engine derives common dimensions and  metrics from the available metadata, instantly exposing each for  high-performance refinement and analysis in the <a href="http://www.endeca.com/en/products/endeca_latitude/technology-overview/Discovery-Framework.html">Discovery Framework</a>.  Have a new data source? Simply add it and the MDEX Engine will create  new relationships where possible. Changes in source data schema? No  problem, adjustments on the fly are easy.</p></blockquote>
<p>And I pointed out that the MDEX engine was a columnar DBMS.</p>
<p>Meanwhile, <a href="http://www.dbms2.com/2011/09/22/hybrid-columnar-soundbites/">Oracle&#8217;s own columnar DBMS efforts have been disappointing</a>. Endeca could be an intended answer to that. However, while Oracle&#8217;s track record with standalone DBMS acquisitions is admirable (DEC RDB, MySQL, etc.), Oracle&#8217;s track record of integrating DBMS acquisitions into the Oracle product itself is not so good. (Express? Essbase? The text product line? None of that has gone particularly well.)</p>
<p>So <strong>while I would expect Endeca&#8217;s flagship e-commerce shopping engine products to flourish under Oracle&#8217;s ownership, I would be cautious about the integration of Endeca&#8217;s core technology into the Oracle product line.</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/10/18/oracle-is-buying-endeca/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Hybrid-columnar soundbites</title>
		<link>http://www.dbms2.com/2011/09/22/hybrid-columnar-soundbites/</link>
		<comments>http://www.dbms2.com/2011/09/22/hybrid-columnar-soundbites/#comments</comments>
		<pubDate>Thu, 22 Sep 2011 18:06:30 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5326</guid>
		<description><![CDATA[Busy couple of days talking with reporters. A few notes on hybrid-columnar analytic DBMS, all backed up by yesterday&#8217;s post on Teradata columnar: Oracle does not actually offer columnar I/O; the other three systems do. But see the &#8220;I won&#8217;t be surprised&#8221; part in yesterday&#8217;s Teradata post. Aster does not offer columnar compression; the other [...]]]></description>
			<content:encoded><![CDATA[<p>Busy couple of days talking with reporters. A few notes on hybrid-columnar analytic DBMS, all backed up by <a href="http://www.dbms2.com/2011/09/22/teradata-columnar-compression/">yesterday&#8217;s post on Teradata columnar</a>:</p>
<ul>
<li>Oracle does not actually offer columnar I/O; the other three systems do. But see the &#8220;I won&#8217;t be surprised&#8221; part in yesterday&#8217;s Teradata post.</li>
<li>Aster does not offer columnar compression; the other three do.</li>
<li>EMC  Greenplum and Teradata offer different kinds of ways to mix column and  row storage in the same table; each has its advantages.</li>
<li>Teradata  generally has a more mature and capable offering than EMC Greenplum, for  most purposes, whichever way you choose to organize your tables.</li>
</ul>
<p><em>Edit: The <a href="http://online.wsj.com/article/BT-CO-20110921-715547.html">Wall Street Journal</a> got this wrong, writing that Teradata was the first-ever hybrid columnar system. Specifically, they wrote</em></p>
<p><em> </em></p>
<blockquote><p><em>While columnar technology has been around for years, Teradata says its  product is unique because it allows users to include both columns and  rows in the same database.</em></p></blockquote>
<p><em> </em></p>
<p><em>Googling on &#8220;Teradata To Unveil New Analytics Product To Speed Business Adoption&#8221; might get you around the paywall to see the offending piece.<br />
</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/22/hybrid-columnar-soundbites/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Teradata Columnar and Teradata 14 compression</title>
		<link>http://www.dbms2.com/2011/09/22/teradata-columnar-compression/</link>
		<comments>http://www.dbms2.com/2011/09/22/teradata-columnar-compression/#comments</comments>
		<pubDate>Thu, 22 Sep 2011 05:25:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Rainstor]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5296</guid>
		<description><![CDATA[Teradata is pre-announcing Teradata 14, for delivery by the end of this year, where by &#8220;Teradata 14&#8243; I mean the latest version of the DBMS that drives the classic Teradata product line. Teradata 14&#8242;s flagship feature is Teradata Columnar, a hybrid-columnar offering that follows in the footsteps of Greenplum (now part of EMC) and Aster [...]]]></description>
			<content:encoded><![CDATA[<p>Teradata is pre-announcing Teradata 14, for delivery by the end of this year, where by &#8220;Teradata 14&#8243; I mean the latest version of the DBMS that drives the classic Teradata product line. Teradata 14&#8242;s flagship feature is Teradata Columnar, a hybrid-columnar offering that follows in the footsteps of <a href="../../../../../2009/10/14/greenplum-hybrid-columnar/">Greenplum</a> (now part of EMC) and <a href="../../../../../2010/09/15/aster-data-ncluster-version-4-6/">Aster Data</a> (now part of Teradata).</p>
<p>The basic idea of Teradata Columnar is:</p>
<ul>
<li>Each table can be stored in Teradata in row format, column format, or a mix.</li>
<li>You can do almost anything with a Teradata columnar table that you can do with a row-based one.</li>
<li>If you choose column storage, you also get some new compression choices.</li>
</ul>
<p><span id="more-5296"></span>The &#8220;mix&#8221; option is like Vertica&#8217;s <a href="../../../../../2009/08/04/flexstore-and-the-rest-of-vertica-35/">FlexStore</a>, in that different columns (e.g. different components of a street address) can be grouped into a mini-row, even if you otherwise choose to store that table in a columnar way. Teradata does not at this time offer the Greenplum or Aster way of mixing rows and columns, whereby some of the rows in a table can be stored in a column-store way, while other rows are stored in entire-row row-store solidarity</p>
<p>Thus, Teradata Columnar gives you many of the basic I/O and compression benefits of columnar DBMS, along with all the usual Teradata goodness of concurrency, workload management, system management, concurrency, SQL support, and so on. By way of comparison:</p>
<ul>
<li>Similar things are true of Greenplum&#8217;s offering (except for the parts about concurrency, advanced workload management, and so on).</li>
<li>Aster doesn&#8217;t have columnar compression.</li>
<li>Oracle has <a href="../../../../../2011/02/06/columnar-compression-database-storage/">columnar compression but no true columnar storage</a>.*</li>
</ul>
<p>Also, as I noted above, Teradata mixes rows and columns in a different way than Aster or EMC Greenplum do.</p>
<p><em>*However, I won&#8217;t be surprised if Oracle soon announces true hybrid-columnar as well. I originally heard about Teradata Columnar and Oracle&#8217;s efforts to develop true hybrid-columnar storage the same week, 23 months ago.</em></p>
<p>Going hybrid-columnar is a big deal. Aster Data, for example, told me that a considerable fraction of all its workloads ran faster with columnar than row-based storage.* And it&#8217;s of extra importance to a vendor that, like Teradata, needs to play catch-up in the compression derby.</p>
<p><em>*Anything in which the queries eliminated more than half or so of the columns (60%, if I recall correctly, but it was definitely an approximate figure). That pretty much means any query except full and near-full table scans.</em></p>
<p>Teradata&#8217;s columnar compression story is pretty complicated. To quote from a forthcoming press release:</p>
<blockquote><p>Teradata automatically chooses from among six types of compression: run length, dictionary, trim, delta on mean, null and UTF8. based on the column demographics.</p></blockquote>
<p>The trickiest words in that are &#8220;automatic&#8221; and &#8220;dictionary&#8221;. Teradata divides column-store data into &#8220;column containers&#8221; of, say, 8 KB. (Current thinking is 8 KB default, 65 KB maximum, but that could change by the time of product release.) By default, Teradata software decides separately for each column container which compression algorithm(s) to use. It can even change its mind dynamically over time, as the contents of the container change.</p>
<p>What I find weird about Teradata&#8217;s columnar dictionary compression is that the dictionary is container-specific. One benefit versus having a more global dictionary is that, since you compress fewer items, compression tokens can each be shorter. (The length of a typical token is a lot like the log of the cardinality of the dictionary.) Another benefit is that smaller dictionaries are faster to search. The obvious offsetting drawback is that a larger and more global dictionary has the potential to compress various items that wind up being left uncompressed in this smaller-scale scheme.</p>
<p>Other notes about Teradata compression include:</p>
<ul>
<li>Teradata has for a while had a more manual form of dictionary compression.</li>
<li>Teradata also has block-level compression.</li>
<li>You can do block-level compression even on top of the columnar compression described above.</li>
<li>The Teradata/Rainstor partnership for archiving-level compression that Rainstor made so much fuss about doesn&#8217;t seem to actually be happening; Teradata seems content with the other compression choices it offers.</li>
</ul>
<p>And finally, Teradata 14 extends <a href="../../../../../2008/10/14/teradata-virtual-storage/">Teradata Virtual Storage</a> with a feature called Compress on Cold. The idea is that &#8220;cold&#8221; data can safely get (extra) compression &#8212; that block-level stuff &#8212; automatically. If the data heats up again (e.g. by becoming relevant for a while to the latest year-over-year comparisons) it can be just as automatically removed from compression. Teradata thinks this is significantly better than the alternative of making manual compression choices based on not-so-granular range partitions.</p>
<p>Unsurprisingly, Teradata lacks some features and benefits found in certain columnar-first analytic DBMS. One biggie is that, absent clever workarounds such as Vertica&#8217;s in-memory write-optimized store, columnar DBMS have a single-row-update performance problem, because you are putting the information in many places on disk rather than just one. I generally take it for granted that a columnar-first vendor has such a workaround. Row-based vendors gone columnar, however, are a different story. Teradata et al. are also likely to decompress data and reassemble it into full rows as soon as it hits RAM, which obviates the potential benefit that you have less data per row clogging up cache.*<em> (Edit: As per Todd Walter&#8217;s comments below, this is not accurate &#8212; and that&#8217;s a potentially important feature.)</em></p>
<p><em>*Late decompression actually depends on columnar compression, not columnar storage, and hence can also be enjoyed by row-based DBMS such as </em><a href="../../../../../2010/06/21/netezza-ibm-db2-compression/"><em>DB2</em></a><em>. </em></p>
<p>To use Teradata Columnar, you need to be using round-robin data distribution rather than, say, hash. Teradata jargon for this is NoPI, where the &#8220;PI&#8221; stands for Primary Index.* Drawbacks to that include:</p>
<ul>
<li>You don&#8217;t get the hash distribution benefit of saving a data redistribution step on joins whose join key happens to be the same as the hash key.</li>
<li>In Teradata-land, NoPI implies append-only, so you get the garbage collection/compactification that implies.</li>
</ul>
<p>However, that&#8217;s a physical append-only; you can still do logical updates.</p>
<p><em>*PI is not to be confused with PPI, which stands for Primary Partition Index, and is Teradata&#8217;s name for range (or case-statement-based) partitioning. PPI works just fine with Teradata Columnar. As of Teradata 14, you can do PPI up to 62 levels deep.</em></p>
<p>The Teradata folks also sent along a slide deck laying out parts of the <a href="http://www.monash.com/uploads/Teradata-Columnar-September-2011.ppt">Teradata Columnar</a> story. But it&#8217;s not one of the better Teradata decks I&#8217;ve ever posted.<em><br />
</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/22/teradata-columnar-compression/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Vertica projections &#8212; an overview</title>
		<link>http://www.dbms2.com/2011/09/07/vertica-projections/</link>
		<comments>http://www.dbms2.com/2011/09/07/vertica-projections/#comments</comments>
		<pubDate>Thu, 08 Sep 2011 03:09:00 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5195</guid>
		<description><![CDATA[Partially at my suggestion, Vertica has blogged a three-part series explaining the &#8220;projections&#8221; that are central to a Vertica database. This is important, because in Vertica projections play the roles that in many analytic DBMS might be filled by base tables, indexes, AND materialized views. Highlights include: A Vertica projection can contain: All the columns [...]]]></description>
			<content:encoded><![CDATA[<p>Partially at my suggestion, Vertica has blogged a <a href="http://www.vertica.com/2011/09/01/the-power-of-projections-part-1/">three</a>-<a href="http://www.vertica.com/2011/09/02/the-power-of-projections-part-2/">part</a> <a href="http://www.vertica.com/2011/09/06/the-power-of-projections-part-3/">series</a> explaining the &#8220;projections&#8221; that are central to a Vertica database. This is important, because in Vertica projections play the roles that in many analytic DBMS might be filled by base tables, indexes, AND materialized views. Highlights include:</p>
<ul>
<li>A Vertica projection can contain:
<ul>
<li>All the columns in a table.</li>
<li>Some of the columns in a table.</li>
<li>A prejoin among tables.</li>
</ul>
</li>
<li>Vertica projections are updated and maintained just as base tables are. (I.e., there&#8217;s no kind of batch lag.)</li>
<li>You can import the same logical schema you use elsewhere. Vertica puts no constraints on your logical schema. <em>Note: Vertica has been claiming good support for all logical schemas since <a href="http://www.dbms2.com/2010/02/22/vertica-4/">Vertica 4.0</a> came out in early 2010.</em></li>
<li>Vertica (the product) will automatically generate a physical schema for you &#8212; i.e. a set of projections &#8212; that Vertica (the company) thinks will do a great job for you. <em>Note: That also dates back to <a href="http://www.dbms2.com/2010/02/22/vertica-4/">Vertica 4.0</a>.</em></li>
<li>Vertica claims that queries are very fast even when you haven&#8217;t created projections explicitly for them. <em>Note: While the extent to which this is true may be a matter of dispute, competitors clearly overreach when they make assertions like &#8220;every major Vertica query needs a projection prebuilt for it.&#8221;</em></li>
<li>On the other hand, it is advisable to build projections (automatically or manually) that optimize performance of certain parts of your query load.</li>
</ul>
<p>The blog posts contain a lot more than that, of course, both rah-rah and technical detail, including reminders of other Vertica advantages (compression, no logging, etc.). If you&#8217;re interested in analytic DBMS, they&#8217;re worth a look.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/07/vertica-projections/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Virtual data marts in Sybase IQ</title>
		<link>http://www.dbms2.com/2011/08/26/virtual-data-marts-in-sybase-iq/</link>
		<comments>http://www.dbms2.com/2011/08/26/virtual-data-marts-in-sybase-iq/#comments</comments>
		<pubDate>Sat, 27 Aug 2011 04:11:46 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5142</guid>
		<description><![CDATA[I made a few remarks about Sybase IQ 15.3 when it became generally available in July. Now that I&#8217;ve had a current briefing, I&#8217;ll make a few more. The key enhancement in Sybase IQ 15.3 is distributed query &#8212; what others might call parallel query &#8212; aka PlexQ. A Sybase IQ query can now be [...]]]></description>
			<content:encoded><![CDATA[<p>I made a <a href="../../../../../2011/07/07/sybase-iq-soundbites/">few remarks about Sybase IQ 15.3</a> when it became generally available in July. Now that I&#8217;ve had a current briefing, I&#8217;ll make a few more.</p>
<p>The key enhancement in Sybase IQ 15.3 is distributed query &#8212; what others might call parallel query &#8212; aka PlexQ. A Sybase IQ query can now be distributed among many nodes, all talking to the same SAN (Storage-Area Network). Any Sybase IQ node can take the responsibility of being the &#8220;leader&#8221; for that particular query.</p>
<p>In itself, this isn&#8217;t that impressive; all the same things could have been said about pre-Exadata Oracle.* But PlexQ goes somewhat further than just removing a bottleneck from Sybase IQ. Notably, Sybase has rolled out a <strong>virtual data mart</strong> capability. Highlights of the Sybase IQ virtual data mart story include:   <span id="more-5142"></span></p>
<ul>
<li>A virtual data mart takes minutes for a DBA to set up.</li>
<li>A virtual data mart has a number of &#8220;logical&#8221; servers and disk volumes.</li>
<li>A virtual data mart can include data from the core Sybase IQ database, plus additional data that might not have passed data warehouse bureaucratic muster. (Perhaps even more than <a href="../../../../../2010/10/06/ebay-followup-greenplum-out-teradata-10-petabytes-hadoop-has-some-value-and-more/">Teradata</a>, Sybase sees this as being the primary virtual data mart use case.)</li>
<li>Sybase IQ virtual data marts seem to be the mechanism for certain aspects of workload management. For example, they seem to be the only way to extend workload management to <a href="../../../../../2010/05/23/sybase-iq-15/">Sybase IQ&#8217;s in-database analytics</a>.</li>
</ul>
<p><em>*Of course, as a robust columnar DBMS, Sybase IQ lacks the fatal data warehousing drawbacks of pre-Exadata Oracle: I/O limitations, and the unnatural acts of database administration they induce.</em></p>
<p>Sybase is also proud of the elasticity of its new architecture, but seems no more able than I to come up with a use case in which anybody would much care.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/08/26/virtual-data-marts-in-sybase-iq/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Sybase IQ soundbites</title>
		<link>http://www.dbms2.com/2011/07/07/sybase-iq-soundbites/</link>
		<comments>http://www.dbms2.com/2011/07/07/sybase-iq-soundbites/#comments</comments>
		<pubDate>Thu, 07 Jul 2011 16:27:28 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4933</guid>
		<description><![CDATA[Sybase made a total hash of the timing of this week&#8217;s press release. I got annoyed after they promised to inform me of the new embargo time, then broke the promise. Other people got annoyed earlier than that. So be it. Below is the draft of a post I was holding, with brackets added around [...]]]></description>
			<content:encoded><![CDATA[<p><em>Sybase made a total hash of the timing of this week&#8217;s press release. I got annoyed after they promised to inform me of the new embargo time, then broke the promise. Other people got annoyed earlier than that. </em></p>
<p><em>So be it. Below is the draft of a post I was holding, with brackets added around one word that is no longer accurate.<br />
</em></p>
<p>I don&#8217;t write enough about Sybase IQ. That said, I offered a couple of quotes to a reporter [yesterday] in connection with the general availability of Sybase IQ 15.3. Lightly edited, they go:</p>
<ul>
<li>&#8220;Shared-everything MPP&#8221; isn&#8217;t a total contradiction in terms. It&#8217;s great for adding in concurrent users. And there&#8217;s little doubt that Sybase IQ can support robust access to databases 10s of terabytes in size.</li>
<li>As I first noted a couple of years ago, <a href="../../../../../2009/06/08/the-future-of-data-marts/">virtual data marts are a good idea</a>. Too few vendors are making it easy to spin them out. They let departments start doing analytics very quickly, yet allow IT to keep partial control.</li>
</ul>
<p>Beyond that, I should note:</p>
<ul>
<li>Sybase IQ is the classic choice for what I call <a href="http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/">traditional data marts</a>.</li>
<li>Sybase IQ is a leader in <a href="http://www.dbms2.com/2011/06/20/temporal-data-time-series-and-imprecise-predicates/">temporal functionality</a>, which is not coincidental to its presence in the financial services market.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/07/sybase-iq-soundbites/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

