<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Microsoft and SQL*Server</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/microsoft-sqlserver/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Wed, 08 Feb 2012 17:17:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Comments on the analytic DBMS industry and Gartner&#8217;s Magic Quadrant for same</title>
		<link>http://www.dbms2.com/2012/02/08/gartner-magic-quadrant-data-warehouse-2011-2012/</link>
		<comments>http://www.dbms2.com/2012/02/08/gartner-magic-quadrant-data-warehouse-2011-2012/#comments</comments>
		<pubDate>Wed, 08 Feb 2012 17:17:32 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Exasol]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Kognitio]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[illuminate Solutions]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5926</guid>
		<description><![CDATA[This year&#8217;s Gartner Magic Quadrant for Data Warehouse Database Management Systems is out.* I shall now comment, just as I did on the 2010, 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants, to varying extents. To frame the discussion, let me start by saying: In general, I regard Gartner Magic [...]]]></description>
			<content:encoded><![CDATA[<p>This year&#8217;s Gartner Magic Quadrant for Data Warehouse Database Management Systems is out.* I shall now comment, just as I did on the <a href="http://www.dbms2.com/2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/">2010</a>, <a href="../../../../../2010/02/10/gartner-magic-quadrant-data-warehouse-2009-2010/">2009</a>, <a href="../../../../../2009/01/12/gartners-2008-data-warehouse-database-management-system-magic-quadrant-is-out/">2008</a>, <a href="../../../../../2007/10/19/gartner-2007-magic-quadrant-for-data-warehouse-database-management-systems/">2007</a>, and <a href="../../../../../2006/10/03/vendor-segmentation-for-data-warehouse-dbms/">2006</a> Gartner Data Warehouse Database Management System Magic Quadrants, to varying extents. To frame the discussion, let me start by saying:</p>
<ul>
<li>In general, I regard Gartner Magic Quadrants as a bad use of good research.</li>
<li>Illustrating the uselessness of &#8212; or at least poor execution on &#8212; the  overall quadrant metaphor, a large majority of the vendors covered are  lined up near the line x = y, each outpacing the one below in both of  the quadrant&#8217;s dimensions.</li>
<li>I find fewer specifics to disagree with in this Gartner Magic Quadrant than in previous year&#8217;s versions. Two factors jump to mind as possible reasons:
<ul>
<li>This year&#8217;s Gartner Magic Quadrant for Data Warehouse Database Management Systems is somewhat less ambitious than others; while it gives as much company detail as its predecessors, it doesn&#8217;t add as much discussion of overall trends. So there&#8217;s less to (potentially) disagree with.</li>
<li><a href="http://www.dbms2.com/2010/12/28/evolving-definitions-and-technology-categories-for-2011/">Merv Adrian is now at Gartner</a>.</li>
</ul>
</li>
<li>Whatever the problems may be with Gartner&#8217;s approach, the whole thing comes out better than do <a href="http://www.dbms2.com/2011/02/11/comments-on-the-2011-forrester-wave-for-enterprise-data-warehouse-platforms/">Forrester&#8217;s failed imitations</a>.</li>
</ul>
<p><em>*At the time of this posting, I don&#8217;t yet have a link. However, I expect that to change quickly, and I plan to edit this paragraph accordingly. If nothing else, I hope people will drop links into the comment thread. </em></p>
<p>Specific company comments, roughly in line with Gartner&#8217;s rough single-dimensional rank ordering, include: <span id="more-5926"></span></p>
<ul>
<li>The Gartner Magic Quadrant&#8217;s comments on Teradata seem pretty fair. I don&#8217;t think I&#8217;m much in disagreement when I say:
<ul>
<li>Teradata has the richest, most mature analytic DBMS offering.</li>
<li>Teradata has an outstanding track record both for <a href="http://www.dbms2.com/2011/09/24/confusion-about-teradatas-big-customers/">managing large data volumes</a> and for high-concurrency mixed workloads.</li>
<li>Aster Data was a cool Teradata acquisition, even if Teradata/Aster synergies or integration have been nominal to date.</li>
<li>Teradata still needs to get out of its own way in marketing, positioning, packaging, and/or defining its premium-priced system vs. its more moderately-priced alternatives. Indeed, as necessary as this approach may have been to fending off encroachments by Netezza and others, what Teradata really needs to do is evolve to a more pick-your-own-node-combination mix-match kind of offering.</li>
</ul>
</li>
<li>Gartner has talked with a lot of Oracle Exadata users who say that the product works; Gartner has also stopped beating Oracle up for <a href="http://www.dbms2.com/2010/06/14/best-practices-analytic-database-poc/">its previous policy of almost never doing onsite POCs (Proofs of Concept)</a>; both parts of that ring true with me. But Gartner also rightly dings Oracle for various issues in cost and cumbersomeness. Overall, while I agree there are organizations for which Oracle should indeed be a top-ranked choice, there are many others who shouldn&#8217;t put Oracle on their short list.</li>
<li>Third in the Gartner MQ rankings is IBM.
<ul>
<li>Gartner gets so caught up in reciting the names of various IBM product offerings that it neglects to say much good about DB2 itself. (I tend to have a similar problem.)</li>
<li>But Gartner does mention concurrency as a strength. I agree, especially if we presume that that was a reference to DB2 rather than Netezza.</li>
<li>Gartner cites Netezza&#8217;s post-acquisition annual growth rate as 30%. Gartner seems to think this is a good number. I disagree, but in Netezza&#8217;s defense, it has had to endure IBM&#8217;s post-acquisition on-boarding process.</li>
</ul>
</li>
<li>Arguably fourth in the Gartner Data Warehouse Magic Quadrant rankings is EMC/Greenplum.
<ul>
<li>In general, Gartner likes the taste of Greenplum Kool-Aid.</li>
<li>Gartner neglects to ding Greenplum for concurrency challenges, which I view as an oversight given Gartner&#8217;s general stress on that area.</li>
<li>Gartner does ding Greenplum for support challenges.</li>
<li>Gartner neglects to praise Greenplum for true <a href="http://www.dbms2.com/2009/10/14/greenplum-hybrid-columnar/">hybrid row/columnar data management</a>, a feature shared by <a href="http://www.dbms2.com/2011/09/22/teradata-columnar-compression/">Teradata</a> and <a href="http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/">Vertica</a>, among others, but not by <a href="http://www.dbms2.com/2011/02/06/columnar-compression-database-storage/">Oracle</a>, DB2, or Netezza.</li>
<li>Gartner located a half-petabyte Greenplum database. This doesn&#8217;t surprise me, even though Greenplum has frequently made exaggerated claims about large-size database successes in the past.</li>
<li>Gartner reports a &gt;400 figure for Greenplum customers, which is plausible.</li>
</ul>
</li>
<li>In its first deviation from strict one-dimensional rank ordering, the Gartner Magic Quadrant ranks Sybase ahead of Greenplum in completeness of vision but behind in &#8220;ability to execute&#8221;.
<ul>
<li>If that were the other way around, it might make more sense. Greenplum promises anything and everything you might ever want for analytic data management or the associated analysis; but Sybase has vastly more analytic DBMS users than Greenplum does, running a variety of demanding workloads.</li>
<li>Gartner appears to think that Sybase IQ requires less database administration than I do.</li>
<li>Gartner seems concerned that SAP will position HANA and Sybase ASE as, between them, the only DBMS you&#8217;ll ever need, casting doubt on Sybase IQ&#8217;s future. I wouldn&#8217;t worry about that if you have a problem you want to solve today.</li>
</ul>
</li>
<li>The Gartner Magic Quadrant for Data Warehouse Database Management Systems ranks Microsoft sixth overall, despite noting that there isn&#8217;t a single production reference for Microsoft&#8217;s Parallel Data Warehouse. In support of this ranking, it for example cites the compression feature, which distinguishes Microsoft SQL Server from no other product on the list except Kognitio. If you have such an undemanding data warehousing problem that many different analytic DBMS could meet your needs, there&#8217;s a good chance Microsoft SQL Server can also do the job; and if you&#8217;ve bought into the Microsoft technology stack, you might as well keep going down that path. Otherwise, I don&#8217;t know why somebody should adopt Microsoft&#8217;s offering at this time.</li>
<li>Seventh along the main diagonal path in the Gartner Magic Quadrant is HP Vertica. I&#8217;d rank Vertica higher than that, but in fairness I note two execution concerns. First, HP has a lousy track record, both in acquisitions and in data warehousing/analytics. Second, Vertica is bad about answering my email. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Anyhow, Gartner doesn&#8217;t seem to have given Vertica credit either for <a href="http://www.dbms2.com/2011/06/20/columnar-dbms-vendor-customer-metrics/">its full customer count or for the multiple petabyte-scale databases Vertica runs</a>.</li>
<li>1010data is an outlier, with Gartner noting that it only partly fits in with other &#8220;Data Warehousing Database Management&#8221; companies, and hence kind of confessing that 1010data on the Magic Quadrant is somewhat arbitrary. Stuff like that is bound to happen, given <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">the inherent difficulties of defining market categories</a>. Anyhow, my thoughts on 1010data include:
<ul>
<li>I&#8217;m nervous about the fact that 1010data doesn&#8217;t actually control its own DBMS technology, but rather relies on old code from the small private company KX Systems.</li>
</ul>
<ul>
<li> There are three main reasons to consider 1010data:
<ul>
<li>You want to enter the data mart outsourcing business in a casual way, and you like its SaaS offering.</li>
<li>You want to engage in <a href="http://www.dbms2.com/2010/05/15/stakeholder-facing-analytics/">stakeholder-facing analytics</a> in a casual way, and you like its SaaS offering.</li>
<li>You love 1010data&#8217;s particular set of interactive analytic features and performance.</li>
</ul>
</li>
</ul>
</li>
<li>Back to the main path winding along the Gartner Magic Quadrant main diagonal &#8212; next up is ParAccel. While I question some of the peripheral comments, I agree with Gartner&#8217;s core messages that:
<ul>
<li>ParAccel, the product, is blazingly fast in certain use cases.</li>
<li>ParAccel, the company, is dangerously small.</li>
</ul>
</li>
<li>Eighth on the Gartner MQ&#8217;s main path is Kognitio. This is too high. Kognitio positions itself as offering in-memory DBMS, yet stubbornly refuses to do any kind of data compression. That&#8217;s an awful combination of choices. As for using Kognitio&#8217;s data warehousing SaaS offering &#8212; why would you do that, when more modern products are available on a SaaS/cloud basis as well?</li>
<li>Ninth in the Gartner Magic Quadrant main rankings is SAND.
<ul>
<li>The SAND section is not a triumph of Gartner accuracy. For example:
<ul>
<li><a href="http://www.dbms2.com/2011/11/12/clarifying-sands-customer-metrics-positioning-and-technical-story/">Gartner completely missed the errors in SAND&#8217;s reported customer counts</a>.</li>
<li>Gartner refers to SAND as being &#8220;in existence for approximately nine years&#8221;, which is too low by at least a factor of 2.</li>
<li>Gartner says &#8220;SAND is a privately held company&#8221;, even though <a href="http://itmarketstrategy.com/2009/06/07/sand-technology-a-risky-bet/">Merv knows better than that</a>.</li>
</ul>
</li>
<li>Otherwise, Gartner&#8217;s opinion on SAND seems to boil down to &#8220;Interesting technology and ideas, but dangerously small company.&#8221; I agree.</li>
</ul>
</li>
<li>Tenth and too low in the Gartner MQ main rankings is Infobright.
<ul>
<li>At least by some metrics (e.g. customer count), Infobright isn&#8217;t as dangerously small as ParAccel, SAND, Kognitio, et al.</li>
<li>That said, Infobright is small and focused on <a href="http://www.dbms2.com/2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a>. So I wouldn&#8217;t be confident in Infobright&#8217;s future technology path for human-generated data use cases.</li>
<li>Infobright&#8217;s performance is uneven &#8212; blazing in cases where the Knowledge Grid helps, but not necessarily stellar by analytic DBMS standards when full table scans are called for.</li>
<li>I agree with Gartner that the possibility of Oracle/MySQL future shenanigans is a concern. But while the energy behind MySQL forking efforts doesn&#8217;t seem too great right now, I&#8217;d expect them to revive and offer a successful escape path if it seemed Oracle was going to indeed play hardball.</li>
<li>Also, given that it&#8217;s already an open source vendor, there are various kinds of assurances Infobright could give that would also help alleviate customer concerns.</li>
</ul>
</li>
<li>Actian, formerly Ingres, took a big tumble in Gartner&#8217;s rankings versus last year, when I simply wrote &#8220;<a href="http://www.dbms2.com/2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/">What Gartner said in connection with <strong>Ingres</strong> is too inaccurate to deserve detailed attention</a>.&#8221; I&#8217;m even a little harsher about <a href="http://www.dbms2.com/2011/09/25/ingres-actian/">Ingres/Actian&#8217;s DBMS products and prospects</a> than Gartner is, but at least now we&#8217;re in the same ballpark.</li>
<li>Along with Infobright, ParAccel, and SAND, <a href="http://www.dbms2.com/2011/11/12/exasol-update/">Exasol</a> appears to be another of the &#8220;good columnar technology/small company&#8221; crowd. As with other such products, one should be careful about fit-and-finish features that are missing today, as there is no assurance they&#8217;ll be added in a timely manner going forward.</li>
<li>illuminate Solutions, which was on last year&#8217;s Gartner list, <a href="http://www.dbms2.com/2012/01/16/has-illuminate-solutions-joined-the-choir-invisible/">now appears to be an ex-company</a>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/08/gartner-magic-quadrant-data-warehouse-2011-2012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Microsoft SQL Server 2012 and enterprise database choices in general</title>
		<link>http://www.dbms2.com/2012/01/24/microsoft-sql-server-2012/</link>
		<comments>http://www.dbms2.com/2012/01/24/microsoft-sql-server-2012/#comments</comments>
		<pubDate>Tue, 24 Jan 2012 14:42:34 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Mid-range]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5859</guid>
		<description><![CDATA[Microsoft is launching SQL Server 2012 on March 7. An IM chat with a reporter resulted, and went something like this. Reporter: [Care to comment]? CAM: SQL Server is an adequate product if you don&#8217;t mind being locked into the Microsoft stack. For example, the ColumnStore feature is very partial, given that it can&#8217;t be [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.sqlserverlaunch.com/ww/Home">Microsoft is launching SQL Server 2012 on March 7</a>. An IM chat with a reporter resulted, and went something like this.</p>
<p><strong>Reporter: [Care to comment]?</strong><br />
<strong>CAM:</strong> SQL Server is an adequate product if you don&#8217;t mind being locked into the Microsoft stack. For example, the ColumnStore feature is very partial, given that <a href="http://msdn.microsoft.com/en-us/library/gg492088%28v=sql.110%29.aspx#Update">it can&#8217;t be updated</a>; but Oracle doesn&#8217;t have columnar storage at all.</p>
<p><strong>Reporter: Is the lock-in overall worse than IBM DB2, Oracle?</strong><br />
<strong>CAM:</strong> Microsoft locks you into an operating system, so yes.</p>
<p><strong>Reporter: Is this release something larger Oracle or IBM shops could consider as a lower-cost alternative a co-habitation scenario, in the event they&#8217;re mulling whether to buy more Oracle or IBM licenses?</strong><br />
<strong>CAM:</strong> If they have a strong Microsoft-stack investment already, sure. Otherwise, why?</p>
<p><strong>Reporter: [How about] just cost?</strong><br />
<strong>CAM:</strong> DB2 works just as well to keep Oracle honest as SQL Server does, and without a major operating system commitment. For analytic databases you want an analytic DBMS or appliance anyway.</p>
<p>Best is to have one major vendor of OTLP/general-purpose DBMS, a web DBMS, a DBMS for disposable projects (that may be the same as one of the first two), plus however many different analytic data stores you need to get the job done.</p>
<p>By &#8220;web DBMS&#8221; I mean MySQL, NewSQL, or NoSQL. Actually, you might need more than one product in that area.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/01/24/microsoft-sql-server-2012/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Eight kinds of analytic database (Part 1)</title>
		<link>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/</link>
		<comments>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 08:17:44 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Buying processes]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MOLAP]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[QlikTech and QlikView]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Web analytics]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4868</guid>
		<description><![CDATA[Analytic data management technology has blossomed, leading to many questions along the lines of &#8220;So which products should I use for which category of problem?&#8221; The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for &#8220;big data&#8221; is little help. Let&#8217;s try eight categories instead. While no categorization [...]]]></description>
			<content:encoded><![CDATA[<p>Analytic data management technology has blossomed, leading to many questions along the lines of &#8220;So which products should I use for which category of problem?&#8221; The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for &#8220;big data&#8221; is little help.</p>
<p>Let&#8217;s try eight categories instead. While <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">no categorization is ever perfect</a>, these each have at least some degree of technical homogeneity. Figuring out which types of analytic database you have or need &#8212; and in most cases you&#8217;ll need several &#8212; is a great early step in your analytic technology planning.  <span id="more-4868"></span></p>
<p><strong><em>Enterprise data warehouse</em></strong> (Full or partial)</p>
<ul>
<li><em>Kinds of data likely to be included:</em> All, but especially operational</li>
<li><em>Likely use styles:</em> All</li>
<li><em>Canonical example:</em> Central EDW for a big enterprise</li>
<li><em>Stresses:</em> Concurrency, reliability, workload management</li>
</ul>
<p>The enterprise data warehouse (EDW) ideal says that you copy all your data into one place, and drive all decision-making from there. <a href="../../../../../2011/06/21/its-official-the-grand-central-edw-will-never-happen/">Full EDWs are pipedreams</a>. Still, a partial EDW makes sense for most large enterprises, and many indeed already have one. The first product lines to consider for classical EDWs are Teradata, DB2, Exadata, and maybe Microsoft SQL Server, especially if you&#8217;re going to stress concurrency and/or operational use cases.</p>
<p><strong><em>Traditional data mart</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All</li>
<li><em>Likely use styles:</em> Business intelligence, budgeting/consolidation, investigative</li>
<li><em>Examples:</em> Reporting servers, planning/consolidation servers, anything MOLAP, etc.</li>
<li><em>Stresses:</em> Performance, concurrency, TCO</li>
</ul>
<p>Whether or not you have something like an enterprise data warehouse, it&#8217;s common to have lighter-weight data marts as well. A traditional data mart might drive reports and dashboards. Or it might be specialized for budgeting, planning, and/or consolidation.  Some <a href="../../../../../2011/03/03/investigative-analytics/">investigative analytics</a> may be in the mix as well.</p>
<p>Any DBMS that can support an EDW can also support a data mart, but it may not be the most cost-effective way to do so. Columnar DBMS might have more attractive performance and TCO (Total Cost of Ownership); the same goes for Netezza. Some of them &#8212; e.g. Sybase IQ and <a href="../../../../../2011/06/20/vertica-release-5/">Vertica</a> &#8212; have excellent track records in concurrent usage as well. <a href="../../../../../2011/05/29/when-to-use-relational-database-management-system/">Ted Codd</a> pushed what amounts to MOLAP (Multidimensional OnLine Analytic Processing) systems for these use cases. But relational DBMS commonly do a better job, which is one reason most major MOLAP products have wound up at RDBMS companies.</p>
<p><strong><em>Investigative data mart &#8212; agile</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All, especially customer-centric</li>
<li><em>Likely use styles</em>: Investigative</li>
<li><em>Canonical example:</em> A few analysts getting a few TB to examine</li>
<li><em>Stresses:</em> Ease of setup/load, ease of admin, price/performance</li>
</ul>
<p>Besides the traditional data mart, there are at least two other kinds. Both are focused on investigative analytics, but they&#8217;re differentiated by database size.</p>
<p>If you have just a few analysts,* looking at no more than a few terabytes of data (perhaps even just some gigabytes) &#8212; and if that data is &#8220;single-subject&#8221; and fairly homogenous &#8212; your watchwords should be &#8220;cheap&#8221;, &#8220;easy&#8221;, and &#8220;fast&#8221;. You don&#8217;t need to invest in much hardware, in expensive software, in much administrative effort (the analysts can be their own DBAs),  nor should you endure much set-up time. Just grab a product, grab some data, and start running queries (or extracts into the statistical tool of your choice).</p>
<p><em>*If you have dozens or even hundreds of analysts hitting the same database, you&#8217;re probably back to the more concurrency-oriented scenarios outlined above.</em></p>
<p>Infobright is often cost-effective among columnar analytic DBMS. Other vendors might cut you a price break as well. If you have multiple terabytes of data, don&#8217;t rule out Netezza&#8217;s lowest-end products (even if they&#8217;d really rather sell you something bigger). Or, if you&#8217;re in the sub-terabyte range, maybe you can get by with an in-memory BI tool such as QlikView, and not do anything special on the DBMS side at all.</p>
<p><strong><em>Investigative data mart &#8212; big</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All, especially customer-centric, logs, financial trade, scientific</li>
<li><em>Likely use styles</em>: Investigative</li>
<li><em>Canonical example:</em> Single-subject 20 TB &#8211; 20 PB relational database<em></em></li>
<li><em>Stresses:</em> Performance, scale-out, analytic functionality</li>
</ul>
<p>But if you&#8217;re looking at tens of terabytes of relational data, or even more, you really do have a &#8220;big data&#8221; problem. Performance and scalability are major challenges, usually best addressed by MPP (Massively Parallel Processing) systems, such as Netezza, Vertica, Aster Data, ParAccel, Teradata, or Greenplum. Performance POCs (Proofs Of Concept) are a big part of the buying process. Vendor price negotiations are crucial too.</p>
<p><em>Actually, in the low tens of terabytes you might be able to get away with a shared-disk system that has excellent compression &#8212; e.g., columnar products like Sybase IQ, Infobright, or SAND, rather than just Vertica and ParAccel.</em></p>
<p>Assuming you have affordable, scalable query performance, the competitive differentiator can switch to additional analytic functionality. Aster, Netezza, ParAccel, Vertica, and Greenplum either offer full <a href="../../../../../2011/02/24/analytic-platforms/">analytic platforms</a>, or seem to be on the path to doing so. Teradata, which now owns Aster Data, offers substantial built-in analytic capability in its traditional products as well, and the same goes for Sybase IQ.</p>
<p><em>Continued in <a href="http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/">Part 2</a>,</em><em> where we cover some of the more difficult use cases.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Comments on the Gartner 2010/2011 Data Warehouse Database Management Systems Magic Quadrant</title>
		<link>http://www.dbms2.com/2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/</link>
		<comments>http://www.dbms2.com/2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/#comments</comments>
		<pubDate>Sat, 05 Feb 2011 15:49:39 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[1010data]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Ingres]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Workload management]]></category>
		<category><![CDATA[illuminate Solutions]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=3744</guid>
		<description><![CDATA[The Gartner 2010 Data Warehouse Database Management Systems Magic Quadrant is out. I shall now comment, just as I did to varying degrees on the 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants. Note: Links to Gartner Magic Quadrants tend to be unstable. Please alert me if any problems arise; [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://www.gartner.com/technology/media-products/reprints/teradata/vol3/article1/article1.html">Gartner 2010 Data Warehouse Database Management Systems Magic Quadrant</a> is out. I shall now comment, just as I did to varying degrees on the <a href="../../../../../2010/02/10/gartner-magic-quadrant-data-warehouse-2009-2010/">2009</a>, <a href="../../../../../2009/01/12/gartners-2008-data-warehouse-database-management-system-magic-quadrant-is-out/">2008</a>, <a href="../../../../../2007/10/19/gartner-2007-magic-quadrant-for-data-warehouse-database-management-systems/">2007</a>, and <a href="../../../../../2006/10/03/vendor-segmentation-for-data-warehouse-dbms/">2006</a> Gartner Data Warehouse Database Management System Magic Quadrants.</p>
<p><em>Note: Links to Gartner Magic Quadrants tend to be unstable. Please alert me if any problems arise; I&#8217;ll edit accordingly.</em></p>
<p>In <a href="../../../../../2009/01/12/gartners-2008-data-warehouse-database-management-system-magic-quadrant-is-out/">my comments on the 2008 Gartner Data Warehouse Database Management Systems Magic Quadrant</a>, I observed that <strong>Gartner&#8217;s &#8220;completeness of vision&#8221; scores were generally pretty reasonable,</strong> but their<strong> &#8220;ability to execute&#8221; rankings were somewhat bizarre;</strong> the same remains true this year. For example, Gartner ranks Ingres higher by that metric than Vertica, Aster Data, ParAccel, or Infobright. Yet each of those companies is growing nicely and delivering products that meet serious cutting-edge analytic DBMS needs, neither of which has been true of Ingres since about 1987.  <span id="more-3744"></span></p>
<p>The general list of &#8220;market forces, end-user expectations and vendors&#8217; resulting solution approaches&#8221; at the top of the 2010 Gartner Data Warehouse Database Management System Magic Quadrant article is a mixed bag. Following Gartner&#8217;s order, I&#8217;ll address those first, and particular companies cited afterwards. Specific items and comments include:</p>
<ul>
<li><strong>&#8220;Increased demand for optimization techniques and performance enhancement.</strong><strong>&#8220;</strong> Gartner seems to be saying that data warehouse DBMS buyers want lists of specific, esoteric performance features. Well, buyers always want their DBMS to run fast, and they&#8217;d like the products to be mature enough to have been through a few rounds of <a href="../../../../../2009/08/21/bottleneck-whack-a-mole/">Bottleneck Whack-A-Mole</a>, but otherwise I&#8217;m not sure I&#8217;d put that at the top of my list.</li>
<li><strong>&#8220;</strong><strong>The argument made by purchasing departments that buying power increases when dealing with a single, incumbent vendor.</strong><strong>&#8220;</strong><strong> </strong>I agree that <a href="../../../../../2011/02/02/exadata-notes/">vendor consolidation and account control</a> are a huge part of the Oracle, Microsoft, IBM and even Teradata stories. (Vertica can prove it&#8217;s 10X more price-performant than Oracle and still not get the business.) But it&#8217;s not just about price negotiations; once annual maintenance is included, one has to squint pretty hard to see Oracle as a low-cost alternative. Also important is reducing the number of total product-specific skill-sets needed on the IT staff.</li>
<li><strong>&#8220;</strong><strong>Prepackaged, prebalanced warehouse environments delivered using data warehouse appliances.</strong><strong>&#8220;</strong> Yep. To varying extents, Oracle, Microsoft, Teradata, and IBM are all committed to designed-hardware strategies.</li>
<li><strong>&#8220;</strong><strong>Expectations for the delivery of on-site POCs.</strong><strong>&#8220;</strong> Honestly, not as many buyers insist on on-site Proofs of Concept as should. Still, Oracle is shameful in its reluctance to do them. (Teradata tries to avoid them too, for obvious reasons of expense, but is much more gracious about capitulating when the buyer insists.)</li>
<li><strong>&#8220;</strong><strong>Cost controls and data warehouse performance management.</strong><strong>&#8220;</strong><strong> </strong>See next comment.</li>
<li><strong>&#8220;</strong><strong>Demands for delivering a fully mixed workload.</strong><strong>&#8220;</strong><strong> </strong>I&#8217;d have phrased the workload management and administrative tools points rather differently than this, but so be it.<strong> </strong></li>
<li><strong>&#8220;</strong><strong>Demands for departmental analytics delivered quickly via data marts.</strong><strong>&#8220;</strong><strong> </strong>Agreed. Data-mart-only installations are a huge part of the market of the analytic DBMS market. <a href="../../../../../2009/06/08/the-future-of-data-marts/">Data mart spin-out</a> is also important.</li>
<li><strong>&#8220;</strong><strong>Wider indexing and fast performance within clusters of data, delivered via column-based solutions.</strong><strong>&#8220;</strong> This bizarrely seems to conflate column stores and parallel processing (both of which are of course highly important).</li>
<li><strong>&#8220;</strong><strong>A wave of new data warehouse implementers seeking fast-track, low-risk delivery.</strong><strong>&#8220;</strong> Well, yes. Netezza noticed that quite some years ago. And by now the <a href="../../../../../2010/04/12/enterprise-data-warehouse-edw-myt/">long-gestation EDW (Enterprise Data Warehouse)</a> is widely disliked.</li>
<li><strong>&#8220;</strong><strong>Global organizations seeking distributed solutions as potential architecture.</strong><strong>&#8220;</strong> If this is the MPP point, it&#8217;s oddly phrased. If this is a suggestion that data warehouses should be partitioned across wide-area networks, it&#8217;s just plain odd. If it&#8217;s a reiteration that departments like to control their own data marts, I agree. And if it&#8217;s a comment on keep-data-in-the-country privacy laws, it could be the most prescient thing Donald Feinberg has said in many years.</li>
</ul>
<p>Long though it is, that list of general items and issues for the 2010 Gartner Data Warehouse Database Management System Magic Quadrant has some gaps. Most glaringly, I don&#8217;t see any references to <a href="../../../../../2011/01/24/analytic-computing-system/">advanced analytics</a> in general, or even to the specific case of <a href="../../../../../2010/05/15/further-clarifying-in-database-mpp-sas/">integrated predictive analytics</a>. There&#8217;s also nothing about solid-state memory or other storage-technology considerations, although in fairness it&#8217;s still early days for much of what vendors conceive of as competitive differentiation in those respects.</p>
<p>Here are some vendor-specific comments on the 2010 Gartner Data Warehouse Database Management System Magic Quadrant:</p>
<ul>
<li>It&#8217;s pretty bizarre to compare <strong>1010data</strong> to database.com or Microsoft Azure. Kognitio would be a better choice. So would cloud-hosted instances of Vertica, Aster Data nCluster, or others.</li>
<li>Gartner&#8217;s comments on <strong>Aster Data</strong> and nCluster are actually pretty reasonable.</li>
<li>Gartner&#8217;s comments on <strong>EMC/Greenplum</strong> are a bit Kool-Aid-drinky, and don&#8217;t account for the inevitable flailing that occurs right after an acquisition. But otherwise they&#8217;re pretty reasonable.</li>
<li>I don&#8217;t take <strong>IBM&#8217;s</strong> super-comprehensive-all-inclusive architectural stories as seriously as Gartner does.</li>
<li>I don&#8217;t take <strong>Netezza&#8217;s</strong> small stable of OEM partners as seriously as Gartner does. I also don&#8217;t share Gartner&#8217;s optimism for the continuation of Netezza&#8217;s NEC partnership in the face of IBM&#8217;s Netezza ownership.</li>
<li>I&#8217;m even more skeptical about <a href="../../../../../2008/03/27/the-illuminate-guys-have-a-cto-blog/">illuminate</a> than Gartner is.</li>
<li>I&#8217;m delighted that Gartner has adopted my phrase <a href="../../../../../2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a> <strong>(Infobright</strong> is one of several firms pushing that one).</li>
<li>&#8220;Only open-source column-store DBMS&#8221; is a bit exaggerated, but Infobright is indeed the only one with serious traction, or offered by a serious analytic DBMS vendor.</li>
<li>What Gartner said in connection with <strong>Ingres</strong> is too inaccurate to deserve detailed attention.</li>
<li>While Gartner&#8217;s write-up of <strong>Kognitio</strong> is a bit confused, that&#8217;s excusable. Kognitio&#8217;s strategy changes often.</li>
<li>I&#8217;m not persuaded by the claim of low <strong>Microsoft</strong> TCO. The days when Microsoft&#8217;s tools were vastly better than the competition&#8217;s are long gone. And using an OLTP DBMS for data warehousing generally takes more people effort than using something more purpose-built.</li>
<li>Gartner is right to ding <strong>Oracle</strong> for high prices, high people costs, and unwillingness to do onsite POCs.</li>
<li>Gartner is right that <strong>Exadata</strong> is a huge improvement over non-Exadata Oracle data warehousing.</li>
<li>Gartner is right to suggest that Exadata can easily handle data warehouses over 20 terabytes in size, but wrong to suggest that software-only Oracle also can. Just because the pain is less than it was with earlier releases of Oracle doesn&#8217;t mean it isn&#8217;t still bad.</li>
<li>Gartner&#8217;s comments on <strong>ParAccel</strong> are pretty reasonable.</li>
<li>Gartner&#8217;s comments on compression in connection with <strong>SAND</strong> make no technical sense (tokenization is a key form of columnar compression, not an alternative to it). Also, SAP&#8217;s acquisition of Sybase is a business challenge for SAND, not a technical one.</li>
<li>Unless I&#8217;m forgetting something, <strong>Sybase IQ</strong> has no more in-database data mining than any other Fuzzy Logix partner does.</li>
<li>Gartner failed to note that, like other DBMS dating back to the 1990s and before, Sybase IQ is more complex to administer than some newer products are.</li>
<li>Gartner&#8217;s take on <strong>Teradata </strong>is pretty reasonable.</li>
<li>Gartner&#8217;s take on <strong>Vertica, </strong>while sloppy, is basically sensible. However, Gartner failed to note that Vertica is a laggard in non-query analytics. (I am sure those deficiencies are being addressed, but Vertica&#8217;s competitors are moving ahead as well.)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/feed/</wfw:commentRss>
		<slash:comments>23</slash:comments>
		</item>
		<item>
		<title>Sound bites on HP/Microsoft and Neoview</title>
		<link>http://www.dbms2.com/2011/01/19/sound-bites-on-hpmicrosoft-and-neoview/</link>
		<comments>http://www.dbms2.com/2011/01/19/sound-bites-on-hpmicrosoft-and-neoview/#comments</comments>
		<pubDate>Wed, 19 Jan 2011 17:05:57 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=3611</guid>
		<description><![CDATA[HP and Microsoft put out a press release.  Three new appliances are being announced, and we&#8217;re being reminded of at least one past announcement. I wasn&#8217;t briefed, and wouldn&#8217;t want to comment on, say, price/performance or feature particulars. That said: HP Neoview seems pretty dead. I haven&#8217;t heard a single favorable reference to HP Neoview [...]]]></description>
			<content:encoded><![CDATA[<p>HP and <a href="http://www.microsoft.com/Presspass/press/2011/jan11/01-18HPMSAppliancesPR.mspx">Microsoft</a> put out a <a href="http://www.hp.com/hpinfo/newsroom/press/2011/110119xa.html">press release</a>.  Three new appliances are being announced, and we&#8217;re being reminded of at least one past announcement. I wasn&#8217;t briefed, and wouldn&#8217;t want to comment on, say, price/performance or feature particulars. That said:</p>
<ul>
<li>HP Neoview seems pretty dead.</li>
<li>I haven&#8217;t heard a single favorable reference to HP Neoview since I remarked in March, 2010 that &#8220;<a href="http://www.dbms2.com/2010/03/19/some-business-trends-in-the-data-warehouse-market/">HP Neoview is reeling</a>.&#8221;</li>
<li>A reporter asked me &#8220;What went wrong?&#8221; Well, almost any new analytic DBMS/appliance product will compete mainly on two things in its early days &#8212; price/performance (or absolute performance), and just how (im)mature it initially is. (Aster Data may be the only prominent exception to that rule.) Presumably, HP Neoview did badly by those metrics.</li>
<li>HP Neoview was widely conjectured to be a pet project of ousted former HP CEO <a href="http://www.dbms2.com/2010/09/07/soundbites-about-mark-hurd-joining-oracle/">Mark Hurd</a>.</li>
<li>Nobody tells me of competing with Microsoft SQL Server 2008 Parallel Data Warehouse either (i.e. Madison/<a href="http://www.dbms2.com/2009/09/30/facts-and-rumors/">DATallegro</a>). Thus, in particular, I haven&#8217;t heard any reason to believe there&#8217;s anything good about the technology, especially now that the ever-upbeat Stuart Frost has left Microsoft. I&#8217;m conjecturing that Parallel Data Warehouse is focused heavily on the existing Microsoft installed base.</li>
<li>Speaking of Aster &#8212; even under NDA, they won&#8217;t tell me or give me any useful hints as to who their undisclosed strategic investor is. Well, HP has a long history of investing in sometimes-competing DBMS vendors (back to Oracle and Informix), and a good reason to keep quiet (reluctance to admit the end of Neoview). Hmm &#8230;</li>
<li>The consolidation appliance in the HP/Microsoft announcement is a clear response to <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/">Oracle&#8217;s Exadata strategy</a>, or (which is probably more accurate) to the same market opportunity Oracle identified.</li>
<li>I couldn&#8217;t quite figure out whether the cheap data warehouse appliance included Microsoft PowerPivot support, but that would make sense if it did.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/01/19/sound-bites-on-hpmicrosoft-and-neoview/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The Netezza and IBM DB2 approaches to compression</title>
		<link>http://www.dbms2.com/2010/06/21/netezza-ibm-db2-compression/</link>
		<comments>http://www.dbms2.com/2010/06/21/netezza-ibm-db2-compression/#comments</comments>
		<pubDate>Mon, 21 Jun 2010 12:05:47 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2320</guid>
		<description><![CDATA[Thursday, I spent 3 ½ hours talking with 10 of Netezza&#8217;s more senior engineers. Friday, I talked for 1 ½ hours with IBM Fellow and DB2 Chief Architect Tim Vincent, and we agreed we needed at least 2 hours more. In both cases, the compression part of the discussion seems like a good candidate to [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Thursday, <a href="http://www.dbms2.com/2010/06/21/netezza-database-software-technology-overview/">I spent 3 ½ hours talking with 10 of Netezza&#8217;s more senior engineers</a>. Friday, I talked for 1 ½ hours with IBM Fellow and DB2 Chief Architect Tim Vincent, and we agreed we needed at least 2 hours more. In both cases, the compression part of the discussion seems like a good candidate to split out into a separate post. So here goes.</p>
<p style="margin-bottom: 0in;">When you sell a row-based DBMS, as Netezza and IBM do, there are a couple of approaches you can take to compression. First, you can compress the blocks of rows that your DBMS naturally stores. Second, you can compress the data in a column-aware way. Both Netezza and IBM have chosen completely column-oriented compression, with no block-based techniques entering the picture to my knowledge. But that&#8217;s about as far as the similarity between Netezza and IBM compression goes.  <span id="more-2320"></span></p>
<p style="margin-bottom: 0in;"><strong>IBM&#8217;s basic DB2 compression strategy</strong> is remarkably simple. In every table (not column) – or in each range partition in a range-partitioned table &#8212; <strong>the 4096 most common* values are identified; these are all encoded into 12-bit strings</strong>. And that&#8217;s that. This has been happening since DB2 9.1, released 4 ½ years ago. DB2&#8242;s compression persists through logs, buffer pools (i.e., RAM cache), and so on. In DB2 9.7, the most recent release, IBM extended the use of the compression to a few areas it hadn&#8217;t stretched before, such as log-based replication, native XML, or CLOBs (Character Large OBjects) that happen not to be too big.</p>
<p style="margin-bottom: 0in;"><em>*Actually, I&#8217;d presume it&#8217;s not exactly the “most common”; there surely is some minimum length of a value to be encoded, or some bias toward length. Also, the determination of what to encode is probably a little imprecise. E.g., I forgot to ask whether the choice of values ever changes as data got updated.</em></p>
<p style="margin-bottom: 0in;">The sophisticated part of DB2&#8242;s simple compression strategy is its breadth of applicability; DB2 compression can apply to:</p>
<ul>
<li>Values in columns (numeric, 	character, whatever)</li>
<li>Substrings of values in columns</li>
<li>Groups of columns (e.g., 	city/state/zip code)</li>
</ul>
<p style="margin-bottom: 0in;">Except for the 4096 values limit, that sounds at least as flexible as the <a href="http://www.dbms2.com/2009/05/14/the-secret-sauce-to-clearpaces-compression/">Rainstor/Clearpace compression approach</a>.</p>
<p style="margin-bottom: 0in;"><strong>Netezza,</strong> unlike IBM, takes a grab-bag approach to compression – try out a bunch of techniques, see which work best, and incorporate those in the product. <a href="http://www.enzeecommunity.com/blogs/nzblog/2008/05/15/issue-19-the-compress-engine-the-netezza-philosophy">Netezza first introduced compression a couple of years ago,</a> for numeric columns only, especially integer.  Techniques used in Netezza numeric compression include but are not limited to:</p>
<ul>
<li>Delta compression, wherein you 	store the increment between a value and its predecessor rather than 	a whole new value.</li>
<li>Ways of indicating that a value or 	increment was just the same as in the row before.</li>
</ul>
<p style="margin-bottom: 0in;">This was via something called Compress Engine,* now being renamed to Compress Engine 1. Netezza&#8217;s new Compress Engine 2 improves on what Netezza did in Compress Engine 1 for numeric data, most notably by trimming away excess field length. (Netezza says it got 28% better compression on a test data set with almost no character strings, primarily from that enhancement.) Further, Netezza Compress Engine 2 adds new compression techniques, allowing it to handle VARCHAR – i.e. character strings &#8212; as well.</p>
<p style="margin-bottom: 0in;"><em>*Fortunately, the original name or at least description of “Compiled Tables” is retreating ever more from view.</em></p>
<p style="margin-bottom: 0in;">Netezza&#8217;s Compress Engine 2 has two ways to compress character fields/text strings – <strong>prefix compression </strong><span style="font-weight: normal;">and </span><strong>Huffman coding.</strong> By way of contrast, Netezza tested suffix compression and decided it wasn&#8217;t beneficial enough to bother messing with.</p>
<ul>
<li>The idea behind prefix compression 	is that if two strings start with the same characters, for the 	second one you only have to record the part that&#8217;s different. Prefix 	compression has a lot of the same merits as delta compression; like 	delta compression, it works best on sorted columns. (An example of 	where prefix compression makes obvious sense is URLs, which tend to 	all start in similar ways.)</li>
<li>In Netezza&#8217;s version of Huffman 	coding, the alphabet is encoded symbol-by-symbol, with more common 	characters getting codes of shorter length. These codes are chosen 	on a column-by-column basis. (I presume the “/” character gets 	shorter code in a URL column than it would, for example, in one that 	stored addresses.)</li>
</ul>
<p style="margin-bottom: 0in;">While I didn&#8217;t ask explicitly, it seems pretty obvious that Compress Engine 2&#8242;s functionality is a strict superset of Compress Engine 1&#8242;s. <a href="http://www.dbms2.com/2010/06/21/netezza-silicon-balance/">Netezza is going to run Compress Engines 1 and 2 side by side</a>, but expects pages to move from Compress Engine 1&#8242;s purview to Compress Engine 2&#8242;s as part of the new “table grooming” process.</p>
<p><em><strong>Related links</strong></em></p>
<ul>
<li>IBM kindly permitted me to post some of <a href="http://www.monash.com/uploads/ibm-db2-compression-june-2010.pdf">its slides in the area of compression</a></li>
<li><a href="http://msdn.microsoft.com/en-us/library/cc280464.aspx">Microsoft SQL Server seems to rely on prefix and dictionary compression</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/21/netezza-ibm-db2-compression/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Notes on SciDB and scientific data management</title>
		<link>http://www.dbms2.com/2010/05/22/scidb-and-scientific-database-management/</link>
		<comments>http://www.dbms2.com/2010/05/22/scidb-and-scientific-database-management/#comments</comments>
		<pubDate>Sat, 22 May 2010 08:04:24 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[GIS and geospatial]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[SciDB]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[Web analytics]]></category>
		<category><![CDATA[eBay]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2178</guid>
		<description><![CDATA[I firmly believe that, as a community, we should look for ways to support scientific data management and related analytics. That&#8217;s why, for example, I went to XLDB3 in Lyon, France at my own expense. Eight months ago, I wrote about issues in scientific data management. Here&#8217;s some of what has transpired since then. The [...]]]></description>
			<content:encoded><![CDATA[<p>I firmly believe that, as a community, we should look for ways to support scientific data management and related analytics. That&#8217;s why, for example, I went to XLDB3 in Lyon, France at my own expense. Eight months ago, I wrote about <a href="http://www.dbms2.com/2009/10/03/issues-in-scientific-data-management/">issues in scientific data management</a>. Here&#8217;s some of what has transpired since then.</p>
<p>The main new activity I know of has been in the open source <a href="http://www.scidb.org/">SciDB</a> project.   <span id="more-2178"></span></p>
<ul>
<li>A company called Zetics has been started to commercialize SciDB. As of now, the entire staff seems to be CEO Marilyn Matz, techie Paul Brown, and part of Mike Stonebraker. Marilyn says Zetics has some venture capital, but even under NDA didn&#8217;t tell me who it was from. Zetics does not have its own web site.</li>
<li>Marilyn tells me there are 20-25 contributors to SciDB, led by Paul Brown and Mike Stonebraker. Brown is full-time. Persistent Systems has been donating the efforts of a few of its employees. Some <a href="http://www.lsst.org/lsst">LSST</a> folks have been doing SciDB work backed by grant money. Most or all of the rest seem to be purer volunteers. Some Russians have been particularly active.</li>
<li>Release 0.5 of SciDB is expected in June. Release 1.0 is expected in September. This is a rewrite; prior demo code has been scrapped. Perhaps not coincidentally, it&#8217;s also a small slip from prior project plans.</li>
<li>The array data model is an example of what&#8217;s being implemented first. (Duh &#8212; you can&#8217;t have a DBMS without a data model.) Support for uncertainty is an example of what&#8217;s been deferred until later.</li>
<li>As has been clear since XLDB3 last August, one major target market for SciDB is genomic research.</li>
<li>It&#8217;s obvious that the oil and gas industry, with all its geospatial data, should be interested in SciDB. But there&#8217;s not much activity in that regard; outreach is evidently needed. If you can think of somebody in that sector (or anywhere else) who should be alerted to SciDB, please ping them.</li>
<li>Interest from web analytics users in SciDB seems to have receded a bit from the days when eBay almost funded the project.</li>
</ul>
<p>In other scientific data management news,</p>
<ul>
<li>Microsoft put out a book called <a href="http://research.microsoft.com/en-us/collaboration/fourthparadigm/">The Fourth Paradigm</a> on scientific database management. The whole thing can be downloaded, very officially, as a giant PDF. I think it&#8217;s worth skimming. I don&#8217;t think it&#8217;s worth actually reading. (I did read it.)</li>
<li><a href="http://www-conf.slac.stanford.edu/xldb/">XLDB4</a> will be at Stanford October 5-7. Unlike prior XLDBs, it will have an open (i.e., no invitation required) part.</li>
</ul>
<p>Finally, you surely are aware of the whole &#8220;Climategate&#8221; mess, in which major climate researchers&#8217; email was hacked and many unkind conclusions were drawn. Well, one of the most technical parts of the disclosure was in a long series of Read Me files, in which an unfortunate programmer lamented about <a href="http://di2.nu/foia/HARRY_READ_ME-20.html">the difficulty of reconstructing published results from files at hand</a>. These turned out to illustrate a classic problem that SciDB or alternatives are meant to solve:</p>
<ul>
<li>Raw data was impossible to use without various adjustments to regularize it (the word &#8220;regridding&#8221; comes up a lot, for example). Massaging was needed before analytics could be done on it.</li>
<li>The raw data was thrown out or lost, and could not be reconstructed (why they couldn&#8217;t have asked the suppliers of the data to give it to them again was unclear in this case, since it wasn&#8217;t original experimental data).</li>
<li>It was thus impossible to massage the data in any new or improved way.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/22/scidb-and-scientific-database-management/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Intelligent Enterprise’s Editors’/Editor’s Choice list for 2010</title>
		<link>http://www.dbms2.com/2010/02/11/intelligent-enterprise-editors-choice-201/</link>
		<comments>http://www.dbms2.com/2010/02/11/intelligent-enterprise-editors-choice-201/#comments</comments>
		<pubDate>Thu, 11 Feb 2010 23:13:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Ingres]]></category>
		<category><![CDATA[Intersystems and Cache']]></category>
		<category><![CDATA[Jaspersoft]]></category>
		<category><![CDATA[Kalido]]></category>
		<category><![CDATA[MarkLogic]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Pentaho]]></category>
		<category><![CDATA[QlikTech and QlikView]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Tableau Software]]></category>
		<category><![CDATA[Talend]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1578</guid>
		<description><![CDATA[As he has before, Intelligent Enterprise Editor Doug Henschen Personally selected annual lists of 12 &#8220;Most influential&#8221; companies and 36 &#8220;Companies to watch&#8221; in analytics- and database-related sectors. Made it clear that these are his personal selections. Nonetheless has called it an Editors&#8217; Choice list, rather than Editor&#8217;s Choice. (Actually, he&#8217;s really called it an [...]]]></description>
			<content:encoded><![CDATA[<p>As he has <a href="http://www.dbms2.com/2009/01/12/intelligent-enterprises-editorseditors-choice-list/">before</a>, <em>Intelligent Enterprise</em> Editor Doug Henschen</p>
<ul>
<li>Personally selected <a href="http://intelligent-enterprise.informationweek.com/showArticle.jhtml;jsessionid=IANLOXCT2244BQE1GHPCKH4ATMY32JVN?articleID=222900034&amp;pgno=1">annual lists</a> of 12 &#8220;Most influential&#8221; companies and 36 &#8220;Companies to watch&#8221; in analytics- and database-related sectors.</li>
<li>Made it clear that these are his personal selections.</li>
<li>Nonetheless has called it an Editors&#8217; Choice list, rather than Editor&#8217;s Choice. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
</ul>
<p>(Actually, he&#8217;s really called it an &#8220;award.&#8221;)</p>
<p><span id="more-1578"></span>People advising Doug &#8212; who come to think of it actually are Contributing Editors to <em>Intelligent Enterprise</em> or something like that &#8212; included Cindi Howson, Seth Grimes, three others, and me.</p>
<p>And if past is prologue, I will now get a flood of PR emails calling my attention to this award that I already have both participated in and blogged about. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>As usual, the sense:nonsense ratio on these lists was pleasingly high. Analytic DBMS vendors cited included IBM, Microsoft, Netezza, Oracle, Sybase, and Teradata in the &#8220;Most influential&#8221; group, with Aster, Greenplum, HP, Infobright, and Vertica among the &#8220;To watch&#8221; crowd. It&#8217;s tough to argue with those selections, whose most questionable element is probably the not-ridiculous supposition that HP could do something interesting over the coming year. Cloudera and Intersystems also made the list, deservedly.</p>
<p>All three of QlikTech, Tableau, and TIBCO made the list, which is appropriate given the potential for and interest in interactive data exploration technology.  The BI majors, independent or otherwise, were all on as well. In text mining, Doug included Attensity and Clarabridge, which I think is exactly right. (Plus OpenCalais.)  Upon reflection, I probably should have nominated Mark Logic, even though most of its business is non-enterprise; but hey, nobody&#8217;s perfect, and the same goes for lists. Open source was well represented, with Apache, Actuate, Jaspersoft, Eclipse, Infobright, Nuxeo and R all being cited (but not Ingres or Pentaho). Kalido made the list, with my endorsement, their silly I-CASE like marketing messaging notwithstanding.</p>
<p>Speaking of imperfections &#8212; there only are a few category names, and so category assignments can be pretty bizarre. (In an ideal world, middleware wouldn&#8217;t be included under &#8220;enterprise applications&#8221;.) Greenplum hasn&#8217;t really &#8220;extended&#8221; its DBMS with a &#8220;cloud&#8221; option. As much as I&#8217;d like Netezza to be more influential than SAP, that&#8217;s probably not the best way to rank them. And there are a number of &#8220;This company is on a roll!&#8221; kinds of comments that I wouldn&#8217;t necessarily endorse.</p>
<p>But those are all nitpicks. On the whole, it&#8217;s another nice job.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/11/intelligent-enterprise-editors-choice-201/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Facts and rumors</title>
		<link>http://www.dbms2.com/2009/09/30/facts-and-rumors/</link>
		<comments>http://www.dbms2.com/2009/09/30/facts-and-rumors/#comments</comments>
		<pubDate>Wed, 30 Sep 2009 06:21:44 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[DATAllegro]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Telecommunications]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=963</guid>
		<description><![CDATA[Vertica is putting out a press release today touting its 100th customer, and talking of triple digit growth last year. Multiple sources have told me that the DATAllegro system is being thrown out of Dell, so evidently Dell is telling this to one and all. If that goes through, this would presumably leave TEOCO as [...]]]></description>
			<content:encoded><![CDATA[<ul>
<li>Vertica is putting out a press 	release today touting its 100th customer, and talking of triple 	digit growth last year.</li>
<li>Multiple sources have told me that 	the DATAllegro system is being thrown out of <a href="http://www.dbms2.com/2009/03/02/closing-the-book-on-the-datallegro-customer-base/">Dell</a>, so evidently Dell is telling this to one and all. If that goes 	through, this would presumably leave <a href="http://www.dbms2.com/2008/05/23/data-warehouse-appliance-power-user-teoco/">TEOCO</a> as DATAllegro&#8217;s single happy 	customer. (I haven&#8217;t checked with Microsoft for its view.)</li>
<li>A rumor has it that Infiniband 	technology vendor Voltaire, Ltd. privately claims triple-digit sales 	of switches for Exadata 1 (I think that one would be one switch per Exadata installation, not per rack). Based just on a quick glance, this is far from confirmed by 	Voltaire&#8217;s earnings <a href="http://seekingalpha.com/article/135775-voltaire-ltd-q1-2009-earnings-call-transcript">conference 	call</a> <a href="http://seekingalpha.com/article/152278-voltaire-q2-2009-earnings-transcript">transcripts</a> or <a href="http://sec.gov/cgi-bin/browse-edgar?action=getcompany&amp;CIK=0001401678&amp;owner=exclude&amp;count=40">SEC 	filings</a>. However, the most recent transcript does seem to 	indicate Voltaire got multiple Exadata deals in the 	telecommunications sector, and suggests some Exadata penetration in 	other sectors as well.</li>
<li>I was told of a 	classified-agency user that has &gt;1 petabyte of data on Exadata 1 	and 600 terabytes or so on Netezza. My not-obviously-biased source says 	the agency is distinctly happier with Netezza than Exadata.</li>
<li>Like <a href="http://paraccel.com/data_warehouse_blog/?p=104">ParAccel</a>, 	<a href="http://www.theregister.co.uk/2009/09/29/tpc_slaps_oracle/">Oracle 	just got dinged for TPC-related misbehavior</a>.</li>
<li>Rumor has it that Sun has no 	intention of helping ParAccel rerun its withdrawn TPC-H benchmark.</li>
<li>ParAccel has withdrawn the claim 	from its home page to be the &#8220;CERTIFIED&#8221; price-performance 	leader. This seems to confirm that the claim was a reference to the 	TPC-H. In my opinion, that was a gross misrepresentation of what the 	TPC-H shows.</li>
</ul>
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/09/30/facts-and-rumors/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Xkoto Gridscale highlights</title>
		<link>http://www.dbms2.com/2009/09/11/xkoto-gridscale-highlights/</link>
		<comments>http://www.dbms2.com/2009/09/11/xkoto-gridscale-highlights/#comments</comments>
		<pubDate>Fri, 11 Sep 2009 18:36:03 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Xkoto]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=881</guid>
		<description><![CDATA[I talked yesterday with cofounders Albert Lee and Ariff Kassam of Xkoto. Highlights included: Xkoto sells Gridscale, a clustering server for DB2 and, more recently, MS SQL Server. Xkoto Gridscale runs on a separate box, between the application and the database servers. This box is typically smaller and cheaper than the database server boxes. Xkoto [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I talked yesterday with cofounders Albert Lee and Ariff Kassam of Xkoto. Highlights included:<span id="more-881"></span></p>
<ul>
<li>Xkoto sells Gridscale, a 	clustering server for DB2 and, more recently, MS SQL Server.</li>
<li>Xkoto Gridscale runs on a separate 	box, between the application and the database servers. This box is 	typically smaller and cheaper than the database server boxes.</li>
<li>Xkoto most typically sells 	Gridscale into environments where there already are three database 	servers &#8212; one to do work, one for hot standby, and one for remote 	disaster recovery.</li>
<li>In such environments, Gridscale&#8217;s 	big benefit is that you can distribute the query workload among all 	three servers. Xkoto believes this big performance increase is the 	reason customers don&#8217;t get much past 3 database servers under Xkoto 	(they didn&#8217;t seem quite sure as to whether the all-time record was 4 	or 5).  Note that even if a remote server is a little too far away 	for OLTP query response, it can work fine for reporting.</li>
<li>Of course, if you don&#8217;t already 	have high/&#8221;continuous&#8221; availability and/or disaster 	recovery, then Xkoto would say those are core benefits of Gridscale 	as well.</li>
<li>Gridscale sends transactions (or 	just SQL statements?) to all servers in the cluster. Once any of 	them responds affirmatively, that update is reflected in queries. 	Gridscale maintains a small query log to make sure it gets the other 	database copies in sync. It also tries to make sure that queries 	always go to the most current copy of the database. (I didn&#8217;t ask 	what happens if Server A executes Transaction T but not U, while 	Server B executes Transaction U and not T &#8212; but that does seem like 	something of an edge case.).</li>
<li>Xkoto spun out of <a href="http://www.halcyoninc.com/">Halcyon 	Monitoring</a> in 2006, starting with DB2 support. Microsoft SQL 	Server support was introduced in 2008.</li>
<li>Xkoto likes its partnerships with 	IBM and Microsoft. For example, IBM provides Level 1 and 2 support 	for Gridscale itself. Due in large part to this partnership 	strategy, Xkoto says it has no plans to support DBMS beyond DB2 and 	SQL Server.</li>
<li>Instead, Xkoto is pursuing 	partnerships with large application vendors and so on. (The figure 	&#8220;about 10&#8243; was mentioned.) I gather the idea is to make 	sure that neither the application support folks nor the app itself 	freak out from the fact that the app isn&#8217;t exactly talking to the 	DBMS any more.</li>
<li>Xkoto has done lab tests 	suggesting Gridscale offers near-linear scalability (in terms of SQL 	Server database throughput) on a query-only workload up to 10 	servers.</li>
<li>I gather that Xkoto and IBM have 	demos suggesting it&#8217;s a fine idea to have your disaster recovery 	server be in the Amazon cloud, but they haven&#8217;t yet made any sales 	based on that &#8212; er, based on that <em>premise.</em> <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </li>
<li>Gridscale pricing is measured in 	the same metrics as DB2 or SQL Server pricing, and in each case is 	around 1/3 what database pricing would be on the same box (I&#8217;m 	guessing that&#8217;s for enterprise additions without add-ons, but I 	didn&#8217;t probe). Specifically, Gridscale charges $12K per 100 PVUs for 	the DB2 edition, and $12K per socket for running with Microsoft SQL 	Server.</li>
<li>Gridscale typically runs on 	smaller boxes than the databases it talks to.</li>
<li>Xkoto has about 35 	revenue-recognized customers. Most are on DB2, the first environment 	Gridscale supported.</li>
<li>Average Gridscale selling prices 	are $180K on DB2, and $40-50K in the early going for SQL Server.</li>
<li>Xkoto has about 40 full-time 	employees, with engineering in Toronto and business operations in 	Waltham.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/09/11/xkoto-gridscale-highlights/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
	</channel>
</rss>

