<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Benchmarks and POCs</title>
	<atom:link href="http://www.dbms2.com/category/buying-processes/benchmarks-and-pocs/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 09 Feb 2012 09:21:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Exasol update</title>
		<link>http://www.dbms2.com/2011/11/12/exasol-update/</link>
		<comments>http://www.dbms2.com/2011/11/12/exasol-update/#comments</comments>
		<pubDate>Sun, 13 Nov 2011 02:37:13 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Exasol]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5661</guid>
		<description><![CDATA[I last wrote about Exasol in 2008. After talking with the team Friday, I&#8217;m fixing that now. The general theme was as you&#8217;d expect: Since last we talked, Exasol has added some new management, put some effort into sales and marketing, got some customers, kept enhancing the product and so on. Top-level points included: Exasol&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p><a href="../../../../../2008/08/16/exasol-technical-briefing/">I last wrote about Exasol in 2008</a>. After talking with the team Friday, I&#8217;m fixing that now. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  The general theme was as you&#8217;d expect: Since last we talked, Exasol has added some new management, put some effort into sales and marketing, got some customers, kept enhancing the product and so on.</p>
<p>Top-level points included:</p>
<ul>
<li>Exasol&#8217;s technical philosophy is substantially the same as before, albeit not with as extreme a focus on fitting everything in RAM.</li>
<li>Exasol believes its flagship DBMS EXASolution has great performance on a load-and-go basis.</li>
<li>Exasol has 25 EXASolution customers, all in Germany.*</li>
<li>5 of those are &#8220;cloud&#8221; customers, at hosting providers engaged by Exasol.</li>
<li>EXASolution database sizes now range from the low 100s of gigabytes up to 30 terabytes.</li>
<li>Pretty much the whole company is in Nuremberg.</li>
</ul>
<p><span id="more-5661"></span><em>*That excludes some money from Hitachi. Exasol&#8217;s Hitachi partnership is still in limbo, an apparent casualty of the world economic crisis.</em></p>
<p>On the technical side:</p>
<ul>
<li>As noted in my 2008 post, EXASolution is a columnar, no-head-node MPP (Massively Parallel Processing) DBMS.</li>
<li>The main way EXASolution compresses data is via dictionary/tokenization. 5:1 is a typical compression ratio before mirroring and so on, out of a 2-10:1 range.</li>
<li>EXASolution writes data to blocks in memory that are smaller than what is otherwise its preferred size (1/2 to 5 megabytes). These are sent to disk, where merge eventually happens. Exasol insists that write performance has always been fully satisfactory to customers to date.</li>
<li>EXASolution doesn&#8217;t have much in the way of performance tuning knobs. Exasol says they aren&#8217;t needed, and says that one really can start an EXASolution POC (Proof of Concept) in a day or so.</li>
<li>EXASolution doesn&#8217;t have much in the way of workload management capabilities, except what&#8217;s automagic (e.g., short query bias). However, it does collect statistics you can query via your favorite BI tool.</li>
<li>EXASolution doesn&#8217;t have much in the way of <a href="../../../../../2011/02/24/analytic-platforms/">analytic platform</a> capabilities, although there is some Lua-based scripting. However, there&#8217;s something NDA in the analytic platform area Coming Soon.*</li>
</ul>
<p>In general, the whole thing sounds somewhat like ParAccel, at least at a high level.</p>
<p><em>*Exasol is not and never has been our client, but we can keep secrets for them even so.</em></p>
<p>Naturally, Exasol believes EXASolution has fine concurrency, with at least one customer routinely running 2000 concurrent users, 200 concurrent sessions (via connection pooling), and 5-10 concurrent queries. Another customer has 3500 Cognos users. 1-200 concurrent queries appears to be the record peak load. Anyhow, Exasol says that plans to offer real workload management could be accelerated if a need were discovered.</p>
<p>Exasol says it almost never loses POCs, but admits that it competes fairly rarely against Vertica and ParAccel, no doubt for reasons of geography. Exasol boasts one visible Sybase IQ replacement (Sony Music).</p>
<p>While Exasol&#8217;s sales to date have been in Germany, there are plans to change that soon. At least one sales cycle is well underway in Eastern Europe. Offices in other Germanic countries are planned. Existing customers are planning to deploy additional copies outside Germany. Discussions are underway regarding other geographies, e.g. English-speaking ones.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/12/exasol-update/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Eight kinds of analytic database (Part 1)</title>
		<link>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/</link>
		<comments>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 08:17:44 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Buying processes]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MOLAP]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[QlikTech and QlikView]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Web analytics]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4868</guid>
		<description><![CDATA[Analytic data management technology has blossomed, leading to many questions along the lines of &#8220;So which products should I use for which category of problem?&#8221; The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for &#8220;big data&#8221; is little help. Let&#8217;s try eight categories instead. While no categorization [...]]]></description>
			<content:encoded><![CDATA[<p>Analytic data management technology has blossomed, leading to many questions along the lines of &#8220;So which products should I use for which category of problem?&#8221; The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for &#8220;big data&#8221; is little help.</p>
<p>Let&#8217;s try eight categories instead. While <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">no categorization is ever perfect</a>, these each have at least some degree of technical homogeneity. Figuring out which types of analytic database you have or need &#8212; and in most cases you&#8217;ll need several &#8212; is a great early step in your analytic technology planning.  <span id="more-4868"></span></p>
<p><strong><em>Enterprise data warehouse</em></strong> (Full or partial)</p>
<ul>
<li><em>Kinds of data likely to be included:</em> All, but especially operational</li>
<li><em>Likely use styles:</em> All</li>
<li><em>Canonical example:</em> Central EDW for a big enterprise</li>
<li><em>Stresses:</em> Concurrency, reliability, workload management</li>
</ul>
<p>The enterprise data warehouse (EDW) ideal says that you copy all your data into one place, and drive all decision-making from there. <a href="../../../../../2011/06/21/its-official-the-grand-central-edw-will-never-happen/">Full EDWs are pipedreams</a>. Still, a partial EDW makes sense for most large enterprises, and many indeed already have one. The first product lines to consider for classical EDWs are Teradata, DB2, Exadata, and maybe Microsoft SQL Server, especially if you&#8217;re going to stress concurrency and/or operational use cases.</p>
<p><strong><em>Traditional data mart</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All</li>
<li><em>Likely use styles:</em> Business intelligence, budgeting/consolidation, investigative</li>
<li><em>Examples:</em> Reporting servers, planning/consolidation servers, anything MOLAP, etc.</li>
<li><em>Stresses:</em> Performance, concurrency, TCO</li>
</ul>
<p>Whether or not you have something like an enterprise data warehouse, it&#8217;s common to have lighter-weight data marts as well. A traditional data mart might drive reports and dashboards. Or it might be specialized for budgeting, planning, and/or consolidation.  Some <a href="../../../../../2011/03/03/investigative-analytics/">investigative analytics</a> may be in the mix as well.</p>
<p>Any DBMS that can support an EDW can also support a data mart, but it may not be the most cost-effective way to do so. Columnar DBMS might have more attractive performance and TCO (Total Cost of Ownership); the same goes for Netezza. Some of them &#8212; e.g. Sybase IQ and <a href="../../../../../2011/06/20/vertica-release-5/">Vertica</a> &#8212; have excellent track records in concurrent usage as well. <a href="../../../../../2011/05/29/when-to-use-relational-database-management-system/">Ted Codd</a> pushed what amounts to MOLAP (Multidimensional OnLine Analytic Processing) systems for these use cases. But relational DBMS commonly do a better job, which is one reason most major MOLAP products have wound up at RDBMS companies.</p>
<p><strong><em>Investigative data mart &#8212; agile</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All, especially customer-centric</li>
<li><em>Likely use styles</em>: Investigative</li>
<li><em>Canonical example:</em> A few analysts getting a few TB to examine</li>
<li><em>Stresses:</em> Ease of setup/load, ease of admin, price/performance</li>
</ul>
<p>Besides the traditional data mart, there are at least two other kinds. Both are focused on investigative analytics, but they&#8217;re differentiated by database size.</p>
<p>If you have just a few analysts,* looking at no more than a few terabytes of data (perhaps even just some gigabytes) &#8212; and if that data is &#8220;single-subject&#8221; and fairly homogenous &#8212; your watchwords should be &#8220;cheap&#8221;, &#8220;easy&#8221;, and &#8220;fast&#8221;. You don&#8217;t need to invest in much hardware, in expensive software, in much administrative effort (the analysts can be their own DBAs),  nor should you endure much set-up time. Just grab a product, grab some data, and start running queries (or extracts into the statistical tool of your choice).</p>
<p><em>*If you have dozens or even hundreds of analysts hitting the same database, you&#8217;re probably back to the more concurrency-oriented scenarios outlined above.</em></p>
<p>Infobright is often cost-effective among columnar analytic DBMS. Other vendors might cut you a price break as well. If you have multiple terabytes of data, don&#8217;t rule out Netezza&#8217;s lowest-end products (even if they&#8217;d really rather sell you something bigger). Or, if you&#8217;re in the sub-terabyte range, maybe you can get by with an in-memory BI tool such as QlikView, and not do anything special on the DBMS side at all.</p>
<p><strong><em>Investigative data mart &#8212; big</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All, especially customer-centric, logs, financial trade, scientific</li>
<li><em>Likely use styles</em>: Investigative</li>
<li><em>Canonical example:</em> Single-subject 20 TB &#8211; 20 PB relational database<em></em></li>
<li><em>Stresses:</em> Performance, scale-out, analytic functionality</li>
</ul>
<p>But if you&#8217;re looking at tens of terabytes of relational data, or even more, you really do have a &#8220;big data&#8221; problem. Performance and scalability are major challenges, usually best addressed by MPP (Massively Parallel Processing) systems, such as Netezza, Vertica, Aster Data, ParAccel, Teradata, or Greenplum. Performance POCs (Proofs Of Concept) are a big part of the buying process. Vendor price negotiations are crucial too.</p>
<p><em>Actually, in the low tens of terabytes you might be able to get away with a shared-disk system that has excellent compression &#8212; e.g., columnar products like Sybase IQ, Infobright, or SAND, rather than just Vertica and ParAccel.</em></p>
<p>Assuming you have affordable, scalable query performance, the competitive differentiator can switch to additional analytic functionality. Aster, Netezza, ParAccel, Vertica, and Greenplum either offer full <a href="../../../../../2011/02/24/analytic-platforms/">analytic platforms</a>, or seem to be on the path to doing so. Teradata, which now owns Aster Data, offers substantial built-in analytic capability in its traditional products as well, and the same goes for Sybase IQ.</p>
<p><em>Continued in <a href="http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/">Part 2</a>,</em><em> where we cover some of the more difficult use cases.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The Vertica story (with soundbites!)</title>
		<link>http://www.dbms2.com/2011/06/20/vertica-release-5/</link>
		<comments>http://www.dbms2.com/2011/06/20/vertica-release-5/#comments</comments>
		<pubDate>Mon, 20 Jun 2011 06:14:56 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4777</guid>
		<description><![CDATA[I&#8217;ve blogged separately that: Vertica has a bunch of customers, including seven with 1 or more petabytes of data each. Vertica has progressed down the analytic platform path, with Monday&#8217;s release of Vertica 5.0. And of course you know: Vertica (the product) is columnar, MPP, and fast.* Vertica (the company) was recently acquired by HP.** [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve blogged separately that:</p>
<ul>
<li><a href="../../../../../2011/06/20/columnar-dbms-vendor-customer-metrics/">Vertica      has a bunch of customers</a>, including <strong>seven with 1 or more petabytes of      data each.</strong></li>
<li><a href="http://www.dbms2.com/2011/06/20/vertica-as-an-analytic-platform/">Vertica      has progressed down the analytic platform path</a>, with Monday&#8217;s release      of Vertica 5.0.</li>
</ul>
<p>And of course you know:</p>
<ul>
<li>Vertica (the product) is columnar, MPP, and fast.*</li>
<li>Vertica (the company) was recently acquired by HP.**</li>
</ul>
<p><span id="more-4777"></span><em>*Similar things seem true of ParAccel, but most of the other serious columnar analytic DBMS aren&#8217;t actually MPP (Massively Parallel Processing) yet. More precisely, they have  shared-everything architectures, especially on the storage level.</em></p>
<p><em>** Vertica says it has a &#8220;staggering&#8221; pipeline now that it&#8217;s been with HP for a few months.  I also gather that the post-merger HP/Vertica appliance product line formally rolled out last week.</em></p>
<p><em> </em></p>
<p>As for product maturity:</p>
<ul>
<li><a href="../../../../../2010/02/22/vertica-4/">Vertica 4.0</a> cleaned up a lot of stuff.</li>
<li>Vertica 5.0 goes further in a variety of areas, notably clustering administration and database tuning/design.</li>
</ul>
<p>But here&#8217;s something I hadn&#8217;t fully realized &#8212; <strong>Vertica claims concurrent usage as a competitive strength</strong>. By this I mean:</p>
<ul>
<li>Vertica says that it      has some customers with 1000s of users, in BI/dashboarding kinds of      applications.</li>
<li>Vertica asserts it can      support 1000 users on a single appliance rack.</li>
<li>Vertica tries to drive      POCs (Proofs Of Concept) towards testing concurrency.</li>
</ul>
<p>This is all consistent with <a href="../../../../../2010/04/16/story-of-an-analytic-dbms-evaluation/">a user example I blogged about last year</a>.</p>
<p>That said, while Vertica introduced respectable workload management features in Vertica 4.0, its main claim to concurrency is simply speed &#8212; if each query ends quickly, you never have to execute all that many of them at once.</p>
<p>Anyhow, there will (or at least should be) articles written about Vertica 5.0, and I may not be that easy to find for comment, what with <a href="../../../../../2011/06/19/investigative-analytics-derived-data/">Enzee Universe</a> and all. So here are a few <strong>Vertica soundbites:</strong></p>
<ul>
<li>Having seven petabyte-level commercial      users is an impressive testament to Vertica&#8217;s scalability. I think only      Teradata could best that number among analytic DBMS, unless you want to      count Hadoop/Hive.</li>
<li>Vertica&#8217;s analytic platform capabilities      are new, and initially not as rich as <a href="../../../../../2010/02/22/aster-data-ncluster-4-5/">Aster      Data&#8217;s</a> or <a href="../../../../../2011/04/17/netezza-twinfin-i-class-overview/">Netezza&#8217;s</a>,      especially in the area of language support. But they&#8217;re a good first step.</li>
<li>Judging by the examples of EMC/Greenplum      and IBM/Netezza, Vertica&#8217;s honeymoon period at HP is likely to last for a      while. <em>(Edit: That said, not all is peachy at <a href="http://www.dbms2.com/2011/04/16/unpacking-the-emc-greenplum-q1-sales-disaster-rumors/">EMC/Greenplum</a>.)</em></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/06/20/vertica-release-5/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Comments on the Gartner 2010/2011 Data Warehouse Database Management Systems Magic Quadrant</title>
		<link>http://www.dbms2.com/2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/</link>
		<comments>http://www.dbms2.com/2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/#comments</comments>
		<pubDate>Sat, 05 Feb 2011 15:49:39 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[1010data]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Ingres]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Workload management]]></category>
		<category><![CDATA[illuminate Solutions]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=3744</guid>
		<description><![CDATA[Edit: Comments on the February, 2012 Gartner Magic Quadrant for Data Warehouse Database Management Systems &#8212; and on the companies reviewed in it &#8212; are now up. The Gartner 2010 Data Warehouse Database Management Systems Magic Quadrant is out. I shall now comment, just as I did to varying degrees on the 2009, 2008, 2007, [...]]]></description>
			<content:encoded><![CDATA[<p><em>Edit: Comments on the February, 2012 <a href="http://www.dbms2.com/2012/02/08/gartner-magic-quadrant-data-warehouse-2011-2012/">Gartner Magic Quadrant for Data Warehouse Database Management Systems</a> &#8212; and on the companies reviewed in it &#8212; are now up.</em></p>
<p>The <a href="http://www.gartner.com/technology/media-products/reprints/teradata/vol3/article1/article1.html">Gartner 2010 Data Warehouse Database Management Systems Magic Quadrant</a> is out. I shall now comment, just as I did to varying degrees on the <a href="../../../../../2010/02/10/gartner-magic-quadrant-data-warehouse-2009-2010/">2009</a>, <a href="../../../../../2009/01/12/gartners-2008-data-warehouse-database-management-system-magic-quadrant-is-out/">2008</a>, <a href="../../../../../2007/10/19/gartner-2007-magic-quadrant-for-data-warehouse-database-management-systems/">2007</a>, and <a href="../../../../../2006/10/03/vendor-segmentation-for-data-warehouse-dbms/">2006</a> Gartner Data Warehouse Database Management System Magic Quadrants.</p>
<p><em>Note: Links to Gartner Magic Quadrants tend to be unstable. Please alert me if any problems arise; I&#8217;ll edit accordingly.</em></p>
<p>In <a href="../../../../../2009/01/12/gartners-2008-data-warehouse-database-management-system-magic-quadrant-is-out/">my comments on the 2008 Gartner Data Warehouse Database Management Systems Magic Quadrant</a>, I observed that <strong>Gartner&#8217;s &#8220;completeness of vision&#8221; scores were generally pretty reasonable,</strong> but their<strong> &#8220;ability to execute&#8221; rankings were somewhat bizarre;</strong> the same remains true this year. For example, Gartner ranks Ingres higher by that metric than Vertica, Aster Data, ParAccel, or Infobright. Yet each of those companies is growing nicely and delivering products that meet serious cutting-edge analytic DBMS needs, neither of which has been true of Ingres since about 1987.  <span id="more-3744"></span></p>
<p>The general list of &#8220;market forces, end-user expectations and vendors&#8217; resulting solution approaches&#8221; at the top of the 2010 Gartner Data Warehouse Database Management System Magic Quadrant article is a mixed bag. Following Gartner&#8217;s order, I&#8217;ll address those first, and particular companies cited afterwards. Specific items and comments include:</p>
<ul>
<li><strong>&#8220;Increased demand for optimization techniques and performance enhancement.</strong><strong>&#8220;</strong> Gartner seems to be saying that data warehouse DBMS buyers want lists of specific, esoteric performance features. Well, buyers always want their DBMS to run fast, and they&#8217;d like the products to be mature enough to have been through a few rounds of <a href="../../../../../2009/08/21/bottleneck-whack-a-mole/">Bottleneck Whack-A-Mole</a>, but otherwise I&#8217;m not sure I&#8217;d put that at the top of my list.</li>
<li><strong>&#8220;</strong><strong>The argument made by purchasing departments that buying power increases when dealing with a single, incumbent vendor.</strong><strong>&#8220;</strong><strong> </strong>I agree that <a href="../../../../../2011/02/02/exadata-notes/">vendor consolidation and account control</a> are a huge part of the Oracle, Microsoft, IBM and even Teradata stories. (Vertica can prove it&#8217;s 10X more price-performant than Oracle and still not get the business.) But it&#8217;s not just about price negotiations; once annual maintenance is included, one has to squint pretty hard to see Oracle as a low-cost alternative. Also important is reducing the number of total product-specific skill-sets needed on the IT staff.</li>
<li><strong>&#8220;</strong><strong>Prepackaged, prebalanced warehouse environments delivered using data warehouse appliances.</strong><strong>&#8220;</strong> Yep. To varying extents, Oracle, Microsoft, Teradata, and IBM are all committed to designed-hardware strategies.</li>
<li><strong>&#8220;</strong><strong>Expectations for the delivery of on-site POCs.</strong><strong>&#8220;</strong> Honestly, not as many buyers insist on on-site Proofs of Concept as should. Still, Oracle is shameful in its reluctance to do them. (Teradata tries to avoid them too, for obvious reasons of expense, but is much more gracious about capitulating when the buyer insists.)</li>
<li><strong>&#8220;</strong><strong>Cost controls and data warehouse performance management.</strong><strong>&#8220;</strong><strong> </strong>See next comment.</li>
<li><strong>&#8220;</strong><strong>Demands for delivering a fully mixed workload.</strong><strong>&#8220;</strong><strong> </strong>I&#8217;d have phrased the workload management and administrative tools points rather differently than this, but so be it.<strong> </strong></li>
<li><strong>&#8220;</strong><strong>Demands for departmental analytics delivered quickly via data marts.</strong><strong>&#8220;</strong><strong> </strong>Agreed. Data-mart-only installations are a huge part of the market of the analytic DBMS market. <a href="../../../../../2009/06/08/the-future-of-data-marts/">Data mart spin-out</a> is also important.</li>
<li><strong>&#8220;</strong><strong>Wider indexing and fast performance within clusters of data, delivered via column-based solutions.</strong><strong>&#8220;</strong> This bizarrely seems to conflate column stores and parallel processing (both of which are of course highly important).</li>
<li><strong>&#8220;</strong><strong>A wave of new data warehouse implementers seeking fast-track, low-risk delivery.</strong><strong>&#8220;</strong> Well, yes. Netezza noticed that quite some years ago. And by now the <a href="../../../../../2010/04/12/enterprise-data-warehouse-edw-myt/">long-gestation EDW (Enterprise Data Warehouse)</a> is widely disliked.</li>
<li><strong>&#8220;</strong><strong>Global organizations seeking distributed solutions as potential architecture.</strong><strong>&#8220;</strong> If this is the MPP point, it&#8217;s oddly phrased. If this is a suggestion that data warehouses should be partitioned across wide-area networks, it&#8217;s just plain odd. If it&#8217;s a reiteration that departments like to control their own data marts, I agree. And if it&#8217;s a comment on keep-data-in-the-country privacy laws, it could be the most prescient thing Donald Feinberg has said in many years.</li>
</ul>
<p>Long though it is, that list of general items and issues for the 2010 Gartner Data Warehouse Database Management System Magic Quadrant has some gaps. Most glaringly, I don&#8217;t see any references to <a href="../../../../../2011/01/24/analytic-computing-system/">advanced analytics</a> in general, or even to the specific case of <a href="../../../../../2010/05/15/further-clarifying-in-database-mpp-sas/">integrated predictive analytics</a>. There&#8217;s also nothing about solid-state memory or other storage-technology considerations, although in fairness it&#8217;s still early days for much of what vendors conceive of as competitive differentiation in those respects.</p>
<p>Here are some vendor-specific comments on the 2010 Gartner Data Warehouse Database Management System Magic Quadrant:</p>
<ul>
<li>It&#8217;s pretty bizarre to compare <strong>1010data</strong> to database.com or Microsoft Azure. Kognitio would be a better choice. So would cloud-hosted instances of Vertica, Aster Data nCluster, or others.</li>
<li>Gartner&#8217;s comments on <strong>Aster Data</strong> and nCluster are actually pretty reasonable.</li>
<li>Gartner&#8217;s comments on <strong>EMC/Greenplum</strong> are a bit Kool-Aid-drinky, and don&#8217;t account for the inevitable flailing that occurs right after an acquisition. But otherwise they&#8217;re pretty reasonable.</li>
<li>I don&#8217;t take <strong>IBM&#8217;s</strong> super-comprehensive-all-inclusive architectural stories as seriously as Gartner does.</li>
<li>I don&#8217;t take <strong>Netezza&#8217;s</strong> small stable of OEM partners as seriously as Gartner does. I also don&#8217;t share Gartner&#8217;s optimism for the continuation of Netezza&#8217;s NEC partnership in the face of IBM&#8217;s Netezza ownership.</li>
<li>I&#8217;m even more skeptical about <a href="../../../../../2008/03/27/the-illuminate-guys-have-a-cto-blog/">illuminate</a> than Gartner is.</li>
<li>I&#8217;m delighted that Gartner has adopted my phrase <a href="../../../../../2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a> <strong>(Infobright</strong> is one of several firms pushing that one).</li>
<li>&#8220;Only open-source column-store DBMS&#8221; is a bit exaggerated, but Infobright is indeed the only one with serious traction, or offered by a serious analytic DBMS vendor.</li>
<li>What Gartner said in connection with <strong>Ingres</strong> is too inaccurate to deserve detailed attention.</li>
<li>While Gartner&#8217;s write-up of <strong>Kognitio</strong> is a bit confused, that&#8217;s excusable. Kognitio&#8217;s strategy changes often.</li>
<li>I&#8217;m not persuaded by the claim of low <strong>Microsoft</strong> TCO. The days when Microsoft&#8217;s tools were vastly better than the competition&#8217;s are long gone. And using an OLTP DBMS for data warehousing generally takes more people effort than using something more purpose-built.</li>
<li>Gartner is right to ding <strong>Oracle</strong> for high prices, high people costs, and unwillingness to do onsite POCs.</li>
<li>Gartner is right that <strong>Exadata</strong> is a huge improvement over non-Exadata Oracle data warehousing.</li>
<li>Gartner is right to suggest that Exadata can easily handle data warehouses over 20 terabytes in size, but wrong to suggest that software-only Oracle also can. Just because the pain is less than it was with earlier releases of Oracle doesn&#8217;t mean it isn&#8217;t still bad.</li>
<li>Gartner&#8217;s comments on <strong>ParAccel</strong> are pretty reasonable.</li>
<li>Gartner&#8217;s comments on compression in connection with <strong>SAND</strong> make no technical sense (tokenization is a key form of columnar compression, not an alternative to it). Also, SAP&#8217;s acquisition of Sybase is a business challenge for SAND, not a technical one.</li>
<li>Unless I&#8217;m forgetting something, <strong>Sybase IQ</strong> has no more in-database data mining than any other Fuzzy Logix partner does.</li>
<li>Gartner failed to note that, like other DBMS dating back to the 1990s and before, Sybase IQ is more complex to administer than some newer products are.</li>
<li>Gartner&#8217;s take on <strong>Teradata </strong>is pretty reasonable.</li>
<li>Gartner&#8217;s take on <strong>Vertica, </strong>while sloppy, is basically sensible. However, Gartner failed to note that Vertica is a laggard in non-query analytics. (I am sure those deficiencies are being addressed, but Vertica&#8217;s competitors are moving ahead as well.)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/feed/</wfw:commentRss>
		<slash:comments>23</slash:comments>
		</item>
		<item>
		<title>Exadata notes</title>
		<link>http://www.dbms2.com/2011/02/02/exadata-notes/</link>
		<comments>http://www.dbms2.com/2011/02/02/exadata-notes/#comments</comments>
		<pubDate>Wed, 02 Feb 2011 07:05:53 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=3715</guid>
		<description><![CDATA[It&#8217;s been a while since I penetrated Oracle&#8217;s tight message control and actually talked with them about Exadata. But Doug Henschen wrote a good article about Exadata based on an Andy Mendelsohn webcast. I agree with almost all of it. At first I was a little surprised that Exadata&#8217;s emphasis shift from data warehousing to [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s been a while since I penetrated Oracle&#8217;s tight message control and actually talked with them about Exadata. But Doug Henschen wrote <a href="http://www.informationweek.com/news/software/bi/showArticle.jhtml?articleID=229100353">a good article about Exadata based on an Andy Mendelsohn webcast</a>. I agree with almost all of it. At first I was a little surprised that <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/">Exadata&#8217;s emphasis shift from data warehousing to OLTP/generic consolidation</a> hasn&#8217;t gone more quickly, but on the other hand:</p>
<ul>
<li>On the data warehouse side Exadata can alleviate screaming pain points.</li>
<li>In OLTP consolidation, Exadata mainly can save money. (Yes, I just said a product from Oracle can save customers money, and I meant it. You may stop laughing at any time.)</li>
</ul>
<p>Doug did overstate when he said that columnar architectures give 100X or more compression. That doesn&#8217;t happen. Yes, columnar compression can be &gt;10X in a variety of use cases, while pre-Exadata Oracle index bloat can approach 10X at times; but even if you&#8217;re counting that way I doubt there are many instances in which it actually multiplies out to &gt;100.</p>
<p>In other Exadata news, the long-standing observation that <a href="http://www.dbms2.com/2009/02/01/oracle-says-they-do-onsite-exadata-pocs-after-all/">Oracle doesn&#8217;t like to do on-site Exadata POCs</a> still holds true. A couple of existing Oracle users &#8212; one rather well-known &#8212; recently told me that Oracle won&#8217;t let them text Exadata except on Oracle premises. In one case, this is a deal-breaker keeping Exadata from being considered for a purchase, and Oracle still won&#8217;t budge.</p>
<p>Finally, I&#8217;m pretty sure that this &#8220;new&#8221; Softbank Teradata replacement Oracle has been touting since September as competitive evidence &#8212; which Doug&#8217;s article also references &#8212; isn&#8217;t quite what it sounds like. I believe Teradata&#8217;s version of the story, which somewhat edited goes like this:  <span id="more-3715"></span></p>
<blockquote>
<ul>
<li>The  Oracle Exadata decision at Softbank Mobile was  driven by business management in spite of <strong> Teradata being recommended by the technical team. </strong></li>
<li>To reiterate, the  data  warehouse project team recommended Teradata over Oracle.  The Teradata  proposal was well received in terms of TCO, performance, ease of use and  safety  of transition, etc. against Oracle Exadata.  However,<strong> the technical  team&#8217;s recommendation was overruled due to the business mandate to  standardize on Oracle throughout the company. </strong></li>
<li>SoftBank  Mobile has over 800  Oracle specialists in IT departments and Software subsidiaries.</li>
<li>The  Exadata performance is being compared to the existing production  system.   Teradata was NOT invited to benchmark a current generation  system.</li>
<li>Also, <strong>Softbank Mobile is a  reseller of Oracle.</strong></li>
</ul>
</blockquote>
<p>Teradata went on to clarify:</p>
<blockquote><p>Here are some  key points  regarding the Teradata systems at SoftBank:</p>
<ul>
<li>Two  Teradata systems:  Production #1 - 32 nodes.  Production #2  - 12 nodes.</li>
<li>Production #1 had nodes ranging from <strong>~3-7  years old.</strong></li>
<li>Production #2 had nodes that were <strong>~8 years  old.</strong></li>
<li>Teradata  V2R5 was <strong>end of life</strong> at the time of replacement.</li>
<li>We <strong>did  not get a chance to compete for this  business.</strong></li>
</ul>
</blockquote>
<p><strong>Bottom line: Oracle&#8217;s big competitive replacement of Teradata systems was against 3-8 year old boxes that the customer&#8217;s technical staff recommended be replaced by more Teradata gear.</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/02/02/exadata-notes/feed/</wfw:commentRss>
		<slash:comments>25</slash:comments>
		</item>
		<item>
		<title>Architectural options for analytic database management systems</title>
		<link>http://www.dbms2.com/2011/01/18/architectural-options-for-analytic-database-management-systems/</link>
		<comments>http://www.dbms2.com/2011/01/18/architectural-options-for-analytic-database-management-systems/#comments</comments>
		<pubDate>Tue, 18 Jan 2011 14:22:09 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data pipelining]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=3588</guid>
		<description><![CDATA[Mike Stonebraker recently kicked off some discussion about desirable architectural features of a columnar analytic DBMS. Let&#8217;s expand the conversation to cover desirable architectural characteristics of analytic DBMS in general.  But first, a few housekeeping notes: This is a very long post. Even so, to keep it somewhat manageable, I&#8217;ve cut corners on completeness. Most [...]]]></description>
			<content:encoded><![CDATA[<p>Mike Stonebraker recently kicked off some discussion about <a href="../../../../../2011/01/12/mike-stonebraker-on-real-column-stores/">desirable architectural features of a columnar analytic</a> DBMS. Let&#8217;s expand the conversation to cover desirable architectural characteristics of analytic DBMS in general.  <span id="more-3588"></span>But first, a few housekeeping notes:</p>
<ul>
<li>This is a very long post.</li>
<li>Even so, to keep it somewhat manageable, I&#8217;ve cut corners on completeness. Most notably, two important areas are entirely deferred to future posts &#8212; advanced-analytics-specific architecture, and in-memory processing (including CEP).</li>
<li>The subjects here are not strictly parallel. The distinction between major add-on modules and &#8220;turtles all the way down&#8221; core architectural choices is rarely crystal-clear &#8212; Mike Stonebraker&#8217;s recent post notwithstanding <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  &#8212; and I&#8217;ve mixed subjects of varying degrees of &#8220;fundamentalness&#8221; pretty freely.</li>
<li>There&#8217;s a long list of links at the end, pointing at posts that help explain or give examples of specific features named in the body of the text, somewhat like unnumbered footnotes.</li>
</ul>
<p>OK. In my opinion, the four drop-dead requirements for an analytic DBMS are:</p>
<ul>
<li><strong>Relational/SQL support.</strong> That&#8217;s how you get great flexibility in more or less easily constructing queries, as well as connectivity to a vast number of tools. In a few cases, I guess <strong>MDX</strong> might suffice as an alternative.</li>
<li>Sufficiently <strong>great query performance,</strong> on the queries you&#8217;re actually going to run, for however many concurrent users you actually will have.</li>
<li>Sufficiently high <strong>data loading throughput</strong> and sufficiently low <strong>data loading latency.</strong></li>
<li>Sufficiently favorable <strong>TCO </strong>(Total Cost of Ownership), all things considered, where &#8220;all things&#8221; at a minimum includes software license, software maintenance, hardware, power, people costs for administration, and people costs for development.</li>
</ul>
<p>Depending on your use case, you might have additional make-or-break requirements. Possible areas include:</p>
<ul>
<li>Additional <strong>query functionality,</strong> of course with good performance. Specific examples include:
<ul>
<li>ANSI-standard SQL features that are not universally supported (e.g. windowing).</li>
<li>Geospatial datatype support.</li>
</ul>
</li>
<li>Further high-performing <strong>integrated analytics</strong>, such as:
<ul>
<li>Data mining/machine learning modeling and scoring.</li>
<li>Other mathematical functions, such as linear algebra, optimization, or Monte Carlo simulation.</li>
<li>Extensibility via MapReduce and/or sufficiently robust user-defined function (UDF) capabilities.</li>
</ul>
</li>
<li>Platform support that matches your needs.</li>
<li>Security, auditability, and/or high-performance encryption.</li>
</ul>
<p>Other possibly important features &#8212; but ones that would usually go on &#8220;nice to have&#8221; rather than &#8220;must have&#8221; lists &#8212; include:</p>
<ul>
<li>Yet more <strong>query functionality,</strong> in areas such as:
<ul>
<li>Non-standard SQL extensions (e.g. temporal ones)</li>
<li>Specific prepackaged UDFs.</li>
<li>Cross-column text search.</li>
</ul>
</li>
<li>Nice <strong>administrative tools,</strong> in areas such as:
<ul>
<li>Single-query performance/optimization.</li>
<li>Authorization/permission.</li>
<li>Workload management.</li>
<li>Data mart spin-out.</li>
</ul>
</li>
</ul>
<p>So what kinds of architectural choices (or major features) should one look to to support such features? On the performance side there are many candidates, including:</p>
<ul>
<li><strong>Specialized indexes</strong>, more commonly found in older DBMS. Leading examples include star and especially bitmap indices, both of which I was already writing about back in the 1990s. Ditto <strong>materialized views</strong>, which aren&#8217;t exactly indices, but are closely related.</li>
<li><strong>Partition elimination.</strong> Single- or multi-level range partitioning can cause whole regions of the database never to be checked in a particular query&#8217;s evaluation. (That&#8217;s a good thing.) The functionality popularized by Netezza as <strong>zone maps </strong>does something similar, without requiring the partitions to be chosen in advance.</li>
<li><strong>Scan-friendliness.</strong> If a query runs a long time, it may include a lot of (full or partial) table scanning. Assuming you rely on spinning disk &#8212; as opposed to solid-state memory &#8212; one way to improve your sequential-scan throughput far above your random-read throughput is to support <a href="../../../../../2006/09/20/teradata-netezza-datallegro-appliance/">large block sizes. </a></li>
<li><strong>Parallelism</strong>. It&#8217;s possible to screw up even multi-core parallelism, but the big issue is multi-server. In particular:
<ul>
<li>An analytic DBMS must <strong>avoid a &#8220;fat head&#8221; bottleneck,</strong> either because there is no head node at all directing things, or else because data redistribution algorithms are sufficiently mature as to not overload it. (In naive parallel DBMS implementations, intermediate query results get sent back to the head node to be, for example, JOINed together. This is not a good thing.)</li>
<li>Multiple analytic DBMS vendors have chosen to develop <strong>custom data transfer protocols,</strong> for more reliable performance than they can get from TCP/IP. Examples include Teradata, Netezza, and ParAccel.</li>
</ul>
</li>
<li><strong>Predicate pushdown. </strong>Predicate pushdown takes several forms, in all cases having the goal of executing certain simpler database operations &#8212; predicate evaluations &#8212; close to the data, thus minimizing I/O or upstream processing.<strong></strong>
<ul>
<li>Netezza famously offloads the first part of predicate evaluation to FPGA (Field-Programmable Gate Array) chips.</li>
<li>At least in theory, I like <a href="../../../../../2008/09/28/exadata-oracle-database-machine-parallelization/">the Exadata form of node specialization</a>, in which a tier of server nodes does the first part of the processing, with the results being sent to a second upstream database tier. But it&#8217;s not obvious that any RDBMS vendor has done a great job with it. Oracle is famously secretive about Exadata&#8217;s track record, and as of this writing apparently still resists on-site benchmarks. <a href="../../../../../2008/09/05/mpp-data-warehouse-nodes/">Calpont</a> hasn&#8217;t accomplished much. And <a href="../../../../../2010/11/29/marklogic-and-its-document-dbms/">MarkLogic</a> of course doesn&#8217;t sell an RDBMS.</li>
<li>There&#8217;s reason to think predicate pushdown would help exploit flash memory, although I&#8217;m not sure vendors are moving in a direction that will let us find out.</li>
</ul>
</li>
<li><strong>Columnar</strong> data storage. Columnar storage is pretty much the ultimate in predicate pushdown, and advantageous in many analytic query scenarios. (Main exception: When you&#8217;re bringing back the majority of a row anyway, you might as well fetch the thing pre-assembled.) As Mike Stonebraker points out, <a href="../../../../../2011/01/12/mike-stonebraker-on-real-column-stores/">columnar storage should not incur serious row-ID overhead</a>, and ideally should be available for multiple sort orders on each column.</li>
<li><strong>Compression.</strong> This, rightly, is another of Mike Stonebraker&#8217;s favorite features. Database compression is hugely important, for I/O and in silicon alike. (And it can also save money on storage.) There are a broad variety of compression techniques, suited for different kinds of data, different kinds of queries, or different points on the storage saving/decompression performance tradeoff spectrum.<strong></strong></li>
<li><strong>Flexible storage.</strong> Not all data is best stored the same way, even if it&#8217;s in the same database. Some is destined for columnar-friendly use cases, other for whole row. Some is compressed ideally by one technique, some by another. And so on. Some database managers do a good job of letting different parts of the database (even within the same table) be stored in different ways. <strong></strong></li>
<li><strong>Query pipelining. </strong>There are a lot of steps to query execution, in both the fine-grained sense (a whole lot of rows) and the coarse-grained (all but the simplest execution plans feature a number of operations each). FPGA-based vendors XtremeData and Kickfire used the innate parallelism of an FPGA to pipeline query execution. Kickfire failed, and XtremeData hasn&#8217;t sold many systems, but that doesn&#8217;t mean it isn&#8217;t a good idea. <a href="../../../../../2010/08/12/teradata-future-product-strategy/">Kickfire&#8217;s assets were sold to Teradata</a>. Meanwhile, VectorWise&#8217;s very name speaks to its (Intel-based) vector processing architecture.</li>
<li><strong>Result set reuse.</strong> Instead of mixing together different steps of the same query, how about mixing together the same step in different queries, so that you don&#8217;t have to repeat it? As a simple example, suppose two queries need to do the same table scan. Well then, it would be nice to only do the scan once. In most cases, query workloads are too diverse for result set reuse of that kind to be very important; still, it&#8217;s a cool feature, which Teradata calls <a href="../../../../../2006/09/20/teradata-netezza-datallegro-appliance/">synchronized scan</a>.</li>
<li><strong>Suitably optimized execution engine </strong>&#8211; column, row, whatever. (This is Mike Stonebraker&#8217;s &#8220;inner loop&#8221; point generalized.)<strong></strong></li>
<li>Well-factored<strong> query optimizer. </strong>No matter what, it&#8217;s good for a query optimizer to have been through a few rounds of <a href="../../../../../2009/08/21/bottleneck-whack-a-mole/">Bottleneck Whack-A-Mole</a>. Beyond that, an optimizer with sufficiently convenient hooks can have cool and occasionally valuable features such as:
<ul>
<li><strong>On-the-fly query re-planning. </strong>Do part of the query, rerun column statistics, and re-plan the query if appropriate.<strong></strong></li>
<li><strong>Not-so-black-box optimization. </strong>Work interactively with the DBA to find the best query plan.<strong></strong></li>
<li><strong>Query rewriting.</strong> Any decent optimizer will take a complex query and produce an execution plan that in some cases looks quite unlike the original query. Some optimizers go further in rewriting the query first, essentially to psych themselves into coming up with a better plan.<strong></strong></li>
</ul>
</li>
</ul>
<p>You can&#8217;t do much with an analytic database unless you get data into it in the first place. Thus, performance in writing and loading data are important, and there are a number of architectural decisions that can be helpful in those regards.</p>
<ul>
<li><strong>Row-based architecture.</strong> Column stores have obvious advantages for query, but in a naive column store implementation you have tremendous overhead, pulling the rows apart and storing them in many different columns. This is particularly the case for small, frequent updates.</li>
<li><strong>Batched writes. </strong>The classic way to deal with column stores&#8217; data writing challenges is to batch data in memory, then bang it to disk only occasionally. Hopefully the data is available seamlessly for query in RAM before the disk-banging occurs. This technique is by no means restricted to analytic and/or columnar use cases, but the single best-known example may be Vertica&#8217;s Read-Optimized Store (disk)/Write-Optimized Store (RAM) pairing.</li>
<li><strong>Lack of indices and materialized views</strong>. Indexes and materialized views can help query speed, albeit at the cost of disk space and administrative effort. But maintaining them multiplies the difficulty of loading data in the first place.</li>
<li><strong>Lockless or optimistic-locking concurrency model.</strong> Locking models suitable for OLTP  can be ridiculous for analytic databases, blocking queries for no good reason. Fortunately, there are alternatives.</li>
<li><strong>Append-only updating. </strong>When I/O volumes are high, append-only updating can give an important performance improvement over update-in-place, assuming you have sufficiently good algorithms for garbage-collection/clean-up. If I/O volumes are so low that you don&#8217;t care about the performance benefits, maybe it would be nice to have the &#8220;time-travel&#8221; feature that&#8217;s a potential byproduct of MVCC (Multi-Version Concurrency Control). Neither part of this observation applies solely to analytic DBMS.</li>
<li><strong>Parallel load (no fat head). </strong>It&#8217;s not just query execution that can get bottlenecked at a &#8220;head node;&#8221; the same can happen with loads, batch or otherwise. That&#8217;s not a good thing. Thus, various parallel analytic DBMS vendors have set up ways to load data directly to the nodes where it&#8217;s going to be stored.<strong></strong></li>
<li><strong>Specialized load nodes</strong>. <a href="../../../../../2008/10/22/aster-data-systems-ncluster/">Aster Data nCluster features specialized data loading nodes</a>, although Aster has introduced a more conventional kind of parallel load as well.</li>
</ul>
<p>And of course, all of the above need to be implemented in the context of well-configured combinations of hardware, networking, and software.</p>
<p>Topics I know I&#8217;ve left out include advanced-analytics functionality, and in-memory processing (CEP or otherwise). Also missing are specifics of compression algorithms &#8212; or indeed of anything else. I&#8217;m sure there&#8217;s much else missing besides, so please point out the most glaring omissions in the comment thread below. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p><strong><em>Related links:</em></strong></p>
<ul>
<li><a href="../../../../../2010/05/15/further-clarifying-in-database-mpp-sas/">Why even in-database scoring can be important</a> (May, 2010).</li>
<li><a href="../../../../../2009/10/18/three-big-myths-about-mapreduce/">Three big myths about MapReduce</a> (October, 2009).</li>
<li><a href="../../../../../2008/08/26/why-mapreduce-matters-to-sql-data-warehousing/">Why you might ever want to integrate MapReduce into your DBMS</a> (August, 2008).</li>
<li><a href="../../../../../2009/06/08/the-future-of-data-marts/">The future of data marts</a>, specifically data mart spin-out. (June, 2009).</li>
<li>Netezza offers both zone maps and <a href="../../../../../2010/06/21/netezza-database-software-technology-overview/">clustered base tables</a> (June, 2010).</li>
<li>Oracle Exadata <a href="../../../../../2010/01/22/oracle-database-hardware-strategy/">Storage Indexes</a> are like Netezza zone maps (January, 2010).</li>
<li><a href="../../../../../2009/08/08/netezza-fpga/">How Netezza uses the FPGA</a> (August, 2010).</li>
<li><a href="../../../../../2009/02/01/oracle-says-they-do-onsite-exadata-pocs-after-all/">Oracle is reluctant to do on-site Exadata POCs</a> (February, 2009). As of the end of 2010, that doesn&#8217;t seem to have changed.</li>
<li><a href="../../../../../2010/06/21/netezza-ibm-db2-compression/">The Netezza and IBM DB2 approaches to compression</a> (June, 2010, which is before IBM acquired Netezza).</li>
<li><a href="../../../../../2009/05/14/the-secret-sauce-to-clearpaces-compression/">The secret sauce to Rainstor&#8217;s extreme compression</a> (May, 2009, when Rainstor was still called Clearpace).</li>
<li><a href="../../../../../2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/">The row-based/columnar distinction gets blurred</a>, e.g. by Vertica FlexStore (August, 2009).</li>
<li>And by <a href="../../../../../2009/10/14/greenplum-hybrid-columnar/">Greenplum</a> (October, 2009). Also contains the observation that even row-style compression works better when data is stored columnarly.</li>
<li>And by <a href="../../../../../2010/09/15/aster-data-ncluster-version-4-6/">Aster Data</a> (September, 2010).</li>
<li>Teradata is particularly aggressive about <a href="../../../../../2009/08/02/teradata-13-focuses-on-advanced-analytic-performance/">query rewrite</a> (August, 2009).</li>
<li><a href="../../../../../2006/09/27/logless-lockless-netezza-more-carefully-explained/">Netezza&#8217;s logless, lockless architecture</a> (September, 2006).<a href="../../../../../2006/09/27/logless-lockless-netezza-more-carefully-explained/"><br />
</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/01/18/architectural-options-for-analytic-database-management-systems/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Where ParAccel is at</title>
		<link>http://www.dbms2.com/2010/10/17/paraccel/</link>
		<comments>http://www.dbms2.com/2010/10/17/paraccel/#comments</comments>
		<pubDate>Sun, 17 Oct 2010 08:21:04 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=3296</guid>
		<description><![CDATA[Until recently, I was extremely critical of ParAccel&#8217;s marketing. But there was an almost-clean sweep of the relevant ParAccel executives, and the specific worst practices I was calling out have for the most part been eliminated. So I was open to talking and working with ParAccel again, and that&#8217;s now happening. On my recent California [...]]]></description>
			<content:encoded><![CDATA[<p>Until recently, <a href="http://www.dbms2.com/2010/01/15/there-sure-seem-to-be-a-lot-of-inaccuracies-on-paraccels-website/">I was extremely critical of ParAccel&#8217;s marketing</a>. But there was an almost-clean sweep of the relevant ParAccel executives, and the specific worst practices I was calling out have for the most part been eliminated. So I was open to talking and working with ParAccel again, and that&#8217;s now happening. On my recent California trip, I chatted with three ParAccel folks for a few hours. Based on that and other conversation, here&#8217;s the current ParAccel story as I understand it.<br />
<span id="more-3296"></span><br />
I&#8217;ve already noted that <a href="http://www.dbms2.com/2010/08/09/links-and-observations/">PADB 3.0 is coming soon</a> (ParAccel Analytic DataBase), but pending its arrival, ParAccel&#8217;s technical story is primarily about <strong>query performance.</strong> More specifically:</p>
<ul>
<li>ParAccel asserts that PADB is much faster than other analytic DBMS &#8212; even close competitors such as Vertica &#8212; on <strong>especially complex queries. </strong>&#8220;60-way joins&#8221; were mentioned. So was the flattening of correlated subqueries.</li>
<li>ParAccel also claims industry-leading performance on simpler queries, but not by the same (or perhaps even particular large) margins.</li>
<li>Mercifully, ParAccel no longer <a href="http://www.dbms2.com/2009/07/08/progress-in-figuring-out-what-paraccel-is-doing/">claims to have never, ever lost on performance in a customer evaluation</a>. But it still says that is very close to being true.</li>
<li>Major reasons ParAccel gives for PADB&#8217;s high performance include:
<ul>
<li>Like Vertica, Sybase IQ, and others, PADB uses a <strong>columnar</strong> architecture.</li>
<li>ParAccel thinks PADB&#8217;s newest <strong>query optimizer</strong> &#8212; fondly named <a href="http://paraccel.com/technology/omne-optimizer/">Omne</a> &#8212; is outstanding.</li>
<li>ParAccel&#8217;s PADB <strong>compiles its queries.</strong></li>
<li>In general, ParAccel is just performance-obsessed.</li>
</ul>
</li>
<li>One could also mention:
<ul>
<li>ParAccel&#8217;s PADB runs smoothly in-memory, if that&#8217;s what you want.</li>
<li>ParAccel also offers a Flash option for PADB.</li>
<li>Like many other analytic DBMS vendors, ParAccel has created a custom networking protocol. (ParAccel has talked about that <a href="http://www.dbms2.com/2010/04/16/story-of-an-analytic-dbms-evaluation/">altogether too much</a> in the past.)</li>
<li>Like Vertica, ParAccel&#8217;s PADB generally decompresses data as late as the  particular compression scheme used allows. (Well, actually, that&#8217;s not  one ParAccel mentions unless asked.)</li>
<li>ParAccel has long encouraged one to put part of one&#8217;s database on direct-attached storage as a kind of persistent cache, plus all of it on a storage-area network, because PADB can optimize its scans to go against both physical stores.</li>
<li>ParAccel&#8217;s PADB does encryption a block at a time, rather than a row at a time, so there&#8217;s very little overhead to using the encryption feature.</li>
</ul>
</li>
<li>ParAccel says that PADB has no indexes, materialized views, etc., notwithstanding that <a href="http://www.dbms2.com/2008/02/18/paraccel-technical-overview/">I heard something different from Barry Zane a few years ago</a>. This is the basis for ParAccel&#8217;s claim that <strong>no tuning</strong> (or at least very little) is required, or indeed even possible &#8230;</li>
<li>&#8230; and similarly, it is the reason ParAccel encourages prospects to do ad-hoc queries in their POCs (Proofs Of Concept), at least when Vertica is the competitor.</li>
<li>However, ParAccel&#8217;s PADB has rather <strong>complex initial set-up.</strong> This has been the basis for widespread skepticism about ParAccel&#8217;s &#8220;no tuning&#8221; claim. ParAccel is working to automate that away, but admits to being only part-way through the process.</li>
<li>Highlights of ParAccel&#8217;s data writing strategy include:
<ul>
<li>PADB sends data transactionally to disk.</li>
<li>PADB usually sends data to disk a block at a time, because it is coming in fast enough for that to work out (either due to bulk load or streaming).</li>
<li>PADB is <strong>append-only</strong> &#8230;</li>
<li>&#8230; so PADB has a garbage-collection mechanism called Vacuum. Right now Vacuum has to be started manually, but doesn&#8217;t block reads and writes; full background garbage collection is of course a roadmap feature.</li>
<li>As is natural for append-only systems, ParAccel&#8217;s PADB has MVCC (MultiVersion Concurrency Control) and snapshot isolation.</li>
</ul>
</li>
<li>Name a <strong>compression</strong> method, and PADB probably has it &#8212; 13 in all by ParAccel&#8217;s count, including dictionary/token, run-length encoding, Delta, LZ, and so on.</li>
</ul>
<p>Tracking ParAccel&#8217;s customer success has long been difficult. The <a href="http://www.dbms2.com/2010/02/10/gartner-magic-quadrant-data-warehouse-2009-2010/">2009 Gartner Magic Quadrant</a> claim of ~20 ParAccel customers seems odd to everybody, including ParAccel. ParAccel&#8217;s own reporting of customer wins around then was <a href="http://www.dbms2.com/2010/01/15/there-sure-seem-to-be-a-lot-of-inaccuracies-on-paraccels-website/">quite confusing</a>. And ParAccel&#8217;s customer count a year before that was <a href="http://www.dbms2.com/2009/01/03/paraccels-market-momentum/">extremely low</a>. But ParAccel&#8217;s Michael Weir just rounded up some figures for me, namely:</p>
<ul>
<li>ParAccel has 30+ revenue-recognized customers, not counting OEMs, OEMs&#8217; customers, or paid POCs.</li>
<li>2 ParAccel customers have &gt; 100 TB of user data.</li>
<li>7 ParAccel customers have &gt; 10 TB of user data.</li>
<li>The largest ParAccel cluster is 28 nodes and growing.</li>
</ul>
<p>Naturally, Michael went on to note that even relatively small databases can have high value.</p>
<p>One last note: ParAccel has approximately 78 employees.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/10/17/paraccel/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Best practices for analytic DBMS POCs</title>
		<link>http://www.dbms2.com/2010/06/14/best-practices-analytic-database-poc/</link>
		<comments>http://www.dbms2.com/2010/06/14/best-practices-analytic-database-poc/#comments</comments>
		<pubDate>Mon, 14 Jun 2010 12:53:33 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2297</guid>
		<description><![CDATA[When you are selecting an analytic DBMS or appliance, most of the evaluation boils down to two questions: How quickly and cost-effectively does it execute SQL? What analytic functionality, SQL or otherwise, does it do a good job of executing? And so, in undertaking such a selection, you need to start by addressing three issues: [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">When you are selecting an analytic DBMS or appliance, most of the evaluation boils down to two questions:</p>
<ul>
<li>How q<span style="font-style: normal;">uickly 	and cost-effectively does it execute SQL?</span></li>
<li><span style="font-style: normal;">What 	analytic functionality, SQL or otherwise, does it do a good job of 	executing?</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">And so, in undertaking such a selection, you need to start by addressing three issues:</span></p>
<ul>
<li><a href="../2009/09/10/analytic-speed-latency/">What 	does “speed” mean to you</a>?</li>
<li>What does “cost” mean to you?</li>
<li>What analytic functionality do you 	need anyway?</li>
</ul>
<p style="margin-bottom: 0in;"><span id="more-2297"></span>Key elements of cost* include:</p>
<ul>
<li>Software license and maintenance</li>
<li>Hardware purchase cost, 	maintenance, electric power, and computer room burden</li>
<li>Database and system administration</li>
<li>(For some uses cases) Programming</li>
</ul>
<p style="margin-bottom: 0in;"><em>*Assuming a classical in-house IT shop, where products are typically bought rather than leased/rented. With outsourced and/or monthly-fee structures, the details change but the principles remain the same.</em></p>
<p style="margin-bottom: 0in;"><em></em>Most of that can be evaluated pretty well via a spreadsheet, although things can get a bit tricky when you get to people costs, which are a large fraction of the whole. In particular, different analytic DBMS product suites have great, high-performance support for different (and often rapidly growing) sets of functionality – basic and advanced SQL, statistics, and more. Figuring out which ones will be best for your programmers, and how significant the differences are &#8212; well, that&#8217;s a lot like any other programming language evaluation, and those are rarely neat or clean-cut.</p>
<p style="margin-bottom: 0in; font-style: normal;">But when it comes to evaluating speed, <strong>there&#8217;s no substitute for a well-designed proof of concept (POC).</strong> Many analytic DBMS and appliance vendors are happy to let you do a POC, on your own premises (or remotely if you prefer), under your control, at no cost to you. And that&#8217;s great. <strong>It is crucial that a POC be run either by you, by a consultant* answerable to you,</strong><span style="font-weight: normal;"> or – if you decide the vendor must run it for you – at least </span><strong>with you watching every step of the way</strong><span style="font-weight: normal;"> and knowing exactly what is being done. Applianc</span>e vendors do find it cheaper to run POCs on their own premises, so a certain reluctance to ship you a box is understandable. But <strong>make no compromises about the transparency of a POC, or about your control of exactly what it is that gets tested.</strong></p>
<p style="margin-bottom: 0in;"><em>*Since I sell <a href="http://www.monash.com/adviseusers.html">consulting services</a> for users evaluating analytic DBMS, I naturally am biased to think that consultants can be very useful in the process. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  But whether you should use them a little (sanity check), a medium amount (work with you through the process), or heavily (actually drive the process for you and/or execute the POCs) is very dependent upon your specific situation.</em></p>
<p style="margin-bottom: 0in; font-style: normal;">So far as I&#8217;ve been able to tell:</p>
<ul>
<li><span style="font-style: normal;">Netezza 	loves to ship boxes to prospects for POCs, and have them set up the 	boxes and do POCs themselves. That&#8217;s a big reason why <a href="../2009/02/18/the-netezza-guys-propose-a-poc-checklist/">Netezza 	wants to call attention to this subject</a>.</span></li>
<li><span style="font-style: normal;">Oracle 	has generally been pretty <a href="../2009/02/01/oracle-says-they-do-onsite-exadata-pocs-after-all/">reluctant 	to ship Exadata boxes out for POCs</a>. That&#8217;s the other reason 	Netezza wants to call attention to the issue. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </span></li>
<li><span style="font-style: normal;">Open 	source vendors make it easy for you to download and test at least 	their community editions.</span></li>
<li><span style="font-style: normal;">Vertica 	makes it pretty easy for you to test its software too (download or 	cloud).</span></li>
<li><span style="font-style: normal;">ParAccel 	has generally insisted on running POCs itself, although it will do 	so on your premises if you insist.</span></li>
<li><span style="font-style: normal;">Teradata 	naturally tries to do POCs on its own premises, but doesn&#8217;t insist 	too hard.<em> (Edit: Randy Lea of Teradata says that Teradata is now doing over half its POCs onsite.)</em><br />
</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">Most of the criticisms I&#8217;ve heard of vendors&#8217; POC practices have been directed at Oracle or ParAccel.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">For most POCs, it&#8217;s a good conceptual template to </span><span style="font-style: normal;"><strong>form and then test a hypothesis</strong></span><span style="font-style: normal;"> to the effect of:</span></p>
<ul>
<li><span style="font-style: normal;">For 	a given technology product assemblage (brand of DBMS, number of 	nodes, etc.), and</span></li>
<li><span style="font-style: normal;">For 	a given level of human effort (e.g., administrative effort), you can</span></li>
<li><span style="font-style: normal;">Run 	a given a workload, with</span></li>
<li><span style="font-style: normal;">Satisfactory 	and satisfactorily consistent response times</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">Sometimes absolute throughput and price/performance are important </span><em>secondary</em><span style="font-style: normal;"> considerations; sometimes they&#8217;re less germane. But either way, it&#8217;s almost always right to focus </span><em>primarily</em><span style="font-style: normal;"> on the questions of </span><span style="font-style: normal;"><strong>“What do I want this system to do?”</strong></span><span style="font-style: normal;"> and </span><span style="font-style: normal;"><strong>“What do I think we&#8217;re going to have to invest in it?</strong></span><span style="font-style: normal;">” By way of contrast, it&#8217;s often misleading to focus too much on questions like “<a href="../2008/11/19/data-warehouse-proof-of-concept-pocs/">What&#8217;s the one number that best describes the performance of this system?</a>” &#8212; even if you customize that calculation for your environment – or, even worse, “How much speed-up can I get on my single worst <a href="../2008/11/15/query-from-hell/">Query from Hell</a>?” </span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">The fundamental rule of POC construction is: </span><span style="font-style: normal;"><strong>Model your entire use case as best you can.</strong></span><span style="font-style: normal;"> That means you need to consider, at a minimum:</span></p>
<ul>
<li><span style="font-style: normal;">Your 	whole concurrent query, other analytic, and low-latency update 	workload (peak).</span></li>
<li><span style="font-style: normal;">Your 	whole query, analytic, load, backup, and maintenance workload 	(ongoing).</span></li>
<li><span style="font-style: normal;"><a href="../2008/12/14/the-%E2%80%9Cbaseball-bat%E2%80%9D-test-for-analytic-dbms-and-data-warehouse-appliances/">Partial-failure 	scenarios</a>.</span></li>
<li><span style="font-style: normal;">Your 	core SLAs (Service-Level Agreements).</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">Of course, that&#8217;s not as easy as it sounds. Presumably, the main reason you&#8217;re getting a new analytic DBMS is that you want to do new kinds of analysis. By the very nature of analytics, you won&#8217;t know what analytic operations are most useful until you try them out and see what their results are. On the other hand – if you haven&#8217;t done considerable thinking about how you&#8217;re going to use your new analytic database, how did you ever get funding for the project in the first place? <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">Seriously, I could write multiple posts, each as long as this one (but more application-oriented), about how to upgrade your analytic capabilities (and which fool&#8217;s gold to avoid). But this has gotten pretty long already, so for now I&#8217;ll just stop here.</span></p>
<p style="margin-bottom: 0in;"><em>Note: My clients at Netezza asked me to write something short about POCs they could use as a kind of foreword to some collateral, where by &#8220;short&#8221; they meant single-paragraph or something like that. They&#8217;re great clients, so I said yes, under the condition I could also use it as a blog post. Except … this post didn&#8217;t turn out to be nearly as short as they envisioned. Oops. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </em></p>
<p style="margin-bottom: 0in;"><em><strong>Related links</strong></em></p>
<ul>
<li>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">My 	February, 2009 <a href="../2009/02/25/even-more-final-version-of-my-tdwi-slide-deck/">slide 	deck on how to select an analytic DBMS</a> is in many parts still 	pretty current</span></p>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/14/best-practices-analytic-database-poc/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Ingres VectorWise technical highlights</title>
		<link>http://www.dbms2.com/2010/06/11/ingres-vectorwise-technical-highlights/</link>
		<comments>http://www.dbms2.com/2010/06/11/ingres-vectorwise-technical-highlights/#comments</comments>
		<pubDate>Fri, 11 Jun 2010 11:28:18 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Ingres]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[VectorWise]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2261</guid>
		<description><![CDATA[After working through problems w/ travel, cell phones, and so on, Peter Boncz of VectorWise finally caught up with me for a regrettably brief call. Peter gave me the strong impression that what I&#8217;d written in the past about VectorWise had been and remained accurate, so I focused on filling in the gaps. Highlights included:  [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">After working through problems w/ travel, cell phones, and so on, Peter Boncz of VectorWise finally caught up with me for a regrettably brief call. Peter gave me the strong impression that what <a href="http://www.dbms2.com/2009/08/04/vectorwise-ingres-and-monetdb/">I&#8217;d written in the past about VectorWise</a> had been and remained accurate, so I focused on filling in the gaps. Highlights included:  <span id="more-2261"></span></p>
<ul>
<li>VectorWise is indeed a 	shared-everything analytic DBMS.</li>
<li>The VectorWise front-end is 	Ingres. Ingres VectorWise supports almost all SQL that Ingres does (there 	are a few edge-case exceptions).</li>
<li>Conversely, Ingres VectorWise 	doesn&#8217;t support any SQL Ingres doesn&#8217;t, most notably SQL-99 	Analytics. Naturally, SQL-99 Analytics is a roadmap item for 	Ingres/VectorWise.</li>
<li>Ingres VectorWise 1.0 is pretty 	purely columnar. There&#8217;s a bit of <a href="http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/">PAX</a>, but it&#8217;s mainly 	automagic/under the covers. The one user-controlled exception I 	understood was that one can ensure that composite keys are stored 	together.</li>
<li>The main Ingres VectorWise 	performance secret sauce ingredients we touched on were:
<ul>
<li>Vectorization of operations (hence VectorWise&#8217;s name).</li>
<li>Compression that is tuned for 	speed rather than to minimize storage utilization.</li>
</ul>
</li>
<li>We unfortunately didn&#8217;t have time 	to revisit the other big part of the Ingres VectorWise performance 	story, namely clever design for modern microprocessor architectures. 	High-level generalities about that do pervade <a href="http://www.dbms2.com/2010/06/10/vectorwise-press-release/">the Ingres 	VectorWise press release</a>,<span style="font-style: normal;"> but – 	well, they&#8217;re very high level.</span></li>
<li>Unlike Vertica but like most other 	columnar DBMS vendors, Ingres VectorWise wants you to store your 	data once. You can index-organize the data. You can also organize 	multiple tables in the same order, to make joins among them fast.</li>
<li>Support for actual join indexes is an Ingres VectorWise roadmap item.</li>
<li>As do ever more analytic DBMS, 	Ingres VectorWise has something akin to <a href="http://www.dbms2.com/2006/09/20/netezza-vs-conventional-data-warehousing-rdbms/">Netezza zone maps</a>.</li>
<li>When I asked 	Peter what had changed most from the initial VectorWise development 	plan, other than the above, he basically said that their performance 	priorities had shifted a bit. Specifically, he said.
<ul>
<li>They had 	originally been “blinded” (his word) by the TPC-H benchmark, but 	figured out that they were overly focused on it. (<a href="http://www.dbms2.com/2009/06/22/the-tpc-h-benchmark-is-a-blight-upon-the-industry/">Well, duh</a>.)</li>
<li>They learned 	about the importance of other things such as data loading speeds.</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/11/ingres-vectorwise-technical-highlights/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Story of an analytic DBMS evaluation</title>
		<link>http://www.dbms2.com/2010/04/16/story-of-an-analytic-dbms-evaluation/</link>
		<comments>http://www.dbms2.com/2010/04/16/story-of-an-analytic-dbms-evaluation/#comments</comments>
		<pubDate>Sat, 17 Apr 2010 02:56:56 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Buying processes]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1900</guid>
		<description><![CDATA[One of our readers was kind enough to walk me through his analytic DBMS evaluation process. The story is: The X Company (XCo) has a &#60;1 TB database. 100s of XCo&#8217;s customers log in at once to run reports. 50-200 concurrent queries is a good target number. XCo had been “suffering” with Oracle and wanted [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">One of our readers was kind enough to walk me through his analytic DBMS evaluation process. The story is:</p>
<ul>
<li>The X Company (XCo) has a &lt;1 TB 	database.</li>
<li>100s of XCo&#8217;s customers log in at 	once to run reports. 50-200 concurrent queries is a good target 	number.</li>
<li>XCo had been “suffering” with 	Oracle and wanted to upgrade.</li>
<li>XCo didn&#8217;t have a lot of money to 	spend. <strong>Netezza</strong> pulled out of the sales cycle early due to 	budget (and this was recently enough that Netezza <a href="http://www.dbms2.com/2010/01/25/netezza-skimmer/">Skimmer</a> could have been bid).</li>
<li><strong>Greenplum</strong> didn&#8217;t offer any 	references that approached the desired number of concurrent users.</li>
<li>Ultimately the evaluation came 	down to <strong>Vertica</strong> and <strong>ParAccel.</strong></li>
<li><strong>Vertica won.</strong></li>
</ul>
<p style="margin-bottom: 0in; font-weight: normal;">Notes on the Vertica vs. ParAccel selection include:<span id="more-1900"></span></p>
<ul>
<li>ParAccel sent an engineer on-site 	to do a proof-of-concept (POC), and generally competed very hard for 	the deal.</li>
<li>Vertica dropped by for a sales 	call once, and let XCo do the Vertica POC itself.</li>
<li>Not surprisingly, XCo got the 	impression that Vertica was easier to set up and administer than 	ParAccel.</li>
<li>Also, when ParAccel emphasized 	architectural features such as custom “backplane” and compiled 	queries, XCo got the impression – right or wrong – that 	ParAccel&#8217;s performance was more brittle or situational than 	Vertica&#8217;s.</li>
<li>ParAccel was modestly faster than 	Vertica in the POC. (I think &#8212; Vertica&#8217;s numbers were described as being &#8220;very competitive.&#8221;)</li>
<li>In multiple ways, Vertica gave the 	impression of greater product and vendor maturity than ParAccel.</li>
</ul>
<p style="margin-bottom: 0in;">My contact continues to be interested in all things Greenplum, and has recommended <a href="http://www.dbms2.com/2009/10/19/greenplum-free-single-node-edition/">Greenplum Single-Node Edition</a> to his analyst colleagues.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/04/16/story-of-an-analytic-dbms-evaluation/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

