<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Netezza</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/netezza/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 02 Sep 2010 09:06:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>DB2 workload management</title>
		<link>http://www.dbms2.com/2010/08/18/ibm-db2-workload-management/</link>
		<comments>http://www.dbms2.com/2010/08/18/ibm-db2-workload-management/#comments</comments>
		<pubDate>Wed, 18 Aug 2010 08:47:09 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2819</guid>
		<description><![CDATA[DB2 has added a lot of workload management features in recent releases. So when we talked Tuesday afternoon, Tim Vincent and I didn&#8217;t bother going through every one. Even so, we covered some interesting subjects in the area of DB2 workload management, including:  

If your goal is to keep a certain 	class of queries from [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><a href="../2009/04/24/some-db2-highlights/">DB2 has added a lot of workload management features in recent releases</a>. So when we talked Tuesday afternoon, Tim Vincent and I didn&#8217;t bother going through every one. Even so, we covered some interesting subjects in the area of DB2 workload management, including:  <span id="more-2819"></span></p>
<ul>
<li>If your goal is to keep a certain 	class of queries from taking too many resources, Tim thinks a great 	way of doing that is to control how many of them are allowed to run 	concurrently.</li>
<li>By way of contrast, Tim is 	cautious about the common approach of just lowering a query&#8217;s 	priority. His concern is that a long-running query could linger even 	longer, creating a long-lasting bottleneck in, for example, <a href="http://www.dbms2.com/2010/08/18/more-on-temp-space-compression-and-random-io/" >temp 	space</a>.</li>
<li>When running over (I believe) 	Linux and AIX, DB2 workload management is integrated with operating 	system workload management. I.e., the same “service class” or 	“workload class” (at a guess, the former is the official term 	and the latter is the term that makes sense) of queries and 	associated processes gets the same treatment in both DB2 and the OS.</li>
<li>DB2&#8217;s workload management extends 	to buffer pools, to inhibit low-priority queries from evicting a 	higher-priority query&#8217;s data from cache.</li>
<li>Sometimes, workload management 	doesn&#8217;t throttle a query, but just decides to collect stats for 	future analysis. (This is on the eminently reasonably theory that 	the best stats to collect are the ones that are live when  	performance problems are actually occurring.)</li>
</ul>
<p style="margin-bottom: 0in;">Finally, Tim spoke of what I regard as the weirdest workload management requirement, one I also heard about from <a href="http://www.dbms2.com/2009/07/18/netezza-on-concurrency-and-workload-management/" >Netezza</a> <span style="font-style: normal;">(but didn&#8217;t explicitly mention) in</span> June. Sometimes, it seems, you simply don&#8217;t want queries to finish too fast. Why? Because if you give great performance when the machine is lightly loaded, then business users might expect that performance too when the machine is heavily loaded and you can&#8217;t deliver it. Apparently, in some environments it&#8217;s better to never deliver great query performance than it is to do so only inconsistently.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/18/ibm-db2-workload-management/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Netezza&#8217;s version of EnterpriseDB-based Oracle compatibility</title>
		<link>http://www.dbms2.com/2010/06/26/netezza-migrator/</link>
		<comments>http://www.dbms2.com/2010/06/26/netezza-migrator/#comments</comments>
		<pubDate>Sat, 26 Jun 2010 12:17:11 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data integration and middleware]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Emulation, transparency, portability]]></category>
		<category><![CDATA[EnterpriseDB and Postgres Plus]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2397</guid>
		<description><![CDATA[EnterpriseDB has some deplorable business practices (my stories of being screwed by EnterpriseDB have been met by &#8220;Well, you&#8217;re hardly the only one&#8221;). But a couple of more successful DBMS vendors have happily partnered with EnterpriseDB even so, to help pick off Oracle users. IBM&#8217;s approach was in the vein of an EnterpriseDB-infused version of [...]]]></description>
			<content:encoded><![CDATA[<p>EnterpriseDB has some deplorable business practices (my stories of being screwed by EnterpriseDB have been met by &#8220;Well, you&#8217;re hardly the only one&#8221;). But a couple of more successful DBMS vendors have happily partnered with EnterpriseDB even so, to help pick off Oracle users. IBM&#8217;s approach was in the vein of an <a href="http://www.dbms2.com/2009/04/24/ibms-oracle-emulation-strategy-reconsidered/" >EnterpriseDB</a>-<a href="http://www.dbms2.com/2009/04/22/dbms-transparency-layers-never-seem-to-sell-well/" >infused</a> <a href="http://www.dbms2.com/2010/04/07/ibm-anti-oracle-announcements/" >version</a> of SQL handling within DB2.* Netezza just announced an EnterpriseDB-based Netezza Migrator that is rather different.</p>
<p><em>*The comment threads are the most informative parts of those posts.</em></p>
<p>I&#8217;m a little unclear as to the Netezza Migrator details, not least because Netezza folks don&#8217;t seem to care too much about Netezza Migrator themselves. That said, the core ideas of Netezza Migrator are:  <span id="more-2397"></span></p>
<ul>
<li>Netezza Migrator is an enhanced (?) version of EnterpriseDB&#8217;s Postgres Plus Advanced Server DBMS. (Recall that Postgres Plus is PostgreSQL-based and fairly <a href="http://www.dbms2.com/2008/07/07/enterprisedbf-oracle-compatibility/" >Oracle-compatible</a>.)</li>
<li>Netezza Migrator does not run on Netezza appliances, but rather on conventional computers off to the side.</li>
<li>Netezza Migrator generally farms out queries to Netezza appliances, but can also manage data itself. (That latter part could supposedly come in handy for small tables one might want to execute stored procedures against.)</li>
<li>Netezza Migrator does a better job of farming out queries (and also inserts/updates/loads) to Netezza appliances than an Oracle DBMS would. The two biggest examples of that are:
<ul>
<li>Oracle will farm out SELECTs, but not JOINs.</li>
<li>Oracle won&#8217;t invoke Netezza&#8217;s parallel/bulk load capabilities.</li>
</ul>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/26/netezza-migrator/feed/</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
		<item>
		<title>Flash is coming, well &#8230;</title>
		<link>http://www.dbms2.com/2010/06/25/flash-is-coming-well/</link>
		<comments>http://www.dbms2.com/2010/06/25/flash-is-coming-well/#comments</comments>
		<pubDate>Fri, 25 Jun 2010 16:42:26 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data integration and middleware]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2389</guid>
		<description><![CDATA[I really, really wanted to title this post &#8220;Flash is coming in a flash.&#8221; That seems a little exaggerated &#8212; but only a little.

Netezza now intends to come out with a Flash-based appliance earlier than it originally expected.
Indeed, Netezza has suspended &#8212; by which I mean &#8220;scrapped&#8221; &#8212; prior plans for a RAM-heavy disk-based appliance. [...]]]></description>
			<content:encoded><![CDATA[<p>I really, really wanted to title this post &#8220;Flash is coming in a flash.&#8221; That seems a little exaggerated &#8212; but only a little.</p>
<ul>
<li>Netezza now intends to come out with a Flash-based appliance earlier than it originally expected.</li>
<li>Indeed, Netezza has suspended &#8212; by which I mean &#8220;scrapped&#8221; &#8212; prior plans for a RAM-heavy disk-based appliance. It will use a RAM/Flash combo instead.*</li>
<li>Tim Vincent of IBM told me that customers seem ready to adopt solid-state memory. One interesting comment he made is that Flash isn&#8217;t really all that much more expensive than high-end storage area networks.</li>
</ul>
<p>Uptake of solid-state memory (i.e. Flash) for analytic database processing will probably stay pretty low in 2010, but in 2011 it should be a notable (b)leading-edge technology, and it should get mainstreamed pretty quickly after that.  <span id="more-2389"></span></p>
<p><em>*So far as I can tell, that&#8217;s one of the two significant roadmap changes between the 2009 and 2010 editions of <a href="http://www.dbms2.com/2010/06/23/my-talk-this-morning/" >Enzee Universe</a>. The other one is that </em><em>the robust form of</em><em> appliance-to-appliance replication technology is coming out later than Netezza had originally planned and hoped.</em></p>
<p>There also is increasing reason to think that the issues with Flash memory wearing out are overwrought.  And by the way, the entire history of enterprise solid-state memory use is basically shorter than the time in which these products supposedly will wear out, so it&#8217;s not as if there have been a lot of real-life failures out there.)</p>
<ul>
<li>First, clever things are being done in the area of error correction codes, although for the most part I defer that part of the discussion to Petascan&#8217;s Camuel Gilyadov. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  E.g., this seems to be the idea behind Anobit.</li>
<li>Second, analytic DBMS are pretty much an ideal use case for Flash reliability. Suppose, as is the case for many products and implementations, you only write things in big blocks. Then you are, ipso facto, resetting the Flash bits only in big blocks. Thus, at least in theory, you automatically have pretty perfect wear leveling.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/25/flash-is-coming-well/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>My talk this morning</title>
		<link>http://www.dbms2.com/2010/06/23/my-talk-this-morning/</link>
		<comments>http://www.dbms2.com/2010/06/23/my-talk-this-morning/#comments</comments>
		<pubDate>Wed, 23 Jun 2010 11:33:18 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Presentations]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2378</guid>
		<description><![CDATA[Netezza&#8217;s Enzee Universe conference is now almost over, and I still haven&#8217;t figured out what my gig as &#8220;conference blogger&#8221; entails. More precisely, I&#8217;m operating from our unspoken fallback plan, namely &#8220;If all else fails, do what you&#8217;d do anyway, but do more of it.&#8221; For me to live up to that, all Netezza had [...]]]></description>
			<content:encoded><![CDATA[<p>Netezza&#8217;s Enzee Universe conference is now almost over, and I still haven&#8217;t figured out what my gig as <a href="http://www.dbms2.com/2010/06/21/notes-on-a-spate-of-netezza-related-blog-posts/" >&#8220;conference blogger&#8221;</a> entails. More precisely, I&#8217;m operating from our unspoken fallback plan, namely &#8220;If all else fails, do what you&#8217;d do anyway, but do more of it.&#8221; For me to live up to that, all Netezza had to do was find interesting things to write about &#8212; and as far as I&#8217;m concerned, they already did that last Thursday in spades; the five interesting meetings they set up for with users and partners on Tuesday were just gravy.</p>
<p>Another part of the deal was that I&#8217;d give a talk this morning at 9:30 am. And when I give talks, I like to put up posts that cover whatever material I haven&#8217;t written up before, while also offering the talk&#8217;s listeners convenient links to materials I have already covered previously at length.</p>
<p><span id="more-2378"></span>So anyway:</p>
<p>As I&#8217;ve been doing all year, I plan to start the talk with the subject of <strong>liberty and privacy.</strong> My most recent <a href="http://www.dbms2.com/2010/04/04/privacy-liberty-continued/" >overview post on privacy and liberty</a> has a bunch of links to what I and other people have said before.</p>
<p>This year&#8217;s Enzee Universe keynote speaker was Stephen Baker, author of <em>Numerati. </em>His talk, the only one all week I&#8217;ve attended in its entirety (I do intend for mine to be the second one, however <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  ) reminded me that some of my ideas had been inspired by his book, specifically the part about <a href="http://www.dbms2.com/2010/04/20/big-brother-watching-our-parents/" >sensors in our elderly relatives&#8217; homes tracking every movement</a>, for legitimate reasons of health care and physical safety.</p>
<p>One thing I&#8217;m reminded of when talking with users is that they tend to be a bit focused on their projects or areas, and almost never have the opportunity to consider the full range of possibilities open to them. So I&#8217;ve put in two slides to raise consciousness on the point.</p>
<ul>
<li>$ per user
<ul>
<li>$1000s or maybe $10,000s (perhaps a small team of analysts looking at Big Data)</li>
</ul>
<ul>
<li>$100s or maybe $1000s (perhaps conventional BI)</li>
<li>On the order of $.10 or $1 (<a href="http://www.dbms2.com/2010/05/15/stakeholder-facing-analytics/" >stakeholder-facing analytics</a>)</li>
</ul>
</li>
<li>Three benefits of better price/performance
<ul>
<li>Do the same thing, cheaper</li>
<li>Do the same thing, better</li>
<li>Do something different</li>
</ul>
</li>
</ul>
<p>Then I put together a list of &#8220;cool technologies in analytics&#8221; people might want to think about, including:</p>
<ul>
<li>Solid-state memory (there&#8217;s a whole section here about that; see the sidebar)</li>
<li><a href="http://www.dbms2.com/2010/04/12/greenplumchorus/" >Data mart spin-out</a></li>
<li>Exploratory BI (e.g., <a href="http://www.dbms2.com/2010/06/12/the-underlying-technology-of-qlikview/" >QlikView</a>)</li>
<li>Advanced analytics (platforms)
<ul>
<li>DBMS-centric (more on that coming soon, but meanwhile <a href="http://www.dbms2.com/2010/02/22/netezza-twinfin/" >February&#8217;s posts</a> can be a placeholder)</li>
<li>MapReduce-centric (there&#8217;s a whole section here on that too)</li>
</ul>
</li>
<li>Advanced analytics (UIs and algorithms)
<ul>
<li>SQL tasting (I just coined that to talk about the useful idea of getting fast, partial results on long queries)</li>
<li>Stats/predictive (I&#8217;ve really got to build a blog category for that)</li>
<li><a href="http://www.dbms2.com/2010/06/19/objectivity-infinite-graph/" >Graph</a></li>
<li>Matrix/optimization</li>
</ul>
</li>
</ul>
<p>And finally, I listed three &#8220;aggravating analytic challenges&#8221; in areas where I&#8217;m disappointed with the progress of and/or prospects for technology, including:</p>
<ul>
<li><a href="http://www.monashreport.com/2006/10/05/dashboard-business-intelligence-bi-segmentation/" onclick="javascript:pageTracker._trackPageview('/www.monashreport.com');">KPI management</a></li>
<li>Text/tabular integration</li>
<li><a href="http://www.dbms2.com/2010/06/08/profile-of-revealed-preferences/" >Profile of Revealed Preferences aka “Social graph”</a></li>
</ul>
<p><em><strong>Posts on Netezza&#8217;s announcements around the time of Enzee Universe</strong></em></p>
<ul>
<li><a href="../2010/06/21/netezza-database-software-technology-overview/">A long discussion of Netezza’s 	technology, focusing on the database parts</a></li>
<li><a href="../2010/06/21/netezza-ibm-db2-compression/">A discussion of Netezza’s and 	IBM’s compression strategies</a></li>
<li><a href="../2010/06/21/netezza-silicon-balance/">Notes on how Netezza balances 	its silicon and uses its FPGAs</a></li>
<li><a href="../2010/06/21/data-warehouse-load-latency/">A quickie on data warehouse 	loading latency</a></li>
<li><a href="http://www.dbms2.com/2010/06/26/netezza-migrator/" >How Netezza Migrator works</a></li>
<li><a href="http://www.dbms2.com/2010/06/25/flash-is-coming-well/" >Netezza&#8217;s strategy for RAM and Flash</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/23/my-talk-this-morning/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>What kinds of data warehouse load latency are practical?</title>
		<link>http://www.dbms2.com/2010/06/21/data-warehouse-load-latency/</link>
		<comments>http://www.dbms2.com/2010/06/21/data-warehouse-load-latency/#comments</comments>
		<pubDate>Mon, 21 Jun 2010 12:15:17 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2319</guid>
		<description><![CDATA[I took advantage of my recent conversations with Netezza and IBM to discuss what kinds of data warehouse load latency were practical. In both cases I got the impression:

Subsecond load latency is 	substantially impossible. Doing that amounts to OLTP.
5 seconds or so is doable with 	aggressive investment and tuning.
Several minute load latency is 	pretty easy.
10-15 [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I took advantage of my recent conversations with <a href="http://www.dbms2.com/2010/06/21/netezza-database-software-technology-overview/" >Netezza</a> and <a href="http://www.dbms2.com/2010/06/21/netezza-ibm-db2-compression/" >IBM</a> to discuss what kinds of data warehouse load latency were practical. In both cases I got the impression:</p>
<ul>
<li>Subsecond load latency is 	substantially impossible. Doing that amounts to OLTP.</li>
<li>5 seconds or so is doable with 	aggressive investment and tuning.</li>
<li>Several minute load latency is 	pretty easy.</li>
<li>10-15 minute latency or longer is 	now very routine.</li>
</ul>
<p style="margin-bottom: 0in;">There&#8217;s generally a throughput/latency tradeoff, so if you want very low latency with good throughput, you may have to throw a lot of hardware at the problem.</p>
<p style="margin-bottom: 0in;">I&#8217;d expect to hear similar things from any other vendor with reasonably mature analytic DBMS technology. Low-latency load is a problem for columnar systems, but both <a href="http://www.dbms2.com/2008/08/12/vertica-paraccel-exasol/" >Vertica <span style="font-style: normal;">and</span> ParAccel</a> designed in workarounds from the getgo. Aster Data probably didn&#8217;t meet these criteria until <a href="http://www.dbms2.com/2009/10/30/aster-data-application-server-ncluster/" >Version 4.0</a>, its old “<a href="http://www.dbms2.com/2008/10/22/aster-data-systems-ncluster/" >frontline</a>” positioning notwithstanding, but I think it does now.</p>
<p style="margin-bottom: 0in;"><em><strong>Related link</strong></em></p>
<ul>
<li>
<p style="margin-bottom: 0in;"><a href="http://www.dbms2.com/2009/09/10/analytic-speed-latency/" >Just what is your need for speed</a> anyway?</p>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/21/data-warehouse-load-latency/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The Netezza and IBM DB2 approaches to compression</title>
		<link>http://www.dbms2.com/2010/06/21/netezza-ibm-db2-compression/</link>
		<comments>http://www.dbms2.com/2010/06/21/netezza-ibm-db2-compression/#comments</comments>
		<pubDate>Mon, 21 Jun 2010 12:05:47 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2320</guid>
		<description><![CDATA[Thursday, I spent 3 ½ hours talking with 10 of Netezza&#8217;s more senior engineers. Friday, I talked for 1 ½ hours with IBM Fellow and DB2 Chief Architect Tim Vincent, and we agreed we needed at least 2 hours more. In both cases, the compression part of the discussion seems like a good candidate to [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Thursday, <a href="http://www.dbms2.com/2010/06/21/netezza-database-software-technology-overview/" >I spent 3 ½ hours talking with 10 of Netezza&#8217;s more senior engineers</a>. Friday, I talked for 1 ½ hours with IBM Fellow and DB2 Chief Architect Tim Vincent, and we agreed we needed at least 2 hours more. In both cases, the compression part of the discussion seems like a good candidate to split out into a separate post. So here goes.</p>
<p style="margin-bottom: 0in;">When you sell a row-based DBMS, as Netezza and IBM do, there are a couple of approaches you can take to compression. First, you can compress the blocks of rows that your DBMS naturally stores. Second, you can compress the data in a column-aware way. Both Netezza and IBM have chosen completely column-oriented compression, with no block-based techniques entering the picture to my knowledge. But that&#8217;s about as far as the similarity between Netezza and IBM compression goes.  <span id="more-2320"></span></p>
<p style="margin-bottom: 0in;"><strong>IBM&#8217;s basic DB2 compression strategy</strong> is remarkably simple. In every table (not column) – or in each range partition in a range-partitioned table &#8212; <strong>the 4096 most common* values are identified; these are all encoded into 12-bit strings</strong>. And that&#8217;s that. This has been happening since DB2 9.1, released 4 ½ years ago. DB2&#8217;s compression persists through logs, buffer pools (i.e., RAM cache), and so on. In DB2 9.7, the most recent release, IBM extended the use of the compression to a few areas it hadn&#8217;t stretched before, such as log-based replication, native XML, or CLOBs (Character Large OBjects) that happen not to be too big.</p>
<p style="margin-bottom: 0in;"><em>*Actually, I&#8217;d presume it&#8217;s not exactly the “most common”; there surely is some minimum length of a value to be encoded, or some bias toward length. Also, the determination of what to encode is probably a little imprecise. E.g., I forgot to ask whether the choice of values ever changes as data got updated.</em></p>
<p style="margin-bottom: 0in;">The sophisticated part of DB2&#8217;s simple compression strategy is its breadth of applicability; DB2 compression can apply to:</p>
<ul>
<li>Values in columns (numeric, 	character, whatever)</li>
<li>Substrings of values in columns</li>
<li>Groups of columns (e.g., 	city/state/zip code)</li>
</ul>
<p style="margin-bottom: 0in;">Except for the 4096 values limit, that sounds at least as flexible as the <a href="http://www.dbms2.com/2009/05/14/the-secret-sauce-to-clearpaces-compression/" >Rainstor/Clearpace compression approach</a>.</p>
<p style="margin-bottom: 0in;"><strong>Netezza,</strong> unlike IBM, takes a grab-bag approach to compression – try out a bunch of techniques, see which work best, and incorporate those in the product. <a href="http://www.enzeecommunity.com/blogs/nzblog/2008/05/15/issue-19-the-compress-engine-the-netezza-philosophy" onclick="javascript:pageTracker._trackPageview('/www.enzeecommunity.com');">Netezza first introduced compression a couple of years ago,</a> for numeric columns only, especially integer.  Techniques  used in Netezza numeric compression include but are not limited to:</p>
<ul>
<li>Delta compression, wherein you 	store the increment between a value and its predecessor rather than 	a whole new value.</li>
<li>Ways of indicating that a value or 	increment was just the same as in the row before.</li>
</ul>
<p style="margin-bottom: 0in;">This was via something called Compress Engine,* now being renamed to Compress Engine 1. Netezza&#8217;s new Compress Engine 2 improves on what Netezza did in Compress Engine 1 for numeric data, most notably by trimming away excess field length. (Netezza says it got 28% better compression on a test data set with almost no character strings, primarily from that enhancement.) Further, Netezza Compress Engine 2 adds new compression techniques, allowing it to handle VARCHAR – i.e. character strings &#8212; as well.</p>
<p style="margin-bottom: 0in;"><em>*Fortunately, the original name or at least description of “Compiled Tables” is retreating ever more from view.</em></p>
<p style="margin-bottom: 0in;">Netezza&#8217;s Compress Engine 2 has two ways to compress character fields/text strings – <strong>prefix compression </strong><span style="font-weight: normal;">and </span><strong>Huffman coding.</strong> By way of contrast, Netezza tested suffix compression and decided it wasn&#8217;t beneficial enough to bother messing with.</p>
<ul>
<li>The idea behind prefix compression 	is that if two strings start with the same characters, for the 	second one you only have to record the part that&#8217;s different. Prefix 	compression has a lot of the same merits as delta compression; like 	delta compression, it works best on sorted columns. (An example of 	where prefix compression makes obvious sense is URLs, which tend to 	all start in similar ways.)</li>
<li>In Netezza&#8217;s version of Huffman 	coding, the alphabet is encoded symbol-by-symbol, with more common 	characters getting codes of shorter length. These codes are chosen 	on a column-by-column basis. (I presume the “/” character gets 	shorter code in a URL column than it would, for example, in one that 	stored addresses.)</li>
</ul>
<p style="margin-bottom: 0in;">While I didn&#8217;t ask explicitly, it seems pretty obvious that Compress Engine 2&#8217;s functionality is a strict superset of Compress Engine 1&#8217;s. <a href="http://www.dbms2.com/2010/06/21/netezza-silicon-balance/" >Netezza is going to run Compress Engines 1 and 2 side by side</a>, but expects pages to move from Compress Engine 1&#8217;s purview to Compress Engine 2&#8217;s as part of the new “table grooming” process.</p>
<p><em><strong>Related links</strong></em></p>
<ul>
<li>IBM kindly permitted me to post some of <a href="http://www.monash.com/uploads/ibm-db2-compression-june-2010.pdf" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">its slides in the area of compression</a></li>
<li><a href="http://msdn.microsoft.com/en-us/library/cc280464.aspx" onclick="javascript:pageTracker._trackPageview('/msdn.microsoft.com');">Microsoft SQL Server seems to rely on prefix and dictionary compression</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/21/netezza-ibm-db2-compression/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Netezza&#8217;s silicon balance</title>
		<link>http://www.dbms2.com/2010/06/21/netezza-silicon-balance/</link>
		<comments>http://www.dbms2.com/2010/06/21/netezza-silicon-balance/#comments</comments>
		<pubDate>Mon, 21 Jun 2010 12:00:12 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2321</guid>
		<description><![CDATA[As I&#8217;ve mentioned in a couple of other posts, Netezza is stressing that the most recent wave of its technology is software-only, with no hardware upgrades made or needed. In other words, Netezza boxes already have all the silicon they need. But of course, there are really at least three major aspects to the Netezza [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">As I&#8217;ve mentioned in a couple of other posts, Netezza is stressing that the most recent <a href="http://www.dbms2.com/2010/06/21/netezza-database-software-technology-overview/" >wave</a> of its technology is software-only, with no hardware upgrades made or needed. In other words, Netezza boxes already have all the silicon they need. But of course, there are really at least three major aspects to the Netezza silicon story – FPGA (Field-Programmable Gate Array), CPU, and RAM.</p>
<ul>
<li>Netezza planned to be “generous” 	in its original TwinFin FPGA capacity, anticipating software 	upgrades like the ones it&#8217;s introducing now. It is satisfied that 	this strategy worked. More on this below.</li>
<li>The same surely applies to CPU.</li>
<li>What&#8217;s more, I get the sense that 	the CPU turned out in practice to be even more over-provisioned than 	they anticipated …</li>
<li>… at least when one just 	considers Netezza&#8217;s base NPS software.</li>
<li>However, I suspect that if the 	advanced analytics capability takes off, Netezza will determine that 	more CPU is always better.</li>
<li>And by the way, NEC is making 	versions of Netezza appliances with more advanced chips than Netezza 	is. So if anybody should really, really need more CPU in their 	Netezza boxes, there&#8217;s a very straightforward way to make that 	happen. (And if there were nontrivial demand for that, appropriate 	support plans could surely be structured.)</li>
<li>Everybody needs to be careful 	about RAM. Netezza is surely no exception.</li>
</ul>
<p style="margin-left: 0.49in; text-indent: -0.25in; margin-bottom: 0in;">
<p style="margin-bottom: 0in;">The major parts of Netezza&#8217;s FPGA software are:</p>
<ul>
<li><strong>Compress Engine 2.</strong> This is 	Netezza&#8217;s new way of doing compression.</li>
<li><strong>Compress Engine 1.</strong> This is 	Netezza&#8217;s old way of doing compression. It is being kept around so 	that existing Netezza tables don&#8217;t suddenly have to be changed or 	reloaded.</li>
<li><strong>Project Engine.</strong> Guess what 	this does.</li>
<li><strong>Restrict Engine.</strong> Ditto.</li>
<li><strong>Visibility Engine.</strong> This 	<a href="http://www.dbms2.com/2006/09/27/logless-lockless-netezza-more-carefully-explained/" >enforces ACID</a> and handles row-level security. It is “sort 	of a corner of” the Restrict Engine (Actually, Netezza seems to 	waver as to whether to describe “Restrict” and “Visibility” 	as being two engines or one.)</li>
<li>Miscellaneous plumbing.</li>
</ul>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">If I understood correctly, each Netezza FPGA has two each of the engines in parallel.</p>
<p style="margin-bottom: 0in;"><em><strong>Related link</strong></em></p>
<ul>
<li>An August, 2009 post on <a href="http://www.dbms2.com/2009/08/08/netezza-fpga/" >what Netezza does in its FPGA</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/21/netezza-silicon-balance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A partial overview of Netezza database software technology</title>
		<link>http://www.dbms2.com/2010/06/21/netezza-database-software-technology-overview/</link>
		<comments>http://www.dbms2.com/2010/06/21/netezza-database-software-technology-overview/#comments</comments>
		<pubDate>Mon, 21 Jun 2010 11:57:35 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2322</guid>
		<description><![CDATA[Netezza is having its user conference Enzee Universe in Boston  Monday–Wednesday, June 21-23, and naturally will be announcing new products there, and otherwise providing hooks and inducements to get itself written about. (The preliminary count is seven press releases in all.) To get a head start, I stopped by Netezza Thursday for meetings that [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Netezza is having its user conference Enzee Universe in Boston  Monday–Wednesday, June 21-23, and naturally will be announcing new products there, and otherwise providing <a href="http://www.dbms2.com/2010/06/21/notes-on-a-spate-of-netezza-related-blog-posts/" >hooks and inducements to get itself written about</a>. (The preliminary count is seven press releases in all.) To get a head start, I stopped by Netezza Thursday for meetings that included a 3 ½ hour session with 10 or so senior engineers, and have exchanged some clarifying emails since.  <span id="more-2322"></span></p>
<p style="margin-bottom: 0in;">It might be best to start with some Netezza product introduction and naming housekeeping:</p>
<ul>
<li>Netezza isn&#8217;t changing the 	hardware on any of its existing systems at this time. Rather, 	Netezza&#8217;s product upgrades are contained in <strong>a software-only </strong><span style="font-weight: normal;">release &#8230;</span></li>
<li>… except 	that it isn&#8217;t a “release,” but rather a <strong>“wave.”</strong> There 	are three points to that terminological distinction:
<ul>
<li>The advanced 	analytics part doesn&#8217;t depend on the new database platform software.</li>
<li>Individual 	functions in the advanced analytics part don&#8217;t necessarily depend on 	advances in the analytics platform.</li>
<li>It plays on 	the surfboard-centric naming of Netezza&#8217;s appliances. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
</ul>
</li>
<li>Netezza has wisely scrapped <a href="http://www.dbms2.com/2010/02/22/netezza-twinfin/" >its 	prior plan to make its advanced-analytics capabilities be a 	chargeable add-on to it core appliance products</a>. Rather, Netezza 	is going to offer <strong>advanced analytics as part of its core product.</strong> Part of the reason is that the interest in these capabilities is 	broader than Netezza first anticipated. The name for this is is 	something like <strong>i-Class advanced analytics capabilities.</strong></li>
<li>There is a “relea<span style="font-weight: normal;">se” 	in all this too, namely </span><strong>NPS 6.0</strong><span style="font-weight: normal;"> (Netezza Performance Software). That&#8217;s the core DBMS technology. </span></li>
<li><span style="font-weight: normal;">It&#8217;s 	all to be shipped in Q3.</span></li>
</ul>
<p style="margin-bottom: 0in; font-weight: normal;">Highlights of our NPS 6.0 conversation include:</p>
<ul>
<li>As promised, Netezza has improved 	its <strong>compression</strong> significantly. Because this was anticipated, 	this upgrade was planned for in the design of <a href="http://www.dbms2.com/2009/07/30/netezza-new-product-family/" >the systems Netezza 	started introducing last summer</a>. Consequently, the reduction in 	I/O produced by compression translates almost directly into better 	performance – the silicon is now more fully loaded than it was 	before, but few if any actual silicon bottlenecks have been 	introduced by the I/O improvement.</li>
<li>Netezza&#8217;s other big performance 	enhancement is the introduction of <strong>clustered base tables, </strong><span style="font-weight: normal;">which 	it says can reduce I/O by an order of magnitude or better.</span></li>
<li>Netezza says that there are 	individual queries in which the enhancements take query performance 	up 30-40X. (Presumably, those would be ones for which clustered base 	tables are a big win.)</li>
<li>More interestingly, Netezza says 	that <strong>overall performance is improved by &gt;2X.</strong> That&#8217;s 	queries, load, backup, and everything else all blended together.</li>
<li>Underpinning 	all this, Netezza went from 125 MHz to a blend of 125 and 250 MHz in 	its FPGA clock speeds. Also, the width of the FPGA onboard data path 	went from 16 to 32 bits. Netezza suggests that the naive calculation 	which says this could increase FPGA throughput 4X isn&#8217;t entirely 	misleading.</li>
<li>Netezza is 	pretty content with its <strong>workload management</strong> capabilities for 	queries, but nonetheless keeps adding features. <a href="http://www.dbms2.com/2009/07/18/netezza-on-concurrency-and-workload-management/" >Workload management</a> has not yet been extended to cover all the non-query parts of the 	analytic functionality.</li>
<li>Netezza 	continues to enhance its <strong>cost-based optimizer and query planner.</strong></li>
<li><span style="font-weight: normal;">Netezza 	has long used an </span><strong>internal networking</strong><span style="font-weight: normal;"> approach that&#8217;s rather different from TCP/IP. Netezza views TCP/IP&#8217;s 	strength as recovering gracefully if there&#8217;s congestion. However, 	Netezza would rather do whatever it takes to preclude congestion in 	the first place, except perhaps in rare edge cases. I&#8217;m not aware of 	what enhancements, if any, have been made to Netezza&#8217;s internal 	networking specifically in NPS 6.0.</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">The basic idea of clustered base tables (“base tables” are ones that are not, for example, materialized views) is to </span><span style="font-style: normal;"><strong>range partition in multiple dimensions at once.</strong></span><span style="font-style: normal;"> Then you rule out (as in don&#8217;t retrieve) all those blocks that fail a match in any one of the cluster dimensions.</span><span style="font-style: normal;"><span style="font-weight: normal;"> Netezza says its customers were doing a lot of work to simulate this benefit by multiple sorts; Netezza&#8217;s implementation will now handle that much more automatically. Netezza says that talking to customers revealed that 4-5 cluster dimensions was almost always the most somebody would need; they will ship support for 4. That makes sense. In most cases, you&#8217;d want to cluster on the answers to “W” questions – Who, What, Where, When (but probably not Why), in one dimension each. However, Netezza does call out as an ideal use case geospatial, precisely because 2 (or more rarely 3) dimensions each have “equal weight.”</span></span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;"><span style="font-weight: normal;">I don&#8217;t know how other vendors implement clustered base tables, but in Netezza&#8217;s case it&#8217;s via a </span></span><span style="font-style: normal;"><strong>space-filling curve.</strong></span><span style="font-style: normal;"><span style="font-weight: normal;"> (Actually, they called it a “Hilbert space-filling curve,” but I oppose that phrasing, as it&#8217;s apt to lead to extremely incorrect use of the term “Hilbert space.”) I.e., data is mapped to 4-tuples (say) in line with the dimensions, which are then sorted in a linear order in line with a space-filling curve. Happily, Netezza hasn&#8217;t experienced problems clustering columns that have particularly challenging cardinality or skew.</span></span></p>
<p style="margin-bottom: 0in;">If I understood correctly, you can only zone map (and presumably cluster) on integers and dates right now, but that will change soon. <em>(Edit: In blog comments and email, Tim Greenwood of Netezza explained to me that the NPS 6.0 workarounds to that were much more robust than I realized.)</em></p>
<p style="margin-bottom: 0in;">Netezza put a lot of work for NPS 6 into something it calls <strong>“table grooming,”</strong> which amounts to recopying tables in more beneficial form. Uses for table grooming – which is a manually initiated process – include but probably aren&#8217;t limited to:</p>
<ul>
<li>Clustering tables and, as needed, 	reclustering them.</li>
<li>Getting rid of data that was 	deleted. (Netezza has Postgres-style multiversion concurrency 	control – MVCC – but no time-travel, so keeping around deleted 	data is a waste of space.)</li>
<li>Recompressing data from 	Compress Engine 1 to Compress Engine 2.</li>
<li>Alter Table</li>
</ul>
<p style="margin-bottom: 0in;">The core ideas of table grooming include:</p>
<ul>
<li>The Netezza NPS software copies 	rows from one place to another.</li>
<li>Netezza NPS then updates the 	appropriate metadata.</li>
<li>Metadata updates are 	transactional, even though the actual data movement is not.</li>
</ul>
<p style="margin-bottom: 0in;">This can be done part of a table at a time. Reads and loads are unaffected by the process, or at least not blocked. Delete commits are indeed blocked during a reorg, but Netezza guesses that the blocks hold for a few minutes during the grooming of a clustered base table, 10-15 seconds if space is being reclaimed, and something similar for an Alter Table.</p>
<p style="margin-bottom: 0in;">And finally, here are some notes on Netezza&#8217;s query optimization and planning.</p>
<ul>
<li>Netezza has a traditional 	<strong>cost-based optimizer,</strong> in which all operations have estimated 	costs, measured in microseconds, irrespective of which parts of the 	system (CPU, I/O, network, whatever) they most stress. (I have 	trouble imagining how a cost-based optimizer could work differently 	from that without incurring huge computational costs.)</li>
<li><a href="http://www.dbms2.com/2010/06/21/netezza-silicon-balance/" >Netezza&#8217;s bottleneck is almost 	always disk I/O</a>.</li>
<li>Netezza&#8217;s optimizer is not/no 	longer based on the PostgreSQL optimizer.</li>
<li>Netezza does a lot of <strong>query 	transformation.</strong> Key points include:
<ul>
<li>Netezza joins are usually very 	cheap.</li>
<li>Filtered scans are cheap too.</li>
<li>More expensive in Netezza are data 	redistribution (duh), sorts, and unfiltered scans.</li>
<li>Most expensive of all are 	intermediate result sets that don&#8217;t fit into memory.</li>
</ul>
</li>
<li>Specific examples of Netezza query 	transformation include:
<ul>
<li>Pushing predicates out to nodes.&#8217;</li>
<li>Flattening query trees and 	eliminating subqueries.</li>
<li>Rewriting windowed aggregates to 	be joins + grouped aggregates.</li>
<li>(New in 6.0) Transforming outer 	joins into other kinds.</li>
</ul>
</li>
<li>Netezza does real-time sampling to 	help with query planning. (But this is only worth doing for queries 	that are estimated to be expensive.) Zone maps (and clustering too?) 	are invoked as part of deciding where to sample. Sampling was for 	scans only prior to NPS 6.0, and will now be done for joins as well.</li>
</ul>
<p style="margin-bottom: 0in;"><em><strong>Related links</strong></em></p>
<ul>
<li><a href="http://www.dbms2.com/2010/06/21/notes-on-a-spate-of-netezza-related-blog-posts/" >Notes on this week&#8217;s spate of Netezza-related blog posts</a></li>
<li><a href="http://www.dbms2.com/2010/06/21/netezza-ibm-db2-compression/" >How Netezza (and IBM) do database compression</a></li>
<li><a href="http://www.dbms2.com/2010/06/21/netezza-silicon-balance/" >Netezza&#8217;s silicon balance</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/21/netezza-database-software-technology-overview/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Notes on a spate of Netezza-related blog posts</title>
		<link>http://www.dbms2.com/2010/06/21/notes-on-a-spate-of-netezza-related-blog-posts/</link>
		<comments>http://www.dbms2.com/2010/06/21/notes-on-a-spate-of-netezza-related-blog-posts/#comments</comments>
		<pubDate>Mon, 21 Jun 2010 11:55:33 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Presentations]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2318</guid>
		<description><![CDATA[Fearing that last year&#8217;s tight travel budgets would hamper attendance, Netezza – like a number of other vendors – decided to forgo a traditional user conference. Instead, it took its Enzee Universe show on the road, essentially spreading the conference across eight cities. I was asked to keynote six of the installments.
After the first one, [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Fearing that last year&#8217;s tight travel budgets would hamper attendance, Netezza – like a number of other vendors – decided to forgo a traditional user conference. Instead, it took its Enzee Universe show on the road, essentially spreading the conference across eight cities. I was asked to <a href="http://www.dbms2.com/2009/07/30/netezza-enzee-universe/" >keynote</a> six of the installments.</p>
<p style="margin-bottom: 0in;">After the first one, Netezza Marketing VP Tim Young took me aside for two pieces of constructive criticism. The surprising one* was that he felt I had been INSUFFICIENTLY critical of Netezza. Since then, every other conversation we&#8217;ve had about content creation has also featured ringing reassurances that Tim truly wants independent, non-pandering work.</p>
<p style="margin-bottom: 0in;"><em>*The unsurprising one was that I&#8217;d rushed. Well, duh. After months of telling me I had a 1 hour slot, Netezza cut me to ½ hour a few days beforehand. And my talk had been designed to be high-speed even in the longer time slot … </em></p>
<p style="margin-bottom: 0in;">As a result, I accepted a subsequent gig from Netezza that I would barely consider from most other vendors. Namely, for this year&#8217;s Enzee Universe – <a href="http://www.netezza.com/userconference/agenda.html" onclick="javascript:pageTracker._trackPageview('/www.netezza.com');">June 21-23, aka Monday-Wednesday of this week, at the Westin Waterfront Hotel in Boston</a> – I would do some contemporaneous blogging. The parameters we agreed on included:  <span id="more-2318"></span></p>
<ul>
<li>I would just blog here on <a href="http://www.dbms2.com" >DBMS2</a>, with 	Netezza allowed to reuse posts in their entirety on its site(s).</li>
<li>I also would give a talk on the 	conference&#8217;s last day.</li>
<li>I wouldn&#8217;t say much about 	conference sessions, because:
<ul>
<li>I&#8217;m not a session-attending kind 	of guy. (I wasn&#8217;t particularly good at sitting still in class in 8<sup>th</sup> grade. I haven&#8217;t gotten much better since. And <a href="http://www.strategicmessaging.com/powerpoints/2008/02/02/" onclick="javascript:pageTracker._trackPageview('/www.strategicmessaging.com');">I have a huge 	aversion to other people&#8217;s uninterruptible PowerPoints</a>.)</li>
<li>I think Netezza&#8217;s sessions are 	just as hype-filled as anybody else&#8217;s. (Much as I enjoyed traveling 	around the world with Netezza last year, it was painful hearing Jim 	Baum claim in city after city that Netezza boasts a 50X performance 	advantage vs. the competition.)</li>
</ul>
</li>
<li>Rather, I&#8217;d base things much more 	on individual conversations and meetings.</li>
<li>Because I didn&#8217;t see how 	turnaround time could work otherwise, we&#8217;d have some of those 	meetings beforehand, and others early in the conference.</li>
</ul>
<p style="margin-bottom: 0in;">That last bit didn&#8217;t exactly wholly work out; for the second consecutive year Netezza pulled a surprise schedule switch a few days beforehand. But:</p>
<ul>
<li>I did have extensive, fascinating 	meetings at Netezza&#8217;s offices on Thursday, which were the fodder for 	multiple posts going up today.</li>
<li>I have a nice meeting schedule set 	up for Tuesday.</li>
<li>There should be plenty of 	opportunity for hallway and exhibit-floor conversation as the 	conference progresses.</li>
<li>I even have my own private 	conference room, with a lovely name (the “Paine Room”).</li>
</ul>
<p style="margin-bottom: 0in;">So far as I know, the rest of the plan is still operative.</p>
<p style="margin-bottom: 0in;">Posts already written as I draft this one include:</p>
<ul>
<li><a href="http://www.dbms2.com/2010/06/21/netezza-database-software-technology-overview/" >A long discussion of Netezza&#8217;s 	technology, focusing on the database parts</a></li>
<li><a href="http://www.dbms2.com/2010/06/21/netezza-ibm-db2-compression/" >A discussion of Netezza&#8217;s and 	IBM&#8217;s compression strategies</a></li>
<li><a href="http://www.dbms2.com/2010/06/21/netezza-silicon-balance/" >Notes on how Netezza balances 	its silicon and uses its FPGAs</a></li>
<li><a href="http://www.dbms2.com/2010/06/21/data-warehouse-load-latency/" >A quickie on data warehouse 	loading latency</a></li>
</ul>
<p style="margin-bottom: 0in;">I still need to write one focusing on Netezza&#8217;s advanced analytics strategy, and plan to edit in a link to it when it&#8217;s up.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/21/notes-on-a-spate-of-netezza-related-blog-posts/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Best practices for analytic DBMS POCs</title>
		<link>http://www.dbms2.com/2010/06/14/best-practices-analytic-database-poc/</link>
		<comments>http://www.dbms2.com/2010/06/14/best-practices-analytic-database-poc/#comments</comments>
		<pubDate>Mon, 14 Jun 2010 12:53:33 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2297</guid>
		<description><![CDATA[When you are selecting an analytic DBMS or appliance, most of the evaluation boils down to two questions:

How quickly 	and cost-effectively does it execute SQL?
What 	analytic functionality, SQL or otherwise, does it do a good job of 	executing?

And so, in undertaking such a selection, you need to start by addressing three issues:

What 	does “speed” mean [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">When you are selecting an analytic DBMS or appliance, most of the evaluation boils down to two questions:</p>
<ul>
<li>How q<span style="font-style: normal;">uickly 	and cost-effectively does it execute SQL?</span></li>
<li><span style="font-style: normal;">What 	analytic functionality, SQL or otherwise, does it do a good job of 	executing?</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">And so, in undertaking such a selection, you need to start by addressing three issues:</span></p>
<ul>
<li><a href="../2009/09/10/analytic-speed-latency/">What 	does “speed” mean to you</a>?</li>
<li>What does “cost” mean to you?</li>
<li>What analytic functionality do you 	need anyway?</li>
</ul>
<p style="margin-bottom: 0in;"><span id="more-2297"></span>Key elements of cost* include:</p>
<ul>
<li>Software license and maintenance</li>
<li>Hardware purchase cost, 	maintenance, electric power, and computer room burden</li>
<li>Database and system administration</li>
<li>(For some uses cases) Programming</li>
</ul>
<p style="margin-bottom: 0in;"><em>*Assuming a classical in-house IT shop, where products are typically bought rather than leased/rented. With outsourced and/or monthly-fee structures, the details change but the principles remain the same.</em></p>
<p style="margin-bottom: 0in;"><em></em>Most of that can be evaluated pretty well via a spreadsheet, although things can get a bit tricky when you get to people costs, which are a large fraction of the whole. In particular, different analytic DBMS product suites have great, high-performance support for different (and often rapidly growing) sets of functionality – basic and advanced SQL, statistics, and more. Figuring out which ones will be best for your programmers, and how significant the differences are &#8212; well, that&#8217;s a lot like any other programming language evaluation, and those are rarely neat or clean-cut.</p>
<p style="margin-bottom: 0in; font-style: normal;">But when it comes to evaluating speed, <strong>there&#8217;s no substitute for a well-designed proof of concept (POC).</strong> Many analytic DBMS and appliance vendors are happy to let you do a POC, on your own premises (or remotely if you prefer), under your control, at no cost to you. And that&#8217;s great. <strong>It is crucial that a POC be run either by you, by a consultant* answerable to you,</strong><span style="font-weight: normal;"> or – if you decide the vendor must run it for you – at least </span><strong>with you watching every step of the way</strong><span style="font-weight: normal;"> and knowing exactly what is being done. Applianc</span>e vendors do find it cheaper to run POCs on their own premises, so a certain reluctance to ship you a box is understandable. But <strong>make no compromises about the transparency of a POC, or about your control of exactly what it is that gets tested.</strong></p>
<p style="margin-bottom: 0in;"><em>*Since I sell <a href="http://www.monash.com/adviseusers.html" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">consulting services</a> for users evaluating analytic DBMS, I naturally am biased to think that consultants can be very useful in the process. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  But whether you should use them a little (sanity check), a medium amount (work with you through the process), or heavily (actually drive the process for you and/or execute the POCs) is very dependent upon your specific situation.</em></p>
<p style="margin-bottom: 0in; font-style: normal;">So far as I&#8217;ve been able to tell:</p>
<ul>
<li><span style="font-style: normal;">Netezza 	loves to ship boxes to prospects for POCs, and have them set up the 	boxes and do POCs themselves. That&#8217;s a big reason why <a href="../2009/02/18/the-netezza-guys-propose-a-poc-checklist/">Netezza 	wants to call attention to this subject</a>.</span></li>
<li><span style="font-style: normal;">Oracle 	has generally been pretty <a href="../2009/02/01/oracle-says-they-do-onsite-exadata-pocs-after-all/">reluctant 	to ship Exadata boxes out for POCs</a>. That&#8217;s the other reason 	Netezza wants to call attention to the issue. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </span></li>
<li><span style="font-style: normal;">Open 	source vendors make it easy for you to download and test at least 	their community editions.</span></li>
<li><span style="font-style: normal;">Vertica 	makes it pretty easy for you to test its software too (download or 	cloud).</span></li>
<li><span style="font-style: normal;">ParAccel 	has generally insisted on running POCs itself, although it will do 	so on your premises if you insist.</span></li>
<li><span style="font-style: normal;">Teradata 	naturally tries to do POCs on its own premises, but doesn&#8217;t insist 	too hard.<em> (Edit: Randy Lea of Teradata says that Teradata is now doing over half its POCs onsite.)</em><br />
</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">Most of the criticisms I&#8217;ve heard of vendors&#8217; POC practices have been directed at Oracle or ParAccel.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">For most POCs, it&#8217;s a good conceptual template to </span><span style="font-style: normal;"><strong>form and then test a hypothesis</strong></span><span style="font-style: normal;"> to the effect of:</span></p>
<ul>
<li><span style="font-style: normal;">For 	a given technology product assemblage (brand of DBMS, number of 	nodes, etc.), and</span></li>
<li><span style="font-style: normal;">For 	a given level of human effort (e.g., administrative effort), you can</span></li>
<li><span style="font-style: normal;">Run 	a given a workload, with</span></li>
<li><span style="font-style: normal;">Satisfactory 	and satisfactorily consistent response times</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">Sometimes absolute throughput and price/performance are important </span><em>secondary</em><span style="font-style: normal;"> considerations; sometimes they&#8217;re less germane. But either way, it&#8217;s almost always right to focus </span><em>primarily</em><span style="font-style: normal;"> on the questions of </span><span style="font-style: normal;"><strong>“What do I want this system to do?”</strong></span><span style="font-style: normal;"> and </span><span style="font-style: normal;"><strong>“What do I think we&#8217;re going to have to invest in it?</strong></span><span style="font-style: normal;">” By way of contrast, it&#8217;s often misleading to focus too much on questions like “<a href="../2008/11/19/data-warehouse-proof-of-concept-pocs/">What&#8217;s the one number that best describes the performance of this system?</a>” &#8212; even if you customize that calculation for your environment – or, even worse, “How much speed-up can I get on my single worst <a href="../2008/11/15/query-from-hell/">Query from Hell</a>?” </span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">The fundamental rule of POC construction is: </span><span style="font-style: normal;"><strong>Model your entire use case as best you can.</strong></span><span style="font-style: normal;"> That means you need to consider, at a minimum:</span></p>
<ul>
<li><span style="font-style: normal;">Your 	whole concurrent query, other analytic, and low-latency update 	workload (peak).</span></li>
<li><span style="font-style: normal;">Your 	whole query, analytic, load, backup, and maintenance workload 	(ongoing).</span></li>
<li><span style="font-style: normal;"><a href="../2008/12/14/the-%E2%80%9Cbaseball-bat%E2%80%9D-test-for-analytic-dbms-and-data-warehouse-appliances/">Partial-failure 	scenarios</a>.</span></li>
<li><span style="font-style: normal;">Your 	core SLAs (Service-Level Agreements).</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">Of course, that&#8217;s not as easy as it sounds. Presumably, the main reason you&#8217;re getting a new analytic DBMS is that you want to do new kinds of analysis. By the very nature of analytics, you won&#8217;t know what analytic operations are most useful until you try them out and see what their results are. On the other hand – if you haven&#8217;t done considerable thinking about how you&#8217;re going to use your new analytic database, how did you ever get funding for the project in the first place? <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">Seriously, I could write multiple posts, each as long as this one (but more application-oriented), about how to upgrade your analytic capabilities (and which fool&#8217;s gold to avoid). But this has gotten pretty long already, so for now I&#8217;ll just stop here.</span></p>
<p style="margin-bottom: 0in;"><em>Note: My clients at Netezza asked me to write something short about POCs they could use as a kind of foreword to some collateral, where by &#8220;short&#8221; they meant single-paragraph or something like that. They&#8217;re great clients, so I said yes, under the condition I could also use it as a blog post. Except … this post didn&#8217;t turn out to be nearly as short as they envisioned. Oops. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </em></p>
<p style="margin-bottom: 0in;"><em><strong>Related links</strong></em></p>
<ul>
<li>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">My 	February, 2009 <a href="../2009/02/25/even-more-final-version-of-my-tdwi-slide-deck/">slide 	deck on how to select an analytic DBMS</a> is in many parts still 	pretty current</span></p>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/14/best-practices-analytic-database-poc/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
