<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Teradata</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/teradata/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 02 Sep 2010 09:06:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Teradata&#8217;s future product strategy</title>
		<link>http://www.dbms2.com/2010/08/12/teradata-future-product-strategy/</link>
		<comments>http://www.dbms2.com/2010/08/12/teradata-future-product-strategy/#comments</comments>
		<pubDate>Thu, 12 Aug 2010 10:37:14 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Kickfire]]></category>
		<category><![CDATA[Microstrategy]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2769</guid>
		<description><![CDATA[I think Teradata&#8217;s future product strategy is coming into focus. I&#8217;ll start by outlining some particular aspects, and then show how I think it all ties together.

The immediate hook here is that I had a short conversation with Scott Gnau of Teradata yesterday, triggered by Teradata&#8217;s acquisition of Kickfire&#8217;s assets. Takeaways from that part included:

The [...]]]></description>
			<content:encoded><![CDATA[<p>I think Teradata&#8217;s future product strategy is coming into focus. I&#8217;ll start by outlining some particular aspects, and then show how I think it all ties together.<br />
<span id="more-2769"></span></p>
<p style="margin-bottom: 0in;">The immediate hook here is that I had a short conversation with Scott Gnau of Teradata yesterday, triggered by <a href="../2010/07/27/kickfire-unlikely-to-survive/">Teradata&#8217;s acquisition of Kickfire&#8217;s assets</a>. Takeaways from that part included:</p>
<ul>
<li>The acquisition is all about 	Kickfire&#8217;s <a href="../2009/08/21/kickfires-fpga-based-technical-strategy/">data 	pipelining</a> technology.</li>
<li>Scott (in my opinion rightly) 	thinks that isn&#8217;t particularly tied to Kickfire&#8217;s choice of 	particular DBMS architecture (fairly vanilla columnar).</li>
<li>No decision has been made about 	whether the right vehicle for this technology is an FPGA (Field 	Programmable Gate Array), conventional Intel CPU, RAM, etc.</li>
</ul>
<p style="margin-bottom: 0in;"><em>If you want to handicap Teradata&#8217;s future data pipelining strategy, you might note that:</em></p>
<ul>
<li><em>Kickfire&#8217;s own choice – and 	hence its existing implementation – is an FPGA.</em></li>
<li><em><a href="../2009/08/04/vectorwise-ingres-and-monetdb/">VectorWise&#8217;s 	approach to pipelining is Intel-based,</a> apparently at the cost of 	being closely tied to specific generations of Intel CPUs.</em></li>
<li><em><a href="../2009/07/27/xtremedata-announces-its-dbx-data-warehouse-appliance/">XtremeData&#8217;s 	approach to pipelining</a> is FPGA-based.</em></li>
<li><em>Teradata has a lot more 	development resources than any of those other companies, as well as 	important existing products, and hence has both means and motive to 	shoehorn new technology into older system designs.</em></li>
</ul>
<p style="margin-bottom: 0in;">While I had Scott on the phone, I brought up a few other subjects too. Highlights included:</p>
<ul>
<li>Teradata&#8217;s Flash-based appliance 	is doing just fine in beta test and customer POCs (Proofs of 	Concept).</li>
<li>Other kinds of Teradata appliance 	are not inconceivable.</li>
<li>Scott thinks <a href="http://www.dbms2.com/2010/07/31/teradata-xkoto-gridscale-rip-and-active-active-clustering/" >Michael McIntire&#8217;s 	condemnation of Active-Active architectures</a> is overstated. That 	said,
<ul>
<li>Scott does acknowledge a need for 	greater Active-Active scalability, and suggests that the reason 	Xkoto&#8217;s current products are being discontinued is their lack of 	scaling.</li>
<li>Scott seems quietly confident the 	scaling will get done.</li>
</ul>
</li>
<li>Scott is emphatic that Teradata is 	not going to go to <a href="../2009/04/20/calpont-update-you-read-it-here-first/">a 	two-tier architecture</a>. In particular, the point of splitting 	storage/lightweight database processing and heavyweight database 	processing on separate tiers is generally to save bandwidth, and 	Teradata&#8217;s BYNET is typically less than 10% loaded.</li>
<li>Scott didn&#8217;t dispute my claim that 	this all suggests <a href="../2008/10/14/teradata-virtual-storage/">Teradata 	Virtual Storage</a> is the future, at the expense of a rigid 	delineation among <a href="../2008/10/23/teradata-appliance-product-lines/">specific 	use-case-focused product lines</a>.</li>
<li>Unlike <a href="http://www.dbms2.com/2010/02/22/netezza-twinfin/" >Netezza</a> or <a href="http://www.dbms2.com/2010/02/22/aster-data-ncluster-4-5/" >Aster</a>, Teradata doesn&#8217;t seem to plan analytic capability that works outside 	the UDF (User Defined Function) framework. However, Scott noted that 	Teradata has long had the capability that Aster and Netezza now also 	have of letting you run analytic code either in “protected mode” 	(if the process fails the whole database doesn&#8217;t crash) or in the 	database kernel (best performance, if you&#8217;re sufficiently confident 	in the code&#8217;s stability to take the risk). Scott also spoke of the 	release later this quarter of Teradata FastPath, which will offer 	yet better performance (however, there&#8217;s a gotcha to Teradata 	FastPath that&#8217;s still NDA).</li>
</ul>
<p style="margin-bottom: 0in;">Putting all that together with the rest of what we know about Teradata, I&#8217;m going to call out<strong> three pillars of Teradata&#8217;s long-term product strategy:</strong></p>
<ul>
<li><strong>Same fundamentals as always.</strong> Teradata&#8217;s core product strategy is:
<ul>
<li>Single DBMS, capable of meeting 	all analytic needs while running in a single instance, usually 	running on &#8230;</li>
<li>… proprietary hardware …</li>
<li>… built from 	conservatively-chosen parts.</li>
</ul>
</li>
<li><strong>Selective vertical application 	stack.</strong> No matter how horizontally-oriented they are, many 	companies that have been in the analytic technology business for a 	while wind up with some vertical applications. It sort of just 	happens. Teradata is no exception. Teradata also likes to sell 	services to its product customers, and some of those are quite 	vertical-aware.</li>
<li><strong>Mutable, modular platform.</strong> This is what I highlighted above. Note that it&#8217;s philosophically 	attuned with the one-system-does-everything approach Teradata 	prefers. More subtly, please also note that it goes well with 	customer-by-customer price customization, which is almost a must for 	Teradata given the Innovator&#8217;s Dilemma kind of pricing box it finds 	itself in.</li>
</ul>
<p style="margin-bottom: 0in;">So far, that&#8217;s not too exciting, except in the details of how Teradata&#8217;s engineers make that all work. But there&#8217;s a <strong>fourth pillar to Teradata&#8217;s technical strategy</strong> as well, and it&#8217;s a wild card: t<strong>ight partnerships.</strong> Every time I talk with Teradata hardware chief Carson Schmidt, he seems excited about some particular version of a part or other – sometimes from a reasonably established vendor (once it was LSI Logic), sometimes from a tiny one (notably <a href="../2009/10/25/teradata-hardware-strategy-and-tactics/">the “stealth” start-up on which Teradata bet its first solid-state product</a>.) In the future, I expect tight business intelligence partnerships as well. Cognos BI will be increasingly integrated with IBM&#8217;s DBMS and hardware; Business Objects&#8217; BI will increasingly be integrated with SAP&#8217;s applications; and Oracle&#8217;s BI will eventually be integrated with everything. How do you compete with that if you<span style="font-style: normal;">&#8216;re Microstrategy? </span>Well, you try to have superior product, of course – but you also partner as closely with DBMS vendors as you can, an approach Microstrategy has already started. Predictive analytics stalwart <a href="http://www.dbms2.com/2010/05/15/further-clarifying-in-database-mpp-sas/" >SAS</a>, of course, is on a partnership binge as well.</p>
<p style="margin-bottom: 0in;">Teradata has a larger installed base than almost all its competitors, and enjoys richer third-party software and service support as a result. But I suspect that going forward,  for Teradata to remain a leading competitor at price points it is willing to accept, Teradata&#8217;s “ecosystem” advantages will need to ratchet up one or several notches.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/12/teradata-future-product-strategy/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Teradata, Xkoto Gridscale (RIP), and active-active clustering</title>
		<link>http://www.dbms2.com/2010/07/31/teradata-xkoto-gridscale-rip-and-active-active-clustering/</link>
		<comments>http://www.dbms2.com/2010/07/31/teradata-xkoto-gridscale-rip-and-active-active-clustering/#comments</comments>
		<pubDate>Sat, 31 Jul 2010 08:23:57 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Continuent]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Xkoto]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2708</guid>
		<description><![CDATA[Having gotten a number of questions about Teradata&#8217;s acquisition of Xkoto, I leaned on Teradata for an update, and eventually connected with Scott Gnau. Takeaways included:

Teradata is discontinuing  Xkoto&#8217;s existing product Gridscale, which 	Scott characterized as being too OLTP-focused to be a good fit for 	Teradata. Teradata hopes and expects that existing Xkoto Gridscale [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Having gotten a number of questions about Teradata&#8217;s acquisition of Xkoto, I leaned on Teradata for an update, and eventually connected with Scott Gnau. Takeaways included:</p>
<ul>
<li>Teradata is discontinuing <a href="http://www.dbms2.com/2009/09/11/xkoto-gridscale-highlights/" > </a><a href="http://www.dbms2.com/2009/09/11/xkoto-gridscale-highlights/" >Xkoto&#8217;s existing product Gridscale</a>, <span style="font-style: normal;">which 	Scott characterized as being too OLTP-focused to be a good fit for 	Teradata. Teradata hopes and expects that existing Xkoto Gridscale 	customers won&#8217;t renew maintenance. (I&#8217;m not sure</span> that they&#8217;ll 	even get the option to do so.)</li>
<li>The point of Teradata&#8217;s technology 	+ engineers acquisition of Xkoto is to enhance Teradata&#8217;s 	active-active or multi-active data warehousing capabilities, which 	it has had in some form for several years.</li>
<li>In particular, Teradata wants to 	tie together different products in the Teradata product line. (Note: 	Those typically all run pretty much the same Teradata database 	management software, except insofar as they might be on different 	releases.)</li>
<li>Scott rattled off all the 	plausible areas of enhancement, with multiple phrasings – 	performance, manageability, ease of use, tools, features, etc.</li>
<li>Teradata plans to have one or two 	releases based on Xkoto technology in 2011.</li>
</ul>
<p style="margin-bottom: 0in;">Frankly, I&#8217;m disappointed at the struggles of clustering efforts such as Xkoto Gridscale or <a href="http://www.dbms2.com/2009/09/03/continuent-on-clustering/" >Continuent&#8217;s pre-Tungsten products</a>, but if the DBMS vendors meet the same needs themselves, that&#8217;s OK too.</p>
<p style="margin-bottom: 0in;">The logic behind active-active database implementations actually seems pretty compelling:  <span id="more-2708"></span></p>
<ul>
<li>You may well be keeping a second 	copy of your database for high availability/hot standby.</li>
<li>You might even be keeping a third 	copy for off-site disaster recovery.</li>
<li>In some cases, you might have 	reasons beyond disaster recovery to distribute a database around the 	world.</li>
<li>So why not allow queries to be run 	against all the copies?</li>
<li>And by the way, splitting the 	workload up a bit by kinds (e.g., long-running vs. short query) 	might let you optimize the implementation of each copy of the 	database. (This last point becomes even more important with the rise 	of solid-state memory.)</li>
</ul>
<p style="margin-bottom: 0in;">Analytic DBMS vendors pretty much all need to offer this. (Possible exception: If they have a data-mart-only positioning so extreme that customers will never care about any form of failover.) That said, I must confess to not having done a good job of tracking who does or doesn&#8217;t have which features in this area to date; informative comments to this post in that regard would be much appreciated!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/07/31/teradata-xkoto-gridscale-rip-and-active-active-clustering/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Kickfire unlikely to survive</title>
		<link>http://www.dbms2.com/2010/07/27/kickfire-unlikely-to-survive/</link>
		<comments>http://www.dbms2.com/2010/07/27/kickfire-unlikely-to-survive/#comments</comments>
		<pubDate>Tue, 27 Jul 2010 18:56:48 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Kickfire]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2658</guid>
		<description><![CDATA[Following up on a previous report of Kickfire&#8217;s troubles &#8212; a Kickfire customer tipped me off that Kickfire told him they&#8217;re selling their IP and engineers, and the Kickfire products will be discontinued.
At this time, I have no idea who the lucky buyer is.
Edit: We now know it&#8217;s Teradata.
]]></description>
			<content:encoded><![CDATA[<p>Following up on a previous report of <a href="http://www.dbms2.com/2010/06/11/kickfire-update-2/" >Kickfire&#8217;s troubles</a> &#8212; a Kickfire customer tipped me off that Kickfire told him they&#8217;re selling their IP and engineers, and the Kickfire products will be discontinued.</p>
<p>At this time, I have no idea who the lucky buyer is.</p>
<p><em>Edit: We now know it&#8217;s Teradata.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/07/27/kickfire-unlikely-to-survive/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Best practices for analytic DBMS POCs</title>
		<link>http://www.dbms2.com/2010/06/14/best-practices-analytic-database-poc/</link>
		<comments>http://www.dbms2.com/2010/06/14/best-practices-analytic-database-poc/#comments</comments>
		<pubDate>Mon, 14 Jun 2010 12:53:33 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2297</guid>
		<description><![CDATA[When you are selecting an analytic DBMS or appliance, most of the evaluation boils down to two questions:

How quickly 	and cost-effectively does it execute SQL?
What 	analytic functionality, SQL or otherwise, does it do a good job of 	executing?

And so, in undertaking such a selection, you need to start by addressing three issues:

What 	does “speed” mean [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">When you are selecting an analytic DBMS or appliance, most of the evaluation boils down to two questions:</p>
<ul>
<li>How q<span style="font-style: normal;">uickly 	and cost-effectively does it execute SQL?</span></li>
<li><span style="font-style: normal;">What 	analytic functionality, SQL or otherwise, does it do a good job of 	executing?</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">And so, in undertaking such a selection, you need to start by addressing three issues:</span></p>
<ul>
<li><a href="../2009/09/10/analytic-speed-latency/">What 	does “speed” mean to you</a>?</li>
<li>What does “cost” mean to you?</li>
<li>What analytic functionality do you 	need anyway?</li>
</ul>
<p style="margin-bottom: 0in;"><span id="more-2297"></span>Key elements of cost* include:</p>
<ul>
<li>Software license and maintenance</li>
<li>Hardware purchase cost, 	maintenance, electric power, and computer room burden</li>
<li>Database and system administration</li>
<li>(For some uses cases) Programming</li>
</ul>
<p style="margin-bottom: 0in;"><em>*Assuming a classical in-house IT shop, where products are typically bought rather than leased/rented. With outsourced and/or monthly-fee structures, the details change but the principles remain the same.</em></p>
<p style="margin-bottom: 0in;"><em></em>Most of that can be evaluated pretty well via a spreadsheet, although things can get a bit tricky when you get to people costs, which are a large fraction of the whole. In particular, different analytic DBMS product suites have great, high-performance support for different (and often rapidly growing) sets of functionality – basic and advanced SQL, statistics, and more. Figuring out which ones will be best for your programmers, and how significant the differences are &#8212; well, that&#8217;s a lot like any other programming language evaluation, and those are rarely neat or clean-cut.</p>
<p style="margin-bottom: 0in; font-style: normal;">But when it comes to evaluating speed, <strong>there&#8217;s no substitute for a well-designed proof of concept (POC).</strong> Many analytic DBMS and appliance vendors are happy to let you do a POC, on your own premises (or remotely if you prefer), under your control, at no cost to you. And that&#8217;s great. <strong>It is crucial that a POC be run either by you, by a consultant* answerable to you,</strong><span style="font-weight: normal;"> or – if you decide the vendor must run it for you – at least </span><strong>with you watching every step of the way</strong><span style="font-weight: normal;"> and knowing exactly what is being done. Applianc</span>e vendors do find it cheaper to run POCs on their own premises, so a certain reluctance to ship you a box is understandable. But <strong>make no compromises about the transparency of a POC, or about your control of exactly what it is that gets tested.</strong></p>
<p style="margin-bottom: 0in;"><em>*Since I sell <a href="http://www.monash.com/adviseusers.html" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">consulting services</a> for users evaluating analytic DBMS, I naturally am biased to think that consultants can be very useful in the process. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  But whether you should use them a little (sanity check), a medium amount (work with you through the process), or heavily (actually drive the process for you and/or execute the POCs) is very dependent upon your specific situation.</em></p>
<p style="margin-bottom: 0in; font-style: normal;">So far as I&#8217;ve been able to tell:</p>
<ul>
<li><span style="font-style: normal;">Netezza 	loves to ship boxes to prospects for POCs, and have them set up the 	boxes and do POCs themselves. That&#8217;s a big reason why <a href="../2009/02/18/the-netezza-guys-propose-a-poc-checklist/">Netezza 	wants to call attention to this subject</a>.</span></li>
<li><span style="font-style: normal;">Oracle 	has generally been pretty <a href="../2009/02/01/oracle-says-they-do-onsite-exadata-pocs-after-all/">reluctant 	to ship Exadata boxes out for POCs</a>. That&#8217;s the other reason 	Netezza wants to call attention to the issue. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </span></li>
<li><span style="font-style: normal;">Open 	source vendors make it easy for you to download and test at least 	their community editions.</span></li>
<li><span style="font-style: normal;">Vertica 	makes it pretty easy for you to test its software too (download or 	cloud).</span></li>
<li><span style="font-style: normal;">ParAccel 	has generally insisted on running POCs itself, although it will do 	so on your premises if you insist.</span></li>
<li><span style="font-style: normal;">Teradata 	naturally tries to do POCs on its own premises, but doesn&#8217;t insist 	too hard.<em> (Edit: Randy Lea of Teradata says that Teradata is now doing over half its POCs onsite.)</em><br />
</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">Most of the criticisms I&#8217;ve heard of vendors&#8217; POC practices have been directed at Oracle or ParAccel.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">For most POCs, it&#8217;s a good conceptual template to </span><span style="font-style: normal;"><strong>form and then test a hypothesis</strong></span><span style="font-style: normal;"> to the effect of:</span></p>
<ul>
<li><span style="font-style: normal;">For 	a given technology product assemblage (brand of DBMS, number of 	nodes, etc.), and</span></li>
<li><span style="font-style: normal;">For 	a given level of human effort (e.g., administrative effort), you can</span></li>
<li><span style="font-style: normal;">Run 	a given a workload, with</span></li>
<li><span style="font-style: normal;">Satisfactory 	and satisfactorily consistent response times</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">Sometimes absolute throughput and price/performance are important </span><em>secondary</em><span style="font-style: normal;"> considerations; sometimes they&#8217;re less germane. But either way, it&#8217;s almost always right to focus </span><em>primarily</em><span style="font-style: normal;"> on the questions of </span><span style="font-style: normal;"><strong>“What do I want this system to do?”</strong></span><span style="font-style: normal;"> and </span><span style="font-style: normal;"><strong>“What do I think we&#8217;re going to have to invest in it?</strong></span><span style="font-style: normal;">” By way of contrast, it&#8217;s often misleading to focus too much on questions like “<a href="../2008/11/19/data-warehouse-proof-of-concept-pocs/">What&#8217;s the one number that best describes the performance of this system?</a>” &#8212; even if you customize that calculation for your environment – or, even worse, “How much speed-up can I get on my single worst <a href="../2008/11/15/query-from-hell/">Query from Hell</a>?” </span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">The fundamental rule of POC construction is: </span><span style="font-style: normal;"><strong>Model your entire use case as best you can.</strong></span><span style="font-style: normal;"> That means you need to consider, at a minimum:</span></p>
<ul>
<li><span style="font-style: normal;">Your 	whole concurrent query, other analytic, and low-latency update 	workload (peak).</span></li>
<li><span style="font-style: normal;">Your 	whole query, analytic, load, backup, and maintenance workload 	(ongoing).</span></li>
<li><span style="font-style: normal;"><a href="../2008/12/14/the-%E2%80%9Cbaseball-bat%E2%80%9D-test-for-analytic-dbms-and-data-warehouse-appliances/">Partial-failure 	scenarios</a>.</span></li>
<li><span style="font-style: normal;">Your 	core SLAs (Service-Level Agreements).</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">Of course, that&#8217;s not as easy as it sounds. Presumably, the main reason you&#8217;re getting a new analytic DBMS is that you want to do new kinds of analysis. By the very nature of analytics, you won&#8217;t know what analytic operations are most useful until you try them out and see what their results are. On the other hand – if you haven&#8217;t done considerable thinking about how you&#8217;re going to use your new analytic database, how did you ever get funding for the project in the first place? <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">Seriously, I could write multiple posts, each as long as this one (but more application-oriented), about how to upgrade your analytic capabilities (and which fool&#8217;s gold to avoid). But this has gotten pretty long already, so for now I&#8217;ll just stop here.</span></p>
<p style="margin-bottom: 0in;"><em>Note: My clients at Netezza asked me to write something short about POCs they could use as a kind of foreword to some collateral, where by &#8220;short&#8221; they meant single-paragraph or something like that. They&#8217;re great clients, so I said yes, under the condition I could also use it as a blog post. Except … this post didn&#8217;t turn out to be nearly as short as they envisioned. Oops. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </em></p>
<p style="margin-bottom: 0in;"><em><strong>Related links</strong></em></p>
<ul>
<li>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">My 	February, 2009 <a href="../2009/02/25/even-more-final-version-of-my-tdwi-slide-deck/">slide 	deck on how to select an analytic DBMS</a> is in many parts still 	pretty current</span></p>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/14/best-practices-analytic-database-poc/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Clarifying the state of MPP in-database SAS</title>
		<link>http://www.dbms2.com/2010/05/07/in-database-sas-teradata-netezza-aster/</link>
		<comments>http://www.dbms2.com/2010/05/07/in-database-sas-teradata-netezza-aster/#comments</comments>
		<pubDate>Fri, 07 May 2010 06:23:49 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[SAS Institute]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2061</guid>
		<description><![CDATA[I routinely am briefed way in advance of products&#8217; introductions. For that reason and others, it can be hard for me to keep straight what&#8217;s been officially announced, introduced for test, introduced for general availability, vaguely planned for the indefinite future, and so on. Perhaps nothing has confused me more in that regard than the [...]]]></description>
			<content:encoded><![CDATA[<p>I routinely am briefed way in advance of products&#8217; introductions. For that reason and others, it can be hard for me to keep straight what&#8217;s been officially announced, introduced for test, introduced for general availability, vaguely planned for the indefinite future, and so on. Perhaps nothing has confused me more in that regard than the SAS Institute&#8217;s multi-year effort to get SAS integrated into various MPP DBMS, specifically <a href="http://www.dbms2.com/2009/08/02/teradata-13-focuses-on-advanced-analytic-performance/" >Teradata</a>, <a href="http://www.dbms2.com/2010/02/22/netezza-twinfin/" >Netezza Twinfin(i)</a>, and <a href="http://www.dbms2.com/2010/02/22/aster-data-ncluster-4-5/" >Aster Data nCluster</a>.</p>
<p>However, I chatted briefly Thursday with Michelle Wilkie, who is the SAS product manager overseeing all this (and also some other stuff, like SAS running on grids without being integrated into a DBMS). As best I understood, the story is:<span id="more-2061"></span></p>
<ul>
<li>On <strong>Teradata,</strong> SAS is shipping in-database scoring today. SAS also is shipping a limited amount of in-database modeling on Teradata, the count recently having gone up from 4 &#8220;procs&#8221; to 10.</li>
<li>On <strong>Netezza Twinfin(i),</strong> SAS is shipping in-database scoring, and this was recently announced. I can&#8217;t actually find much evidence of this announcement by searching the Web or the SAS website, but Michelle was pretty clear on the point even so.  Further confusing matters, <a href="http://www.sas.com/technologies/analytics/datamining/scoring_acceleration/" onclick="javascript:pageTracker._trackPageview('/www.sas.com');">SAS&#8217; website</a> seems to say in-database scoring is supported on Netezza&#8217;s old generation of products but not its latest one, even though SAS CTO Keith Collins told me <a href="http://www.dbms2.com/2009/09/03/sas-on-netezza-and-other-netezza-extensibility/" >exactly the opposite</a> would be true.</li>
<li>On <strong>Aster Data nCluster,</strong> SAS will ship in-database scoring by the end of 2010. If I understood correctly, this will be for &#8220;limited&#8221; rather than &#8220;general&#8221; availability, but Michelle framed that as a distinction without a difference. I.e., if you want to buy in-database SAS scoring on Aster nCluster, you&#8217;ll be able to.</li>
<li>(More) in-database SAS modeling is expected on all of Teradata, Netezza Twinfin(i), and Aster Data nCluster in the vague future. (The concept of 2011/2012 came into play.)</li>
<li>SAS/Teradata integration, developed first, involved more hand-coding. SAS has subsequently developed some kind of a more general parallelism/in-database capability, akin to what it has in the DBMS-less grid, that either is or isn&#8217;t a good match for DBMS vendors&#8217; native way of supporting parallel processing. (Obviously, I&#8217;m still pretty unclear on this part.)</li>
<li>SAS technology is a good fit for Aster Data&#8217;s MapReduce-centric way of doing parallelism.</li>
</ul>
<p>I also took the opportunity to ask Michelle a question I&#8217;ve had a heck of a time getting answered: <strong>What&#8217;s the big-deal about in-database data mining scoring anyway?</strong> After all, the most common form of in-database data mining scoring is just to take a weighted sum of specific fields in a row, where the weights are the regression coefficients. You can do that in generic SQL, with performance that superficially should be at least as good as that for any alternative strategy. Michelle&#8217;s answers seemed to be twofold:</p>
<ul>
<li><strong>There are other kinds of scoring too</strong> &#8212; neural networks, etc.</li>
<li><strong>Coding the scoring in SQL isn&#8217;t that easy. </strong>Michelle gave the example of a specific user (default Netezza reference account, with initials resembling mine) that spent 400 hours writing and testing something you now get for free with SAS/Netezza integration.</li>
</ul>
<p><em>Edit: In response to this post, SAS wrote in with <a href="http://www.dbms2.com/2010/05/15/further-clarifying-in-database-mpp-sas/" >further clarification about </a></em><em><a href="http://www.dbms2.com/2010/05/15/further-clarifying-in-database-mpp-sas/" >in-database and/or MPP SAS</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/07/in-database-sas-teradata-netezza-aster/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Is the enterprise data warehouse a myth?</title>
		<link>http://www.dbms2.com/2010/04/12/enterprise-data-warehouse-edw-myt/</link>
		<comments>http://www.dbms2.com/2010/04/12/enterprise-data-warehouse-edw-myt/#comments</comments>
		<pubDate>Mon, 12 Apr 2010 11:52:02 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1883</guid>
		<description><![CDATA[An enterprise data warehouse should:

Manage data 	to high standards of accuracy, consistency, cleanliness, 	clarity, and security.
Manage all the data in your 	organization.

Pick ONE.
There&#8217;s little to dislike in the enterprise data warehouse dream, as represented (for example) in this 2004 Teradata Magazine article. But in a world where ever more data comes in from ever more [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">An <strong>enterprise data warehouse</strong> should:</p>
<ul>
<li>Manage da<span style="font-weight: normal;">ta 	to high standards of </span><strong>accuracy, consistency, cleanliness, 	clarity, and security.</strong></li>
<li>Manage <strong>all the data in your 	organization.</strong></li>
</ul>
<p style="margin-bottom: 0in;"><strong>Pick ONE.<span id="more-1883"></span></strong></p>
<p style="margin-bottom: 0in;">There&#8217;s little to dislike in the enterprise data warehouse dream, as represented (for exam<span style="font-style: normal;">ple) in this <a href="http://www.teradata.com/library/pdf/TD_Mag_1Q_2004_Insert.pdf" onclick="javascript:pageTracker._trackPageview('/www.teradata.com');">2004 </a></span><a href="http://www.teradata.com/library/pdf/TD_Mag_1Q_2004_Insert.pdf" onclick="javascript:pageTracker._trackPageview('/www.teradata.com');"><em>Teradata Magazine</em><span style="font-style: normal;"> article</span></a><span style="font-style: normal;">. But in a world where ever more data comes in from ever more sources – and is needed </span><em>ever faster</em><span style="font-style: normal;"> – it simply isn&#8217;t realistic to expe</span><span style="font-style: normal;"><span style="font-weight: normal;">ct that all an</span></span><span style="font-style: normal;"> enterprise&#8217;s data will be vetted, organized, and managed to the highest of standards. </span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">This is a core premise of </span><a href="http://www.dbms2.com/2010/04/12/greenplumchorus/" >Greenplum&#8217;s Enterprise Data Cloud (EDC)/Chorus</a><span style="font-style: normal;"> marketing initiative, and in that respect Greenplum is correct.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">If the EDW is a great idea that can never be 100% implemented, what should you do? At conventional enterprises, the answer is pretty obvious: </span><span style="font-style: normal;"><strong>Manage some of your data to enterprise data warehouse  standards, but not all of it. </strong></span><span style="font-style: normal;"><span style="font-weight: normal;">Specifically, </span></span><span style="font-style: normal;"><strong>your highest-value data should be in something that looks like a classic enterprise data warehouse, and your lower-value data shouldn&#8217;t.</strong></span></p>
<p style="margin-bottom: 0in;">Of course, if you&#8217;re a data mart outsourcer or other analytic service provider, whose data is about your customers&#8217; businesses rather than your own, and whose business is managing your customers&#8217; data, this may not apply to you. But otherwise it&#8217;s a position with many supporting arguments, including:</p>
<ul>
<li><strong>Financial reporting, 	compliance, and other legitimate concerns introduce rigidity into 	data models</strong>. This increases the cost and reduces the speed of 	getting data into enterprise data warehouses.</li>
<li><strong>Data governance procedures </strong>imposed for any other business purpose have the same effect. 	What&#8217;s deemed necessary for enterprise data warehouses can be fatal 	to timely analytics.</li>
<li><span style="font-weight: normal;">The </span><strong>highest-value data</strong><span style="font-weight: normal;"> typically </span><strong>comes from transactional systems, </strong><span style="font-weight: normal;">such 	as order entry or sales contact management. So it starts out with a 	degree of governan</span>ce that, say, web log files may never 	enjoy.</li>
<li>In some enterprises, it is 	affordable or even cost-effective to manage your highest-value data 	in your favorite big-brand DBMS, but necessary to manage most of 	your data in something with lower TCO (Total Cost of Ownership). 	<strong>Big-brand OLTP DBMS are often better </strong><span style="font-weight: normal;">(or 	at least less bad) </span><strong>at </strong><span style="font-weight: normal;">managing </span><strong>enterprise data warehouses than </strong><span style="font-weight: normal;">they 	are </span><strong>at </strong><span style="font-weight: normal;">running</span><strong> data mart workloads.</strong></li>
<li>At certain enterprise and database 	sizes, it may indeed make sense to run what amounts to an <strong>enterprise 	data warehouse out of the same database instance that does OLTP,</strong> while putting larger data sets into more cost-effective data marts. 	A trend to “operational BI” may actually make that option more 	appealing going forward than it has been in the past.</li>
<li>And finally, there&#8217;s the empirical 	fact that <strong>not one really large enterprise on the whole planet has 	a true, perfectly comprehensive enterprise data warehouse. </strong><span style="font-weight: normal;">At 	least, I&#8217;ve never heard of one.</span></li>
</ul>
<p><em><strong>Related links</strong></em></p>
<ul>
<li>Even <a href="http://www.dbms2.com/2008/10/23/teradata-appliance-product-lines/" >Teradata doesn&#8217;t push an EDW-only strategy</a> any more</li>
<li>I agreed when Greenplum first started pushing the EDC idea that something like it would be <a href="http://www.dbms2.com/2009/06/08/the-future-of-data-marts/" >the future of data marts</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/04/12/enterprise-data-warehouse-edw-myt/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Some business trends in the data warehouse market</title>
		<link>http://www.dbms2.com/2010/03/19/some-business-trends-in-the-data-warehouse-market/</link>
		<comments>http://www.dbms2.com/2010/03/19/some-business-trends-in-the-data-warehouse-market/#comments</comments>
		<pubDate>Fri, 19 Mar 2010 13:48:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1741</guid>
		<description><![CDATA[In recent conversations with various analytic DBMS vendors, a fairly consistent picture has emerged.

Business is strong. Multiple vendors claim to be going gangbusters, with the happy sounds coming out of Vertica and Infobright being echoed by several competitors. Hearsay suggests 	some other companies in related businesses are doing well too. 	Depending on who you talk [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">In recent conversations with various analytic DBMS vendors, a fairly consistent picture has emerged.</p>
<ul>
<li><strong>Business is strong.</strong> Multiple vendors claim to be going gangbusters, with the happy sounds coming out of <a href="../2010/03/19/vertica-update-4/">Vertica</a> and <a href="../2010/03/19/infobright-blog-update/">Infobright</a> being echoed by several competitors. Hearsay suggests 	some other companies in related businesses are doing well too. 	Depending on who you talk to, the business pickup dates back to Q4, give or 	take a quarter.</li>
<li><strong>Oracle Exadata has become a 	formidable competitor,</strong><span style="font-weight: normal;"> on the 	strength of Exadata 2.</span> Exadata 2&#8217;s positioning and perception 	among Oracle users seem to be pretty much in line with <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" >what 	Oracle portrayed to me</a>.</li>
<li><strong>Teradata is portrayed as a weak 	competitor.</strong> Competitors don&#8217;t worry about Teradata nearly as 	much as they do about Oracle. That said, I suspect a bit of wishful 	thinking; Teradata is clearly still getting a lot of business the 	other vendors would dearly love to have.</li>
<li><strong>HP Neoview is reeling.</strong> (Almost) nobody sees Neoview competitively. The Walmart Neoview 	installation is said to have stayed small at best. JP Morgan Chase is said 	to have completely thrown Neoview out (and a bunch of HP engineers 	with it).</li>
<li><strong>(Almost) nobody mentions 	competing against DB2</strong> either. This continues to baffle me.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/19/some-business-trends-in-the-data-warehouse-market/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>February 2010 data warehouse DBMS news roundup</title>
		<link>http://www.dbms2.com/2010/02/22/data-warehouse-dbms-news-roundup/</link>
		<comments>http://www.dbms2.com/2010/02/22/data-warehouse-dbms-news-roundup/#comments</comments>
		<pubDate>Mon, 22 Feb 2010 08:30:23 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1628</guid>
		<description><![CDATA[February is usually a busy month for data warehouse DBMS product releases, product announcements, and other real or contrived data warehouse DBMS news, and it can get pretty confusing trying to keep those categories of “news” apart.*  This year is no exception, although several vendors – including Teradata and Netezza – are taking “rolling thunder” [...]]]></description>
			<content:encoded><![CDATA[<p>February is usually a busy month for data warehouse DBMS product releases, product announcements, and other real or contrived data warehouse DBMS news, and it can get pretty confusing trying to keep those categories of “news” apart.*  This year is no exception, although several vendors – including Teradata and Netezza – are taking “rolling thunder” approaches, doing some of their announcements this month while holding others back for March or April.</p>
<p><em>*I probably have it worse than most people in that regard, because my clients run tentative feature lists and announcement schedules by me well in advance, which may get changed multiple times before the final dates roll around. I also occasionally miss some detail, if it wasn&#8217;t in a pre-briefing but gets added at the end.</em></p>
<p>Anyhow, the three big themes of this month&#8217;s announcements are probably:</p>
<ul>
<li><strong>Integrating different kinds of analytic processing into databases and DBMS. </strong></li>
<li><strong>Taking advantage of hardware advances.</strong></li>
<li><strong>Playing catchup</strong> in areas where small vendors&#8217; products weren&#8217;t mature yet.</li>
</ul>
<p><span id="more-1628"></span>For example, the three biggest data warehouse DBMS product announcements this month are probably:</p>
<ul>
<li><strong>Aster Data nCluster 4.5.</strong> Much like Aster&#8217;s prior release &#8212; <a href="../../../../../2009/10/30/aster-data-application-server-ncluster/">Aster Data nCluster 4.0</a> – <a href="http://www.dbms2.com/2010/02/22/aster-data-ncluster-4-5/" >Aster Data nCluster 4.5</a> has a major focus on integrating analytics and database processing. This time, the emphasis is on application development tools and pre-built analytic packages. In addition, Aster&#8217;s management tool GUIs have been upgraded, building on catch-up functionality in the Aster Data nCluster 4.0.</li>
<li><strong>Netezza&#8217;s “i” add-on to its existing TwinFin products.</strong> With <a href="../../../../../2010/02/22/netezza-twinfin/">Netezza TwinFin(i)</a>, Netezza becomes the second MPP RDBMS vendor with a comprehensive “Big Data Analytic Platform” kind of strategy. (Netezza would surely argue that it was the first, but that depends on how seriously one took <a href="../../../../../2007/09/27/the-netezza-developer-network/">Netezza&#8217;s prior attempt</a>.) Many of the details are different from Aster&#8217;s, of course, but the general philosophy is similar. So far, Netezza has announced one interesting proprietary library of analytic packages (for linear/matrix algebra), plus the port of 4,000 or so functions in open source libraries.</li>
<li><strong>Vertica 4.0.</strong> Vertica has had a highly innovative columnar DBMS architecture from the getgo, but at the cost of some restrictions or awkwardness in the relationship between data layout and SQL processing. Vertica says that <a href="../../../../../2010/02/22/vertica-4/">Vertica 4.0</a> fixes all that. In addition, it has some analytic processing enhancements, especially in the time series area, where Vertica doesn&#8217;t vigorously dispute that Sybase IQ previously had an advantage.</li>
</ul>
<p>In addition,</p>
<ul>
<li><strong>Teradata is announcing its Data Warehouse Appliance 2580, the successor to the Teradata 2550.</strong> This is purely a hardware refresh; Teradata&#8217;s hardware and software upgrades are not generally synced. The Teradata 2580 upgrades CPUs from Harpertown to Nehalem, includes 3X the RAM of its predecessor, and offers an option for 1 TB disks (thus lowering the bottom price/TB a lot, to $31K list).</li>
<li>Aster, Vertica, and ParAccel have all called attention to the fact that, if solid-state drives have interfaces like those of disk drives, and if a DBMS supports disk drives, then a DBMS also supports solid-state drives as well. At least Aster and ParAccel have signaled that they have at least one customer or prospect each interested in Fusion I/O&#8217;s solid-state technology, especially in the retail sector. This is basically a hardware matter as well, and a big deal only for those who were somehow unaware of <a href="../../../../../2010/01/31/flash-pcmsolid-state-memory-disk/">the impending dominance of solid-state memory technology</a>.</li>
<li>Sybase announced its <a href="../../../../../2010/02/05/sybase-aleri-rap/">Aleri</a> acquisition earlier this month.</li>
<li>Various vendors have bragged about various rankings, awards, or benchmarks, or – sometimes less tediously &#8212; about last year&#8217;s sales results.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/22/data-warehouse-dbms-news-roundup/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>TwinFin(i) – Netezza&#8217;s version of a parallel analytic platform</title>
		<link>http://www.dbms2.com/2010/02/22/netezza-twinfin/</link>
		<comments>http://www.dbms2.com/2010/02/22/netezza-twinfin/#comments</comments>
		<pubDate>Mon, 22 Feb 2010 08:21:13 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[SAS Institute]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1613</guid>
		<description><![CDATA[Much like Aster Data did in Aster 4.0 and now Aster 4.5, Netezza is announcing a general parallel big data analytic platform strategy. It is called Netezza TwinFin(i), it is a chargeable option for the Netezza TwinFin appliance, and many announced details are on the vague side, with Netezza promising more clarity at or before [...]]]></description>
			<content:encoded><![CDATA[<p>Much like Aster Data did in <a href="http://www.dbms2.com/2009/10/30/aster-data-application-server-ncluster/" >Aster 4.0</a> and now <a href="http://www.dbms2.com/2010/02/22/aster-data-ncluster-4-5/" >Aster 4.5</a>, Netezza is announcing a general parallel big data analytic platform strategy. It is called Netezza TwinFin(i), it is a chargeable option for the <a href="http://www.dbms2.com/2009/07/30/netezza-new-product-family/" >Netezza TwinFin</a> appliance, and many announced details are on the vague side, with Netezza promising more clarity at or before its Enzee Universe conference in June. At a high level, the Aster and Netezza approaches compare/contrast as follows:<span id="more-1613"></span></p>
<ul>
<li>Netezza&#8217;s software runs on well-designed proprietary hardware. Aster runs on hardware that&#8217;s more off-the-shelf.</li>
<li>Aster was first to ship, and will also be first to ship an IDE (Integrated Development Environment).</li>
<li>MapReduce is central to Aster&#8217;s approach. Netezza TwinFin(i) supports MapReduce too, specifically a Hadoop implementation, but I don&#8217;t get the sense that everything Netezza does is built on MapReduce underpinnings.</li>
<li>Both Aster and Netezza try to provide rich functionality for creating in-memory data structures parallel analytic programs can use. Both seem to let you escape from the pure relational-table paradigm more easily than, say, Teradata&#8217;s new persistent memory capabilities do.</li>
<li>Aster and Netezza have made different choices about what kinds of prebuilt analytic packages to offer. Netezza could actually leapfrog Aster in this regard, but let&#8217;s see where each vendor is by, say, mid-year. If you care about the details of built-in analytic functions, you really should consider executing non-disclosure agreements with both those companies.</li>
<li>Both Aster and Netezza stress that you can run analytic functions out-of-process, greatly reducing the chance that they crash the database. Netezza and I&#8217;m pretty sure also Aster also retain the option of running in-process, which provides maximum performance. (In Netezza&#8217;s case C++ is the only in-process language supported, and I think Aster has a similar limitation.)</li>
<li>Like Aster, Netezza is integrating SQL queries and other analytic processing under the same workload management rubric.</li>
<li>Much like Aster, Netezza is tap-dancing by implying much richer forthcoming SAS support than anything currently announced. (The crunch-per-paragraph ratio in either vendor&#8217;s SAS-related press releases to date is distressingly low.)</li>
</ul>
<p>More specifically, here are some highlights of what I know, am guessing, and/or am allowed to say about Netezza TwinFin(i) at this time.</p>
<ul>
<li>The foundation for the analytic add-ons in Netezza TwinFin(i) is some sort of low-level “analytic executables.” Not understanding exactly what these are is my biggest area of confusion in the whole TwinFin(i) stack. Are they all C++, with everything translated into same? Is there Java all the way down as an alternative? (E.g., Hadoop is written in Java.) Anyhow, whatever it is, it&#8217;s surely a big improvement on <a href="../../../../../2007/09/27/the-netezza-developer-network/">Netezza&#8217;s prior Verilog-based generation of analytic extensibility technology</a>.</li>
<li>The announced list of languages supported in Netezza TwinFin(i) is Java, Python, Fortran, R, and C/C++. More are coming.</li>
<li>Netezza has named a lot of analytic functions it is adding, and hinting about more to come. It has named <a href="http://cran.r-project.org/" onclick="javascript:pageTracker._trackPageview('/cran.r-project.org');">CRAN/R</a> and GNU libraries, saying those have 1900 or more functions each. Netezza has also built its own linear algebra library for TwinFin(i), called nzMatrix. And as previously noted, TwinFin(i) also boasts a Hadoop implementation.</li>
<li>I haven&#8217;t heard about much in the way of TwinFin(i)-specific IDE support.</li>
<li>I don&#8217;t really have details as to what kinds of in-memory data structures Netezza TwinFin(i) does or doesn&#8217;t support.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/22/netezza-twinfin/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Aster Data nCluster 4.5</title>
		<link>http://www.dbms2.com/2010/02/22/aster-data-ncluster-4-5/</link>
		<comments>http://www.dbms2.com/2010/02/22/aster-data-ncluster-4-5/#comments</comments>
		<pubDate>Mon, 22 Feb 2010 08:20:13 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[SAS Institute]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1617</guid>
		<description><![CDATA[Like Vertica, Netezza, and Teradata, Aster is using this week to pre-announce a forthcoming product release, Aster Data nCluster 4.5. Aster is really hanging its identity on “Big Data Analytics” or some variant of that concept, and so the two major named parts of Aster nCluster 4.5 are:

Aster Data Analytic Foundation, a set of analytic [...]]]></description>
			<content:encoded><![CDATA[<p>Like <a href="http://www.dbms2.com/2010/02/22/vertica-4/" >Vertica</a>, <a href="http://www.dbms2.com/2010/02/22/netezza-twinfin/" >Netezza</a>, and Teradata, Aster is using this week to pre-announce a forthcoming product release, Aster Data nCluster 4.5. Aster is really hanging its identity on “Big Data Analytics” or some variant of that concept, and so the two major named parts of Aster nCluster 4.5 are:</p>
<ul>
<li><strong>Aster Data Analytic Foundation,</strong> a set of analytic packages prebuilt in <a href="../2009/06/09/aster-data-nclustersql-mapreduce/">Aster&#8217;s SQL-MapReduce</a><strong></strong></li>
<li><strong>Aster Data Developer Express,</strong> an Eclipse-based IDE (Integrated Development Environment) for developing and testing applications built on Aster nCluster, Aster SQL-MapReduce, and Aster Data Analytic Foundation</li>
</ul>
<p>And in other Aster news:</p>
<ul>
<li>Along with the development GUI in Aster nCluster 4.5, there is also a new administrative GUI.</li>
<li>Aster has certified that nCluster works with Fusion I/O boards, because at least one retail industry prospect cares. However, that in no way means that arm&#8217;s-length Fusion I/O certification is Aster&#8217;s ultimate <a href="../2010/01/31/flash-pcmsolid-state-memory-disk/">solid-state memory</a> strategy.</li>
<li>I had the wrong impression about how far Aster/SAS integration has gotten. So far, it&#8217;s just at the connector level.</li>
</ul>
<p>Aster Data Developer Express evidently does some cool stuff, like providing some sort of parallelism testing right on your desktop. It also generates lots of stub code, saving humans from the tedium of doing that. Useful, obviously.</p>
<p>But mainly, I want to write about the analytic packages.<span id="more-1617"></span> I&#8217;m not convinced that they&#8217;re a big deal in themselves yet, or that a whole lot of person-months have gone into their combined development. Still, I think they provide a great indication of one direction in which analytic functionality is going. And by the way, Aster promises to release a lot more of that kind of thing over the next 12 months.</p>
<p>Aster&#8217;s flagship analytic package is <a href="../2009/02/10/aster-data-npath/">nPath</a>, which is like a <strong>regular expression matcher,</strong> but <strong>for (time) series of data</strong> rather than for character strings. The main use for nPath is in pulling specific kinds of event sequences out of web or network event logs. However, one could imagine uses in other sectors that focus on temporal or sequential data (e.g., trading, intelligence, other sensor analysis), should existing SQL- and/or CEP-based technologies not prove sufficiently flexible. Aster 4.5 adds some new aggregation capabilities around nPath.</p>
<p>Other not-wholly-new packages in the Aster Data Analytic Foundation announcement are for <strong>sessionization</strong> (of clickstream data and the like) and <strong>tokenization </strong>(of text/character string data). While sessionization can be done in SQL, Aster thinks its MapReduce-based version is faster, since it doesn&#8217;t require self-joins. Makes sense. Aster&#8217;s tokenization sounds lame, however – text analytics in MapReduce tends to reinvent simplistic wheels for no clear reason, and Aster doesn&#8217;t seem to be an exception. (Aster would argue, however, that anything it does in SQL-MapReduce is more flexible than pure SQL or pure MapReduce alternatives.)</p>
<p>Another example of better-living-without-self-joins is Aster&#8217;s new <strong>market basket</strong> package. This lets you look at a set of point-of-sale data, pick a small integer N, and pull out all the sets of N things that were bought by the same person at the same time. I haven&#8217;t probed the claim in detail, but Aster implies there&#8217;s less combinatorial explosion in its approach than it is in the self-join alternative.</p>
<p><em>Note: Gartner highlighted self joins as a performance challenge in its recent </em><a href="../2010/02/10/gartner-magic-quadrant-data-warehouse-2009-2010/">Data Warehouse Magic Quadrant</a><em>.</em></p>
<p>Aster is also releasing a few <strong>statistical and general analytic functions</strong> &#8212; specifically (and I quote a slide):</p>
<ul>
<li>exponential moving average</li>
<li>weighted moving average</li>
<li>simple moving average</li>
<li>volume-weighted average price</li>
<li>correlation</li>
<li>linear regression</li>
<li>logistic regression</li>
<li>approximate_percentile</li>
<li>approximate_count_distinct</li>
</ul>
<p>The point of the last two items on the list is that if you set a non-zero tolerance for error, you can you can count things or order them into bins very efficiently – especially in terms of RAM &#8212; while being guaranteed not to exceed your error tolerance.</p>
<p><em>Note: One obvious inference from this list &#8212; which Aster gladly confirms &#8212; is that Aster has high hopes of selling to the financial services industry. </em></p>
<p>Finally, Aster is releasing its first pure <strong>graph-analytic</strong> function, for finding the shortest path between a given pair of nodes.</p>
<p>While I had the Aster folks on the phone anyway, I also took the opportunity to ask about the Aster nCluster 4.0 capability to create fairly persistent non-relational in-memory data structures. Specifically, I asked whether different users could access the same in-memory structure, and was told that this is a little klugey but not too horrendous. That suggests Aster&#8217;s capability may be a strict superset of UDF-based (User-Defined Function) approaches to meeting the same need, at least from a functionality standpoint. However, ease of creating those in-memory structures may still be better in the more SQL/UDF-centric approach favored by Teradata.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/22/aster-data-ncluster-4-5/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
