<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: History, focus, and technology of HP Neoview</title>
	<atom:link href="http://www.dbms2.com/2008/10/02/hp-neoview-technology-history/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com/2008/10/02/hp-neoview-technology-history/</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 09 Feb 2012 09:22:14 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
	<item>
		<title>By: Database Virtualization = Location Transparency. Old Wine in a New Bottle? &#171; Share Virtual Machines</title>
		<link>http://www.dbms2.com/2008/10/02/hp-neoview-technology-history/#comment-109312</link>
		<dc:creator>Database Virtualization = Location Transparency. Old Wine in a New Bottle? &#171; Share Virtual Machines</dc:creator>
		<pubDate>Thu, 05 Feb 2009 06:59:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=590#comment-109312</guid>
		<description>[...] 2006. Oracle&#8217;s acquisition of TangoSol, Microsoft&#8217;s Project Velocity are following HP NeoView&#8217;s usage of distributed caches for solving large BI queries. Strictly speaking these are not [...]</description>
		<content:encoded><![CDATA[<p>[...] 2006. Oracle&#8217;s acquisition of TangoSol, Microsoft&#8217;s Project Velocity are following HP NeoView&#8217;s usage of distributed caches for solving large BI queries. Strictly speaking these are not [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Goetz Graefe</title>
		<link>http://www.dbms2.com/2008/10/02/hp-neoview-technology-history/#comment-108322</link>
		<dc:creator>Goetz Graefe</dc:creator>
		<pubDate>Tue, 27 Jan 2009 23:44:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=590#comment-108322</guid>
		<description>For what it&#039;s worth, the Cascades project was never associated with the University of Wisconsin - Madison. The only possible connection is that I got my degree there. I wrote the query optimizer code 1993-94 while on the faculty of Portland State University (in Oregon) and consulting for Tandem. In addition to the Tandem project (and now HP Neoview), the code also formed the foundation for query optimization in Microsoft SQL Server 7.0 and onwards.</description>
		<content:encoded><![CDATA[<p>For what it&#8217;s worth, the Cascades project was never associated with the University of Wisconsin &#8211; Madison. The only possible connection is that I got my degree there. I wrote the query optimizer code 1993-94 while on the faculty of Portland State University (in Oregon) and consulting for Tandem. In addition to the Tandem project (and now HP Neoview), the code also formed the foundation for query optimization in Microsoft SQL Server 7.0 and onwards.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tom Williams</title>
		<link>http://www.dbms2.com/2008/10/02/hp-neoview-technology-history/#comment-104377</link>
		<dc:creator>Tom Williams</dc:creator>
		<pubDate>Mon, 15 Dec 2008 04:42:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=590#comment-104377</guid>
		<description>It is a bit difficult to go through it detail (and I did like your joke).

In short, most joins involve sorts and merges, which are expensive and can get very expensive when large data sets are involved.  There are only two join plans that provide linear scalability, the hash join and the hash merge join.  The hash merge join requires a hash based file sytem (different from hash distribution). The hash join employs a similiar technique but in memory.  The problem is that memory runs out quickly and is often used for other operations like buffering.  Teradata is the only RDBMS that provides the hash merge join.

So if I have to sort and merge large data sets, I really want the hash merge join available to the optimzer.</description>
		<content:encoded><![CDATA[<p>It is a bit difficult to go through it detail (and I did like your joke).</p>
<p>In short, most joins involve sorts and merges, which are expensive and can get very expensive when large data sets are involved.  There are only two join plans that provide linear scalability, the hash join and the hash merge join.  The hash merge join requires a hash based file sytem (different from hash distribution). The hash join employs a similiar technique but in memory.  The problem is that memory runs out quickly and is often used for other operations like buffering.  Teradata is the only RDBMS that provides the hash merge join.</p>
<p>So if I have to sort and merge large data sets, I really want the hash merge join available to the optimzer.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2008/10/02/hp-neoview-technology-history/#comment-104270</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Sun, 14 Dec 2008 13:51:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=590#comment-104270</guid>
		<description>Tom,

I have a design that will ensure SUB-linear scalability, up to over a petabyte. On one terabyte of data, I&#039;ll throttle performance by a factor of 10.  On four terabytes, I&#039;ll throttle it only by a factor of 8 ... OK, I&#039;m kidding.  But to compare constant_1 times n vs. constant_2 times nlogn, it&#039;s interesting to know what constant_1 and constant_2 are.

More generally, I&#039;m confused by what you&#039;re saying. You seem to be assigning a single scalability function to all join plans on a particular product, no matter what strategy the particular query&#039;s execution plan uses.  Taken literally, that&#039;s totally absurd, and I&#039;m not guessing successfully at your actual and surely more sensible meaning.</description>
		<content:encoded><![CDATA[<p>Tom,</p>
<p>I have a design that will ensure SUB-linear scalability, up to over a petabyte. On one terabyte of data, I&#8217;ll throttle performance by a factor of 10.  On four terabytes, I&#8217;ll throttle it only by a factor of 8 &#8230; OK, I&#8217;m kidding.  But to compare constant_1 times n vs. constant_2 times nlogn, it&#8217;s interesting to know what constant_1 and constant_2 are.</p>
<p>More generally, I&#8217;m confused by what you&#8217;re saying. You seem to be assigning a single scalability function to all join plans on a particular product, no matter what strategy the particular query&#8217;s execution plan uses.  Taken literally, that&#8217;s totally absurd, and I&#8217;m not guessing successfully at your actual and surely more sensible meaning.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tom Williams</title>
		<link>http://www.dbms2.com/2008/10/02/hp-neoview-technology-history/#comment-104230</link>
		<dc:creator>Tom Williams</dc:creator>
		<pubDate>Sat, 13 Dec 2008 23:00:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=590#comment-104230</guid>
		<description>Which implementation besides Teradata provides linear scalability regardless of the size of the tables and the concurrent user level? From what I understand, Oracle, IMB, Neoview hash join plan is linear but is dependent on availability of sufficient meory.  After that, their join plans are nlogn.

Linear scalability is very rare in computing and I&#039;d be interested in knowing if anyone besides Teradata provides it in their RDBMS.</description>
		<content:encoded><![CDATA[<p>Which implementation besides Teradata provides linear scalability regardless of the size of the tables and the concurrent user level? From what I understand, Oracle, IMB, Neoview hash join plan is linear but is dependent on availability of sufficient meory.  After that, their join plans are nlogn.</p>
<p>Linear scalability is very rare in computing and I&#8217;d be interested in knowing if anyone besides Teradata provides it in their RDBMS.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2008/10/02/hp-neoview-technology-history/#comment-104054</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Fri, 12 Dec 2008 05:37:47 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=590#comment-104054</guid>
		<description>Tom,

Most of the row-based competitors can, as one implementation option, do a hash partition, forgo indexes, and expect the queries to be satisfied by table scans.

So I&#039;m not clear as to exactly what architectural point you are making that puts Teradata ahead of the newer guys, or for that matter that makes it impossible to use Oracle in the way that you described.

If all you&#039;re saying is that b-trees aren&#039;t the way to do decision support, and that the architectures of specialty products reflects this fact better than Oracle&#039;s does, I agree completely. But it looked as if you were going to an extreme that I don&#039;t see the foundation for.

CAM</description>
		<content:encoded><![CDATA[<p>Tom,</p>
<p>Most of the row-based competitors can, as one implementation option, do a hash partition, forgo indexes, and expect the queries to be satisfied by table scans.</p>
<p>So I&#8217;m not clear as to exactly what architectural point you are making that puts Teradata ahead of the newer guys, or for that matter that makes it impossible to use Oracle in the way that you described.</p>
<p>If all you&#8217;re saying is that b-trees aren&#8217;t the way to do decision support, and that the architectures of specialty products reflects this fact better than Oracle&#8217;s does, I agree completely. But it looked as if you were going to an extreme that I don&#8217;t see the foundation for.</p>
<p>CAM</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tom Williams</title>
		<link>http://www.dbms2.com/2008/10/02/hp-neoview-technology-history/#comment-104009</link>
		<dc:creator>Tom Williams</dc:creator>
		<pubDate>Thu, 11 Dec 2008 13:43:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=590#comment-104009</guid>
		<description>You have to also consider the join algorithms when evaluating a decision support RDBMS.  Teradata is the only vendor who can guarantee linear scalability and it is because of the hashed based file system which was built to solve decision support problems.  Oracle, IBM and Neoview are all deployed on a b-tree file system which was designed for OLTP.  This forces them to use nlogn join algorithms when the queries involve very large tables or the concurrency level is high.</description>
		<content:encoded><![CDATA[<p>You have to also consider the join algorithms when evaluating a decision support RDBMS.  Teradata is the only vendor who can guarantee linear scalability and it is because of the hashed based file system which was built to solve decision support problems.  Oracle, IBM and Neoview are all deployed on a b-tree file system which was designed for OLTP.  This forces them to use nlogn join algorithms when the queries involve very large tables or the concurrency level is high.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2008/10/02/hp-neoview-technology-history/#comment-98486</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Sat, 04 Oct 2008 02:12:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=590#comment-98486</guid>
		<description>Joe,

Erin McCabe recently joined HP&#039;s BI unit.  Expect better PR from them in the future! :)

Best,

CAM</description>
		<content:encoded><![CDATA[<p>Joe,</p>
<p>Erin McCabe recently joined HP&#8217;s BI unit.  Expect better PR from them in the future! <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Best,</p>
<p>CAM</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2008/10/02/hp-neoview-technology-history/#comment-98365</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Thu, 02 Oct 2008 18:35:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=590#comment-98365</guid>
		<description>Thanks, Glenn -- good points all!

CAM</description>
		<content:encoded><![CDATA[<p>Thanks, Glenn &#8212; good points all!</p>
<p>CAM</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Glenn Paulley</title>
		<link>http://www.dbms2.com/2008/10/02/hp-neoview-technology-history/#comment-98357</link>
		<dc:creator>Glenn Paulley</dc:creator>
		<pubDate>Thu, 02 Oct 2008 17:34:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=590#comment-98357</guid>
		<description>Some comments on a few of these technology points:

&quot;Expressions – I assume this means projects and selects – are done via a kind of byte code, on the CPU. Greg suggested Teradata uses a similar approach.&quot; Actually this pertains to the computation of any expression value in the engine, including aggregate functions, arithmetic functions, string functions, and so on. The idea behind using a byte-code machine is that the machine can, in principle, be &quot;compiled&quot; (optimized) at query build time to eliminate code that is unnecessary for this computation in this particular context. Other systems, including Sybase SQL Anywhere, use this approach.

&quot;Neoview’s Cascades-based optimizer seems to be smart enough to, for example, do aggregations before joins when it makes sense. (Much the same is true of Aster Data’s optimizer.)&quot; - Pushing/pulling aggregation above/below a join was studied by a fellow graduate student, Paul Yan, at the University of Waterloo in the mid-1990s as part of his PhD thesis (under the direction of Paul Larson, now at Microsoft Research). As far as I know DB2 was the first product to incorporate these optimizations.

&quot;By the way, the basic idea behind Cascades – or at least Neoview’s version of it — is that it uses more heuristics than conventional cost-based optimizers do. That it, Neoview starts out with a candidate plan – perhaps derived in the usual way – and then considers variants on it.&quot; - It is difficult to know how much Neoview&#039;s implementation differs from other transformation-based optimizer implementations (such as Microsoft SQL Server) based on the Cascades framework (originally developed by Goetz Graefe, now at HP Labs). Every optimizer uses heuristics to reduce the size of the search space; whether or not one uses &quot;more&quot; heuristics over another is difficult to assess, because those assumptions are rarely documented, if even made public.</description>
		<content:encoded><![CDATA[<p>Some comments on a few of these technology points:</p>
<p>&#8220;Expressions – I assume this means projects and selects – are done via a kind of byte code, on the CPU. Greg suggested Teradata uses a similar approach.&#8221; Actually this pertains to the computation of any expression value in the engine, including aggregate functions, arithmetic functions, string functions, and so on. The idea behind using a byte-code machine is that the machine can, in principle, be &#8220;compiled&#8221; (optimized) at query build time to eliminate code that is unnecessary for this computation in this particular context. Other systems, including Sybase SQL Anywhere, use this approach.</p>
<p>&#8220;Neoview’s Cascades-based optimizer seems to be smart enough to, for example, do aggregations before joins when it makes sense. (Much the same is true of Aster Data’s optimizer.)&#8221; &#8211; Pushing/pulling aggregation above/below a join was studied by a fellow graduate student, Paul Yan, at the University of Waterloo in the mid-1990s as part of his PhD thesis (under the direction of Paul Larson, now at Microsoft Research). As far as I know DB2 was the first product to incorporate these optimizations.</p>
<p>&#8220;By the way, the basic idea behind Cascades – or at least Neoview’s version of it — is that it uses more heuristics than conventional cost-based optimizers do. That it, Neoview starts out with a candidate plan – perhaps derived in the usual way – and then considers variants on it.&#8221; &#8211; It is difficult to know how much Neoview&#8217;s implementation differs from other transformation-based optimizer implementations (such as Microsoft SQL Server) based on the Cascades framework (originally developed by Goetz Graefe, now at HP Labs). Every optimizer uses heuristics to reduce the size of the search space; whether or not one uses &#8220;more&#8221; heuristics over another is difficult to assess, because those assumptions are rarely documented, if even made public.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

