<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Introduction to Greenplum and some compare/contrast</title>
	<atom:link href="http://www.dbms2.com/2006/08/12/greenplum-datallegro-netezza-comparison/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com/2006/08/12/greenplum-datallegro-netezza-comparison/</link>
	<description>Choices in data management and analysis</description>
	<pubDate>Sat, 17 May 2008 05:55:14 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5</generator>
		<item>
		<title>By: Stuart Frost</title>
		<link>http://www.dbms2.com/2006/08/12/greenplum-datallegro-netezza-comparison/#comment-6891</link>
		<dc:creator>Stuart Frost</dc:creator>
		<pubDate>Thu, 21 Sep 2006 16:10:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/2006/08/12/greenplum-datallegro-netezza-comparison/#comment-6891</guid>
		<description>Since we haven't yet seen Greenplum in any competitive situations, it's hard for me to comment in any detail. However, there are a few things I don't understand:

1. Shipping rows around between execution plan fragments sounds OK for small amounts of data, but with large volumes, it's far more efficient to move data around in large blocks to avoid the overhead of many small movements (especially on GigE). We've been able to handle all queries put before us, so I don't see any inherent advantages in terms of functionality.

2. The last time I looked, Postgres was multi-process and not multi-threaded and I don't think that's changed. My guess is that there's a lot of time spent waiting for rows with this kind of approach.

3. I don't understand why the approach described above leads to the conclusion that 'native indexing and highly optimized aggregations' can't be easily done with different architectures. We certainly manage it.

4. In any real-world DW, what use are bit-mapped indexes with cardinality of up to 10,000? We generally deal with tables of billions of rows and cardinality in the tens or hundreds of millions.

5. The network throughput numbers quoted by Sun don't make any sense. How do I get 1GBps through four GigE links? Each link will max out at 80MBps, so that gives only 320MBps. Also, even with a TOE, you'd see a lot of CPU load with that kind of data movement. With such light CPU power relative to the number of disks, how does the system scale under concurrency?

6. There's also a claim of one TB per minute table scans from ten servers floating around. I presume that's calculated by just multiplying Sun's claim of 2GBps disk throughput x 10 x 60. Seems unlikely that Postgres could get anywhere near that in practice with just two dual core CPUs. Even if a simple table scan could run at that speed (which I doubt), our experience with Postgres is that it's MUCH slower than Ingres when running actual queries. 

Stuart
DATAllegro</description>
		<content:encoded><![CDATA[<p>Since we haven&#8217;t yet seen Greenplum in any competitive situations, it&#8217;s hard for me to comment in any detail. However, there are a few things I don&#8217;t understand:</p>
<p>1. Shipping rows around between execution plan fragments sounds OK for small amounts of data, but with large volumes, it&#8217;s far more efficient to move data around in large blocks to avoid the overhead of many small movements (especially on GigE). We&#8217;ve been able to handle all queries put before us, so I don&#8217;t see any inherent advantages in terms of functionality.</p>
<p>2. The last time I looked, Postgres was multi-process and not multi-threaded and I don&#8217;t think that&#8217;s changed. My guess is that there&#8217;s a lot of time spent waiting for rows with this kind of approach.</p>
<p>3. I don&#8217;t understand why the approach described above leads to the conclusion that &#8216;native indexing and highly optimized aggregations&#8217; can&#8217;t be easily done with different architectures. We certainly manage it.</p>
<p>4. In any real-world DW, what use are bit-mapped indexes with cardinality of up to 10,000? We generally deal with tables of billions of rows and cardinality in the tens or hundreds of millions.</p>
<p>5. The network throughput numbers quoted by Sun don&#8217;t make any sense. How do I get 1GBps through four GigE links? Each link will max out at 80MBps, so that gives only 320MBps. Also, even with a TOE, you&#8217;d see a lot of CPU load with that kind of data movement. With such light CPU power relative to the number of disks, how does the system scale under concurrency?</p>
<p>6. There&#8217;s also a claim of one TB per minute table scans from ten servers floating around. I presume that&#8217;s calculated by just multiplying Sun&#8217;s claim of 2GBps disk throughput x 10 x 60. Seems unlikely that Postgres could get anywhere near that in practice with just two dual core CPUs. Even if a simple table scan could run at that speed (which I doubt), our experience with Postgres is that it&#8217;s MUCH slower than Ingres when running actual queries. </p>
<p>Stuart<br />
DATAllegro</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Luke Lonergan</title>
		<link>http://www.dbms2.com/2006/08/12/greenplum-datallegro-netezza-comparison/#comment-3492</link>
		<dc:creator>Luke Lonergan</dc:creator>
		<pubDate>Sat, 12 Aug 2006 17:55:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/2006/08/12/greenplum-datallegro-netezza-comparison/#comment-3492</guid>
		<description>Thanks Curt, this is a good introduction.  It's nice to have someone dig into the technology and find the differences.

WRT “query shipping”, I was actually referring to a simpler approach used by others, not ours.  My admittedly subtle but I think important point was that the whole problem of supporting arbitrary DBMS work is that you have to get inside the database engine and implement optimization at the “execution plan” level and not the “query plan” level.  Rather than “repartitioning on the fly”, we pipeline rows through the interconnect among execution plan fragments in real time.  We do this because of the performance, generality and ease of adding future capabilities associated with a DBMS internal architecture.  I think this is critical and you should expect us to continue to have advantages like supporting a rich assortment of native indexing and highly optimized aggregations that can’t be done easily without our architecture.

Let's see if this sparks some comments!</description>
		<content:encoded><![CDATA[<p>Thanks Curt, this is a good introduction.  It&#8217;s nice to have someone dig into the technology and find the differences.</p>
<p>WRT “query shipping”, I was actually referring to a simpler approach used by others, not ours.  My admittedly subtle but I think important point was that the whole problem of supporting arbitrary DBMS work is that you have to get inside the database engine and implement optimization at the “execution plan” level and not the “query plan” level.  Rather than “repartitioning on the fly”, we pipeline rows through the interconnect among execution plan fragments in real time.  We do this because of the performance, generality and ease of adding future capabilities associated with a DBMS internal architecture.  I think this is critical and you should expect us to continue to have advantages like supporting a rich assortment of native indexing and highly optimized aggregations that can’t be done easily without our architecture.</p>
<p>Let&#8217;s see if this sparks some comments!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Text Technologies&#187;Blog Archive &#187; Text mining into big data warehouses</title>
		<link>http://www.dbms2.com/2006/08/12/greenplum-datallegro-netezza-comparison/#comment-3488</link>
		<dc:creator>Text Technologies&#187;Blog Archive &#187; Text mining into big data warehouses</dc:creator>
		<pubDate>Sat, 12 Aug 2006 10:54:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/2006/08/12/greenplum-datallegro-netezza-comparison/#comment-3488</guid>
		<description>[...] I previously noted that Attensity seemed to putting a lot of emphasis on a partnership with Business Objects and Teradata, although due to vacations I&#8217;ve still failed to get anybody from Business Objects to give me their view of the relationship&#8217;s importance. Now Greenplum tells me that O&#8217;Reilly is using their system to support text mining (apparently via homegrown technology), although I wasn&#8217;t too clear on the details. I also got the sense Greenplum is doing more in text mining, but the details of that completely escaped me. [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] I previously noted that Attensity seemed to putting a lot of emphasis on a partnership with Business Objects and Teradata, although due to vacations I&#8217;ve still failed to get anybody from Business Objects to give me their view of the relationship&#8217;s importance. Now Greenplum tells me that O&#8217;Reilly is using their system to support text mining (apparently via homegrown technology), although I wasn&#8217;t too clear on the details. I also got the sense Greenplum is doing more in text mining, but the details of that completely escaped me. [&#8230;]</p>
]]></content:encoded>
	</item>
</channel>
</rss>
