<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Greenplum update &#8212; Release 3.3 and so on</title>
	<atom:link href="http://www.dbms2.com/2009/06/05/greenplum-update-release-3-3/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com/2009/06/05/greenplum-update-release-3-3/</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Wed, 17 Mar 2010 02:23:23 -0400</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Analytics Team &#187; Blog Archive &#187; Greenplum offers single node edition for free</title>
		<link>http://www.dbms2.com/2009/06/05/greenplum-update-release-3-3/comment-page-1/#comment-145739</link>
		<dc:creator>Analytics Team &#187; Blog Archive &#187; Greenplum offers single node edition for free</dc:creator>
		<pubDate>Sun, 25 Oct 2009 21:10:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=799#comment-145739</guid>
		<description>[...] finally, here&#8217;s some information about Greenplum&#8217;s pricing: either subscription or [...]</description>
		<content:encoded><![CDATA[<p>[...] finally, here&#8217;s some information about Greenplum&#8217;s pricing: either subscription or [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul Johnson</title>
		<link>http://www.dbms2.com/2009/06/05/greenplum-update-release-3-3/comment-page-1/#comment-125542</link>
		<dc:creator>Paul Johnson</dc:creator>
		<pubDate>Tue, 16 Jun 2009 10:05:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=799#comment-125542</guid>
		<description>FYI, the later release of Dataupia also had no master node.</description>
		<content:encoded><![CDATA[<p>FYI, the later release of Dataupia also had no master node.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Glenn Davis</title>
		<link>http://www.dbms2.com/2009/06/05/greenplum-update-release-3-3/comment-page-1/#comment-124985</link>
		<dc:creator>Glenn Davis</dc:creator>
		<pubDate>Thu, 11 Jun 2009 08:38:28 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=799#comment-124985</guid>
		<description>Two points.

Ben Werther&#039;s comment shows a widespread misunderstanding about data and entropy. Bodies of data do not have entropy. Only models of data have entropy. Bear with me. Suppose you compress English text by Huffman-coding individual characters. You would get, say, 4 bits per character. Then compress the same data using Huffman-coded digrams; you&#039;ll get something lower, like 3 bits. So which is the entropy of the data? Neither! You have two figures but the data didn&#039;t change. What changed was the model of the data. That is a very important and not-always-recognized distinction because information theory does not deal with modeling -- only with encoding modeled data.

Now Kolmogorov complexity. That concept has little practical value in database compression because you can&#039;t use it to quantify anything. It&#039;s really philosophical more than anything else.</description>
		<content:encoded><![CDATA[<p>Two points.</p>
<p>Ben Werther&#8217;s comment shows a widespread misunderstanding about data and entropy. Bodies of data do not have entropy. Only models of data have entropy. Bear with me. Suppose you compress English text by Huffman-coding individual characters. You would get, say, 4 bits per character. Then compress the same data using Huffman-coded digrams; you&#8217;ll get something lower, like 3 bits. So which is the entropy of the data? Neither! You have two figures but the data didn&#8217;t change. What changed was the model of the data. That is a very important and not-always-recognized distinction because information theory does not deal with modeling &#8212; only with encoding modeled data.</p>
<p>Now Kolmogorov complexity. That concept has little practical value in database compression because you can&#8217;t use it to quantify anything. It&#8217;s really philosophical more than anything else.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Per-terabyte pricing &#124; DBMS2 -- DataBase Management System Services</title>
		<link>http://www.dbms2.com/2009/06/05/greenplum-update-release-3-3/comment-page-1/#comment-124728</link>
		<dc:creator>Per-terabyte pricing &#124; DBMS2 -- DataBase Management System Services</dc:creator>
		<pubDate>Tue, 09 Jun 2009 08:29:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=799#comment-124728</guid>
		<description>[...] DBMS vendors sometimes price per terabyte of user data.  Vertica&#8217;s list price is $100K/TB. Greenplum&#8217;s list price is $70K/TB. In practice, both offer substantial discounts, especially at higher volumes.  In both cases, this [...]</description>
		<content:encoded><![CDATA[<p>[...] DBMS vendors sometimes price per terabyte of user data.  Vertica&#8217;s list price is $100K/TB. Greenplum&#8217;s list price is $70K/TB. In practice, both offer substantial discounts, especially at higher volumes.  In both cases, this [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Sichi</title>
		<link>http://www.dbms2.com/2009/06/05/greenplum-update-release-3-3/comment-page-1/#comment-124558</link>
		<dc:creator>John Sichi</dc:creator>
		<pubDate>Mon, 08 Jun 2009 05:07:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=799#comment-124558</guid>
		<description>I have written up some commentary on compressing rows vs columns here:

http://thinkwaitfast.blogspot.com/2009/06/compressing-rows-vs-columns.html</description>
		<content:encoded><![CDATA[<p>I have written up some commentary on compressing rows vs columns here:</p>
<p><a href="http://thinkwaitfast.blogspot.com/2009/06/compressing-rows-vs-columns.html" onclick="javascript:pageTracker._trackPageview('/thinkwaitfast.blogspot.com');" rel="nofollow">http://thinkwaitfast.blogspot.com/2009/06/compressing-rows-vs-columns.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael McIntire</title>
		<link>http://www.dbms2.com/2009/06/05/greenplum-update-release-3-3/comment-page-1/#comment-124331</link>
		<dc:creator>Michael McIntire</dc:creator>
		<pubDate>Sat, 06 Jun 2009 16:59:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=799#comment-124331</guid>
		<description>A few points: we&#039;d like to see the master node concept go away all together. The problem is that with a bigger system, it becomes the number of threads/connections which can reasonably be maintained on the head node (inbound and internally), and the cost of the failover which results. Most customers with this problem will be using a PG session pooler, but this comes with it&#039;s own problems.  This problem is not unique to GP, and it is a very tough architectural and implementation problem, I think of the majors only Teradata has a solve. 

On the compression front, two factors ultimately influence how well the compression works. The more structured the data is, the more effective an auto-codification scheme like Vertica. The more random and unknown the data, the more likely the standard block/dictionary schemes will work.</description>
		<content:encoded><![CDATA[<p>A few points: we&#8217;d like to see the master node concept go away all together. The problem is that with a bigger system, it becomes the number of threads/connections which can reasonably be maintained on the head node (inbound and internally), and the cost of the failover which results. Most customers with this problem will be using a PG session pooler, but this comes with it&#8217;s own problems.  This problem is not unique to GP, and it is a very tough architectural and implementation problem, I think of the majors only Teradata has a solve. </p>
<p>On the compression front, two factors ultimately influence how well the compression works. The more structured the data is, the more effective an auto-codification scheme like Vertica. The more random and unknown the data, the more likely the standard block/dictionary schemes will work.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2009/06/05/greenplum-update-release-3-3/comment-page-1/#comment-124253</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Sat, 06 Jun 2009 02:51:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=799#comment-124253</guid>
		<description>When you use &quot;entropy&quot; in the context of &quot;compression&quot;, do you basically mean &quot;Kolmogorov complexity&quot;?  Anyhow, how is it calculated in PRACTICE? I.e., how do you know what the theoretical maximum is for a given dataset?

Thanks,

CAM</description>
		<content:encoded><![CDATA[<p>When you use &#8220;entropy&#8221; in the context of &#8220;compression&#8221;, do you basically mean &#8220;Kolmogorov complexity&#8221;?  Anyhow, how is it calculated in PRACTICE? I.e., how do you know what the theoretical maximum is for a given dataset?</p>
<p>Thanks,</p>
<p>CAM</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ben Werther</title>
		<link>http://www.dbms2.com/2009/06/05/greenplum-update-release-3-3/comment-page-1/#comment-124224</link>
		<dc:creator>Ben Werther</dc:creator>
		<pubDate>Fri, 05 Jun 2009 20:30:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=799#comment-124224</guid>
		<description>There&#039;s an information-theoretical bound to how much any data can be compressed -- i.e. the entropy of the data. http://en.wikipedia.org/wiki/Entropy_(Information_theory)

In our lab testing we&#039;ve seen fast block-compression schemes achieve up to approx 2/3rds of the theoretical maximum compression rate for typical datasets. (i.e. if the theoretical max is 6x compression, the best fast compression schemes will achieve approx 4x compression). We see roughly the same compression (give or take 10-20%) if the data is laid out in rows vs an idealized columnar representation.

In other words, the storage layout of the data makes far less difference than people appreciate, and columnar storage doesn&#039;t provide any magic loophole to defeat entropy.</description>
		<content:encoded><![CDATA[<p>There&#8217;s an information-theoretical bound to how much any data can be compressed &#8212; i.e. the entropy of the data. <a href="http://en.wikipedia.org/wiki/Entropy_(Information_theory)" onclick="javascript:pageTracker._trackPageview('/en.wikipedia.org');" rel="nofollow">http://en.wikipedia.org/wiki/Entropy_(Information_theory)</a></p>
<p>In our lab testing we&#8217;ve seen fast block-compression schemes achieve up to approx 2/3rds of the theoretical maximum compression rate for typical datasets. (i.e. if the theoretical max is 6x compression, the best fast compression schemes will achieve approx 4x compression). We see roughly the same compression (give or take 10-20%) if the data is laid out in rows vs an idealized columnar representation.</p>
<p>In other words, the storage layout of the data makes far less difference than people appreciate, and columnar storage doesn&#8217;t provide any magic loophole to defeat entropy.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andy E</title>
		<link>http://www.dbms2.com/2009/06/05/greenplum-update-release-3-3/comment-page-1/#comment-124180</link>
		<dc:creator>Andy E</dc:creator>
		<pubDate>Fri, 05 Jun 2009 15:18:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=799#comment-124180</guid>
		<description>Greenplum &amp; Asters&#039; compression is &quot;fairly close&quot; to columnar DBs? Someone should probably attempt to quantify &quot;fairly close&quot; or specified which columnar db they&#039;re describing (not all columnar DBs compress equivalently). 

One point of comparison...A couple months ago, a now-Vertica customer benchmarked Vertica and one of the aforementioned DBs, and a deciding factor was the relative amount of storage hardware required. 1TB of web app event data compressed to 200GB in Vertica (80% reduction). Same data &quot;compressed&quot; to greater than 1TB in the other. I think in the end, the competitor&#039;s DB was 8x larger than the Vertica physical DB size. 8x less storage = faster performance (less IO) and, more obviously, lots less hardware when you&#039;re managing dozens of TBs (uncompressed) of data. 

That&#039;s just one (pretty typical) data point. Compression results will vary based on DBMSs compared and the type of data as Curt has mentioned (see: http://www.dbms2.com/2008/09/24/vertica-finally-spells-out-its-compression-claims/ )</description>
		<content:encoded><![CDATA[<p>Greenplum &amp; Asters&#8217; compression is &#8220;fairly close&#8221; to columnar DBs? Someone should probably attempt to quantify &#8220;fairly close&#8221; or specified which columnar db they&#8217;re describing (not all columnar DBs compress equivalently). </p>
<p>One point of comparison&#8230;A couple months ago, a now-Vertica customer benchmarked Vertica and one of the aforementioned DBs, and a deciding factor was the relative amount of storage hardware required. 1TB of web app event data compressed to 200GB in Vertica (80% reduction). Same data &#8220;compressed&#8221; to greater than 1TB in the other. I think in the end, the competitor&#8217;s DB was 8x larger than the Vertica physical DB size. 8x less storage = faster performance (less IO) and, more obviously, lots less hardware when you&#8217;re managing dozens of TBs (uncompressed) of data. </p>
<p>That&#8217;s just one (pretty typical) data point. Compression results will vary based on DBMSs compared and the type of data as Curt has mentioned (see: <a href="http://www.dbms2.com/2008/09/24/vertica-finally-spells-out-its-compression-claims/"  rel="nofollow">http://www.dbms2.com/2008/09/24/vertica-finally-spells-out-its-compression-claims/</a> )</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greenplum will be announcing some stuff &#124; DBMS2 -- DataBase Management System Services</title>
		<link>http://www.dbms2.com/2009/06/05/greenplum-update-release-3-3/comment-page-1/#comment-124174</link>
		<dc:creator>Greenplum will be announcing some stuff &#124; DBMS2 -- DataBase Management System Services</dc:creator>
		<pubDate>Fri, 05 Jun 2009 13:18:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=799#comment-124174</guid>
		<description>[...] excepted &#8212; and so I&#8217;ll do a general, if slightly incomplete, Greenplum update in a separate post.   Categories: Data warehousing, Greenplum, Specific users&#160;  Subscribe to our complete [...]</description>
		<content:encoded><![CDATA[<p>[...] excepted &#8212; and so I&#8217;ll do a general, if slightly incomplete, Greenplum update in a separate post.   Categories: Data warehousing, Greenplum, Specific users&nbsp;  Subscribe to our complete [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic page generated in 0.238 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2010-03-17 03:54:57 -->
