<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: The secret sauce to Clearpace&#8217;s compression</title>
	<atom:link href="http://www.dbms2.com/2009/05/14/the-secret-sauce-to-clearpaces-compression/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com/2009/05/14/the-secret-sauce-to-clearpaces-compression/</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 09 Feb 2012 16:57:09 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
	<item>
		<title>By: The Netezza and IBM DB2 approaches to compression &#124; DBMS 2 : DataBase Management System Services</title>
		<link>http://www.dbms2.com/2009/05/14/the-secret-sauce-to-clearpaces-compression/#comment-203001</link>
		<dc:creator>The Netezza and IBM DB2 approaches to compression &#124; DBMS 2 : DataBase Management System Services</dc:creator>
		<pubDate>Tue, 11 Jan 2011 18:20:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=782#comment-203001</guid>
		<description>[...] Except for the 4096 values limit, that sounds at least as flexible as the Rainstor/Clearpace compression approach. [...]</description>
		<content:encoded><![CDATA[<p>[...] Except for the 4096 values limit, that sounds at least as flexible as the Rainstor/Clearpace compression approach. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sneakernet to the cloud &#124; DBMS2 -- DataBase Management System Services</title>
		<link>http://www.dbms2.com/2009/05/14/the-secret-sauce-to-clearpaces-compression/#comment-123310</link>
		<dc:creator>Sneakernet to the cloud &#124; DBMS2 -- DataBase Management System Services</dc:creator>
		<pubDate>Sat, 30 May 2009 03:06:07 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=782#comment-123310</guid>
		<description>[...] sending data to the cloud, you probably want to compress it to the max before sending. Clearpace&#8217;s new RainStor structured-data archiving service emphasizes that idea. RainStor marketing says cloud, [...]</description>
		<content:encoded><![CDATA[<p>[...] sending data to the cloud, you probably want to compress it to the max before sending. Clearpace&#8217;s new RainStor structured-data archiving service emphasizes that idea. RainStor marketing says cloud, [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andy Ben-Dyke</title>
		<link>http://www.dbms2.com/2009/05/14/the-secret-sauce-to-clearpaces-compression/#comment-122210</link>
		<dc:creator>Andy Ben-Dyke</dc:creator>
		<pubDate>Wed, 20 May 2009 19:31:13 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=782#comment-122210</guid>
		<description>To clarify, Clearpace’s underlying technology leverages a tree-based rather than columnar structure that utilizes field and pattern level deduplication. When source data is loaded into NParchive each record is stored as a series of pointers to the location of a single instance of a data value, or pattern of data values.  The NParchive data store comprises a tree-based structure that links the various instances of the patterns together to establish the data records. Each record is essentially an independent tree, but each record can share leaves and branches. This approach typically delivers 40:1 compression when combined with the additional algorithmic and byte-level compression techniques employed by NParchive, but means that the original data records can be reconstituted at any time. 
 
NParchive’s tree-based approach provides the advantages of the columnar structure (column-level access and compression) but also allows additional compression to be applied (based upon “patterns” between columns).  Furthermore, the tree structure is used for in-memory querying, so the memory footprint is also significantly reduced. 
 
Take a look at this post http://tinyurl.com/qraffr if you want more information on NParchive’s compression techniques.</description>
		<content:encoded><![CDATA[<p>To clarify, Clearpace’s underlying technology leverages a tree-based rather than columnar structure that utilizes field and pattern level deduplication. When source data is loaded into NParchive each record is stored as a series of pointers to the location of a single instance of a data value, or pattern of data values.  The NParchive data store comprises a tree-based structure that links the various instances of the patterns together to establish the data records. Each record is essentially an independent tree, but each record can share leaves and branches. This approach typically delivers 40:1 compression when combined with the additional algorithmic and byte-level compression techniques employed by NParchive, but means that the original data records can be reconstituted at any time. </p>
<p>NParchive’s tree-based approach provides the advantages of the columnar structure (column-level access and compression) but also allows additional compression to be applied (based upon “patterns” between columns).  Furthermore, the tree structure is used for in-memory querying, so the memory footprint is also significantly reduced. </p>
<p>Take a look at this post <a href="http://tinyurl.com/qraffr" rel="nofollow">http://tinyurl.com/qraffr</a> if you want more information on NParchive’s compression techniques.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: NParchive data compression - the secret sauce &#124; Clearpace Blog</title>
		<link>http://www.dbms2.com/2009/05/14/the-secret-sauce-to-clearpaces-compression/#comment-122204</link>
		<dc:creator>NParchive data compression - the secret sauce &#124; Clearpace Blog</dc:creator>
		<pubDate>Wed, 20 May 2009 18:54:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=782#comment-122204</guid>
		<description>[...] patents and hundreds of man years of development.  However, following some commentary in a post by Curt Monash this week, I thought I’d shed some light on Clearpace’s “secret sauce”.  I’ve tried to [...]</description>
		<content:encoded><![CDATA[<p>[...] patents and hundreds of man years of development.  However, following some commentary in a post by Curt Monash this week, I thought I’d shed some light on Clearpace’s “secret sauce”.  I’ve tried to [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joydeep Sen Sarma</title>
		<link>http://www.dbms2.com/2009/05/14/the-secret-sauce-to-clearpaces-compression/#comment-121878</link>
		<dc:creator>Joydeep Sen Sarma</dc:creator>
		<pubDate>Sun, 17 May 2009 08:08:45 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=782#comment-121878</guid>
		<description>This looks pretty interesting - log files are often heavily denormalized (since joins in warehouses are so expensive) - and the multi-column idea could work very well.

I have been playing around with S3 and EC2 lately - one of the things that struck me was that the cost of uploading data can also be non-trivial. Besides - if data is not uploaded in an optimally compressed manner - then the user needs cpu cycles to compress it by renting cpu in the cloud.

I think it would be very interesting if highly efficient compression could be applied right from the moment data originates - all the way to it&#039;s final long term store.</description>
		<content:encoded><![CDATA[<p>This looks pretty interesting &#8211; log files are often heavily denormalized (since joins in warehouses are so expensive) &#8211; and the multi-column idea could work very well.</p>
<p>I have been playing around with S3 and EC2 lately &#8211; one of the things that struck me was that the cost of uploading data can also be non-trivial. Besides &#8211; if data is not uploaded in an optimally compressed manner &#8211; then the user needs cpu cycles to compress it by renting cpu in the cloud.</p>
<p>I think it would be very interesting if highly efficient compression could be applied right from the moment data originates &#8211; all the way to it&#8217;s final long term store.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2009/05/14/the-secret-sauce-to-clearpaces-compression/#comment-121468</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Thu, 14 May 2009 08:04:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=782#comment-121468</guid>
		<description>That looks like good stuff, Joe.  Thanks!</description>
		<content:encoded><![CDATA[<p>That looks like good stuff, Joe.  Thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe Hellerstein</title>
		<link>http://www.dbms2.com/2009/05/14/the-secret-sauce-to-clearpaces-compression/#comment-121452</link>
		<dc:creator>Joe Hellerstein</dc:creator>
		<pubDate>Thu, 14 May 2009 07:03:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=782#comment-121452</guid>
		<description>For a good technical discussion of how to trade row- and column-wise schemes for maximum compression, have a look at the IBM work on Blink.  In many cases they show you can approach the optimal compression rate (&lt;a href=&quot;http://en.wikipedia.org/wiki/Entropy_(Information_theory)&quot; rel=&quot;nofollow&quot;&gt;entropy&lt;/a&gt;) this way.  I like how they cut through marketing fog on columns vs. rows and focus on the technical meat of compression, and the costs of coding/decoding vs. I/O. 

See &lt;a href=&quot;http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4497414&quot; rel=&quot;nofollow&quot;&gt;this paper on Blink&lt;/a&gt; and &lt;a href=&quot;http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.102.9090&amp;rep=rep1&amp;type=pdf&quot; rel=&quot;nofollow&quot;&gt;their study/survey of various compression methods&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>For a good technical discussion of how to trade row- and column-wise schemes for maximum compression, have a look at the IBM work on Blink.  In many cases they show you can approach the optimal compression rate (<a href="http://en.wikipedia.org/wiki/Entropy_(Information_theory)" rel="nofollow">entropy</a>) this way.  I like how they cut through marketing fog on columns vs. rows and focus on the technical meat of compression, and the costs of coding/decoding vs. I/O. </p>
<p>See <a href="http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4497414" rel="nofollow">this paper on Blink</a> and <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.102.9090&amp;rep=rep1&amp;type=pdf" rel="nofollow">their study/survey of various compression methods</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Introduction to Clearpace &#124; DBMS2 -- DataBase Management System Services</title>
		<link>http://www.dbms2.com/2009/05/14/the-secret-sauce-to-clearpaces-compression/#comment-121440</link>
		<dc:creator>Introduction to Clearpace &#124; DBMS2 -- DataBase Management System Services</dc:creator>
		<pubDate>Thu, 14 May 2009 05:52:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=782#comment-121440</guid>
		<description>[...] and deduping them. I&#8217;m still fuzzy on how that all works.  (Edit: I subsequently posted an explanation of that [...]</description>
		<content:encoded><![CDATA[<p>[...] and deduping them. I&#8217;m still fuzzy on how that all works.  (Edit: I subsequently posted an explanation of that [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>

