<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: DATAllegro vs. Vertica and other columnar systems</title>
	<atom:link href="http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 09 Feb 2012 16:57:09 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
	<item>
		<title>By: Kirk</title>
		<link>http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/#comment-110249</link>
		<dc:creator>Kirk</dc:creator>
		<pubDate>Fri, 13 Feb 2009 05:37:13 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/#comment-110249</guid>
		<description>Vectornova is completely COLUMNS, it&#039;s amazing!</description>
		<content:encoded><![CDATA[<p>Vectornova is completely COLUMNS, it&#8217;s amazing!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/#comment-107592</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Tue, 20 Jan 2009 10:31:46 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/#comment-107592</guid>
		<description>Never heard of VectorStar.

Looking at VectorNova&#039;s site, however, it sounds like VectorStar is based on arrays rather than columns, perhaps like http://www.dbms2.com/2006/10/04/sas-intelligence-storage/</description>
		<content:encoded><![CDATA[<p>Never heard of VectorStar.</p>
<p>Looking at VectorNova&#8217;s site, however, it sounds like VectorStar is based on arrays rather than columns, perhaps like <a href="http://www.dbms2.com/2006/10/04/sas-intelligence-storage/" rel="nofollow">http://www.dbms2.com/2006/10/04/sas-intelligence-storage/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hernan</title>
		<link>http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/#comment-107517</link>
		<dc:creator>Hernan</dc:creator>
		<pubDate>Mon, 19 Jan 2009 17:04:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/#comment-107517</guid>
		<description>Have you had a chance at examining VectorStar? I believe this Mexican developers have made great progress with their columnar data engine -almost unknown in the United States. www.vectornova.com</description>
		<content:encoded><![CDATA[<p>Have you had a chance at examining VectorStar? I believe this Mexican developers have made great progress with their columnar data engine -almost unknown in the United States. <a href="http://www.vectornova.com" rel="nofollow">http://www.vectornova.com</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Column stores vs. vertically-partitioned row stores &#124; DBMS2 -- DataBase Management System Services</title>
		<link>http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/#comment-97311</link>
		<dc:creator>Column stores vs. vertically-partitioned row stores &#124; DBMS2 -- DataBase Management System Services</dc:creator>
		<pubDate>Fri, 12 Sep 2008 04:36:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/#comment-97311</guid>
		<description>[...] DBMS for efficient data warehousing, it isn&#8217;t necessarily dispositive for a comparison of columnar systems to data-warehouse-specialist row-based systems. The three reasons suggested for the poor performance of vertically-partitioned row stores [...]</description>
		<content:encoded><![CDATA[<p>[...] DBMS for efficient data warehousing, it isn&#8217;t necessarily dispositive for a comparison of columnar systems to data-warehouse-specialist row-based systems. The three reasons suggested for the poor performance of vertically-partitioned row stores [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: DBMS2 &#8212; DataBase Management System Services &#187; Blog Archive &#187; Three bold assertions by Mike Stonebraker</title>
		<link>http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/#comment-82933</link>
		<dc:creator>DBMS2 &#8212; DataBase Management System Services &#187; Blog Archive &#187; Three bold assertions by Mike Stonebraker</dc:creator>
		<pubDate>Fri, 25 Apr 2008 04:07:17 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/#comment-82933</guid>
		<description>[...] *For example, were Vertica&#8217;s competitors set up with vertical partitioning? [...]</description>
		<content:encoded><![CDATA[<p>[...] *For example, were Vertica&#8217;s competitors set up with vertical partitioning? [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Kanter</title>
		<link>http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/#comment-23177</link>
		<dc:creator>David Kanter</dc:creator>
		<pubDate>Tue, 27 Mar 2007 02:30:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/#comment-23177</guid>
		<description>Chuck,

Thanks for the clarification - that makes much more sense now.

I believe that in the scenario you describe - loading, the graph wouldn&#039;t have much value, whereas when you update (hence you must enforce consistency and coherency), it would be interesting.

DK</description>
		<content:encoded><![CDATA[<p>Chuck,</p>
<p>Thanks for the clarification &#8211; that makes much more sense now.</p>
<p>I believe that in the scenario you describe &#8211; loading, the graph wouldn&#8217;t have much value, whereas when you update (hence you must enforce consistency and coherency), it would be interesting.</p>
<p>DK</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chuck</title>
		<link>http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/#comment-22767</link>
		<dc:creator>Chuck</dc:creator>
		<pubDate>Fri, 23 Mar 2007 06:30:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/#comment-22767</guid>
		<description>David,

I don&#039;t think I follow the analogy.  There&#039;s a big distinction between loads and updates.  In inserts/loads, there&#039;s no &quot;coherency&quot; needed beyond knowing which data was part of what transaction (timestamping, as you say).  From the perspective of a node processing a load stream, each row loaded &quot;belongs&quot; on one of the nodes in the cluster, so it can be sent to that node directly and nobody else needs to know about it.  (If a query requests that row, the node processing the query knows where to look.)

So from the perspective of a node doing a load in a cluster of N nodes, it processes a stream of data and sends it to N-1 places (keeping 1/Nth for itself).  The node will also receive data from N-1 other nodes if they are processing loads.  So if all nodes are loading data at the same time, each node would expect to receive as much data as it sends.  So load speed scales linearly with the number of nodes as long as:
1. A node can handle N incoming load streams.  (This translates to a memory requirement, mostly.)
2. The interconnect can distribute all the data, full duplex, with all nodes talking to all others at once on the backplane.  (Currently this is true of relatively cheap GigE switches with 32 or 64 ports.)

I&#039;m sorry, but I don&#039;t understand the concept behind the graph.  It seems to me like for a given load size and frequency, either the system will keep up or it won&#039;t.  Wouldn&#039;t it be more intersting to know what the maximum load rate is (GB/min) for combinations of parameters like # of nodes in the cluster and # of GB in each load?</description>
		<content:encoded><![CDATA[<p>David,</p>
<p>I don&#8217;t think I follow the analogy.  There&#8217;s a big distinction between loads and updates.  In inserts/loads, there&#8217;s no &#8220;coherency&#8221; needed beyond knowing which data was part of what transaction (timestamping, as you say).  From the perspective of a node processing a load stream, each row loaded &#8220;belongs&#8221; on one of the nodes in the cluster, so it can be sent to that node directly and nobody else needs to know about it.  (If a query requests that row, the node processing the query knows where to look.)</p>
<p>So from the perspective of a node doing a load in a cluster of N nodes, it processes a stream of data and sends it to N-1 places (keeping 1/Nth for itself).  The node will also receive data from N-1 other nodes if they are processing loads.  So if all nodes are loading data at the same time, each node would expect to receive as much data as it sends.  So load speed scales linearly with the number of nodes as long as:<br />
1. A node can handle N incoming load streams.  (This translates to a memory requirement, mostly.)<br />
2. The interconnect can distribute all the data, full duplex, with all nodes talking to all others at once on the backplane.  (Currently this is true of relatively cheap GigE switches with 32 or 64 ports.)</p>
<p>I&#8217;m sorry, but I don&#8217;t understand the concept behind the graph.  It seems to me like for a given load size and frequency, either the system will keep up or it won&#8217;t.  Wouldn&#8217;t it be more intersting to know what the maximum load rate is (GB/min) for combinations of parameters like # of nodes in the cluster and # of GB in each load?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Kanter</title>
		<link>http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/#comment-22585</link>
		<dc:creator>David Kanter</dc:creator>
		<pubDate>Wed, 21 Mar 2007 19:02:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/#comment-22585</guid>
		<description>Chuck,

Thanks for the elaboration. Here&#039;s why I ask...

When I think about the problem from an architectural standpoint (computer architecture that is), it&#039;s exactly analogous to an update cache coherency policy.

The E/T steps are local, and those probably scale reasonably well.  What isn&#039;t going to scale is the 1:N communication of distributing the data out to every node, and the resulting acknowledgements.  You have to update all N nodes, and you can&#039;t really get around that.

IIRC, you guys use timestamping, so you don&#039;t have to redo any in-flight transactions because of an update.

As I said, what would be interesting would be a 3D graph of:

Data load size (x GB), data load frequency (Y loads/hour) and performance (Z seconds)

DK</description>
		<content:encoded><![CDATA[<p>Chuck,</p>
<p>Thanks for the elaboration. Here&#8217;s why I ask&#8230;</p>
<p>When I think about the problem from an architectural standpoint (computer architecture that is), it&#8217;s exactly analogous to an update cache coherency policy.</p>
<p>The E/T steps are local, and those probably scale reasonably well.  What isn&#8217;t going to scale is the 1:N communication of distributing the data out to every node, and the resulting acknowledgements.  You have to update all N nodes, and you can&#8217;t really get around that.</p>
<p>IIRC, you guys use timestamping, so you don&#8217;t have to redo any in-flight transactions because of an update.</p>
<p>As I said, what would be interesting would be a 3D graph of:</p>
<p>Data load size (x GB), data load frequency (Y loads/hour) and performance (Z seconds)</p>
<p>DK</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chuck</title>
		<link>http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/#comment-22551</link>
		<dc:creator>Chuck</dc:creator>
		<pubDate>Wed, 21 Mar 2007 13:42:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/#comment-22551</guid>
		<description>Agreed that 4 nodes is pretty small, but considering that Vertica is shared nothing and all load steps are local to the machine except data segmentation, load speed scales with number of nodes assisting in the load as long as each node has a data stream and you have a good switch.

We did see around 4x load speed compared to 1 node in this test, which is far more than we can say for a competing row store that uses a shared disk architecture and saw 2.5x.  Likewise, a competing shared-nothing row store without a parallel load feature didn&#039;t get anywhere near 4x on 4 nodes, as load saw 1x while index and MV build saw 4x.

Of course other products out there do it the same way we do and don&#039;t suffer these limitations, but I repeat my claim that there&#039;s no theoretical reason why a column store would be beaten on load performance.  Nor do we ever get beaten (on apples-to-apples hardware of course).

Stay tuned for more complete presentations on our numbers in an upcoming paper, as well as bigger cluster and data sizes.</description>
		<content:encoded><![CDATA[<p>Agreed that 4 nodes is pretty small, but considering that Vertica is shared nothing and all load steps are local to the machine except data segmentation, load speed scales with number of nodes assisting in the load as long as each node has a data stream and you have a good switch.</p>
<p>We did see around 4x load speed compared to 1 node in this test, which is far more than we can say for a competing row store that uses a shared disk architecture and saw 2.5x.  Likewise, a competing shared-nothing row store without a parallel load feature didn&#8217;t get anywhere near 4x on 4 nodes, as load saw 1x while index and MV build saw 4x.</p>
<p>Of course other products out there do it the same way we do and don&#8217;t suffer these limitations, but I repeat my claim that there&#8217;s no theoretical reason why a column store would be beaten on load performance.  Nor do we ever get beaten (on apples-to-apples hardware of course).</p>
<p>Stay tuned for more complete presentations on our numbers in an upcoming paper, as well as bigger cluster and data sizes.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: DBMS2 &#8212; DataBase Management System Services&#187;Blog Archive &#187; Compression in columnar data stores</title>
		<link>http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/#comment-22519</link>
		<dc:creator>DBMS2 &#8212; DataBase Management System Services&#187;Blog Archive &#187; Compression in columnar data stores</dc:creator>
		<pubDate>Wed, 21 Mar 2007 08:13:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/#comment-22519</guid>
		<description>[...] We have lively discussions going on columnar data stores vs. vertically partitioned row stores. Part is visible in the comment thread to a recent post. Other parts come in private comments from Stuart Frost of DATAllegro and Mike Stonebraker of Vertica et al. [...]</description>
		<content:encoded><![CDATA[<p>[...] We have lively discussions going on columnar data stores vs. vertically partitioned row stores. Part is visible in the comment thread to a recent post. Other parts come in private comments from Stuart Frost of DATAllegro and Mike Stonebraker of Vertica et al. [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>

