<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Data warehouse load speeds in the spotlight</title>
	<atom:link href="http://www.dbms2.com/2008/12/02/data-warehouse-load-speeds-in-the-spotlight/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com/2008/12/02/data-warehouse-load-speeds-in-the-spotlight/</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 09 Feb 2012 16:57:09 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
	<item>
		<title>By: While I&#8217;m venting about benchmarks &#124; DBMS2 -- DataBase Management System Services</title>
		<link>http://www.dbms2.com/2008/12/02/data-warehouse-load-speeds-in-the-spotlight/#comment-129665</link>
		<dc:creator>While I&#8217;m venting about benchmarks &#124; DBMS2 -- DataBase Management System Services</dc:creator>
		<pubDate>Wed, 08 Jul 2009 23:58:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=634#comment-129665</guid>
		<description>[...] last year, Vertica made hoo-hah about what it called a world-record data warehouse load speed benchmark.  I wrote at the time that this showed Vertica wasn&#8217;t painfully slow at loading, always a [...]</description>
		<content:encoded><![CDATA[<p>[...] last year, Vertica made hoo-hah about what it called a world-record data warehouse load speed benchmark.  I wrote at the time that this showed Vertica wasn&#8217;t painfully slow at loading, always a [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greenplum claims very fast load speeds, and Fox still throws away most of its MySpace data &#124; DBMS2 -- DataBase Management System Services</title>
		<link>http://www.dbms2.com/2008/12/02/data-warehouse-load-speeds-in-the-spotlight/#comment-113839</link>
		<dc:creator>Greenplum claims very fast load speeds, and Fox still throws away most of its MySpace data &#124; DBMS2 -- DataBase Management System Services</dc:creator>
		<pubDate>Fri, 20 Mar 2009 09:10:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=634#comment-113839</guid>
		<description>[...] warehouse load speeds are a contentious issue.  Vertica contrived a benchmark with a 5 1/2 terabyte/hour load rate.  Oracle has gotten dinged for very low load speeds, which then are hotly debated.  I was told [...]</description>
		<content:encoded><![CDATA[<p>[...] warehouse load speeds are a contentious issue.  Vertica contrived a benchmark with a 5 1/2 terabyte/hour load rate.  Oracle has gotten dinged for very low load speeds, which then are hotly debated.  I was told [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Expressor pre-announces a data loading benchmark leapfrog &#124; DBMS2 -- DataBase Management System Services</title>
		<link>http://www.dbms2.com/2008/12/02/data-warehouse-load-speeds-in-the-spotlight/#comment-106250</link>
		<dc:creator>Expressor pre-announces a data loading benchmark leapfrog &#124; DBMS2 -- DataBase Management System Services</dc:creator>
		<pubDate>Sun, 04 Jan 2009 18:22:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=634#comment-106250</guid>
		<description>[...] Software plans to blow the Vertica/Syncsort &#8220;benchmark&#8221; out of the water, to wit What I know already is that our numbers will between 7 and 8 min to load [...]</description>
		<content:encoded><![CDATA[<p>[...] Software plans to blow the Vertica/Syncsort &#8220;benchmark&#8221; out of the water, to wit What I know already is that our numbers will between 7 and 8 min to load [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cooper to logan &#187; Blog Archive &#187; the white zone if for immediate loading only</title>
		<link>http://www.dbms2.com/2008/12/02/data-warehouse-load-speeds-in-the-spotlight/#comment-106240</link>
		<dc:creator>cooper to logan &#187; Blog Archive &#187; the white zone if for immediate loading only</dc:creator>
		<pubDate>Sun, 04 Jan 2009 14:58:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=634#comment-106240</guid>
		<description>[...] There has been a renewed buzz in the data integration vendor world around the coveted tpc-h benchmark. A discussion about the latest can be found at dbms2. [...]</description>
		<content:encoded><![CDATA[<p>[...] There has been a renewed buzz in the data integration vendor world around the coveted tpc-h benchmark. A discussion about the latest can be found at dbms2. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: More from Vertica on data warehouse load speeds &#124; DBMS2 -- DataBase Management System Services</title>
		<link>http://www.dbms2.com/2008/12/02/data-warehouse-load-speeds-in-the-spotlight/#comment-106166</link>
		<dc:creator>More from Vertica on data warehouse load speeds &#124; DBMS2 -- DataBase Management System Services</dc:creator>
		<pubDate>Sat, 03 Jan 2009 05:37:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=634#comment-106166</guid>
		<description>[...] month, when Vertica releases its &#8220;benchmark&#8221; of data warehouse load speeds, I didn&#8217;t realize it had previously released some actual customer-experience load rates as [...]</description>
		<content:encoded><![CDATA[<p>[...] month, when Vertica releases its &#8220;benchmark&#8221; of data warehouse load speeds, I didn&#8217;t realize it had previously released some actual customer-experience load rates as [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ParAccel actually uses relatively little PostgreSQL code &#124; DBMS2 -- DataBase Management System Services</title>
		<link>http://www.dbms2.com/2008/12/02/data-warehouse-load-speeds-in-the-spotlight/#comment-105843</link>
		<dc:creator>ParAccel actually uses relatively little PostgreSQL code &#124; DBMS2 -- DataBase Management System Services</dc:creator>
		<pubDate>Tue, 30 Dec 2008 00:07:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=634#comment-105843</guid>
		<description>[...] I did get careless when I neglected to doublecheck something I already knew.  The conclusion of this post isn&#8217;t really consistent with what ParAccel told me way back in February, 2007 about how much [...]</description>
		<content:encoded><![CDATA[<p>[...] I did get careless when I neglected to doublecheck something I already knew.  The conclusion of this post isn&#8217;t really consistent with what ParAccel told me way back in February, 2007 about how much [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Omer Trajman</title>
		<link>http://www.dbms2.com/2008/12/02/data-warehouse-load-speeds-in-the-spotlight/#comment-103951</link>
		<dc:creator>Omer Trajman</dc:creator>
		<pubDate>Wed, 10 Dec 2008 20:05:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=634#comment-103951</guid>
		<description>These are great questions and many are answered in the full benchmark report (http://www.vertica.com/php/pdfgateway?file=ETL-World-Record-Audit-Report.pdf).  I can clarify trickle vs. bulk loads in Vertica and in this benchmark as well as what Vertica handles regarding real-time and near real time.   

The Vertica engine does not internally distinguish between trickle and bulk loads.  There are no indexes to turn on and off, no transaction log and no special bulk load modes.  Data is available for fast querying right after it commits.

We do differentiate between trickle and bulk loads using common industry terminology.  A trickle feed is where data is sent to the database at a rate that accommodates queries running concurrently and is typically continuous (24x7).  A bulk load assumes no queries for some period of time and full resources are consumed by the data load.  

This benchmark was designed to test bulk loads – sending data to the server as fast as possible.  At the end of the load data was immediately available for fast querying.  We have numerous benchmarks (http://www.vertica.com/benchmarks) and customers (http://www.vertica.com/customers) that demonstrate trickle load capabilities.  In telecommunications and financial services, trickle loads are a critical feature that prompted customers to deploy Vertica.

Note that Vertica by itself is designed for near-real time (delay in seconds) not real-time or CEP analysis.  We recently announced the deployment of Streambase and Vertica at BlueCrest Capital to handle both streaming data and historical data (http://www.vertica.com/company/news_and_events/StreamBase-and-Vertica-Announce-Customer-Deployment-at-BlueCrest).</description>
		<content:encoded><![CDATA[<p>These are great questions and many are answered in the full benchmark report (<a href="http://www.vertica.com/php/pdfgateway?file=ETL-World-Record-Audit-Report.pdf" rel="nofollow">http://www.vertica.com/php/pdfgateway?file=ETL-World-Record-Audit-Report.pdf</a>).  I can clarify trickle vs. bulk loads in Vertica and in this benchmark as well as what Vertica handles regarding real-time and near real time.   </p>
<p>The Vertica engine does not internally distinguish between trickle and bulk loads.  There are no indexes to turn on and off, no transaction log and no special bulk load modes.  Data is available for fast querying right after it commits.</p>
<p>We do differentiate between trickle and bulk loads using common industry terminology.  A trickle feed is where data is sent to the database at a rate that accommodates queries running concurrently and is typically continuous (24&#215;7).  A bulk load assumes no queries for some period of time and full resources are consumed by the data load.  </p>
<p>This benchmark was designed to test bulk loads – sending data to the server as fast as possible.  At the end of the load data was immediately available for fast querying.  We have numerous benchmarks (<a href="http://www.vertica.com/benchmarks" rel="nofollow">http://www.vertica.com/benchmarks</a>) and customers (<a href="http://www.vertica.com/customers" rel="nofollow">http://www.vertica.com/customers</a>) that demonstrate trickle load capabilities.  In telecommunications and financial services, trickle loads are a critical feature that prompted customers to deploy Vertica.</p>
<p>Note that Vertica by itself is designed for near-real time (delay in seconds) not real-time or CEP analysis.  We recently announced the deployment of Streambase and Vertica at BlueCrest Capital to handle both streaming data and historical data (<a href="http://www.vertica.com/company/news_and_events/StreamBase-and-Vertica-Announce-Customer-Deployment-at-BlueCrest" rel="nofollow">http://www.vertica.com/company/news_and_events/StreamBase-and-Vertica-Announce-Customer-Deployment-at-BlueCrest</a>).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeremy Wong</title>
		<link>http://www.dbms2.com/2008/12/02/data-warehouse-load-speeds-in-the-spotlight/#comment-103390</link>
		<dc:creator>Jeremy Wong</dc:creator>
		<pubDate>Thu, 04 Dec 2008 06:14:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=634#comment-103390</guid>
		<description>ETL is the legacy world, extract a whole thing and then load. Not really scalable.
The active data warehouse is the future, when only incremental transactional changes to be loaded into data warehouse. See the DataMirror (now IBM), Shareplex or Wisdomforce that successfully do that for years</description>
		<content:encoded><![CDATA[<p>ETL is the legacy world, extract a whole thing and then load. Not really scalable.<br />
The active data warehouse is the future, when only incremental transactional changes to be loaded into data warehouse. See the DataMirror (now IBM), Shareplex or Wisdomforce that successfully do that for years</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2008/12/02/data-warehouse-load-speeds-in-the-spotlight/#comment-103356</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Wed, 03 Dec 2008 19:02:47 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=634#comment-103356</guid>
		<description>Yep.  Definitely ETL.  Even mentioned that a small fraction of the records were deliberately erroneous.</description>
		<content:encoded><![CDATA[<p>Yep.  Definitely ETL.  Even mentioned that a small fraction of the records were deliberately erroneous.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Neil Raden</title>
		<link>http://www.dbms2.com/2008/12/02/data-warehouse-load-speeds-in-the-spotlight/#comment-103353</link>
		<dc:creator>Neil Raden</dc:creator>
		<pubDate>Wed, 03 Dec 2008 17:58:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=634#comment-103353</guid>
		<description>Seth, I haven&#039;t read the Vertica announcement yet, but my sense is that they were including ETL functions from Syncsort, not just screaming a neat flat file. I didn&#039;t see similar ETL comments in the other announcements. 

We also used to use sort utilities, including syncsort, to stage the files to improve bulk loading figures. So you have to look very carefully at these benchmarks. 

-NR
twitter: nraden</description>
		<content:encoded><![CDATA[<p>Seth, I haven&#8217;t read the Vertica announcement yet, but my sense is that they were including ETL functions from Syncsort, not just screaming a neat flat file. I didn&#8217;t see similar ETL comments in the other announcements. </p>
<p>We also used to use sort utilities, including syncsort, to stage the files to improve bulk loading figures. So you have to look very carefully at these benchmarks. </p>
<p>-NR<br />
twitter: nraden</p>
]]></content:encoded>
	</item>
</channel>
</rss>

