<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: eBay&#8217;s two enormous data warehouses</title>
	<atom:link href="http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Mon, 25 Jan 2010 14:39:21 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: SQL is Dead. Long Live SQL. : Dataspora Blog</title>
		<link>http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/comment-page-1/#comment-150618</link>
		<dc:creator>SQL is Dead. Long Live SQL. : Dataspora Blog</dc:creator>
		<pubDate>Wed, 25 Nov 2009 10:58:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=770#comment-150618</guid>
		<description>[...] not that relational databases can&#8217;t scale – in fact, they can and do scale to petabytes, as  those who know Fortune 500 enterprise computing can attest . The problem is that relational databases don&#8217;t scale  easily  – and require a lot of ETL [...]</description>
		<content:encoded><![CDATA[<p>[...] not that relational databases can&#8217;t scale – in fact, they can and do scale to petabytes, as  those who know Fortune 500 enterprise computing can attest . The problem is that relational databases don&#8217;t scale  easily  – and require a lot of ETL [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: BI-Quotient &#187; Blog Archive &#187; Greenplum, MapReduce, and Hadoop</title>
		<link>http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/comment-page-1/#comment-125870</link>
		<dc:creator>BI-Quotient &#187; Blog Archive &#187; Greenplum, MapReduce, and Hadoop</dc:creator>
		<pubDate>Thu, 18 Jun 2009 20:10:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=770#comment-125870</guid>
		<description>[...] 6.5 Petabytes of data eBay runs the world&#8217;s largest data warehouse on Greenplum. Facebook runs a 2 PB warehouse on [...]</description>
		<content:encoded><![CDATA[<p>[...] 6.5 Petabytes of data eBay runs the world&#8217;s largest data warehouse on Greenplum. Facebook runs a 2 PB warehouse on [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: More on Fox Interactive Media&#8217;s use of Greenplum &#124; DBMS2 -- DataBase Management System Services</title>
		<link>http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/comment-page-1/#comment-124556</link>
		<dc:creator>More on Fox Interactive Media&#8217;s use of Greenplum &#124; DBMS2 -- DataBase Management System Services</dc:creator>
		<pubDate>Mon, 08 Jun 2009 04:48:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=770#comment-124556</guid>
		<description>[...] most important reference is probably its energetic advocate Fox Interactive Media, even ahead of much larger user Greenplum user eBay, and notwithstanding Aster Data&#8217;s large presence in Fox subsidiary MySpace. I just ran across [...]</description>
		<content:encoded><![CDATA[<p>[...] most important reference is probably its energetic advocate Fox Interactive Media, even ahead of much larger user Greenplum user eBay, and notwithstanding Aster Data&#8217;s large presence in Fox subsidiary MySpace. I just ran across [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew</title>
		<link>http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/comment-page-1/#comment-122616</link>
		<dc:creator>Andrew</dc:creator>
		<pubDate>Sun, 24 May 2009 01:34:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=770#comment-122616</guid>
		<description>Yahoo&#039;s main data warehouse was up to 3 petabytes compressed at the end of 2007.</description>
		<content:encoded><![CDATA[<p>Yahoo&#8217;s main data warehouse was up to 3 petabytes compressed at the end of 2007.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: blog.rbach.net - Server Sprawl Continues</title>
		<link>http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/comment-page-1/#comment-121856</link>
		<dc:creator>blog.rbach.net - Server Sprawl Continues</dc:creator>
		<pubDate>Sat, 16 May 2009 22:40:46 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=770#comment-121856</guid>
		<description>[...] million users on Skype, eBay has a massive data center infrastructure. The company houses more than 8.5 petabytes of data in huge data warehouses. We&#8217;re not certain what kind of server count this requires, but it&#8217;s certainly in the [...]</description>
		<content:encoded><![CDATA[<p>[...] million users on Skype, eBay has a massive data center infrastructure. The company houses more than 8.5 petabytes of data in huge data warehouses. We&#8217;re not certain what kind of server count this requires, but it&#8217;s certainly in the [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Diverging views on density, and some gimmes &#171; Data Beta</title>
		<link>http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/comment-page-1/#comment-121478</link>
		<dc:creator>Diverging views on density, and some gimmes &#171; Data Beta</dc:creator>
		<pubDate>Thu, 14 May 2009 08:33:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=770#comment-121478</guid>
		<description>[...] Monash posted that eBay hosts a 6.5 petabyte Greenplum database on 96 [...]</description>
		<content:encoded><![CDATA[<p>[...] Monash posted that eBay hosts a 6.5 petabyte Greenplum database on 96 [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Facebook, Hadoop, and Hive &#124; DBMS2 -- DataBase Management System Services</title>
		<link>http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/comment-page-1/#comment-120954</link>
		<dc:creator>Facebook, Hadoop, and Hive &#124; DBMS2 -- DataBase Management System Services</dc:creator>
		<pubDate>Mon, 11 May 2009 08:34:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=770#comment-120954</guid>
		<description>[...] eBay has a 6 1/2 petabyte database running on Greenplum and a 2 1/2 petabyte enterprise data warehouse running on Teradata. [...]</description>
		<content:encoded><![CDATA[<p>[...] eBay has a 6 1/2 petabyte database running on Greenplum and a 2 1/2 petabyte enterprise data warehouse running on Teradata. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greg Rahn</title>
		<link>http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/comment-page-1/#comment-120489</link>
		<dc:creator>Greg Rahn</dc:creator>
		<pubDate>Fri, 08 May 2009 06:27:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=770#comment-120489</guid>
		<description>&lt;a href=&quot;#comment-120236&quot; rel=&quot;nofollow&quot;&gt;@anonymous&lt;/a&gt; 

I would agree with your interpretation of the dual/quad fibre channel [card]. To be honest, I would have never guessed anyone made a 4 port HBA but I guess LSI does: &lt;a href=&quot;http://www.lsi.com/storage_home/products_home/host_bus_adapters/fibre_channel_hbas/lsi7404xplc/&quot; rel=&quot;nofollow&quot;&gt;LSI7404XP-LC&lt;/a&gt; and given Teradata resells &lt;a href=&quot;http://www.lsi.com/storage_home/products_home/external_raid/index.html&quot; rel=&quot;nofollow&quot;&gt;LSI Engenio storage&lt;/a&gt; it is likely they use their HBAs also.  Given a 2 port 4GFC PCI-X HBA can deliver 80% of the slot bandwidth, it seems like a bit of a waste to go to 4 ports, at least for performance.  For connectivity, perhaps, which is why I believe Teradata may use it - for their cliques.

The other reason for my comment of a max of 1600MB/s per node is that the LSI Engenio 6998 array only does a max of 1600MB/s (&lt;a href=&quot;http://www.lsi.com/DistributionSystem/AssetDocument/Engenio7900%20overview.ppt&quot; rel=&quot;nofollow&quot;&gt;per LSI&#039;s presentation&lt;/a&gt;) and the 7900 array is quite new so it would seem doubtful that eBay uses that one.  They may have opted for the EMC DMX storage, but I would think that would be an extremely costly solution at 72 nodes.

I may soften up a bit on the I/O rate also.  Curt&#039;s comment &quot;&lt;em&gt;maybe that’s a peak when the workload is scan-heavy&lt;/em&gt;&quot; is probably correct.  Peak rate vs. sustained, I &lt;em&gt;could&lt;/em&gt; see that.  Maybe they do some light scans of some large de-normalized table making it peak out at 2GB/s per node.  But that number still seems quite high at 250MB/s per CPU core.  Doing group bys and aggregation I&#039;m sure that number drops fast.   

I think the interesting, and unmentioned data point, is how many hard drives are in this 72 node config to deliver this I/O number.  eBays&#039;s own &lt;a href=&quot;http://www.dbms2.com/2009/04/28/data-warehouse-storage-options-cheap-expensive-or-solid-state-disk-drives/#comment-119284&quot; rel=&quot;nofollow&quot;&gt;Michael McIntire reports&lt;/a&gt; that Teradata I/O is all random so the MBPS rate per drive is probably somewhere around 30MB/s (give or take).  My guess is that there is somewhere between 4500 and 5000 HDDs (around 64 HDDs per node).</description>
		<content:encoded><![CDATA[<p><a href="#comment-120236" rel="nofollow">@anonymous</a> </p>
<p>I would agree with your interpretation of the dual/quad fibre channel [card]. To be honest, I would have never guessed anyone made a 4 port HBA but I guess LSI does: <a href="http://www.lsi.com/storage_home/products_home/host_bus_adapters/fibre_channel_hbas/lsi7404xplc/" onclick="javascript:pageTracker._trackPageview('/www.lsi.com');" rel="nofollow">LSI7404XP-LC</a> and given Teradata resells <a href="http://www.lsi.com/storage_home/products_home/external_raid/index.html" onclick="javascript:pageTracker._trackPageview('/www.lsi.com');" rel="nofollow">LSI Engenio storage</a> it is likely they use their HBAs also.  Given a 2 port 4GFC PCI-X HBA can deliver 80% of the slot bandwidth, it seems like a bit of a waste to go to 4 ports, at least for performance.  For connectivity, perhaps, which is why I believe Teradata may use it &#8211; for their cliques.</p>
<p>The other reason for my comment of a max of 1600MB/s per node is that the LSI Engenio 6998 array only does a max of 1600MB/s (<a href="http://www.lsi.com/DistributionSystem/AssetDocument/Engenio7900%20overview.ppt" onclick="javascript:pageTracker._trackPageview('/www.lsi.com');" rel="nofollow">per LSI&#8217;s presentation</a>) and the 7900 array is quite new so it would seem doubtful that eBay uses that one.  They may have opted for the EMC DMX storage, but I would think that would be an extremely costly solution at 72 nodes.</p>
<p>I may soften up a bit on the I/O rate also.  Curt&#8217;s comment &#8220;<em>maybe that’s a peak when the workload is scan-heavy</em>&#8221; is probably correct.  Peak rate vs. sustained, I <em>could</em> see that.  Maybe they do some light scans of some large de-normalized table making it peak out at 2GB/s per node.  But that number still seems quite high at 250MB/s per CPU core.  Doing group bys and aggregation I&#8217;m sure that number drops fast.   </p>
<p>I think the interesting, and unmentioned data point, is how many hard drives are in this 72 node config to deliver this I/O number.  eBays&#8217;s own <a href="http://www.dbms2.com/2009/04/28/data-warehouse-storage-options-cheap-expensive-or-solid-state-disk-drives/#comment-119284"  rel="nofollow">Michael McIntire reports</a> that Teradata I/O is all random so the MBPS rate per drive is probably somewhere around 30MB/s (give or take).  My guess is that there is somewhere between 4500 and 5000 HDDs (around 64 HDDs per node).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Yan</title>
		<link>http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/comment-page-1/#comment-120267</link>
		<dc:creator>Yan</dc:creator>
		<pubDate>Wed, 06 May 2009 21:34:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=770#comment-120267</guid>
		<description>Great to open the discussion of VLDB.

Without performance regards of end user queries, the size of DB is meaningless, at most is data storage. There are some limits of VLDB being set up by system considering both size and performance of DB. It would be nice to have some numbers for this.

Yan</description>
		<content:encoded><![CDATA[<p>Great to open the discussion of VLDB.</p>
<p>Without performance regards of end user queries, the size of DB is meaningless, at most is data storage. There are some limits of VLDB being set up by system considering both size and performance of DB. It would be nice to have some numbers for this.</p>
<p>Yan</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: anonymous</title>
		<link>http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/comment-page-1/#comment-120236</link>
		<dc:creator>anonymous</dc:creator>
		<pubDate>Wed, 06 May 2009 15:04:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=770#comment-120236</guid>
		<description>Greg Rahn notes &quot;Four 4GBFC would be 4 x 400MB/s = 1600MB/s max&quot;. 

I went to the link and looked up the specs on the Teradata 5550 node. The data sheet says there are 3 PCI-X slots. It also says I/O can be 4 GB dual or quad fibre channel. My interpretation is dual or quad PCI-X adapter cards. With 3 PCI-X slots that means Teradata can have up to 12 Fibre Channel links per node for a theoretical bandwidth of 4.8 GB/s. The limiting factor is probably three 133 MHz PCI-X buses, which are 1066 MB/s apiece giving 3 GB/s per node.

Mr Rahn also says &quot;I would also comment that 2 Quad Core Xeon 5400 series processors would not be able to do anything but a SELECT COUNT(*) and ingest data at 2GB/s (or even 1600MB/s).&quot; But yet eBay says they are doing it - and a lot more.</description>
		<content:encoded><![CDATA[<p>Greg Rahn notes &#8220;Four 4GBFC would be 4 x 400MB/s = 1600MB/s max&#8221;. </p>
<p>I went to the link and looked up the specs on the Teradata 5550 node. The data sheet says there are 3 PCI-X slots. It also says I/O can be 4 GB dual or quad fibre channel. My interpretation is dual or quad PCI-X adapter cards. With 3 PCI-X slots that means Teradata can have up to 12 Fibre Channel links per node for a theoretical bandwidth of 4.8 GB/s. The limiting factor is probably three 133 MHz PCI-X buses, which are 1066 MB/s apiece giving 3 GB/s per node.</p>
<p>Mr Rahn also says &#8220;I would also comment that 2 Quad Core Xeon 5400 series processors would not be able to do anything but a SELECT COUNT(*) and ingest data at 2GB/s (or even 1600MB/s).&#8221; But yet eBay says they are doing it &#8211; and a lot more.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic page generated in 0.209 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2010-01-27 00:15:37 -->
