<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Data warehouse storage options &#8212; cheap, expensive, or solid-state disk drives</title>
	<atom:link href="http://www.dbms2.com/2009/04/28/data-warehouse-storage-options-cheap-expensive-or-solid-state-disk-drives/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com/2009/04/28/data-warehouse-storage-options-cheap-expensive-or-solid-state-disk-drives/</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 04 Mar 2010 04:56:40 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2009/04/28/data-warehouse-storage-options-cheap-expensive-or-solid-state-disk-drives/comment-page-1/#comment-121147</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Tue, 12 May 2009 14:22:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=768#comment-121147</guid>
		<description>Robert,

I understand the appeal of saying something like &quot;The reason we need to be aware of physical design is largely complex query performance. Complex query performance is an issue mainly because of I/O. If we have better storage technology, that problem goes away, and we can start ignoring physical design the way the theorists have always wanted us to.&quot;

But I think we&#039;re a long way from reaching that ideal, at best.  Data warehouses are BIG, and getting bigger.  They&#039;ll push the limits of hardware technology for a long time to come.</description>
		<content:encoded><![CDATA[<p>Robert,</p>
<p>I understand the appeal of saying something like &#8220;The reason we need to be aware of physical design is largely complex query performance. Complex query performance is an issue mainly because of I/O. If we have better storage technology, that problem goes away, and we can start ignoring physical design the way the theorists have always wanted us to.&#8221;</p>
<p>But I think we&#8217;re a long way from reaching that ideal, at best.  Data warehouses are BIG, and getting bigger.  They&#8217;ll push the limits of hardware technology for a long time to come.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robert Young</title>
		<link>http://www.dbms2.com/2009/04/28/data-warehouse-storage-options-cheap-expensive-or-solid-state-disk-drives/comment-page-1/#comment-121137</link>
		<dc:creator>Robert Young</dc:creator>
		<pubDate>Tue, 12 May 2009 12:48:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=768#comment-121137</guid>
		<description>Check Andandtech for reviews of SSD.  The latest is from 20 March 2009.  Deals explicitly with some of the issues here.  An earlier review dealt with the &quot;block&quot; write versus read.

The value of SSD is not going to be in highly redundant, flat-file (called whatever you want) style datastores; price will be too high.  The value will be in high NF relational databases.  Now, in my opinion (which you can read, and I am not alone), SSD will be the motivator that merges back OLTP with its various replicants.  SSD, and the flash versions (both MLC and SLC) are only the latest low-end implementations (check Texas Memory Systems for one example of industrial strength SSD), removes the join penalty from 3/4/5NF databases.  

The bottleneck will be in finding folks with enough smarts to embrace (again) Dr. Codd&#039;s vision.  The xml folk are not those kind of folk.  My candidate is Larry Ellison.  The reason is that the Oracle architecture, MVCC, is superior for OLTP (IBM finally just capitulated with entrpriseDB).  With SSD, he can use the Oracle database, appropriately normalized, to support both without stars and snowflakes.  A true one stop solution.</description>
		<content:encoded><![CDATA[<p>Check Andandtech for reviews of SSD.  The latest is from 20 March 2009.  Deals explicitly with some of the issues here.  An earlier review dealt with the &#8220;block&#8221; write versus read.</p>
<p>The value of SSD is not going to be in highly redundant, flat-file (called whatever you want) style datastores; price will be too high.  The value will be in high NF relational databases.  Now, in my opinion (which you can read, and I am not alone), SSD will be the motivator that merges back OLTP with its various replicants.  SSD, and the flash versions (both MLC and SLC) are only the latest low-end implementations (check Texas Memory Systems for one example of industrial strength SSD), removes the join penalty from 3/4/5NF databases.  </p>
<p>The bottleneck will be in finding folks with enough smarts to embrace (again) Dr. Codd&#8217;s vision.  The xml folk are not those kind of folk.  My candidate is Larry Ellison.  The reason is that the Oracle architecture, MVCC, is superior for OLTP (IBM finally just capitulated with entrpriseDB).  With SSD, he can use the Oracle database, appropriately normalized, to support both without stars and snowflakes.  A true one stop solution.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Callaghan</title>
		<link>http://www.dbms2.com/2009/04/28/data-warehouse-storage-options-cheap-expensive-or-solid-state-disk-drives/comment-page-1/#comment-120317</link>
		<dc:creator>Mark Callaghan</dc:creator>
		<pubDate>Thu, 07 May 2009 05:09:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=768#comment-120317</guid>
		<description>Curt,

I agree with you that workload has something to do with performance. Ignore the poor wording. I mean that you won&#039;t get 10X more MB/s or IOPs from 15k SAS versus 7200 RPM SATA. Teradata has done clever things with track aligned reads to optimize disk performance. I would much rather read about that.</description>
		<content:encoded><![CDATA[<p>Curt,</p>
<p>I agree with you that workload has something to do with performance. Ignore the poor wording. I mean that you won&#8217;t get 10X more MB/s or IOPs from 15k SAS versus 7200 RPM SATA. Teradata has done clever things with track aligned reads to optimize disk performance. I would much rather read about that.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2009/04/28/data-warehouse-storage-options-cheap-expensive-or-solid-state-disk-drives/comment-page-1/#comment-119632</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Sat, 02 May 2009 22:42:34 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=768#comment-119632</guid>
		<description>Mark,

Maybe the eBay guys diagnosed their situation correctly and maybe they didn&#039;t, but I can&#039;t begin to fathom your basis for saying that workload has nothing to do with it.

CAM</description>
		<content:encoded><![CDATA[<p>Mark,</p>
<p>Maybe the eBay guys diagnosed their situation correctly and maybe they didn&#8217;t, but I can&#8217;t begin to fathom your basis for saying that workload has nothing to do with it.</p>
<p>CAM</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Callaghan</title>
		<link>http://www.dbms2.com/2009/04/28/data-warehouse-storage-options-cheap-expensive-or-solid-state-disk-drives/comment-page-1/#comment-119585</link>
		<dc:creator>Mark Callaghan</dc:creator>
		<pubDate>Sat, 02 May 2009 13:46:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=768#comment-119585</guid>
		<description>@Michael - you are the first person to ever claim that a 15k enterprise-grade SAS disk can do 10x more IOPs than a 7200 RPM consumer-grade SATA disk. Congratulations.

@Curt - workload has nothing to do with it. Oliver has made a controversial claim with no substantiation. That is marketing and nothing else.</description>
		<content:encoded><![CDATA[<p>@Michael &#8211; you are the first person to ever claim that a 15k enterprise-grade SAS disk can do 10x more IOPs than a 7200 RPM consumer-grade SATA disk. Congratulations.</p>
<p>@Curt &#8211; workload has nothing to do with it. Oliver has made a controversial claim with no substantiation. That is marketing and nothing else.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael McIntire</title>
		<link>http://www.dbms2.com/2009/04/28/data-warehouse-storage-options-cheap-expensive-or-solid-state-disk-drives/comment-page-1/#comment-119284</link>
		<dc:creator>Michael McIntire</dc:creator>
		<pubDate>Thu, 30 Apr 2009 17:44:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=768#comment-119284</guid>
		<description>All: the 10x numbers have to be colored by the application behavior. As an application Teradata is a hash distributed architecture, meaning that all IO (ALL) is random. Teradata also happens to exploit massive amounts of IO - essentially highly optimized software specifically designed to exploit best in class brute force hardware. 

When high session concurrency is factored into how the database operates, this results in a very large number of different IO paths in addition to the random placement of the data.  There are certainly efforts by the entire tech stack to geographically colocate like data, but it is this random IO with concurrency environment which causes much, much greater head movement. 

When you compute seek time and rotational time together in a 100% random block read environment - 10x is simple to see. At 7200 RPM, it takes 2x longer to read a track of data on a SATA drive - which also is likely to have several times more data per track, most of which is not needed for random IO. 

With SATA Seek Latency of ~4x the FC drives, the two combine for something larger than 10x... this does not even count the compute and algorithmic issues or where in the tech stack the computation occurs. A highly simplistic and not entirely realistic example, but it should illustrate the point.  

SATA disk systems compete very well in low concurrency large sequential block environments, which is entirely opposite the Teradata environment.  So, the 10x number being quoted here is not a surprise.</description>
		<content:encoded><![CDATA[<p>All: the 10x numbers have to be colored by the application behavior. As an application Teradata is a hash distributed architecture, meaning that all IO (ALL) is random. Teradata also happens to exploit massive amounts of IO &#8211; essentially highly optimized software specifically designed to exploit best in class brute force hardware. </p>
<p>When high session concurrency is factored into how the database operates, this results in a very large number of different IO paths in addition to the random placement of the data.  There are certainly efforts by the entire tech stack to geographically colocate like data, but it is this random IO with concurrency environment which causes much, much greater head movement. </p>
<p>When you compute seek time and rotational time together in a 100% random block read environment &#8211; 10x is simple to see. At 7200 RPM, it takes 2x longer to read a track of data on a SATA drive &#8211; which also is likely to have several times more data per track, most of which is not needed for random IO. </p>
<p>With SATA Seek Latency of ~4x the FC drives, the two combine for something larger than 10x&#8230; this does not even count the compute and algorithmic issues or where in the tech stack the computation occurs. A highly simplistic and not entirely realistic example, but it should illustrate the point.  </p>
<p>SATA disk systems compete very well in low concurrency large sequential block environments, which is entirely opposite the Teradata environment.  So, the 10x number being quoted here is not a surprise.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2009/04/28/data-warehouse-storage-options-cheap-expensive-or-solid-state-disk-drives/comment-page-1/#comment-119270</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Thu, 30 Apr 2009 14:53:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=768#comment-119270</guid>
		<description>Mark,

I don&#039;t understand why you&#039;re extrapolating from your home system to eBay&#039;s data warehouse.  Are the workloads similar?

In particular, Oliver tells me the problem usually doesn&#039;t arise when there&#039;s only one query running, especially if the query can be satisfied by quasi-sequential scans.  How many simultaneous queries did you run your test with?

Thanks,

CAM</description>
		<content:encoded><![CDATA[<p>Mark,</p>
<p>I don&#8217;t understand why you&#8217;re extrapolating from your home system to eBay&#8217;s data warehouse.  Are the workloads similar?</p>
<p>In particular, Oliver tells me the problem usually doesn&#8217;t arise when there&#8217;s only one query running, especially if the query can be satisfied by quasi-sequential scans.  How many simultaneous queries did you run your test with?</p>
<p>Thanks,</p>
<p>CAM</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cruppstahl&#8217;s blog &#187; Why cheap hard disks are slower than expensive disks</title>
		<link>http://www.dbms2.com/2009/04/28/data-warehouse-storage-options-cheap-expensive-or-solid-state-disk-drives/comment-page-1/#comment-119262</link>
		<dc:creator>cruppstahl&#8217;s blog &#187; Why cheap hard disks are slower than expensive disks</dc:creator>
		<pubDate>Thu, 30 Apr 2009 13:58:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=768#comment-119262</guid>
		<description>[...] This post is about harddisks and why cheap (SATA) harddisks are much slower than expensive ones (Fibre channel/SAS). [...]</description>
		<content:encoded><![CDATA[<p>[...] This post is about harddisks and why cheap (SATA) harddisks are much slower than expensive ones (Fibre channel/SAS). [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Callaghan</title>
		<link>http://www.dbms2.com/2009/04/28/data-warehouse-storage-options-cheap-expensive-or-solid-state-disk-drives/comment-page-1/#comment-119068</link>
		<dc:creator>Mark Callaghan</dc:creator>
		<pubDate>Wed, 29 Apr 2009 13:33:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=768#comment-119068</guid>
		<description>I am sure they are right as long as their claim includes &#039;could have&#039;. I get plenty of mail and email telling me I could be a lottery winner.

I get 100+ IOPs and 50 MB/second from consumer grade 7200 RPM SATA disks at home, so a 15k disk needs to do 1000 IOPs and 500 MB/second to be 10X better or my cheap disk needs to do retries on almost every request. I don&#039;t think that is typical. Maybe they had a bad batch of disks or very old disks.

There is SMART monitoring on disks that counts retries and other stats and there have been a few large scale studies based on this data. So the data is there, but it isn&#039;t easy to accumulate at a large scale.

This paper is a good start and has references to other good papers.

http://www.google.com/url?sa=t&amp;source=web&amp;ct=res&amp;cd=1&amp;url=http%3A%2F%2Flabs.google.com%2Fpapers%2Fdisk_failures.pdf</description>
		<content:encoded><![CDATA[<p>I am sure they are right as long as their claim includes &#8216;could have&#8217;. I get plenty of mail and email telling me I could be a lottery winner.</p>
<p>I get 100+ IOPs and 50 MB/second from consumer grade 7200 RPM SATA disks at home, so a 15k disk needs to do 1000 IOPs and 500 MB/second to be 10X better or my cheap disk needs to do retries on almost every request. I don&#8217;t think that is typical. Maybe they had a bad batch of disks or very old disks.</p>
<p>There is SMART monitoring on disks that counts retries and other stats and there have been a few large scale studies based on this data. So the data is there, but it isn&#8217;t easy to accumulate at a large scale.</p>
<p>This paper is a good start and has references to other good papers.</p>
<p><a href="http://www.google.com/url?sa=t&amp;source=web&amp;ct=res&amp;cd=1&amp;url=http%3A%2F%2Flabs.google.com%2Fpapers%2Fdisk_failures.pdf" onclick="javascript:pageTracker._trackPageview('/www.google.com');" rel="nofollow">http://www.google.com/url?sa=t&amp;source=web&amp;ct=res&amp;cd=1&amp;url=http%3A%2F%2Flabs.google.com%2Fpapers%2Fdisk_failures.pdf</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2009/04/28/data-warehouse-storage-options-cheap-expensive-or-solid-state-disk-drives/comment-page-1/#comment-118934</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Tue, 28 Apr 2009 19:27:13 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=768#comment-118934</guid>
		<description>Mark,

I don&#039;t have serious data beyond what I posted.  I&#039;m hoping other folks with information will jump into the discussion.</description>
		<content:encoded><![CDATA[<p>Mark,</p>
<p>I don&#8217;t have serious data beyond what I posted.  I&#8217;m hoping other folks with information will jump into the discussion.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic page generated in 0.228 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2010-03-04 11:29:27 -->
