<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: The TPC-H schema</title>
	<atom:link href="http://www.dbms2.com/2009/07/02/the-tpc-h-schema/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com/2009/07/02/the-tpc-h-schema/</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 04 Mar 2010 04:56:40 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2009/07/02/the-tpc-h-schema/comment-page-1/#comment-129258</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Mon, 06 Jul 2009 20:57:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=830#comment-129258</guid>
		<description>Hi Greg,

I&#039;m honestly not sure one way or the other about the mix of data warehouses in the real world.  There are examples of just about anything.  There&#039;s clearly a lot of potential value to the proposition &quot;Run with your old schema, but a lot faster, in addition to adding new tables and queries.&quot; But I can&#039;t think of a single case that has both the properties:

A.  Needs absolutely the most screaming tippy-top raw performance.
B.  Has a schema with no performance optimizations.

I just can&#039;t think of an actual real-world case that comes anywhere close to the tradeoffs and requirements of the TPC-H.</description>
		<content:encoded><![CDATA[<p>Hi Greg,</p>
<p>I&#8217;m honestly not sure one way or the other about the mix of data warehouses in the real world.  There are examples of just about anything.  There&#8217;s clearly a lot of potential value to the proposition &#8220;Run with your old schema, but a lot faster, in addition to adding new tables and queries.&#8221; But I can&#8217;t think of a single case that has both the properties:</p>
<p>A.  Needs absolutely the most screaming tippy-top raw performance.<br />
B.  Has a schema with no performance optimizations.</p>
<p>I just can&#8217;t think of an actual real-world case that comes anywhere close to the tradeoffs and requirements of the TPC-H.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greg Rahn</title>
		<link>http://www.dbms2.com/2009/07/02/the-tpc-h-schema/comment-page-1/#comment-129242</link>
		<dc:creator>Greg Rahn</dc:creator>
		<pubDate>Mon, 06 Jul 2009 15:57:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=830#comment-129242</guid>
		<description>@Curt

I believe you missed the question from my previous comment: Do you feel there is something fundamentally wrong with the TPC-H schema design?

To answer your question: &quot;Did I [Curt] understand correctly?&quot;. Based on your response, it would appear not.

I&#039;m struggling to understand how go from my comment:&lt;blockquote&gt;...isn’t the reality that most data warehouses probably “suffer” from a less than academically perfect data model...&lt;/blockquote&gt; 
to your comment 
&lt;blockquote&gt;I think you’re hypothesizing sites that are foolishly wedded to theory...&lt;/blockquote&gt;  

Those seem like completely orthogonal thoughts to me. No?

In making my comment I was suggesting that if you indeed feel there is a better design for the TPC-H schema, that very well may be.  However, I believe that many existing data warehouse data models could be improved (in an academic sense), but the reality of the situation is they exist, thus TPC-H as-is, is probably more representative of real-world data warehouses than an academically designed schema.

Hopefully that clears up your misunderstanding.</description>
		<content:encoded><![CDATA[<p>@Curt</p>
<p>I believe you missed the question from my previous comment: Do you feel there is something fundamentally wrong with the TPC-H schema design?</p>
<p>To answer your question: &#8220;Did I [Curt] understand correctly?&#8221;. Based on your response, it would appear not.</p>
<p>I&#8217;m struggling to understand how go from my comment:<br />
<blockquote>&#8230;isn’t the reality that most data warehouses probably “suffer” from a less than academically perfect data model&#8230;</p></blockquote>
<p>to your comment </p>
<blockquote><p>I think you’re hypothesizing sites that are foolishly wedded to theory&#8230;</p></blockquote>
<p>Those seem like completely orthogonal thoughts to me. No?</p>
<p>In making my comment I was suggesting that if you indeed feel there is a better design for the TPC-H schema, that very well may be.  However, I believe that many existing data warehouse data models could be improved (in an academic sense), but the reality of the situation is they exist, thus TPC-H as-is, is probably more representative of real-world data warehouses than an academically designed schema.</p>
<p>Hopefully that clears up your misunderstanding.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2009/07/02/the-tpc-h-schema/comment-page-1/#comment-129173</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Mon, 06 Jul 2009 05:53:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=830#comment-129173</guid>
		<description>Greg,

I talked about that snowflake-only claim with Daniel Abadi just this week at SIGMOD. He says that the CIO conversations really happened that way, and now calls it a bad sample.

As for the rest -- could you please fill in a few more details of your straw man? I think you&#039;re hypothesizing sites that are foolishly wedded to theory, have few challenges in update latency, have hugely demanding requirements in performance, and can&#039;t be bothered to look at more than a single benchmark number in evaluating a multimillion dollar purchase.  Did I understand correctly?</description>
		<content:encoded><![CDATA[<p>Greg,</p>
<p>I talked about that snowflake-only claim with Daniel Abadi just this week at SIGMOD. He says that the CIO conversations really happened that way, and now calls it a bad sample.</p>
<p>As for the rest &#8212; could you please fill in a few more details of your straw man? I think you&#8217;re hypothesizing sites that are foolishly wedded to theory, have few challenges in update latency, have hugely demanding requirements in performance, and can&#8217;t be bothered to look at more than a single benchmark number in evaluating a multimillion dollar purchase.  Did I understand correctly?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greg Rahn</title>
		<link>http://www.dbms2.com/2009/07/02/the-tpc-h-schema/comment-page-1/#comment-129167</link>
		<dc:creator>Greg Rahn</dc:creator>
		<pubDate>Mon, 06 Jul 2009 04:03:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=830#comment-129167</guid>
		<description>@Curt

In asking such a question are you suggesting there is something fundamentally wrong with the TPC-H schema?

While one could argue there are better ways to model the TPC-H schema, isn&#039;t the reality that most data warehouses probably &quot;suffer&quot; from a less than academically perfect data model?  I guess this is quite contrary to &lt;a href=&quot;http://www.cs.brown.edu/~ugur/osfa.pdf&quot; rel=&quot;nofollow&quot;&gt;the findings&lt;/a&gt; of Michael Stonebraker et al.
&lt;blockquote&gt;In interviewing about two dozen CIOs, the authors have never seen a warehouse that did not use a snowflake schema.&lt;/blockquote&gt;</description>
		<content:encoded><![CDATA[<p>@Curt</p>
<p>In asking such a question are you suggesting there is something fundamentally wrong with the TPC-H schema?</p>
<p>While one could argue there are better ways to model the TPC-H schema, isn&#8217;t the reality that most data warehouses probably &#8220;suffer&#8221; from a less than academically perfect data model?  I guess this is quite contrary to <a href="http://www.cs.brown.edu/~ugur/osfa.pdf" onclick="javascript:pageTracker._trackPageview('/www.cs.brown.edu');" rel="nofollow">the findings</a> of Michael Stonebraker et al.</p>
<blockquote><p>In interviewing about two dozen CIOs, the authors have never seen a warehouse that did not use a snowflake schema.</p></blockquote>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2009/07/02/the-tpc-h-schema/comment-page-1/#comment-128662</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Fri, 03 Jul 2009 08:15:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=830#comment-128662</guid>
		<description>Justin,

As per my various posts on database emulation/portability, vendors who boast such features tend to think they are much more important than customers do. :)</description>
		<content:encoded><![CDATA[<p>Justin,</p>
<p>As per my various posts on database emulation/portability, vendors who boast such features tend to think they are much more important than customers do. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Justin Swanhart</title>
		<link>http://www.dbms2.com/2009/07/02/the-tpc-h-schema/comment-page-1/#comment-128578</link>
		<dc:creator>Justin Swanhart</dc:creator>
		<pubDate>Fri, 03 Jul 2009 01:15:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=830#comment-128578</guid>
		<description>“Actually, I tend to frown on materialized views, on the level that if you need more than a very few of them, you’d probably be better off w/ a faster DBMS that doesn’t need as many and hence has much less of an administrative burden.”

I agree, but this is problematic ff you want to keep your existing tools, scripts, etc.  You are kind of stuck because it is hard to change databases.  This is why Kickfire is great, because if you are already running MySQL, just about everything you are used to doing is going to work similarly or exactly the same on Kickfire.</description>
		<content:encoded><![CDATA[<p>“Actually, I tend to frown on materialized views, on the level that if you need more than a very few of them, you’d probably be better off w/ a faster DBMS that doesn’t need as many and hence has much less of an administrative burden.”</p>
<p>I agree, but this is problematic ff you want to keep your existing tools, scripts, etc.  You are kind of stuck because it is hard to change databases.  This is why Kickfire is great, because if you are already running MySQL, just about everything you are used to doing is going to work similarly or exactly the same on Kickfire.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Justin Swanhart</title>
		<link>http://www.dbms2.com/2009/07/02/the-tpc-h-schema/comment-page-1/#comment-128577</link>
		<dc:creator>Justin Swanhart</dc:creator>
		<pubDate>Fri, 03 Jul 2009 01:11:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=830#comment-128577</guid>
		<description>It all depends on the system.  If you are using materialized views, it usually isn&#039;t for convenience, it is for performance.  Building a materialized view might take a long time, but in the long run, if you can amortize the cost of maintaining the view over time using incremental materialization, then it is time well spent.  I&#039;d rather run a query which takes 24 hours once, then spend 15 minutes per day maintaining it, than run it every day.

Very few databases and tools support incremental materialization though.  It is not a trivial problem. 

Another problem is actually using the mviews. If you have a tool like mondrian which understands how to write queries to access the materialized data, then you are set.  Oracle supports materialized view rewrite which does it automatically as long as you define hierarchies.  Otherwise you have to rewrite your queries to access the materializations which is inconvenient at best.</description>
		<content:encoded><![CDATA[<p>It all depends on the system.  If you are using materialized views, it usually isn&#8217;t for convenience, it is for performance.  Building a materialized view might take a long time, but in the long run, if you can amortize the cost of maintaining the view over time using incremental materialization, then it is time well spent.  I&#8217;d rather run a query which takes 24 hours once, then spend 15 minutes per day maintaining it, than run it every day.</p>
<p>Very few databases and tools support incremental materialization though.  It is not a trivial problem. </p>
<p>Another problem is actually using the mviews. If you have a tool like mondrian which understands how to write queries to access the materialized data, then you are set.  Oracle supports materialized view rewrite which does it automatically as long as you define hierarchies.  Otherwise you have to rewrite your queries to access the materializations which is inconvenient at best.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jerome Pineau</title>
		<link>http://www.dbms2.com/2009/07/02/the-tpc-h-schema/comment-page-1/#comment-128551</link>
		<dc:creator>Jerome Pineau</dc:creator>
		<pubDate>Thu, 02 Jul 2009 21:49:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=830#comment-128551</guid>
		<description>&quot;Actually, I tend to frown on materialized views, on the level that if you need more than a very few of them, you’d probably be better off w/ a faster DBMS that doesn’t need as many and hence has much less of an administrative burden.&quot;

Ohh I am going to quote this one :) In light of my just posted quip on http://jeromepineau.blogspot.com</description>
		<content:encoded><![CDATA[<p>&#8220;Actually, I tend to frown on materialized views, on the level that if you need more than a very few of them, you’d probably be better off w/ a faster DBMS that doesn’t need as many and hence has much less of an administrative burden.&#8221;</p>
<p>Ohh I am going to quote this one <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  In light of my just posted quip on <a href="http://jeromepineau.blogspot.com" onclick="javascript:pageTracker._trackPageview('/jeromepineau.blogspot.com');" rel="nofollow">http://jeromepineau.blogspot.com</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2009/07/02/the-tpc-h-schema/comment-page-1/#comment-128546</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Thu, 02 Jul 2009 20:37:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=830#comment-128546</guid>
		<description>Actually, I tend to frown on materialized views, on the level that if you need more than a very few of them, you&#039;d probably be better off w/ a faster DBMS that doesn&#039;t need as many and hence has much less of an administrative burden.

Similarly, in cases where Justin&#039;s critique is applicable, that would seem to imply establishing the MVs is VERY expensive. But a MV is really just a big query. So if running a big query is stupifyingly slow ... again, maybe you&#039;re on the wrong platform.

Infobright-like systems that automagically create quasi-MVs on the fly may be excused from part or all of this criticism ...</description>
		<content:encoded><![CDATA[<p>Actually, I tend to frown on materialized views, on the level that if you need more than a very few of them, you&#8217;d probably be better off w/ a faster DBMS that doesn&#8217;t need as many and hence has much less of an administrative burden.</p>
<p>Similarly, in cases where Justin&#8217;s critique is applicable, that would seem to imply establishing the MVs is VERY expensive. But a MV is really just a big query. So if running a big query is stupifyingly slow &#8230; again, maybe you&#8217;re on the wrong platform.</p>
<p>Infobright-like systems that automagically create quasi-MVs on the fly may be excused from part or all of this criticism &#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Justin Swanhart</title>
		<link>http://www.dbms2.com/2009/07/02/the-tpc-h-schema/comment-page-1/#comment-128544</link>
		<dc:creator>Justin Swanhart</dc:creator>
		<pubDate>Thu, 02 Jul 2009 20:26:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=830#comment-128544</guid>
		<description>Also, as the SSB benchmark designers note, the TPC-H schema is whack anyway.  There is a granularity mismatch between some of the tables.

I&#039;ve seem similar schemas to TPC-H(tm) used in real life though.  It sits somewhere between a pure OLTP schema and a pure DW schema, and I&#039;m pretty sure there are lots of reporting databases that fit that description.</description>
		<content:encoded><![CDATA[<p>Also, as the SSB benchmark designers note, the TPC-H schema is whack anyway.  There is a granularity mismatch between some of the tables.</p>
<p>I&#8217;ve seem similar schemas to TPC-H(tm) used in real life though.  It sits somewhere between a pure OLTP schema and a pure DW schema, and I&#8217;m pretty sure there are lots of reporting databases that fit that description.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic page generated in 0.186 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2010-03-04 13:00:40 -->
