<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/rss2full.xsl" type="text/xsl" media="screen"?><?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/itemcontent.css" type="text/css" media="screen"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>DBMS2 -- DataBase Management System Services</title>
	
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<pubDate>Thu, 20 Nov 2008 01:08:35 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
	<language>en</language>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/dbms2/feed" type="application/rss+xml" /><item>
		<title>Interpreting the results of data warehouse proofs-of-concept (POCs)</title>
		<link>http://feeds.feedburner.com/~r/dbms2/feed/~3/459017106/</link>
		<comments>http://www.dbms2.com/2008/11/19/data-warehouse-proof-of-concept-pocs/#comments</comments>
		<pubDate>Thu, 20 Nov 2008 01:08:35 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
		
		<category><![CDATA[Data warehousing]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=629</guid>
		<description><![CDATA[When enterprises buy new brands of analytic DBMS, they almost always run proofs-of-concept (POCs) in the form of private benchmarks.  The results are generally confidential, but that doesn&#8217;t keep a few stats from occasionally leaking out.  As I noted recently, those leaks are problematic on multiple levels. For one thing, even if the [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">When enterprises buy new brands of analytic DBMS, they almost always run proofs-of-concept (POCs) in the form of private benchmarks.  The results are generally confidential, but that doesn&#8217;t keep a few stats from occasionally leaking out.  As I noted recently, those leaks are <a href="../2008/11/15/query-from-hell/">problematic on multiple levels</a>. For one thing, even if the results are to be taken as accurate and basically not-misleading, the way vendors describe them leaves a lot to be desired.</p>
<p style="margin-bottom: 0in;">Here&#8217;s a concrete example to illustrate the point. One of my vendor clients sent over the <a href="http://www.monash.com/uploads/DBMS-POC-example.xls" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.monash.com');">stats</a> from a recent POC, in which its data warehousing product was compared against a name-brand incumbent. 16 reports were run.  The new product beat the old 16 out of 16 times. The lowest margin was a 1.8X speed-up, while the best was a whopping 335.5X.</p>
<p style="margin-bottom: 0in;">My client helpfully took the “simple average” &#8212; i.e. the mean – of the 16 factors, and described this as an average 62X drubbing. But is that really fair? <span id="more-629"></span>The median speed-up was only 17X.  And in a figure I find more meaningful than either of those, the total reduction in execution time – assuming each of the reports was run the same number of times – was “just” 12 times.</p>
<p style="margin-bottom: 0in;">Now, 12X is a whopping speed-up, and this was a very successful POC for the challenger.  But calling it 62X is just silly, and that was the point of my earlier post.</p>
<p style="margin-bottom: 0in;">So how <em>should</em> POC numbers be weighted?  Ideally, one could calculate a big weighted sum: “Our daily workload will be a lot like 2,000 copies each of Queries 1, 2, 3, 4, and 5; 300 copies each of BigQueries 6 and 7; 25 copies of MegaQuery 7; and a copy of DestructoQuery 8; all multiplied by a factor of 17.”</p>
<p style="margin-bottom: 0in;">But to come up with reasonable projections, it is <em>not</em> enough to look at past usage. After all, if the price of Query 3 goes down by 5X, while the cost of Query 8 goes down by a factor of 50, the relative consumption of Queries 3 and 8 is apt to change significantly.  That&#8217;s just the economics of supply and demand.</p>
<p style="margin-bottom: 0in;"><strong>Bottom line:  The more accurately you can predict future data warehouse use, the more confidently you can choose the analytic database technology that&#8217;s best for you.</strong></p>
<p style="margin-bottom: 0in;">
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/459017106" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/11/19/data-warehouse-proof-of-concept-pocs/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.dbms2.com/2008/11/19/data-warehouse-proof-of-concept-pocs/</feedburner:origLink></item>
		<item>
		<title>MySQL Query Analyzer</title>
		<link>http://feeds.feedburner.com/~r/dbms2/feed/~3/458936124/</link>
		<comments>http://www.dbms2.com/2008/11/19/mysql-query-analyzer/#comments</comments>
		<pubDate>Wed, 19 Nov 2008 23:21:28 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
		
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=628</guid>
		<description><![CDATA[Given how the product&#8217;s rollout has been handled, it seems necessary to comment on MySQL&#8217;s recently released MySQL Query Analyzer without actually having much information on the subject.  Mark Callaghan offers a good take &#8212; he&#8217;s generally very favorable, but notes that MySQL has some limitations that Query Analyzer has trouble getting around.
]]></description>
			<content:encoded><![CDATA[<p>Given how the product&#8217;s rollout has been handled, it seems necessary to comment on MySQL&#8217;s recently released MySQL Query Analyzer without actually having much information on the subject.  <a href="http://mysqlha.blogspot.com/2008/11/query-analyzer-rocks.html" onclick="javascript:pageTracker._trackPageview('/outbound/article/mysqlha.blogspot.com');">Mark Callaghan</a> offers a good take &#8212; he&#8217;s generally very favorable, but notes that MySQL has some limitations that Query Analyzer has trouble getting around.</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/458936124" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/11/19/mysql-query-analyzer/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.dbms2.com/2008/11/19/mysql-query-analyzer/</feedburner:origLink></item>
		<item>
		<title>Silly website tricks</title>
		<link>http://feeds.feedburner.com/~r/dbms2/feed/~3/457451577/</link>
		<comments>http://www.dbms2.com/2008/11/18/silly-website-tricks/#comments</comments>
		<pubDate>Tue, 18 Nov 2008 17:48:57 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
		
		<category><![CDATA[Humor]]></category>

		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=627</guid>
		<description><![CDATA[Vertica&#8217;s marketing is usually good-to-outstanding, but they made a funny misstep this time.   If you go to the Vertica home page, you&#8217;ll see seasonal art suggesting that their product is a turkey and/or that it&#8217;s terrified it&#8217;s about to get the ax.
Live by the pun, die by the pun.
]]></description>
			<content:encoded><![CDATA[<p>Vertica&#8217;s marketing is usually good-to-outstanding, but they made a funny misstep this time.   If you go to <a href="http://www.vertica.com" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.vertica.com');">the Vertica home page,</a> you&#8217;ll see seasonal art suggesting that their product is a turkey and/or that it&#8217;s terrified it&#8217;s about to get the ax.</p>
<p><a href="http://www.dbms2.com/2007/09/06/the-vertica-guys-have-their-own-blog-now/" >Live by the pun,</a> die by the pun.</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/457451577" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/11/18/silly-website-tricks/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.dbms2.com/2008/11/18/silly-website-tricks/</feedburner:origLink></item>
		<item>
		<title>Graphjam: I can haz BI</title>
		<link>http://feeds.feedburner.com/~r/dbms2/feed/~3/455256746/</link>
		<comments>http://www.dbms2.com/2008/11/16/graphjam-i-can-haz-bi/#comments</comments>
		<pubDate>Sun, 16 Nov 2008 21:06:13 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
		
		<category><![CDATA[Business intelligence]]></category>

		<category><![CDATA[Fun stuff]]></category>

		<category><![CDATA[Humor]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=625</guid>
		<description><![CDATA[Charts and graphs, from the folks who brought you a whole lot of cute kitten photos:

Social media explained
Political news coverage explained
Stupid Excel tricks

]]></description>
			<content:encoded><![CDATA[<p>Charts and graphs, from the folks who brought you a whole lot of <a href="http://icanhascheezburger.com/2008/11/14/funny-pictures-room-for-you-in-our-spa/" onclick="javascript:pageTracker._trackPageview('/outbound/article/icanhascheezburger.com');">cute kitten photos</a>:</p>
<ul>
<li><a href="http://graphjam.com/2008/10/27/song-chart-memes-find-you-on-facebook/" onclick="javascript:pageTracker._trackPageview('/outbound/article/graphjam.com');">Social media explained</a></li>
<li><a href="http://graphjam.com/2008/11/12/song-chart-memes-us-political-belief/" onclick="javascript:pageTracker._trackPageview('/outbound/article/graphjam.com');">Political news coverage explained</a></li>
<li><a href="http://graphjam.com/2008/11/13/song-chart-memes-perception-of-3d-pie-charts/" onclick="javascript:pageTracker._trackPageview('/outbound/article/graphjam.com');">Stupid Excel tricks</a></li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/455256746" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/11/16/graphjam-i-can-haz-bi/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.dbms2.com/2008/11/16/graphjam-i-can-haz-bi/</feedburner:origLink></item>
		<item>
		<title>When people don’t want accurate predictions made about them</title>
		<link>http://feeds.feedburner.com/~r/dbms2/feed/~3/455094496/</link>
		<comments>http://www.dbms2.com/2008/11/16/when-people-dont-want-accurate-predictions-made-about-them/#comments</comments>
		<pubDate>Sun, 16 Nov 2008 17:50:53 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
		
		<category><![CDATA[Analytic technologies]]></category>

		<category><![CDATA[Data warehousing]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=624</guid>
		<description><![CDATA[In a recent article on governmental anti-terrorism data mining efforts &#8212; and the privacy risks associated with same &#8212; The Economist wrote (emphasis mine):
Abdul Bakier, a former official in Jordan’s General Intelligence Department, says that tips to foil data-mining systems are discussed at length on some extremist online forums. Tricks such as calling phone-sex hotlines [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://www.economist.com/displaystory.cfm?story_id=12295455" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.economist.com');">a recent article on governmental anti-terrorism data mining efforts</a> &#8212; and the privacy risks associated with same &#8212; <em>The Economist</em> wrote (emphasis mine):</p>
<blockquote><p>Abdul Bakier, a former official in Jordan’s General Intelligence Department, says that <strong>tips to foil data-mining systems are discussed at length</strong> on some extremist online forums. Tricks such as calling phone-sex hotlines can help make a profile less suspicious. “The new generation of al-Qaeda is practising all that,” he says.</p></blockquote>
<p>Well, duh.  Terrorists and fraudsters don&#8217;t want to be detected.  Algorithms that rely on positive evidence of bad intent may work anyway.  But if you rely on evidence that shows people are <em>not</em> bad actors, that&#8217;s likely to work about as well as Bayesian spam detectors.*<span id="more-624"></span></p>
<p><em>*I.e., pretty much not at all.  The idea behind Bayesian spam detectors is that rare words are the best indicator of subject matter.  So spammers salt spam with random non-spammy rare words as companions to their spammy ones, and Bayesian filters wave it through.  That&#8217;s been going on since shortly after <a href="http://www.monash.com/anti-spam-other.html" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.monash.com');">I predicted it</a> in 2003 or 2004.<br />
</em></p>
<p>Now let&#8217;s take that idea a little further. A lot of data mining and predictive analytics is devoted to figuring out which customers and prospects should get the most attractive offers, or other preferential treatment.  The biggest example of this may be telecom companies who invest in reducing churn rates, but other examples abound. For example, gaming companies send pit bosses out to give comp tickets to gamblers who may have reached their pain level on losses. Personalized websites might offer individualized deals as well.</p>
<p>As public awareness of these techniques grows, there&#8217;s an obvious risk &#8212; consumers could try to game the system so as to get special treatment. It&#8217;s not as if this is unknown behavior even without data mining; people complain loudly all the time in hope that they&#8217;ll get some sort of mollifying payoff.  And so we&#8217;ll have just one more reason why data mining models need to be constantly moving targets.</p>
<p>And we two of the many reasons why data mining is an ethical minefield:</p>
<ul>
<li>It&#8217;s tempting for companies to take advantage of their most docile, agreeable, least demanding customers.</li>
<li>It&#8217;s tempting for consumers to pretend to hold attitudes different from how they really feel.</li>
</ul>
<p><em><strong>Related links</strong></em></p>
<ul>
<li><a href="http://www.monashreport.com/2006/06/06/freedom-even-without-data-privacy/">Freedom even without data privacy (a public policy wish list)<br />
</a></li>
<li><a href="http://www.monashreport.com/2006/06/09/qui-custodet-ipso-custodes/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.monashreport.com');">Who will watch the watchmen?</a></li>
<li><a href="http://www.monashreport.com/2006/06/09/terrorism-prevention-in-practice/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.monashreport.com');">The no-fly list in practice</a></li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/455094496" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/11/16/when-people-dont-want-accurate-predictions-made-about-them/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.dbms2.com/2008/11/16/when-people-dont-want-accurate-predictions-made-about-them/</feedburner:origLink></item>
		<item>
		<title>High-performance analytics</title>
		<link>http://feeds.feedburner.com/~r/dbms2/feed/~3/454232173/</link>
		<comments>http://www.dbms2.com/2008/11/15/high-performance-analytics/#comments</comments>
		<pubDate>Sat, 15 Nov 2008 19:39:05 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
		
		<category><![CDATA[Analytic technologies]]></category>

		<category><![CDATA[Aster Data]]></category>

		<category><![CDATA[Data warehousing]]></category>

		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>

		<category><![CDATA[Greenplum]]></category>

		<category><![CDATA[MapReduce]]></category>

		<category><![CDATA[Netezza]]></category>

		<category><![CDATA[Oracle]]></category>

		<category><![CDATA[Parallelization]]></category>

		<category><![CDATA[SAS Institute]]></category>

		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=623</guid>
		<description><![CDATA[For the past few months, I&#8217;ve collected a lot of data points to the effect that high-performance analytics – i.e., beyond straightforward query &#8212; is becoming increasingly important.  And I&#8217;ve written about some of them at length.  For example:

MapReduce – controversial or in some cases even disappointing though it may be – has [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">For the past few months, I&#8217;ve collected a lot of data points to the effect that <em>high-performance analytics</em> – i.e., beyond straightforward query &#8212; is becoming increasingly important.  And I&#8217;ve written about some of them at length.  For example:</p>
<ul>
<li>MapReduce – <a href="../2008/09/04/mike-stonebraker-mapreduce/">controversial</a> or in some cases even <a href="../2008/10/15/ebay-doesnt-love-mapreduce/">disappointing</a> though it may be – has a lot of <a href="../2008/08/26/known-applications-of-mapreduce/">use 	cases.</a></li>
<li>It&#8217;s early days, but Netezza and 	Teradata (and others) are beefing up their <a href="../2008/09/26/netezza-teradata-geospatial/">geospatial 	analytic capabilities</a>.</li>
<li><a href="../2008/10/07/multiple-approaches-to-memory-centric-analytics/">Memory-centric 	analytics</a> is in the spotlight.</li>
</ul>
<p style="margin-bottom: 0in;"><em>Ack.  I can&#8217;t decide whether “analytics” should be a singular or plural noun.  Thoughts?</em></p>
<p style="margin-bottom: 0in; font-style: normal;">Another area that&#8217;s come up which I haven<span style="font-style: normal;">&#8216;t blogged about so much is </span><span style="font-style: normal;"><strong>data mining in the database.</strong></span><span style="font-style: normal;"> D</span><span style="font-style: normal;"><span>ata mining accounts for <a href="../2006/10/04/data-mining-data-warehousing/">a large part of data warehouse use</a>.  The traditional way to do data mining is to extract data from the database and dump it into SAS.  But there are problems with this scenario, including:</span></span><span id="more-623"></span></p>
<ul>
<li><span style="font-style: normal;"><span>There&#8217;s 	a </span></span><em><span>lot</span></em><span style="font-style: normal;"><span> of data to move.</span></span></li>
<li><span style="font-style: normal;"><span>Therefore 	it&#8217;s tempting to only sample the database rather than analyze the 	whole thing, which could have at least a slight negative effect on 	model accuracy.</span></span></li>
<li><span style="font-style: normal;"><span>The 	result of the process is often some kind of scoring algorithm, and 	you may want to execute that real-time rather than in batch mode.</span></span></li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;"><span style="font-style: normal;"><span>Various interesting fixes have been tried.</span></span></p>
<ul>
<li><span style="font-style: normal;"><span>SAS 	and Teradata are partnering quite closely to <a href="../2007/10/10/sas-goes-mpp-on-teradata-first/">run 	SAS on Teradata boxes</a>.</span></span></li>
<li><span style="font-style: normal;"><span>Database 	management system vendors are building at least the data scoring 	part right into the DBMS.  SAS rival SPSS – which relies more on 	just-in-time SQL and less on batch extracts anyway – reports that 	hooking into Oracle&#8217;s native scoring produces massive performance 	gains. (To put that another way – I finally got independent 	confirmation of what Oracle&#8217;s Charlie Berger has been telling me for 	years.)</span></span></li>
<li><span style="font-style: normal;"><span>Data 	preparation can be handled by the general ELT/ETLT 	(Extract/(Transform)/Load/Transform – i.e., in-database data 	transformation) strategies of the data warehouse DBMS vendors.</span></span></li>
<li><span style="font-style: normal;"><span>Oracle 	(more than most competitors, although SAS/Teradata are headed that 	way too) actually does all stages of data mining right in the 	database.</span></span></li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;"><span style="font-style: normal;"><span>Vendors who are putting considerable marketing emphasis on parallel analytics include:</span></span></p>
<ul>
<li><span style="font-style: normal;"><span>Greenplum 	and Aster Data (especially <a href="../2008/08/26/why-mapreduce-matters-to-sql-data-warehousing/">MapReduce</a>)</span></span></li>
<li><span style="font-style: normal;"><span>Oracle 	(the data mining story and more)</span></span></li>
<li><span style="font-style: normal;"><span>Teradata 	(the SAS deal, the geospatial effort, and more)</span></span></li>
<li><span style="font-style: normal;"><span>Netezza 	(especially in connection with the <a href="../2007/09/27/the-netezza-developer-network/">Netezza 	Developer Network</a>)</span></span></li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;"><span style="font-style: normal;"><span>I&#8217;m sure others would say they belong on the list as well.  It&#8217;s an important area of competitive differentiation.</span></span></p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/454232173" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/11/15/high-performance-analytics/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.dbms2.com/2008/11/15/high-performance-analytics/</feedburner:origLink></item>
		<item>
		<title>Beyond query</title>
		<link>http://feeds.feedburner.com/~r/dbms2/feed/~3/454171708/</link>
		<comments>http://www.dbms2.com/2008/11/15/beyond-query/#comments</comments>
		<pubDate>Sat, 15 Nov 2008 17:57:50 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
		
		<category><![CDATA[Data warehousing]]></category>

		<category><![CDATA[Microsoft and SQL*Server]]></category>

		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=622</guid>
		<description><![CDATA[I sometimes describe database management systems as “big SQL interpreters,” because that&#8217;s the core of what they do.  But it&#8217;s not all they do, which is why I describe them as “electronic file clerks” too.  File clerks don&#8217;t just store and fetch data; they also put a lot of work into neatening, culling, [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I sometimes describe database management systems as “big SQL interpreters,” because that&#8217;s the core of what they do.  But it&#8217;s not <em>all</em> they do, which is why I describe them as “electronic file clerks” too.  File clerks don&#8217;t just store and fetch data; they also put a lot of work into neatening, culling, and generally managing the health of their information hoards.</p>
<p style="margin-bottom: 0in;">Already 15 years ago, online backup was as big a competitive differentiator in the database wars as any particular SQL execution feature. Security became important in some market segments. Reliability and availability have been important from the getgo. And manageability has been crucial ever since Microsoft lapped Oracle in that regard, back when SQL Server had little else to recommend it except price.*</p>
<p style="margin-bottom: 0in;"><em>*Before Oracle10g, the SQL Server vs. Oracle manageability gap was</em> big.</p>
<p style="margin-bottom: 0in;">Now data warehousing is demanding the same kinds of infrastructure richness.*  <span id="more-622"></span>When you&#8217;re loading data nightly or weekly, and using it to run canned reports, the system can burp for a few hours without anybody getting sacrificially fired.  But we&#8217;re entering an era of  “operational BI,” in which analytics gets integrated tightly into (for example) customer-facing systems, <a href="../2008/09/22/web-analytics-clickstream-network-event-data/">websites</a> and call centers alike. All of a sudden, data warehouses need OLTP-like reliability.  Surely not coincidentally, I&#8217;ve recently founding myself noting new data warehouse backup strategies from <a href="../2008/10/17/oracle-notes/">Oracle</a> and <a href="../2008/10/22/aster-data-systems-ncluster/">Aster Data</a> alike.</p>
<p style="margin-bottom: 0in;"><em>*Obviously, there have a few cases where this was needed all along. But my sense is that the numbers are now growing a lot.</em></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">In no way do I want to deny that <a href="../2008/11/15/query-from-hell/">data warehouse performance</a> is crucial.  I&#8217;m just saying that the other stuff now matters a lot too.</span></p>
<p style="margin-bottom: 0in; font-style: normal;">OLTP-like robustness isn&#8217;t the only way in which data warehousing issues go “beyond query”; another important subject is <a href="http://www.dbms2.com/2008/11/15/high-performance-analytics/" >high-performance analytics</a>.</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/454171708" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/11/15/beyond-query/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.dbms2.com/2008/11/15/beyond-query/</feedburner:origLink></item>
		<item>
		<title>The query from hell, and other stories</title>
		<link>http://feeds.feedburner.com/~r/dbms2/feed/~3/453913966/</link>
		<comments>http://www.dbms2.com/2008/11/15/query-from-hell/#comments</comments>
		<pubDate>Sat, 15 Nov 2008 11:30:58 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
		
		<category><![CDATA[Data warehousing]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=621</guid>
		<description><![CDATA[I write about a lot of products whose core job boils down to Make queries run fast. Without exception, their vendors tout stories of remarkable performance gains over conventional/incumbent DBMS (reported improvement is usually at least 50-fold, and commonly 100-500+).  They further claim at least 2-3X better performance than their close competitors.  In [...]]]></description>
			<content:encoded><![CDATA[<p>I write about a lot of products whose core job boils down to <em>Make queries run fast.</em> Without exception, their vendors tout stories of remarkable performance gains over conventional/incumbent DBMS (reported improvement is usually at least 50-fold, and commonly 100-500+).  They further claim at least 2-3X better performance than their close competitors.  In making these claims, vendors usually stress that their results come from live customer benchmarks.  In few if any of cases, I judge, are they lying outright.  So what&#8217;s going on?<span id="more-621"></span></p>
<p style="margin-bottom: 0in;">Multiple things, I think.</p>
<ul>
<li>Existing data warehouses are often 	badly optimized.  The same technology, configured differently, often 	would do a much better job.</li>
<li>General-purpose DBMS often require 	much more tuning for decent complex-query performance than 	specialized products do.  It might have been possible to get that 	query-from-hell to run fast on the old system, but it wasn&#8217;t easy.</li>
<li>Besides, often nobody tried very hard.  	The value of the query didn&#8217;t seem to justify the tuning effort.</li>
<li>Specialized products often really 	are better for the workloads they&#8217;re specialized for.</li>
<li>Different specialized products are 	best suited for different kinds of analytic workloads.</li>
<li>Different companies send different 	qualities of benchmark experts to different sales cycles at 	different times.  Smaller vendors with few active sales cycles 	sometimes actually send their CTOs.</li>
<li>And by the way, vendors do their 	best to “cook the books.” If one query runs 600X better faster 	than on the competition, and 19 queries run 2-5X faster, the claim 	of total speedup is apt to be in the 100X+ range, if that&#8217;s 	defensible by at least one definition of “average.”</li>
</ul>
<p style="margin-bottom: 0in;"><em><strong>Related link</strong></em></p>
<ul>
<li><a href="http://www.dbms2.com/2007/09/06/three-bold-assertions-by-mike-stonebraker/" >Three bold assertions by Mike Stonebraker</a></li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/453913966" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/11/15/query-from-hell/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.dbms2.com/2008/11/15/query-from-hell/</feedburner:origLink></item>
		<item>
		<title>MySQL is being used in an IBM Lotus appliance</title>
		<link>http://feeds.feedburner.com/~r/dbms2/feed/~3/450347150/</link>
		<comments>http://www.dbms2.com/2008/11/12/mysql-ibm-lotus-appliance/#comments</comments>
		<pubDate>Wed, 12 Nov 2008 06:01:20 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
		
		<category><![CDATA[DBMS product categories]]></category>

		<category><![CDATA[IBM and DB2]]></category>

		<category><![CDATA[Mid-range]]></category>

		<category><![CDATA[MySQL]]></category>

		<category><![CDATA[solidDB]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=620</guid>
		<description><![CDATA[Apparently, IBM is rolling out an appliance for small businesses. MySQL is under the covers. The appliance won&#8217;t have a keyboard or monitor, so there won&#8217;t be a lot of database administration going on.
Before Solid and solidDB were acquired by IBM, one of the things Solid was proudest of was some embedded apps in which [...]]]></description>
			<content:encoded><![CDATA[<p>Apparently, IBM is rolling out <a href="http://www.theregister.co.uk/2008/11/11/lotus_server_appliance/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.theregister.co.uk');">an appliance for small businesses</a>. MySQL is under the covers. The appliance won&#8217;t have a keyboard or monitor, so there won&#8217;t be a lot of database administration going on.</p>
<p>Before Solid and solidDB were acquired by IBM, one of the things Solid was proudest of was some embedded apps in which solidDB ran for years in boxes without keyboards or monitors.</p>
<p>I still think it&#8217;s a pity that IBM isn&#8217;t using solidDB as broadly as the technology deserves.  Even so, this is a nice endorsement of MySQL for reliable zero-DBA mid-range use.</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/450347150" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/11/12/mysql-ibm-lotus-appliance/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.dbms2.com/2008/11/12/mysql-ibm-lotus-appliance/</feedburner:origLink></item>
		<item>
		<title>Big scientific databases need to be stored somehow</title>
		<link>http://feeds.feedburner.com/~r/dbms2/feed/~3/445756018/</link>
		<comments>http://www.dbms2.com/2008/11/07/big-scientific-databases-need-to-be-stored-somehow/#comments</comments>
		<pubDate>Fri, 07 Nov 2008 18:36:21 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
		
		<category><![CDATA[Aster Data]]></category>

		<category><![CDATA[Data types]]></category>

		<category><![CDATA[Greenplum]]></category>

		<category><![CDATA[IBM and DB2]]></category>

		<category><![CDATA[Kognitio]]></category>

		<category><![CDATA[Microsoft and SQL*Server]]></category>

		<category><![CDATA[Netezza]]></category>

		<category><![CDATA[Oracle]]></category>

		<category><![CDATA[Parallelization]]></category>

		<category><![CDATA[PostgreSQL]]></category>

		<category><![CDATA[Scientific research]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=619</guid>
		<description><![CDATA[A year ago, Mike Stonebraker observed that conventional DBMS don&#8217;t necessarily do a great job on scientific data, and further pointed out that different kinds of science might call for different data access methods.   Even so, some of the largest databases around are scientific ones, and they have to be managed somehow.  [...]]]></description>
			<content:encoded><![CDATA[<p>A year ago, Mike Stonebraker observed that conventional DBMS don&#8217;t necessarily do a great job on scientific data, and further pointed out that <a href="http://www.databasecolumn.com/2007/11/databases-for-big-science.html" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.databasecolumn.com');">different kinds of science might call for different data access methods</a>.   Even so, some of the largest databases around are scientific ones, and they have to be managed somehow.  For example:</p>
<ul>
<li>Microsoft just put out an <a href="http://www.microsoft.com/presspass/press/2008/nov08/11-06AlzHeavensPR.mspx" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.microsoft.com');">overwrought press release</a>.  The substance seems to be that Pan-STARRS &#8212; a Jim Gray legacy also discussed in <a href="http://www.computerworld.com/action/article.do?command=printArticleBasic&amp;articleId=9112018" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.computerworld.com');">an August, 2008 <em>Computerworld</em> article</a> &#8212; is adding 1.4 terabytes of image data per night, and one not so new database adds 15 terabytes per year of some kind of computer simulation output used to analyze protein folding.  Both run on SQL Server, of course.</li>
<li>Kognitio has an astronomical database too, at <a href="http://kognitio.com/news/pressreleases/index.php?id=45" onclick="javascript:pageTracker._trackPageview('/outbound/article/kognitio.com');">Cambridge University</a>, adding 1/2 a terabyte of data per night.</li>
<li>Oracle is used for a McGill University proteonomics database called <a href="http://www.genomequebecplatforms.com/mcgill/services/proteomics/bioinfo.aspx" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.genomequebecplatforms.com');">CellMapBase</a>.  A figure of 50 terabytes of &#8220;mass storage&#8221; is included, which doesn&#8217;t include tape backup and so on.</li>
<li>The Large Hadron Collider, once it actually starts functioning, is projected to generate <a href="http://lcg.web.cern.ch/LCG/" onclick="javascript:pageTracker._trackPageview('/outbound/article/lcg.web.cern.ch');">15 petabytes of data</a> annually, which will be initially stored on tape and then distributed to various computing centers around the world.</li>
<li>Netezza is proud of its ability to serve images and the like quickly, although off the top of my head I&#8217;m not thinking of a major customer it has in that area.  (But then, if you just sell software, your academic discount can approach 100%; but if like Netezza you have an actual cost of goods sold, that&#8217;s not as appealing an option.)</li>
</ul>
<p>Long-term, I imagine that the most suitable DBMS for these purposes will be MPP systems with strong datatype extensibility &#8212; e.g., DB2, PostgreSQL-based Greenplum, PostgreSQL-based Aster nCluster, or maybe Oracle.</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/445756018" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/11/07/big-scientific-databases-need-to-be-stored-somehow/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.dbms2.com/2008/11/07/big-scientific-databases-need-to-be-stored-somehow/</feedburner:origLink></item>
	</channel>
</rss>
