<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS2 -- DataBase Management System Services &#187; Solid-state memory</title>
	<atom:link href="http://www.dbms2.com/category/storage/solid-state-memory-disk-flash/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 18 Mar 2010 05:19:19 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Open issues in database and analytic technology</title>
		<link>http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/</link>
		<comments>http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/#comments</comments>
		<pubDate>Mon, 01 Feb 2010 22:04:31 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1507</guid>
		<description><![CDATA[The last part of my New England Database Summit talk was on open issues in database and analytic technology. This was closely intertwined with the previous section, and also relied on a lot that I&#8217;ve posted here. So I&#8217;ll just put up a few notes on that part, with lots of linkage to prior discussion [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">The last part of my <a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/" >New England Database Summit</a> talk was on open issues in database and analytic technology. This was closely intertwined with the <a href="http://www.dbms2.com/2010/01/31/trends-database-aanalytic-technology/" >previous section</a>, and also relied on a lot that I&#8217;ve posted here. So I&#8217;ll just put up a few notes on that part, with lots of linkage to prior discussion of the same points.<span id="more-1507"></span></p>
<p><!-- 		@page { margin: 0.79in } 		P { margin-bottom: 0.08in } --></p>
<ul>
<li>The most important issue in 	database and analytic technology, in my opinion, isn&#8217;t technological 	at all – rather, it&#8217;s the legal and political steps needed to <a href="http://www.dbms2.com/2010/01/31/data-based-snooping-threat-libert/" > preserve liberty</a> in the face of advancing, intrusive 	technology.</li>
<li>Another important issue for 	society – and this one does involve a lot of technology – is 	scientific number crunching. In particular, <a href="http://www.dbms2.com/2009/10/03/issues-in-scientific-data-management/" >database technology for 	scientific computing</a> needs to be developed much further. I&#8217;ll have 	more to say on all this soon.</li>
<li>More generally, technology needs 	to keep advancing for parallel analytics. Fortunately, it is. Watch 	this space over the next few weeks.</li>
<li>Oracle has said, in effect, that <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" > its most important technological challenge of the decade</a> is getting 	<a href="http://www.dbms2.com/2010/01/31/flash-pcmsolid-state-memory-disk/" >solid-state memory</a> right. I agree.</li>
<li>Data volumes will keep going up, 	up, up. Technology needs to keep evolving accordingly. Much of what 	I write is on that subject.</li>
<li>Data needs to be processed and analyzed at <a href="http://www.dbms2.com/2009/09/10/analytic-speed-latency/" >very 	different latencies</a>. And there&#8217;s much further to go in integrating 	disparate latencies.</li>
<li>Analytic database management in 	the cloud hasn&#8217;t been solved yet, especially for Big Data. Among the 	reasons are the difficulty of moving data into the cloud (unless it 	originated there), the slowness of moving it from node to node in 	shared-nothing architectures (which reduces the elasticity benefit), 	and above all the long and unpredictable latencies of interprocessor 	communication while queries are running (a key subject of discussion 	at the <a href="http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/" >Boston Big Data Summit</a>).</li>
<li>Better business intelligence user 	interfaces are increasingly available. I&#8217;m thinking particularly of 	approaches with buzzwords like <a href="http://www.dbms2.com/2008/08/04/qliktech-qlikview-update/" >visualization/interactive exploration</a> or <a href="http://www.texttechnologies.com/2007/08/03/the-case-for-inxight-awareness-server/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">faceted</a>. But they aren&#8217;t well-integrated into the overall 	analytic stack, as big BI vendors are trailing the smaller ones in 	this regards. (Part of the problem relates to my previous point.)</li>
<li>Application development over text 	search isn&#8217;t in the same league as application development over 	relational DBMS. The choices are mainly XML (e.g., <a href="http://www.texttechnologies.com/2008/04/29/mark-logic-viewed-as-a-different-kind-of-text-search-technology-vendor/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">MarkLogic</a>), SQL 	for text integrated into RDBMS (limited by the weakness of those 	integrations), and something like <a href="http://www.texttechnologies.com/2008/09/20/attivio-update/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">Attivio&#8217;s Java SDK</a>. There&#8217;s a 	major conceptual barrier in building those apps, namely the 	unpredictability of query results. Still, it should be possible to 	do better.</li>
<li>Similarly, text analytics and 	conventional analytics exist well side by side. They can even be in 	the same database and/or dashboard, although in practice that is 	limited by the strong <a href="http://www.texttechnologies.com/2008/10/24/attensity-update-2/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">SaaS focus of text mining vendors and users</a>. But analytic 	integration of them is really hard. Linguistic imprecision is, in my 	opinion, only the #2 reason for this difficulty. The #1 reason is 	that trends detected by text analytics are much less precise than 	trends on tabular data – e.g., a 50% increase in a certain kind of 	complaint may be no more significant than a 5% change in a revenue 	variable.</li>
<li>I&#8217;m increasingly persuaded that <a href="http://www.dbms2.com/2009/08/21/social-network-analysis-aka-relationship-analytics/" > graph analytics</a> can be handled without a graph-centric data model. 	But right now, it isn&#8217;t being handled well at all. Lots more needs 	to be done – although when it is, it will just exacerbate the 	privacy/liberty dangers that so concern me.</li>
</ul>
<p><em><strong>Other posts based on my January, 2010 New England Database Summit keynote address</strong></em></p>
<ul>
<li><a title="Data-based snooping — a huge threat to liberty that we’re all helping make worse" href="../2010/01/31/data-based-snooping-threat-libert/">Data-based snooping — a huge threat to liberty that we’re all helping make worse</a></li>
<li><a title="Flash, other solid-state memory, and disk" href="../2010/01/31/flash-pcmsolid-state-memory-disk/">Flash, other solid-state memory, and disk</a></li>
<li><a title="Interesting trends in database and analytic technology" href="../2010/01/31/trends-database-aanalytic-technology/">Interesting trends in database and analytic technology</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Interesting trends in database and analytic technology</title>
		<link>http://www.dbms2.com/2010/01/31/trends-database-aanalytic-technology/</link>
		<comments>http://www.dbms2.com/2010/01/31/trends-database-aanalytic-technology/#comments</comments>
		<pubDate>Mon, 01 Feb 2010 02:11:17 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1492</guid>
		<description><![CDATA[My project for the day is blogging based on my “Database and analytic technology: State of the union” talk of a few days ago. (I called it that because of when it was given, because it mixed prescriptive and descriptive elements, and because I wanted to call attention to the fact that I cover the [...]]]></description>
			<content:encoded><![CDATA[<p>My project for the day is blogging based on my “<a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/" >Database and analytic technology: </a><a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/" >State of the union</a>” talk of a few days ago. (I called it that because of when it was given, because it mixed prescriptive and descriptive elements, and because I wanted to call attention to the fact that I cover the <em>union</em> of database and analytic technologies – the <em>intersection</em> of those two sectors is an area of particular focus, but is far from the whole of my coverage.)</p>
<p>One section covered recent/ongoing/near-future trends that I thought were particularly interesting, including:<span id="more-1492"></span></p>
<p><strong>Simpler database technology,</strong> by which I mean DBMS that are:</p>
<ul>
<li>Easier 	to administer than market-leading systems &#8230;</li>
<li>… even if at the cost of being special-purpose</li>
<li>E.g.,
<ul>
<li>MySQL and older mid-tier RDBMS such as Progress</li>
<li>Many analytic DBMS and appliances, most notably Netezza&#8217;s</li>
</ul>
</li>
</ul>
<p>For general purpose or OLTP uses, I&#8217;m not a big fan of MySQL (not enough progress in making it industrial-strength), PostgreSQL (no good company behind it – I&#8217;m a non-fan of EnterpriseDB), or Ingres (open source or not, it&#8217;s an antiquated system that hasn&#8217;t been invested in as much as Oracle, DB2 or SQL Server).</p>
<p>But I get the impression there are a lot of contenders among small startups, featuring very new architectures for OLTP or general-purpose database management. VoltDB comes to mind. NimbusDB is finally within range of getting funded. Dan Weinreb told me Friday he knows of a bunch of others as well. And that&#8217;s all before we even get into the <a href="http://www.dbms2.com/2009/12/12/legit-nosql-key-value-store/" >NoSQL</a> kind of alternative.</p>
<p><strong>Flexible storage architectures.</strong> That&#8217;s starting out with an emphasis on hybrid columnar, as in the examples of <a href="http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/" >Vertica</a> and <a href="http://www.dbms2.com/2009/10/14/greenplum-hybrid-columnar/" >Greenplum</a>. Oracle (to whom I&#8217;m under no NDA obligation) and other vendors (to whom I am) are going that way as well.</p>
<p><strong>Multi-tier database architectures,</strong> by which I mean at least two things:</p>
<ul>
<li>The database tier/server tier split of Exadata</li>
<li>Hybrid RAM/disk architectures, examples of which include
<ul>
<li>Vertica&#8217;s RAM-based write-optimized store</li>
<li><a href="http://www.dbms2.com/2009/10/18/introduction-to-sensage/" >Sensage&#8217;s CEP-in-the-DBMS</a></li>
<li>This in-memory analytics stuff we keep hearing about from the BI vendors</li>
<li>Any true in-memory/disk hybrid, such as the regrettably sidelined <a href="http://www.dbms2.com/2007/12/21/ibm-acquires-soliddb/" >solidDB</a></li>
<li>Smart thinking by numerous DBMS vendors about optimizing the use of RAM and/or Level 2 cache</li>
</ul>
</li>
</ul>
<p>Netezza is particularly interesting to watch in this regard because it:</p>
<ul>
<li>Had a pretty strict storage/other processing split in prior product generations and &#8230;</li>
<li>… <a href="http://www.dbms2.com/2009/07/30/netezza-new-product-family/" >ditched that in its latest generation</a> …</li>
<li>… which however is focused on optimizing the use of RAM cache</li>
</ul>
<p>Also noteworthy is Petascan, the stealth-mode –and therefore harder to watch right now <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  – company I keep teasing about, which makes a strong case for carrying the database/storage tier split into the flash/solid-state memory technology generation. <a href="../2009/04/20/calpont-update-you-read-it-here-first/">Calpont</a> also has a server/storage tier split, but that&#8217;s of mainly theoretical interest unless and until Calpont actually ships an MPP version of <a href="../2009/11/07/calponts-infinidb/">InfiniDB</a>.</p>
<p><strong>Cheaper parts,</strong> which have of course been a huge trend for decades.<a href="../2010/01/31/flash-pcmsolid-state-memory-disk/"> Solid-state memory</a> will soon conquer the world. Meanwhile, cheaper sensors drive that <a href="../2010/01/17/three-broad-categories-of-data/">machine-generated data</a> I keep talking about.</p>
<p>An ever-better understanding of <strong>scale-out technology,</strong> in several respects, including:</p>
<ul>
<li>Query, notably data movement for MPP DBMS</li>
<li>Update, especially minimalistic DBMS approaches, be they sharded MySQL or more NoSQLish</li>
<li>Number-crunching, especially via MapReduce and/or parallel analytic libraries integrated into DBMS</li>
</ul>
<p>Cool trends I touched on more briefly include:</p>
<ul>
<li>More data being available for analysis. This was a core theme of my <a href="http://www.dbms2.com/2009/07/30/netezza-enzee-universe/" >Enzee Universe keynote speeches</a>; there are also some notes on it in my 	post based on my <a href="http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/" >Boston Big Data Summit</a> talk.</li>
<li>More users being served by analytics. Ditto.</li>
<li>Data exploration/visualization, ala QlikView, Spotfire, or Tableau, and also the faceted stuff.</li>
<li>The democratization of data mining. But I&#8217;m not as sure of that one as of the others&#8230;</li>
</ul>
<p>One area I flat-out forgot to mention is <a href="http://www.dbms2.com/2009/06/08/the-future-of-data-marts/" >easy data mart spin-out</a>.</p>
<p><em><strong>Other posts based on my January, 2010 New England Database Summit keynote address</strong></em></p>
<ul>
<li><a title="Data-based snooping — a huge threat to liberty that we’re all helping make worse" href="../2010/01/31/data-based-snooping-threat-libert/">Data-based snooping — a huge threat to liberty that we’re all helping make worse</a></li>
<li><a title="Flash, other solid-state memory, and disk" href="../2010/01/31/flash-pcmsolid-state-memory-disk/">Flash, other solid-state memory, and disk</a></li>
<li><a title="Open issues in database and analytic technology" href="../2010/02/01/open-issues-in-database-and-analytic-technology/">Open issues in database and analytic technology</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/01/31/trends-database-aanalytic-technology/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Flash, other solid-state memory, and disk</title>
		<link>http://www.dbms2.com/2010/01/31/flash-pcmsolid-state-memory-disk/</link>
		<comments>http://www.dbms2.com/2010/01/31/flash-pcmsolid-state-memory-disk/#comments</comments>
		<pubDate>Sun, 31 Jan 2010 22:12:30 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1469</guid>
		<description><![CDATA[If there&#8217;s one subject on which the New England Database Summit changed or at least clarified my thinking,* it&#8217;s future storage technologies. Here&#8217;s what I now think:

Solid-state memory will soon be 	the right storage technology for a large fraction of databases, OLTP and analytic alike. I&#8217;m not sure whether the initial cutoff in 	database size [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">If there&#8217;s one subject on which the <a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/" >New England Database Summit</a> changed or at least clarified my thinking,* it&#8217;s future storage technologies. Here&#8217;s what I now think:</p>
<ul>
<li><strong>Solid-state memory will soon be 	the right storage technology for a large fraction of databases,</strong> OLTP and analytic alike. I&#8217;m not sure whether the initial cutoff in 	database size is best thought of as terabytes or 10s of terabytes, 	but it&#8217;s in that range. And it will increase over time, for the 	usual cheaper-parts reasons.</li>
<li><strong>That doesn&#8217;t necessarily mean 	flash.</strong> <a href="http://en.wikipedia.org/wiki/Phase-change_memory" onclick="javascript:pageTracker._trackPageview('/en.wikipedia.org');">PCM</a> (Phase-Change Memory) is coming down the pike, with perhaps 100X the 	durability of flash, in terms of the total number of writes it can 	tolerate. On the other hand, PCM has issues in the face of heat. 	More futuristically, IBM is also high on <a href="http://www.almaden.ibm.com/spinaps/research/sd/?racetrack" onclick="javascript:pageTracker._trackPageview('/www.almaden.ibm.com');">magnetic racetrack 	memory</a>. IBM likes the term <em>storage-class memory</em> to 	cover all this &#8212; which I find regrettable, since the acronym SCM is 	way overloaded already. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
<li><strong>Putting a disk controller in 	front of solid-state memory is really wasteful.</strong> It wreaks havoc 	on I/O rates.</li>
<li><strong>Generic PCIe interfaces don&#8217;t 	suffice either,</strong> in many analytic use cases. Their I/O is better, 	but still not good enough. (Doing better yet is where Petascan – 	the stealth-mode company I keep teasing about – comes in.)</li>
<li><strong>Disk will long be useful for 	very large databases.</strong> Kryder&#8217;s Law, about disk <strong>capacity,</strong> has at 	least as high an annual improvement as Moore&#8217;s Law shows for chip 	capacity, the <a href="http://www.dbms2.com/2010/01/31/the-disk-rotation-speed-bottleneck/" >disk rotation speed bottleneck</a> notwithstanding. Disk 	will long be much cheaper than silicon for data storage. And cheaper 	silicon in sensors will lead to ever more <a href="http://www.dbms2.com/2010/01/17/three-broad-categories-of-data/" >machine-generated data</a> that fills up a lot of disks.</li>
<li><strong>Disk will long be useful for 	archiving.</strong> Disk is the new tape.</li>
</ul>
<p style="margin-bottom: 0in;"><em>*When the first three people to the question microphone include both Mike Stonebraker and Dave DeWitt, your thinking tends to clarify in a hurry.</em></p>
<p style="margin-bottom: 0in;"><em><strong>Related links</strong></em></p>
<ul>
<li><span style="font-style: normal;"><span style="font-weight: normal;">A 	<a href="http://drona.csa.iisc.ernet.in/%7Egopi/west10/HPCA-WEST-SCMandSoftware.pdf" onclick="javascript:pageTracker._trackPageview('/drona.csa.iisc.ernet.in');">slide 	deck by C. Mohan of IBM</a> similar to the one he presented at the 	NEDB Summit about storage-class memories.</span></span></li>
<li><span style="font-style: normal;"><span style="font-weight: normal;">A 	much more detailed <a href="http://www.usenix.org/events/fast/tutorials/T3.pdf" onclick="javascript:pageTracker._trackPageview('/www.usenix.org');">IBM 	presentation</a> on storage-class memories.</span></span></li>
<li><span style="font-style: normal;"><span style="font-weight: normal;"><a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" >Oracle&#8217;s</a> and <a href="http://www.dbms2.com/2009/10/25/teradata-hardware-strategy-and-tactics/" >Teradata&#8217;s</a> beliefs about the importance of solid-state memory.<br />
</span></span></li>
</ul>
<p><em><strong>Other posts based on my January, 2010 New England Database Summit keynote address</strong></em></p>
<ul>
<li><a title="Data-based snooping — a huge threat to liberty that we’re all helping make worse" href="../2010/01/31/data-based-snooping-threat-libert/">Data-based snooping — a huge threat to liberty that we’re all helping make worse</a></li>
<li><a title="Interesting trends in database and analytic technology" href="../2010/01/31/trends-database-aanalytic-technology/">Interesting trends in database and analytic technology</a></li>
<li><a title="Open issues in database and analytic technology" href="../2010/02/01/open-issues-in-database-and-analytic-technology/">Open issues in database and analytic technology</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/01/31/flash-pcmsolid-state-memory-disk/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Two cornerstones of Oracle’s database hardware strategy</title>
		<link>http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/</link>
		<comments>http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/#comments</comments>
		<pubDate>Fri, 22 Jan 2010 08:59:23 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cache]]></category>
		<category><![CDATA[DBMS product categories]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1429</guid>
		<description><![CDATA[After several months of careful optimization, Oracle managed to pick the most inconvenient* day possible for me to get an Exadata update from Juan Loaiza. But the call itself was long and fascinating, with the two main takeaways being:

Oracle      thinks flash memory is the most important hardware technology of the [...]]]></description>
			<content:encoded><![CDATA[<p>After several months of careful optimization, Oracle managed to pick the most inconvenient* day possible for me to get an Exadata update from Juan Loaiza. But the call itself was long and fascinating, with the two main takeaways being:</p>
<ul>
<li>Oracle      thinks <strong>flash memory is the most important hardware technology of the      decade,</strong> one that could lead to Oracle being “bumped off” if they don’t      get it right.</li>
<li>Juan      believes <strong>the “bulk” of Oracle’s business will move over to Exadata-like      technology over the next 5-10 years. </strong>Numbers-wise, this seems to be based more      on Exadata being a platform for consolidating an enterprise’s many Oracle databases than it is on Exadata running a few Especially Big Honking Database      management tasks.</li>
</ul>
<p>And by the way, Oracle doesn’t make its storage-tier software available to run on anything than Oracle-designed boxes.  At the moment, that means Exadata Versions 1 and 2. Since Exadata is by far Oracle’s best DBMS offering (at least in theory), that means <strong>Oracle’s best database offering only runs on specific Oracle-sold hardware platforms.<span id="more-1429"></span></strong> <em></em></p>
<p><em>*E.g., I was sitting upstairs in my parents’ apartment in </em><em>Columbus</em><em>, </em><em>OH</em><em> having the call while their doctor, who I’ve never met, was visiting downstairs. He offered to make a special trip back Saturday afternoon because he missed me Wednesday, but he’s notorious for not coming when he says he will.</em> <em>Update: He didn&#8217;t come Saturday. On Saturday he said he&#8217;d come Sunday. He didn&#8217;t do that either. </em></p>
<p>Other high- and lowlights of our conversation included:</p>
<ul>
<li>Flash      is the main new hardware element in Exadata Version 2. Otherwise, Exadata      2 is just an annual refresh of Exadata Version 1 to include updated      components (Nehalem chips, bigger disk drives, etc.)</li>
<li>Juan      thinks it’s suboptimal to use flash memory through the bottleneck of disk      controllers, favoring PCIe cards instead. (I emphatically agree.)</li>
<li>Juan      resolutely ducked questions about <a href="../../../../../2009/09/25/the-hunt-for-oracle-exadata-production-references/">actual      Exadata production deployment</a>. Literally the only fact he shared in      that regard is that there are at least 2 Exadata production systems      running that each have 2 or more racks cabled together.</li>
<li>Juan      stressed that Exadata runs apps written over Oracle DBMS unchanged.</li>
<li>When      making mixed-workload claims for Exadata 2, Juan stressed consolidation of      multiple databases, some OLTP and some analytic. He didn’t really argue      with my skepticism about <a href="../../../../../2009/09/29/integration-oltp-data-warehousing-exadata-2/">integrating      OLTP and analytics in the same database</a>, with one exception:</li>
<li>Juan      pointed out that in major OLTP apps such as ERP systems, there often is      actually more processing going on in reporting and other batch stuff than      there is in true OLTP.</li>
<li>Exadata      2’s flash memory is designed as a disk cache, smarter than LRU (Least      Recently Used). The two examples Juan gave of “smarter than LRU” are that      backups and table scans don’t flush the cache.</li>
<li>I      forget whether this is new in Exadata 2 (I think it is), but anyhow –      Exadata has a “Storage Index” that’s a lot like a <a href="../../../../../2006/09/20/netezza-vs-conventional-data-warehousing-rdbms/">Netezza      zone map</a>. I.e., for each megabyte or so of data it stores the min and      max value of every column; if a query predicate rules out those ranges,      that megabyte is never retrieved.</li>
<li>Oracle      has long offered what sounds like flexible workload management capability,      and this has now been extended to specifically include I/O resources on      the storage tier.</li>
<li>This      isn’t Exadata-specific, but Oracle has built a file system on top of its      DBMS, optimized for speed, which helps with, e.g., ELT      (Extract/Load/Transform). Evidently, it’s not at all the same thing as      Mark Benioff’s 1990s Microsoft-annoying IFS (Internet File System)      project, which seems to have morphed into a content management SDK.</li>
</ul>
<p>Highlights specifically in the area of parallelization included:</p>
<ul>
<li>Juan      stressed that all databases consolidated onto an Exadata machine      are/should be striped across all storage units.</li>
<li>On the      other hand, Juan said that different databases should be confined to      specific cores or CPUs on the database tier.</li>
<li>But on      the third hand, Juan also stressed – in what could be called a “private      cloud” pitch – that there’s great elasticity as to which databases are      matched to which server CPUs.</li>
<li>Contrary      to what <a href="../../../../../2008/09/28/exadata-oracle-database-machine-parallelization/">I      thought he and/or his colleagues told me a year ago</a>, Juan said RAC      (Real Application Clusters) is a big part of Oracle’s data warehouse      processing.</li>
<li>However,      Juan says that what I regard(ed) as a major objection to Oracle’s      database-tier parallelization &#8212; the need to manually specify “degrees of      parallelism” &#8212; has now been obviated by automation. Juan thinks that few      data warehouse DBAs will now need to manually tune parallelism, with minor      exceptions. One exception he cites is that if a nightly report really is      non-urgent, it can just be forced to run on a single core with no chance      to grab more resources. (However, Juan thinks manual tuning of parallelism      will continue to play a greater role in OLTP.)</li>
</ul>
<p>OK. That’s all I can get done tonight (see above re: inconvenience of timing). Follow-on subjects I’d like to and indeed plan to post about include:</p>
<ul>
<li>What      Juan said about hybrid columnar compression</li>
<li>Oracle’s      delightfully non-confidential slide deck, and a few comments about same</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Research agenda for 2010</title>
		<link>http://www.dbms2.com/2009/12/31/research-agenda-for-2010/</link>
		<comments>http://www.dbms2.com/2009/12/31/research-agenda-for-2010/#comments</comments>
		<pubDate>Thu, 31 Dec 2009 22:02:11 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[About this blog]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Jaspersoft]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[QlikTech and QlikView]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Tableau Software]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1384</guid>
		<description><![CDATA[As you may have noticed, I&#8217;ve been posting less research/analysis in November and December than during some other periods. In no particular order, reasons have included:

Over a 20 week period, I had travel in 13 of them.
3 of those were vacation in November.
As travel finally wound down:

It was time to focus a bit on my [...]]]></description>
			<content:encoded><![CDATA[<p>As you may have noticed, I&#8217;ve been posting less research/analysis in November and December than during some other periods. In no particular order, reasons have included:<span id="more-1384"></span></p>
<ul>
<li>Over a 20 week period, I had travel in 13 of them.</li>
<li>3 of those were vacation in November.</li>
<li>As travel finally wound down:
<ul>
<li>It was time to focus a bit on <a href="http://www.monashreport.com/2009/12/14/our-services-for-technology-vendors/" onclick="javascript:pageTracker._trackPageview('/www.monashreport.com');">my own business</a></li>
<li>Elder care got serious; e.g., my parents went to the hospital on consecutive days, Christmas week, the first one on their 52nd wedding anniversary</li>
<li>Linda and I both got really nasty colds</li>
<li>The holidays were happening</li>
<li>I started helping out a really cool startup company (first time I&#8217;ve taken stock in a private company in years; more on that soon)</li>
<li>There was less industry news going on anyway than in some other recent months</li>
</ul>
</li>
</ul>
<p>But of course I plan to speed up the research/analysis/writing soon. Here, FYI, are a few things I have on my plate.</p>
<p>For a couple of years now, the center of what I&#8217;ve written about has been <strong>high-performance analytic data processing. </strong>You can expect me to keep pursuing that in all its aspects. But there are two specific areas I&#8217;ve identified in which I want to redouble my efforts.</p>
<p>First, almost every BI vendor has an effort in<strong> &#8220;in-memory analytics&#8221;</strong> and/or <strong>&#8220;interactive data exploration.&#8221;</strong> I suspect there&#8217;s a lot of difference in underlying technologies, but I&#8217;m having trouble getting details. QlikTech (the worst foot-dragger of the three), Microstrategy, and Jaspersoft all owe me follow-up conversations with the people who know what&#8217;s going on well enough to explain it. Tableau keeps promising me a briefing and then not delivering. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  And I&#8217;m even further behind with the behemoth companies &#8212; Oracle, Microsoft, IBM/Cognos (arguably) et al.</p>
<p>Second, <strong>solid-state memory</strong> is coming to data warehousing. The obvious reasons are that it&#8217;s obviously close, and Moore&#8217;s Law still applies to bring it closer. More specific reasons for believing in solid-state include:</p>
<ul>
<li><a href="http://www.dbms2.com/2009/10/25/teradata-hardware-strategy-and-tactics/" >Teradata</a> has made large strides in making solid-state memory useful.</li>
<li>The stealth start-up I mentioned above is poised to make further strides.</li>
<li>(I&#8217;m not totally sure yet about this part) The in-memory analytics mentioned above might wind up working better in solid-state memory than in DRAM.</li>
</ul>
<p>I&#8217;m spending quite a few cycles thinking about this area.</p>
<p>I&#8217;d also like to look further at <strong>analytic applications </strong>and<strong> advanced analytic functionality.</strong> I foreshadowed some of that in my <a href="http://www.dbms2.com/2009/12/02/mapreduce-for-complex-analytics-webina/" >Aster webinars</a>. There&#8217;s some good stuff to talk about at Teradata I should try to write up soon. I need to have a follow-up conversation with fascinating anti-fraud guy I met at Netezza&#8217;s London event. But that&#8217;s all just scratching the surface.</p>
<p>Both the MySQL and PostgreSQL communities are in some disarray. Other non-behemoth <strong>OLTP/general-purpose DBMS </strong> seem to be, at best, thriving niche products. (I see little in the way of innovative new use for, say, Progress, Cache&#8217;, Ingres, or anything multivalue.) But it feels as if there&#8217;s more opportunity out there than is being met. And at a minimum, I&#8217;d like to learn more than the almost nothing I know about <strong>OLTP <a href="http://www.dbms2.com/2009/12/12/legit-nosql-key-value-store/" >NoSQL</a> alternatives.</strong></p>
<p>I&#8217;ve already said that I expect to give an <a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/" >industry-overview talk</a> at MIT on January 28. I also have an overviewy press article and overviewy white paper under discussion. If those come to fruition, I&#8217;ll of course let you know. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Besides the above, I of course have a number of specific posts that I need to get around to researching and writing at some point, often on topics I&#8217;ve already written about before.  Three subjects fairly high on the priority list are scientific data management, machine-generated data, and Oracle Exadata.</p>
<p>And finally, I have some subjects queued up for a couple of my other blogs as well. If you don&#8217;t already take our <a href="http://www.monash.com/blogs.html" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">multi-blog integrated feed</a>, this might be a good time to switch over.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/12/31/research-agenda-for-2010/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Ray Wang on SAP</title>
		<link>http://www.dbms2.com/2009/12/11/ray-wang-on-sap/</link>
		<comments>http://www.dbms2.com/2009/12/11/ray-wang-on-sap/#comments</comments>
		<pubDate>Fri, 11 Dec 2009 23:16:54 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[MOLAP]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Solid-state memory]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1286</guid>
		<description><![CDATA[Ray Wang made a terrific post based on SAP&#8217;s annual influencer love-in, an event which I no longer attend. Ray believes SAP has been in a &#8220;crisis&#8221;, and sums up his views as
The Bottom Line  &#8211; SAP’s Turning The Corner

Credit must be given to SAP for charting a new course.  A shift in the management [...]]]></description>
			<content:encoded><![CDATA[<p>Ray Wang made <a href="http://blog.softwareinsider.org/2009/12/11/event-report-2009-sap-influencer-summit-sap-must-put-strategy-to-execution-in-order-to-prove-clarity-of-vision/" onclick="javascript:pageTracker._trackPageview('/blog.softwareinsider.org');">a terrific post based on SAP&#8217;s annual influencer love-in</a>, an event which <a href="http://www.monashreport.com/2007/01/03/sap-nonsense-ethics/" onclick="javascript:pageTracker._trackPageview('/www.monashreport.com');">I no longer attend</a>. Ray believes SAP has been in a &#8220;crisis&#8221;, and sums up his views as</p>
<blockquote><p><strong>The Bottom Line  &#8211; SAP’s Turning The Corner<br />
</strong></p>
<p>Credit must be given to SAP for charting a new course.  A shift in the management philosophy and product direction will take years to realize, however, its not too late for change.  SAP must remember its roots and become more German and less American.  The renewed focus must put customer requests and priorities ahead of SAP’s bureaucracy.  The emphasis must focus on the <a href="http://blog.softwareinsider.org/2009/03/16/mondays-musings-its-the-relationship-stupid-part-1-commoditizing-the-workforce/" onclick="javascript:pageTracker._trackPageview('/blog.softwareinsider.org');">relationship</a>.  When that reemerges in how SAP works with customers, partners, influencers, and its own employees, SAP will be back in good graces. In the meantime, its  time to get to work and deliver.  Oracle’s Fusions Apps are coming soon and competitors such as IBM, Microsoft, Epicor, IFS, and SalesForce.com will not relent.</p></blockquote>
<p>I recall the 1980s, when SAP&#8217;s main differentiator, at least in the English-speaking US, was a total commitment to customer success, and when it could be taken for granted that SAP would do business ethically. Things change, and not always for the better.</p>
<p>Anyhow, the reason I&#8217;m highlighting Ray&#8217;s post is that he makes reference to a number of interesting SAP-cetric technology trends or initiatives.<span id="more-1286"></span> In no particular order, Ray suggests:</p>
<ul>
<li>SAP&#8217;s and Oracle&#8217;s (Fusion) <a href="http://www.dbms2.com/2009/07/07/hasso-plattner-calls-for-in-memory-oltp-column-stores/" >efforts to meld memory-centric analytics with operational apps</a> will be crucial for large enterprises &#8212; but perhaps only around the middle of the next decade. (I basically agree, although I&#8217;d note that:
<ul>
<li>Wisely, Ray suggested a very long time frame.</li>
<li>BI/operational app integration has been, on the whole, glacial.</li>
<li>The idea that you have to put pre-built aggregates into RAM to get performance is an indictment of market-leading RDBMS &#8212; but it&#8217;s a fair indictment.</li>
<li>I&#8217;m not sure whether memory-centric OLAP will wind up in RAM or Flash. If the data stores are updated at near-transactional speeds, RAM may make more sense. Otherwise, Flash should have major advantages.)</li>
</ul>
</li>
<li>SAP&#8217;s long-standing attempts to support third-party development of SAP add-ons are a technological mess, in line with <a href="http://www.dbms2.com/2007/10/12/sap-is-losing-crucial-managerial-talent/" >my fears a couple of years ago</a>. However, the business-relationship part of the effort is vastly stronger.</li>
<li>As SAP focused more on the mid-market, it is partnering closely with Microsoft. (If you think about it, that makes all kinds of sense.)</li>
<li>Energy/environmental/safety tracking &#8212; i.e., sustainability &#8212; tools are a big deal. (See also <em><a href="http://www.economist.com/businessfinance/displaystory.cfm?story_id=15022465" onclick="javascript:pageTracker._trackPageview('/www.economist.com');">The Economist</a></em> on that point.)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/12/11/ray-wang-on-sap/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A framework for thinking about data warehouse growth</title>
		<link>http://www.dbms2.com/2009/12/07/data-warehouse-volume-growth/</link>
		<comments>http://www.dbms2.com/2009/12/07/data-warehouse-volume-growth/#comments</comments>
		<pubDate>Mon, 07 Dec 2009 13:50:47 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Application areas]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Text]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1278</guid>
		<description><![CDATA[There are only three ways that the amount of data stored in data warehouses can grow:

The same kinds of data are 	stored as before, with more being added over time.
The same kinds of data are stored 	as before, but in more detail.
New kinds of data are 	stored.

The first of those three ways doesn&#8217;t lead to [...]]]></description>
			<content:encoded><![CDATA[<p>There are only three ways that the amount of data stored in data warehouses can grow:</p>
<ul>
<li><strong>The same kinds of data are 	stored as before, </strong>with more being added over time.</li>
<li>The same kinds of data are stored 	as before, but in <strong>more detail.</strong></li>
<li><strong>New kinds</strong> of data are 	stored.</li>
</ul>
<p style="margin-bottom: 0in;"><span id="more-1278"></span>The first of those three ways doesn&#8217;t lead to dramatic growth. If a data warehouse goes up from 5 years of data to 6, then its overall size will grow a little over 20%.  (How little depends on what the underlying business growth is –  i.e., on how many more business events you have next year than you had 3 years ago.) That&#8217;s almost certainly going to be well-handled, by whatever technology manages your data warehouse today, given that:</p>
<ul>
<li>Chips are still subject to 	something resembling Moore&#8217;s Law.</li>
<li>Disk capacity is still subject to 	Kryder&#8217;s Law, which is like Moore&#8217;s Law but with yet faster growth 	rates.</li>
<li>DBMS software gets more performant 	over time.</li>
</ul>
<p style="margin-bottom: 0in;">So <strong>the cost of managing your same-as-before data will go down every year,</strong> even as the volume of that data grows.</p>
<p style="margin-bottom: 0in;">True, <a href="../2005/11/13/breaking-the-disk-speed-barrier/">disk rotation speeds have only increased 12.5 times since the Eisenhower Administration</a>. But <a href="../2009/10/25/teradata-hardware-strategy-and-tactics/">solid-state drives (SSDs) are getting practical for data warehousing</a> fast, so even that bottleneck eventually will get swept away. And since what we&#8217;re discussing is, basically, the first and hence presumably highest-value data to be warehoused, it&#8217;s apt to wind up on SSDs before some other kinds of data warrant that treatment.  So it&#8217;s the two other factors that drive the greatest data warehouse growth.</p>
<p style="margin-bottom: 0in;">As costs go down, the wisdom of keeping <strong>detailed data</strong> goes up. I&#8217;d go so far as to say that <strong>every piece of data generated by a human being should be preserved and kept online,</strong> legal and privacy considerations permitting.* Most forms of capital-, labor-, and/or location-based competitive advantage being commoditized and/or globalized away. But information remains a unique corporate asset.  Don&#8217;t discard it lightly.</p>
<p style="margin-bottom: 0in;"><em>*Unless there&#8217;s an explicit law mandating data destruction, legal considerations </em>should <em>permit. The idea “Let&#8217;s destroy something of irreplaceable value today, against the possibility we might be brought to judgment tomorrow” is both morally and pragmatically weird. Privacy, however, may be a different matter.</em></p>
<p style="margin-bottom: 0in; font-style: normal;">What that means in practice is that “disk is the new tape.” No-apologies performance can be had on data warehouse systems for <a href="http://www.dbms2.com/2009/07/30/the-netezza-price-point/" >$20,000/terabyte</a> or less – perhaps even <a href="http://www.dbms2.com/2009/10/19/greenplum-free-single-node-edition/" >a lot less</a>. Tolerable performance may cost 3-4X less than that. I think a lot of the growth in data warehouse volumes is of exactly this kind.</p>
<p style="margin-bottom: 0in; font-style: normal;">Ultimately, however, the greatest growth in data warehouse volumes will come from <strong>new kinds of data,</strong> especially data that is partly or wholly <strong>machine-generated.</strong><span> Moore&#8217;s Law applied to sensor chips tells us that data creation will grow just as fast as the data storage capacity. And thus </span><strong>we will be throwing away most machine-generated data forever.</strong><span> But what we keep will grow – well, it probably will grow at Moore&#8217;s/Kryder&#8217;s Law speeds.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>That&#8217;s not to say new kinds of data are all high-volume/machine-generated. Back in 2005, I wrote<a href="http://www.computerworld.com/s/article/103054/More_Data_Makes_Your_Business_Grow?taxonomyId=9&amp;pageNumber=2" onclick="javascript:pageTracker._trackPageview('/www.computerworld.com');"> </a></span><span><a href="http://www.computerworld.com/s/article/103054/More_Data_Makes_Your_Business_Grow?taxonomyId=9&amp;pageNumber=2" onclick="javascript:pageTracker._trackPageview('/www.computerworld.com');">two</a> <a href="http://blogs.computerworld.com/node/512" onclick="javascript:pageTracker._trackPageview('/blogs.computerworld.com');">pieces</a></span><span> for </span><em><span>Computerworld</span></em><span> advocating aggressive pursuit of new data sources, and the examples I mentioned were:</span></p>
<ul>
<li><span>Loyalty cards, especially 	in gaming</span></li>
<li>Location-based analytics</li>
<li>Extra customer feedback (e.g., 	opinion surveys)</li>
<li>Price/offer testing</li>
<li>Text mining 	in general</li>
<li>Medical 	records</li>
</ul>
<p style="margin-bottom: 0in;">Today I&#8217;d add (among others):</p>
<ul>
<li>RFID</li>
<li>The raw 	output from medical test devices</li>
<li>Sensors up and down the energy supply chain</li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;">But some of those older, low-data-volume ideas still head my list of low-hanging analytic fruit.</p>
<p style="margin-bottom: 0in; font-style: normal;"><span>One more complication – these buckets I&#8217;m outlining are less than precise. For example:</span></p>
<ul>
<li><span>Telecom 	CDRs (Call Detail Records) are machine-generated from a seed of 	human activity. They have long been stored, but now are being kept 	in much more detail. This is why telecommunications is one of the 	top markets for data warehouse technology.</span></li>
<li><span>Stock 	trade data used to be based on human decisions. Now most of it is 	just machines buying and selling from each other. Either way, 	increasingly many investment institutions want to keep 	100-terabyte-scale databases of complete historical trade detail. 	And that is why financial services is another huge market for data 	warehouse technology.</span></li>
<li><span>Not 	long ago, web and network event logs. didn&#8217;t even exist, or were 	tiny where they did. Now they fill the largest known commercial 	databases, at firms such as </span><span><a href="http://www.dbms2.com/2009/10/01/yahoos-decapetabyte-data-warehousinghadoop/" >Yahoo</a>, 	<a href="http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/" >eBay</a>, and <a href="http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/" >Facebook</a>.</span><span> Even so, more is thrown away than kept, especially on the network 	event side, which is a multiple of the size of the pure clickstream 	data.</span></li>
<li><span>We 	don&#8217;t know exactly what all data intelligence agencies collect from 	telemetry, from monitoring commercial telecommunication traffic, and 	so on. But they&#8217;re surely throwing the vast majority away, even as 	the small part they keep is </span><span><a href="http://www.dbms2.com/2009/09/30/facts-and-rumors/" >petabyte-scale</a>.</span></li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;">But none of that interferes with my main points, which are:</p>
<ul>
<li><strong>Databases 	will continue to grow very quickly.</strong></li>
<li>One big driver 	is <strong>the increasing detail in which data is kept online.</strong></li>
<li>An even bigger 	driver will be <strong>the unending ability of machines to generate ever 	greater streams of at-least-somewhat interesting data.</strong></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/12/07/data-warehouse-volume-growth/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Boston Big Data Summit keynote outline</title>
		<link>http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/</link>
		<comments>http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/#comments</comments>
		<pubDate>Mon, 23 Nov 2009 06:25:50 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[DBMS product categories]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Humor]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1227</guid>
		<description><![CDATA[Last month, Bob Zurek asked me to give a talk on “Big Data”, where “big” is anything from a few terabytes on up, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I&#8217;m posting them below.

The top two points [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Last month, Bob Zurek asked me to give a talk on <a href="http://www.dbms2.com/2009/10/09/presentations-upcoming/" >“Big Data”, where “big” is anything from a few terabytes on up</a>, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I&#8217;m posting them below.</p>
<p><span id="more-1227"></span></p>
<p style="margin-bottom: 0in;">The top two points from Q&amp;A probably were:</p>
<ul>
<li><strong>Big Data and the cloud actually 	have relatively little to do with each other,</strong> <a href="http://www.dbms2.com/2009/10/30/aster-data-application-server-ncluster/" >a few exceptions</a> notwithstanding, especially if the data is in a shared-nothing DBMS 	(as opposed to, say, a MapReduce-oriented file cluster). Two 	principal reasons are:
<ul>
<li>Redistributing data from node to 	node is a little slow, undermining some of the elasticity benefits 	of the cloud.</li>
<li><a href="http://www.dbms2.com/2009/05/29/sneakernet-to-the-cloud/" >Getting data into the cloud in the 	first place is a lot slow</a>.</li>
</ul>
</li>
<li><strong>The NoSQL movement is a lot like 	the Ron Paul campaign</strong> &#8212; it consists of people who are dissatisfied 	with the status quo, whose dissatisfaction has a lot to do with 	insufficient liberty and/or excessive expenditure, and who otherwise 	don&#8217;t have a whole lot in common with each other.</li>
</ul>
<p style="margin-bottom: 0in;">Anyhow, here are my notes for the talk, edited in just a couple of places for readability or linkage.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><strong>Quick introduction</strong></p>
<ul>
<li>Big Data vs. cloud</li>
<li>How big is Big Data?</li>
<li>At the low end of that range, 	there&#8217;s little you can&#8217;t do with conventional technology if you 	have:
<ul>
<li>An unlimited budget for hardware</li>
<li>An unlimited budget for software</li>
<li>An unlimited budget for people, 	especially Oracle DBAs</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Big Data in OLTP</strong></p>
<ul>
<li>Hard-core OLTP
<ul>
<li>Focus of DBMS technology for a 	long-time</li>
<li>Big budgets because each 	transaction has significant value</li>
<li>Tough to get users to change 	technologies</li>
</ul>
</li>
<li>Lighter-weight OLTP
<ul>
<li>Classic example = web companies
<ul>
<li>Big ones &#8212;  retail-oriented ones 	(eBay, Amazon) partially excepted &#8212; <a href="http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/" >rolled their own technology 	stacks</a></li>
<li>Reluctant to give money to anybody
<ul>
<li>Open source, etc.</li>
</ul>
</li>
</ul>
</li>
<li>Difficulty finding market
<ul>
<li>Product vs. feature
<ul>
<li>Clustering/HA/DR/whatever</li>
<li>Ditto cloud enablement</li>
</ul>
</li>
<li>True products haven&#8217;t found much 	traction yet</li>
</ul>
</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Analytic Big Data use cases</strong></p>
<ul>
<li>Kinds of data for analytics
<ul>
<li>More of same != big</li>
<li>More detail and/or new kinds
<ul>
<li>Complete data sets</li>
<li>Transactions</li>
<li>Call details</li>
<li>Tick/trade history</li>
<li>Web clickstreams</li>
<li>Network event logs</li>
<li>Other machine-generated data</li>
<li>CAM bottom line
<ul>
<li>Anything human-generated should 	and will be retained in its entirety</li>
<li>Quantities of machine-generated 	data retained should and will grow roughly in line w/ computing cost 	reductions (Moore&#8217;s Law, etc.)</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li>Analytic uses of Big Data
<ul>
<li>Analytics is mainly about three 	things
<ul>
<li>Problem detection</li>
<li>Customer relationship improvement
<ul>
<li>(Those overlap when the customer 	relationship is bad)</li>
</ul>
</li>
<li>Financial statements on steroids</li>
</ul>
</li>
</ul>
<ul>
<li>Main kinds of analytics
<ul>
<li>What BI vendors traditionally sell
<ul>
<li>General reporting and dashboards</li>
<li>Ad-hoc query (now driven from 	those reports and dashboards)</li>
<li>Planning (allegedly integrated 	with BI)</li>
</ul>
</li>
<li>Research
<ul>
<li>Ad hoc relational query (worth 	mentioning twice because it drives so much of the market)</li>
<li>Data mining</li>
<li>Most web search and web mining</li>
</ul>
</li>
<li>Operational/near-real-time</li>
<li>Archiving/compliance</li>
</ul>
</li>
<li>What gets Big?
<ul>
<li>Mainly research and archiving</li>
<li>But when reporting or operational 	get Big, you have really interesting computing problems</li>
</ul>
</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Technology issues and trends</strong></p>
<ul>
<li>Moore&#8217;s Law
<ul>
<li>CPUs &#8212; All about cores, hence 	parallelism is key</li>
<li>RAM</li>
<li>SSDs – hence replace disks</li>
<li>Sensors – hence generate lots 	more data</li>
</ul>
</li>
<li>Kryder&#8217;s Law
<ul>
<li>But <a href="http://www.dbms2.com/2005/11/13/breaking-the-disk-speed-barrier/" >rotational speeds up only 	12.5X since Eisenhower Administration</a></li>
<li>Hence solid-state memory (or RAM) 	will soon take over</li>
</ul>
</li>
<li>In the mean time, I/O bottlenecks 	have had to be beaten
<ul>
<li>Hence sequential scans</li>
<li>Hence <a href="http://www.dbms2.com/2007/03/26/index-light-mpp-data-warehouse-appliances/" >index-light</a> architectures</li>
<li>Hence columnar</li>
</ul>
</li>
<li>DBMS “overhead”
<ul>
<li>Raw license and maintenance fees – 	software increasing fraction of total</li>
<li>OLTP vestiges – locking and all 	that</li>
<li>DBAs
<ul>
<li>People costs = huge fraction of 	total</li>
<li>Index-lightness addresses</li>
<li>So does appliance</li>
</ul>
</li>
<li>Many people don&#8217;t really know how to 	write SQL</li>
</ul>
</li>
<li>Configuration
<ul>
<li>Appliance/tightly-balanced
<ul>
<li>Netezza</li>
<li>Teradata earlier</li>
<li>Greenplum/Sun</li>
<li>Oracle</li>
<li>IBM</li>
<li>Microsoft/Madison</li>
</ul>
</li>
<li>Commodity/do what you want
<ul>
<li>Vertica</li>
<li>Greenplum now</li>
<li>Infobright, Aster and others</li>
<li>MapReduce-oriented file systems</li>
</ul>
</li>
<li><a href="http://www.dbms2.com/2009/10/25/data-warehouse-balanced-hardware-configuration/" >Extreme rigidity is silly</a>
<ul>
<li><a href="http://www.dbms2.com/2009/10/25/teradata-hardware-strategy-and-tactics/" >Teradata, Oracle have both 	signaled moving to more modularity</a></li>
<li>Big driver of that = heterogeneous 	storage
<ul>
<li>Cheap disk</li>
<li>Expensive disk</li>
<li>Solid-state</li>
<li>RAM</li>
</ul>
</li>
</ul>
<ul>
<li>CPU/storage ratio is even more of a 	driver</li>
</ul>
</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Theoretically defensible ways to segment the market</strong></p>
<ul>
<li><a href="http://www.dbms2.com/2009/09/10/analytic-speed-latency/" >Latency requirements</a>
<ul>
<li>High availability and low latency 	go together</li>
</ul>
</li>
<li>Query types
<ul>
<li>Simultaneous users for same</li>
</ul>
</li>
<li>Database size</li>
<li>Budget</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Actual segments right now</strong></p>
<ul>
<li><a href="http://www.dbms2.com/2009/08/24/teradatas-active-enterprise-data-warehouse-story/" >Utter ADW/EDW</a></li>
<li>Data mart
<ul>
<li>Size</li>
<li>Naturally columnar vs. naturally 	row-based</li>
</ul>
</li>
<li>Operational/frontline</li>
<li>Less dramatic/smaller EDW</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Teradata hardware strategy and tactics</title>
		<link>http://www.dbms2.com/2009/10/25/teradata-hardware-strategy-and-tactics/</link>
		<comments>http://www.dbms2.com/2009/10/25/teradata-hardware-strategy-and-tactics/#comments</comments>
		<pubDate>Sun, 25 Oct 2009 04:12:09 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1171</guid>
		<description><![CDATA[In my opinion, the most important takeaways about Teradata&#8217;s hardware strategy from the Teradata Partners conference last week are:

Teradata&#8217;s future lies in 	solid-state memory. That&#8217;s in 	line with what Carson 	Schmidt told me six months ago.
To Teradata&#8217;s surprise, the 	solid-state future is imminent. Teradata is 6-9 months further along with solid-state drives (SSD) 	than it [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">In my opinion, the most important takeaways about Teradata&#8217;s hardware strategy from <a href="http://www.dbms2.com/2009/10/19/teradata-partners-2009/" >the Teradata Partners conference</a> last week are:</p>
<ul>
<li><strong>Teradata&#8217;s future lies in 	solid-state memory.</strong><span> That&#8217;s in 	line with what <a href="../2009/04/28/data-warehouse-storage-options-cheap-expensive-or-solid-state-disk-drives/">Carson 	Schmidt</a> told me six months ago.</span></li>
<li><strong>To Teradata&#8217;s surprise, the 	solid-state future is imminent.</strong><span> Teradata is 6-9 months further along with solid-state drives (SSD) 	than it thought a year ago it would be at this point.</span></li>
<li><strong>Short-term, Teradata is going 	to increase the number of appliance kinds it sells. </strong><span>I 	didn&#8217;t actually get details on anything but the new SSD-based Blurr, 	but it seems there will be others as well.</span></li>
<li><strong>Teradata&#8217;s eventual future is 	to mix and match parts (especially different kinds of storage) in a 	more modular product line.</strong><span style="font-style: normal;"><span> <a href="../2008/10/14/teradata-virtual-storage/">Teradata 	Virtual Storage</a> is of </span></span><span>pretty 	limited value otherwise. I probably believe Teradata will go modular 	more emphatically than Teradata itself does, because I think <a href="http://www.dbms2.com/2009/10/25/data-warehouse-balanced-hardware-configuration/" >doing so will meet users needs more effectively</a> than if Teradata relies strictly on fixed appliance configurations.<br />
</span></li>
</ul>
<p style="margin-bottom: 0in;">In addition, some non-SSD componentry tidbits from Carson Schmidt include:</p>
<ul>
<li>Teradata really likes Intel&#8217;s 	Nehalem CPUs, with special reference to multi-threading, QuickPath 	interconnect, and integrated memory controller. Obviously, 	Nehalem-based Teradata boxes should be expected in the not too 	distant future.</li>
<li>Teradata really likes Nehalem&#8217;s 	successor Westmere too, and expects to be pretty fast to market with 	it (faster than with Nehalem) because Nehalem and Westmere are 	plug-compatible in motherboards.</li>
<li>Teradata will go to 10-gigabit 	Ethernet for external connectivity on all its equipment, which 	should improve load performance.</li>
<li>Teradata will also go to 	10-gigabit Ethernet to play the Bynet role on appliances. Tests are 	indicating this improves query performance.</li>
<li>What&#8217;s more, Teradata believes 	there will be no practical scale-out limitations with 10-gigabit 	Ethernet.</li>
<li>Teradata hasn&#8217;t decided yet what 	to do about 2.5” SFF (Small Form Factor) disk drives, but is 	leaning favorably. Benefits would include lower power consumption 	and smaller cabinets.</li>
<li>Also on Carson&#8217;s list of 	“exciting” future technologies is SAS 2.0, which at 6 	gigabits/second doubles the I/O bandwidth of SAS 1.0.</li>
<li>Carson is even excited about 	removing universal power supplies from the cabinets, increasing 	space for other components.</li>
<li>Teradata picked Intel&#8217;s Host Bus 	Adapters for 10-gigabit Ethernet. The switch supplier hasn&#8217;t been 	determined yet.</li>
</ul>
<p style="margin-bottom: 0in;">Let&#8217;s get back now to SSDs, because over the next few years they&#8217;re the potential game-changer. <span id="more-1171"></span>The big news on SSDs is that after last year&#8217;s Teradata Partners conference, a stealth supplier* introduced itself and convinced Teradata it offers really great SSD technology. For example, not a single SSD it has provided Teradata has ever failed. (In hardware, that is. There have of course been firmware bugs, suitably squashed.) I think SSD performance is also exceeding Teradata&#8217;s expectations. This supplier is where the 6-9 month time-to-market gain comes from.</p>
<p style="margin-bottom: 0in;"><em>*Based on how often the concept of “stealth” and “name is NDAed” came up, I do not believe this is the SSD company another vendor told me about that is going around claiming it has a Teradata relationship.</em></p>
<p style="margin-bottom: 0in;">Teradata SSD highlights include:</p>
<ul>
<li>I/O speeds on “random medium 	blocks” are 520 megabytes/second, vs. 15 MB/second on their 	fastest disks. And that&#8217;s limited by SAS 1.0, load-balanced across 	two devices, not the hardware itself. (2 x 300+ MB/sec turns out to 	be 520 MB/sec in this case.) No wonder Carson is excited about SAS 	2.0.</li>
<li>Teradata is using SAS interfaces 	for its SSDs, and believes that&#8217;s unusual, in that other companies 	are using SATA or Fibre Channel.</li>
<li>Never having had a part fail, 	Teradata has no real basis to make MTTF (Mean Time To Failure) 	estimates for its SSDs.</li>
<li>Teradata&#8217;s SSD appliance design 	includes no array controllers. The biggest reason is that right now 	array controllers can&#8217;t keep up with the SSDs&#8217; speed.</li>
<li>In its SSD appliance, Teradata has 	abandoned RAID, doing mirroring instead via a DBMS feature called 	Fallback that&#8217;s been around since Teradata&#8217;s earliest days. 	(However, <a href="../2008/09/28/oracle-database-machine-performance-and-compression/">unlike 	Oracle in Exadata</a>, Teradata continues to use RAID for disks.)</li>
<li>Useful life for Teradata&#8217;s SSDs is 	estimated at 5-7 years.</li>
<li>Teradata&#8217;s SSDs are SLC 	(Single-Level Cell), as opposed to MLC (Multi-Level Cell).</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/25/teradata-hardware-strategy-and-tactics/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>This week at the Teradata Partners user conference</title>
		<link>http://www.dbms2.com/2009/10/19/teradata-partners-2009/</link>
		<comments>http://www.dbms2.com/2009/10/19/teradata-partners-2009/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 13:07:31 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data integration and middleware]]></category>
		<category><![CDATA[Data types]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[GIS and geospatial]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1150</guid>
		<description><![CDATA[Teradata tells me that its press embargoes are ending at 9:00 this morning. Here are some highlights of what&#8217;s going on, although names, dates, and details will have to await conversations and press releases this week.

Teradata is productizing 	“private cloud,” under names including “Teradata 	Enterprise Analytics Cloud,” “Teradata Agile Analytics Cloud,” 	and “Teradata Elastic Mart [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Teradata tells me that its press embargoes are ending at 9:00 this morning. Here are some highlights of what&#8217;s going on, although names, dates, and details will have to await conversations and press releases this week.</p>
<ul>
<li><strong>Teradata is productizing 	“private cloud,”</strong> under names including “Teradata 	Enterprise Analytics Cloud,” “Teradata Agile Analytics Cloud,” 	and “Teradata Elastic Mart Builder.” I.e., Teradata hopes to 	leapfrog Greenplum in its “<a href="../2009/06/08/the-future-of-data-marts/">Enterprise 	Data Cloud</a>” strategy. This is only fair, in that Greenplum 	lifted the idea from Teradata and eBay in the first place. It also 	provides major support for what I think is an extremely sensible 	trend. Give or take issues of who announces and ships what a couple 	months before or after a competitor, my early thinking is that the 	main differences between Greenplum and Teradata in this regard will 	be:
<ul>
<li>Virtual as opposed to just 	physical data marts, based on robust workload management software. 	(Advantage: Teradata)</li>
<li>Pricing, deployment options. 	(Advantage: Greenplum)</li>
<li>Features that don&#8217;t directly 	relate to enterprise/private cloud. (Advantage: Either, often 	Teradata.)</li>
</ul>
</li>
<li><strong>Teradata is generally 	strengthening its data movement technology</strong>, e.g. for making 	various appliances work in sync. I&#8217;m not too clear yet on the 	details of that. I think this is what Teradata&#8217;s phrase “ecosystem 	management” refers to.</li>
<li><strong>Teradata is (pre-)announcing – 	at least as a statement of direction &#8212; an appliance based on 	solid-state drives (SSDs). </strong>I&#8217;ve thought for a while that 	Teradata was a leader in thinking through <a href="../2008/10/23/teradata-solid-state-drives-ssd/">the 	issues around solid-state memory in data warehousing</a>, so it 	makes sense that they&#8217;re among the leaders in actually coming to 	market as well. I plan to say more after meeting with, e.g., Carson 	Schmidt.</li>
<li><strong>Teradata has achieved a 300%ish 	speed-up in geospatial processing</strong>. I gather this is largely a 	byproduct of the parallel analytics work Teradata did around 	strengthening its SAS integration. However, there don&#8217;t seem to be a 	lot of Teradata geospatial users yet.</li>
<li><span>Teradata 	Express, </span><strong>Teradata&#8217;s free Windows-based crippleware, is being 	ported to Amazon EC2 and VMware</strong> as well. Presumably to avoid 	cannibalizing Teradata product sales, there are quite a few 	limitations on Teradata Express, including system capacity, database 	size, and “no production use.”</li>
<li><strong>Teradata continues to extend 	its optimizations 	to handle queries issued by business intelligence tools. </strong><span>Previously, the focus of what 	Teradata discussed in this regard was <a href="../2009/08/02/teradata-13-focuses-on-advanced-analytic-performance/">query 	rewrite</a>. But soon automatic recommendation and creation of 	Aggregate Join Indexes – i.e.., materialized views – will be 	included as well.</span></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/19/teradata-partners-2009/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
