<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS2 -- DataBase Management System Services &#187; Business intelligence</title>
	<atom:link href="http://www.dbms2.com/category/analytics-technologies/business-intelligence/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Fri, 19 Mar 2010 15:49:58 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Data exploration vs. data visualization</title>
		<link>http://www.dbms2.com/2010/03/01/data-exploration-visualization/</link>
		<comments>http://www.dbms2.com/2010/03/01/data-exploration-visualization/#comments</comments>
		<pubDate>Mon, 01 Mar 2010 09:29:47 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1666</guid>
		<description><![CDATA[I&#8217;ve tended to conflate data exploration and data visualization, and I&#8217;m far from alone in doing so. But a recent Economist article is a useful reminder that they aren&#8217;t exactly the same thing.
The article makes the same conflation, but while reading it I noticed something interesting. The concrete examples cited are of clever consultants who [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve tended to conflate <a href="http://www.dbms2.com/2010/01/31/trends-database-aanalytic-technology/" >data exploration and data visualization</a>, and I&#8217;m far from alone in doing so. But a recent <a href="http://www.economist.com/specialreports/displaystory.cfm?story_id=15557455" onclick="javascript:pageTracker._trackPageview('/www.economist.com');"><em>Economist</em></a> article is a useful reminder that they aren&#8217;t exactly the same thing.<span id="more-1666"></span></p>
<p>The article makes the same conflation, but while reading it I noticed something interesting. The concrete examples cited are of clever consultants who crafted innovative data visualizations on the fly, to make conclusions patently apparent to even mathematically-challenged decision-makers. That kind of thing is important, and has been going on <a href="http://tokyohanna.blogspot.com/2009/12/nightingale-x-healthcare-x-visualizing.html" onclick="javascript:pageTracker._trackPageview('/tokyohanna.blogspot.com');">for over 140 years</a>.*</p>
<p><em>*Yes, I&#8217;m trotting out the Florence Nightingale example again. I continue to be in awe of her.</em></p>
<p>What worries me is the article&#8217;s suggestion that <strong>the best data visualizations are done by visualization experts, as ways of making information apparent to other people.</strong> For as long as data visualization relies on hotshot visual-design experts doing one-off projects, its impact on enterprises overall will remain extremely limited. In other words, <strong>to the extent it is incorrect to conflate data visualization and data exploration, data visualization will remain a fringe technology</strong>.</p>
<p>To be fair, a primary decision support/business intelligence usage cycle has always been &#8212; where by &#8220;always&#8221; I mean &#8220;for at least the past 35+ years&#8221; &#8211;</p>
<ul>
<li><strong>Data exploration</strong>. Power user uses technology to find something interesting.</li>
<li><strong>&#8220;Look what I found!&#8221; </strong>Power user then shows a report, chart, or other summary/representation to colleagues.</li>
</ul>
<p>So to the extent modern interactive data exploration/visualization technology fits that paradigm, great. But to the extent that visualization experts are somehow integral to the technology&#8217;s use, it will remain stuck on the analytic fringe.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/01/data-exploration-visualization/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Intelligent Enterprise’s Editors’/Editor’s Choice list for 2010</title>
		<link>http://www.dbms2.com/2010/02/11/intelligent-enterprise-editors-choice-201/</link>
		<comments>http://www.dbms2.com/2010/02/11/intelligent-enterprise-editors-choice-201/#comments</comments>
		<pubDate>Thu, 11 Feb 2010 23:13:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Ingres]]></category>
		<category><![CDATA[Intersystems and Cache']]></category>
		<category><![CDATA[Jaspersoft]]></category>
		<category><![CDATA[Kalido]]></category>
		<category><![CDATA[Mark Logic]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Pentaho]]></category>
		<category><![CDATA[QlikTech and QlikView]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Tableau Software]]></category>
		<category><![CDATA[Talend]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1578</guid>
		<description><![CDATA[As he has before, Intelligent Enterprise Editor Doug Henschen

Personally selected annual lists of 12 &#8220;Most influential&#8221; companies and 36 &#8220;Companies to watch&#8221; in analytics- and database-related sectors.
Made it clear that these are his personal selections.
Nonetheless has called it an Editors&#8217; Choice list, rather than Editor&#8217;s Choice.  

(Actually, he&#8217;s really called it an &#8220;award.&#8221;)
People advising [...]]]></description>
			<content:encoded><![CDATA[<p>As he has <a href="http://www.dbms2.com/2009/01/12/intelligent-enterprises-editorseditors-choice-list/" >before</a>, <em>Intelligent Enterprise</em> Editor Doug Henschen</p>
<ul>
<li>Personally selected <a href="http://intelligent-enterprise.informationweek.com/showArticle.jhtml;jsessionid=IANLOXCT2244BQE1GHPCKH4ATMY32JVN?articleID=222900034&amp;pgno=1" onclick="javascript:pageTracker._trackPageview('/intelligent-enterprise.informationweek.com');">annual lists</a> of 12 &#8220;Most influential&#8221; companies and 36 &#8220;Companies to watch&#8221; in analytics- and database-related sectors.</li>
<li>Made it clear that these are his personal selections.</li>
<li>Nonetheless has called it an Editors&#8217; Choice list, rather than Editor&#8217;s Choice. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
</ul>
<p>(Actually, he&#8217;s really called it an &#8220;award.&#8221;)</p>
<p><span id="more-1578"></span>People advising Doug &#8212; who come to think of it actually are Contributing Editors to <em>Intelligent Enterprise</em> or something like that &#8212; included Cindi Howson, Seth Grimes, three others, and me.</p>
<p>And if past is prologue, I will now get a flood of PR emails calling my attention to this award that I already have both participated in and blogged about. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>As usual, the sense:nonsense ratio on these lists was pleasingly high. Analytic DBMS vendors cited included IBM, Microsoft, Netezza, Oracle, Sybase, and Teradata in the &#8220;Most influential&#8221; group, with Aster, Greenplum, HP, Infobright, and Vertica among the &#8220;To watch&#8221; crowd. It&#8217;s tough to argue with those selections, whose most questionable element is probably the not-ridiculous supposition that HP could do something interesting over the coming year. Cloudera and Intersystems also made the list, deservedly.</p>
<p>All three of QlikTech, Tableau, and TIBCO made the list, which is appropriate given the potential for and interest in interactive data exploration technology.  The BI majors, independent or otherwise, were all on as well. In text mining, Doug included Attensity and Clarabridge, which I think is exactly right. (Plus OpenCalais.)  Upon reflection, I probably should have nominated Mark Logic, even though most of its business is non-enterprise; but hey, nobody&#8217;s perfect, and the same goes for lists. Open source was well represented, with Apache, Actuate, Jaspersoft, Eclipse, Infobright, Nuxeo and R all being cited (but not Ingres or Pentaho). Kalido made the list, with my endorsement, their silly I-CASE like marketing messaging notwithstanding.</p>
<p>Speaking of imperfections &#8212; there only are a few category names, and so category assignments can be pretty bizarre. (In an ideal world, middleware wouldn&#8217;t be included under &#8220;enterprise applications&#8221;.) Greenplum hasn&#8217;t really &#8220;extended&#8221; its DBMS with a &#8220;cloud&#8221; option. As much as I&#8217;d like Netezza to be more influential than SAP, that&#8217;s probably not the best way to rank them. And there are a number of &#8220;This company is on a roll!&#8221; kinds of comments that I wouldn&#8217;t necessarily endorse.</p>
<p>But those are all nitpicks. On the whole, it&#8217;s another nice job.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/11/intelligent-enterprise-editors-choice-201/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Open issues in database and analytic technology</title>
		<link>http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/</link>
		<comments>http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/#comments</comments>
		<pubDate>Mon, 01 Feb 2010 22:04:31 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1507</guid>
		<description><![CDATA[The last part of my New England Database Summit talk was on open issues in database and analytic technology. This was closely intertwined with the previous section, and also relied on a lot that I&#8217;ve posted here. So I&#8217;ll just put up a few notes on that part, with lots of linkage to prior discussion [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">The last part of my <a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/" >New England Database Summit</a> talk was on open issues in database and analytic technology. This was closely intertwined with the <a href="http://www.dbms2.com/2010/01/31/trends-database-aanalytic-technology/" >previous section</a>, and also relied on a lot that I&#8217;ve posted here. So I&#8217;ll just put up a few notes on that part, with lots of linkage to prior discussion of the same points.<span id="more-1507"></span></p>
<p><!-- 		@page { margin: 0.79in } 		P { margin-bottom: 0.08in } --></p>
<ul>
<li>The most important issue in 	database and analytic technology, in my opinion, isn&#8217;t technological 	at all – rather, it&#8217;s the legal and political steps needed to <a href="http://www.dbms2.com/2010/01/31/data-based-snooping-threat-libert/" > preserve liberty</a> in the face of advancing, intrusive 	technology.</li>
<li>Another important issue for 	society – and this one does involve a lot of technology – is 	scientific number crunching. In particular, <a href="http://www.dbms2.com/2009/10/03/issues-in-scientific-data-management/" >database technology for 	scientific computing</a> needs to be developed much further. I&#8217;ll have 	more to say on all this soon.</li>
<li>More generally, technology needs 	to keep advancing for parallel analytics. Fortunately, it is. Watch 	this space over the next few weeks.</li>
<li>Oracle has said, in effect, that <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" > its most important technological challenge of the decade</a> is getting 	<a href="http://www.dbms2.com/2010/01/31/flash-pcmsolid-state-memory-disk/" >solid-state memory</a> right. I agree.</li>
<li>Data volumes will keep going up, 	up, up. Technology needs to keep evolving accordingly. Much of what 	I write is on that subject.</li>
<li>Data needs to be processed and analyzed at <a href="http://www.dbms2.com/2009/09/10/analytic-speed-latency/" >very 	different latencies</a>. And there&#8217;s much further to go in integrating 	disparate latencies.</li>
<li>Analytic database management in 	the cloud hasn&#8217;t been solved yet, especially for Big Data. Among the 	reasons are the difficulty of moving data into the cloud (unless it 	originated there), the slowness of moving it from node to node in 	shared-nothing architectures (which reduces the elasticity benefit), 	and above all the long and unpredictable latencies of interprocessor 	communication while queries are running (a key subject of discussion 	at the <a href="http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/" >Boston Big Data Summit</a>).</li>
<li>Better business intelligence user 	interfaces are increasingly available. I&#8217;m thinking particularly of 	approaches with buzzwords like <a href="http://www.dbms2.com/2008/08/04/qliktech-qlikview-update/" >visualization/interactive exploration</a> or <a href="http://www.texttechnologies.com/2007/08/03/the-case-for-inxight-awareness-server/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">faceted</a>. But they aren&#8217;t well-integrated into the overall 	analytic stack, as big BI vendors are trailing the smaller ones in 	this regards. (Part of the problem relates to my previous point.)</li>
<li>Application development over text 	search isn&#8217;t in the same league as application development over 	relational DBMS. The choices are mainly XML (e.g., <a href="http://www.texttechnologies.com/2008/04/29/mark-logic-viewed-as-a-different-kind-of-text-search-technology-vendor/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">MarkLogic</a>), SQL 	for text integrated into RDBMS (limited by the weakness of those 	integrations), and something like <a href="http://www.texttechnologies.com/2008/09/20/attivio-update/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">Attivio&#8217;s Java SDK</a>. There&#8217;s a 	major conceptual barrier in building those apps, namely the 	unpredictability of query results. Still, it should be possible to 	do better.</li>
<li>Similarly, text analytics and 	conventional analytics exist well side by side. They can even be in 	the same database and/or dashboard, although in practice that is 	limited by the strong <a href="http://www.texttechnologies.com/2008/10/24/attensity-update-2/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">SaaS focus of text mining vendors and users</a>. But analytic 	integration of them is really hard. Linguistic imprecision is, in my 	opinion, only the #2 reason for this difficulty. The #1 reason is 	that trends detected by text analytics are much less precise than 	trends on tabular data – e.g., a 50% increase in a certain kind of 	complaint may be no more significant than a 5% change in a revenue 	variable.</li>
<li>I&#8217;m increasingly persuaded that <a href="http://www.dbms2.com/2009/08/21/social-network-analysis-aka-relationship-analytics/" > graph analytics</a> can be handled without a graph-centric data model. 	But right now, it isn&#8217;t being handled well at all. Lots more needs 	to be done – although when it is, it will just exacerbate the 	privacy/liberty dangers that so concern me.</li>
</ul>
<p><em><strong>Other posts based on my January, 2010 New England Database Summit keynote address</strong></em></p>
<ul>
<li><a title="Data-based snooping — a huge threat to liberty that we’re all helping make worse" href="../2010/01/31/data-based-snooping-threat-libert/">Data-based snooping — a huge threat to liberty that we’re all helping make worse</a></li>
<li><a title="Flash, other solid-state memory, and disk" href="../2010/01/31/flash-pcmsolid-state-memory-disk/">Flash, other solid-state memory, and disk</a></li>
<li><a title="Interesting trends in database and analytic technology" href="../2010/01/31/trends-database-aanalytic-technology/">Interesting trends in database and analytic technology</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Interesting trends in database and analytic technology</title>
		<link>http://www.dbms2.com/2010/01/31/trends-database-aanalytic-technology/</link>
		<comments>http://www.dbms2.com/2010/01/31/trends-database-aanalytic-technology/#comments</comments>
		<pubDate>Mon, 01 Feb 2010 02:11:17 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1492</guid>
		<description><![CDATA[My project for the day is blogging based on my “Database and analytic technology: State of the union” talk of a few days ago. (I called it that because of when it was given, because it mixed prescriptive and descriptive elements, and because I wanted to call attention to the fact that I cover the [...]]]></description>
			<content:encoded><![CDATA[<p>My project for the day is blogging based on my “<a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/" >Database and analytic technology: </a><a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/" >State of the union</a>” talk of a few days ago. (I called it that because of when it was given, because it mixed prescriptive and descriptive elements, and because I wanted to call attention to the fact that I cover the <em>union</em> of database and analytic technologies – the <em>intersection</em> of those two sectors is an area of particular focus, but is far from the whole of my coverage.)</p>
<p>One section covered recent/ongoing/near-future trends that I thought were particularly interesting, including:<span id="more-1492"></span></p>
<p><strong>Simpler database technology,</strong> by which I mean DBMS that are:</p>
<ul>
<li>Easier 	to administer than market-leading systems &#8230;</li>
<li>… even if at the cost of being special-purpose</li>
<li>E.g.,
<ul>
<li>MySQL and older mid-tier RDBMS such as Progress</li>
<li>Many analytic DBMS and appliances, most notably Netezza&#8217;s</li>
</ul>
</li>
</ul>
<p>For general purpose or OLTP uses, I&#8217;m not a big fan of MySQL (not enough progress in making it industrial-strength), PostgreSQL (no good company behind it – I&#8217;m a non-fan of EnterpriseDB), or Ingres (open source or not, it&#8217;s an antiquated system that hasn&#8217;t been invested in as much as Oracle, DB2 or SQL Server).</p>
<p>But I get the impression there are a lot of contenders among small startups, featuring very new architectures for OLTP or general-purpose database management. VoltDB comes to mind. NimbusDB is finally within range of getting funded. Dan Weinreb told me Friday he knows of a bunch of others as well. And that&#8217;s all before we even get into the <a href="http://www.dbms2.com/2009/12/12/legit-nosql-key-value-store/" >NoSQL</a> kind of alternative.</p>
<p><strong>Flexible storage architectures.</strong> That&#8217;s starting out with an emphasis on hybrid columnar, as in the examples of <a href="http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/" >Vertica</a> and <a href="http://www.dbms2.com/2009/10/14/greenplum-hybrid-columnar/" >Greenplum</a>. Oracle (to whom I&#8217;m under no NDA obligation) and other vendors (to whom I am) are going that way as well.</p>
<p><strong>Multi-tier database architectures,</strong> by which I mean at least two things:</p>
<ul>
<li>The database tier/server tier split of Exadata</li>
<li>Hybrid RAM/disk architectures, examples of which include
<ul>
<li>Vertica&#8217;s RAM-based write-optimized store</li>
<li><a href="http://www.dbms2.com/2009/10/18/introduction-to-sensage/" >Sensage&#8217;s CEP-in-the-DBMS</a></li>
<li>This in-memory analytics stuff we keep hearing about from the BI vendors</li>
<li>Any true in-memory/disk hybrid, such as the regrettably sidelined <a href="http://www.dbms2.com/2007/12/21/ibm-acquires-soliddb/" >solidDB</a></li>
<li>Smart thinking by numerous DBMS vendors about optimizing the use of RAM and/or Level 2 cache</li>
</ul>
</li>
</ul>
<p>Netezza is particularly interesting to watch in this regard because it:</p>
<ul>
<li>Had a pretty strict storage/other processing split in prior product generations and &#8230;</li>
<li>… <a href="http://www.dbms2.com/2009/07/30/netezza-new-product-family/" >ditched that in its latest generation</a> …</li>
<li>… which however is focused on optimizing the use of RAM cache</li>
</ul>
<p>Also noteworthy is Petascan, the stealth-mode –and therefore harder to watch right now <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  – company I keep teasing about, which makes a strong case for carrying the database/storage tier split into the flash/solid-state memory technology generation. <a href="../2009/04/20/calpont-update-you-read-it-here-first/">Calpont</a> also has a server/storage tier split, but that&#8217;s of mainly theoretical interest unless and until Calpont actually ships an MPP version of <a href="../2009/11/07/calponts-infinidb/">InfiniDB</a>.</p>
<p><strong>Cheaper parts,</strong> which have of course been a huge trend for decades.<a href="../2010/01/31/flash-pcmsolid-state-memory-disk/"> Solid-state memory</a> will soon conquer the world. Meanwhile, cheaper sensors drive that <a href="../2010/01/17/three-broad-categories-of-data/">machine-generated data</a> I keep talking about.</p>
<p>An ever-better understanding of <strong>scale-out technology,</strong> in several respects, including:</p>
<ul>
<li>Query, notably data movement for MPP DBMS</li>
<li>Update, especially minimalistic DBMS approaches, be they sharded MySQL or more NoSQLish</li>
<li>Number-crunching, especially via MapReduce and/or parallel analytic libraries integrated into DBMS</li>
</ul>
<p>Cool trends I touched on more briefly include:</p>
<ul>
<li>More data being available for analysis. This was a core theme of my <a href="http://www.dbms2.com/2009/07/30/netezza-enzee-universe/" >Enzee Universe keynote speeches</a>; there are also some notes on it in my 	post based on my <a href="http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/" >Boston Big Data Summit</a> talk.</li>
<li>More users being served by analytics. Ditto.</li>
<li>Data exploration/visualization, ala QlikView, Spotfire, or Tableau, and also the faceted stuff.</li>
<li>The democratization of data mining. But I&#8217;m not as sure of that one as of the others&#8230;</li>
</ul>
<p>One area I flat-out forgot to mention is <a href="http://www.dbms2.com/2009/06/08/the-future-of-data-marts/" >easy data mart spin-out</a>.</p>
<p><em><strong>Other posts based on my January, 2010 New England Database Summit keynote address</strong></em></p>
<ul>
<li><a title="Data-based snooping — a huge threat to liberty that we’re all helping make worse" href="../2010/01/31/data-based-snooping-threat-libert/">Data-based snooping — a huge threat to liberty that we’re all helping make worse</a></li>
<li><a title="Flash, other solid-state memory, and disk" href="../2010/01/31/flash-pcmsolid-state-memory-disk/">Flash, other solid-state memory, and disk</a></li>
<li><a title="Open issues in database and analytic technology" href="../2010/02/01/open-issues-in-database-and-analytic-technology/">Open issues in database and analytic technology</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/01/31/trends-database-aanalytic-technology/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Research agenda for 2010</title>
		<link>http://www.dbms2.com/2009/12/31/research-agenda-for-2010/</link>
		<comments>http://www.dbms2.com/2009/12/31/research-agenda-for-2010/#comments</comments>
		<pubDate>Thu, 31 Dec 2009 22:02:11 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[About this blog]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Jaspersoft]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[QlikTech and QlikView]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Tableau Software]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1384</guid>
		<description><![CDATA[As you may have noticed, I&#8217;ve been posting less research/analysis in November and December than during some other periods. In no particular order, reasons have included:

Over a 20 week period, I had travel in 13 of them.
3 of those were vacation in November.
As travel finally wound down:

It was time to focus a bit on my [...]]]></description>
			<content:encoded><![CDATA[<p>As you may have noticed, I&#8217;ve been posting less research/analysis in November and December than during some other periods. In no particular order, reasons have included:<span id="more-1384"></span></p>
<ul>
<li>Over a 20 week period, I had travel in 13 of them.</li>
<li>3 of those were vacation in November.</li>
<li>As travel finally wound down:
<ul>
<li>It was time to focus a bit on <a href="http://www.monashreport.com/2009/12/14/our-services-for-technology-vendors/" onclick="javascript:pageTracker._trackPageview('/www.monashreport.com');">my own business</a></li>
<li>Elder care got serious; e.g., my parents went to the hospital on consecutive days, Christmas week, the first one on their 52nd wedding anniversary</li>
<li>Linda and I both got really nasty colds</li>
<li>The holidays were happening</li>
<li>I started helping out a really cool startup company (first time I&#8217;ve taken stock in a private company in years; more on that soon)</li>
<li>There was less industry news going on anyway than in some other recent months</li>
</ul>
</li>
</ul>
<p>But of course I plan to speed up the research/analysis/writing soon. Here, FYI, are a few things I have on my plate.</p>
<p>For a couple of years now, the center of what I&#8217;ve written about has been <strong>high-performance analytic data processing. </strong>You can expect me to keep pursuing that in all its aspects. But there are two specific areas I&#8217;ve identified in which I want to redouble my efforts.</p>
<p>First, almost every BI vendor has an effort in<strong> &#8220;in-memory analytics&#8221;</strong> and/or <strong>&#8220;interactive data exploration.&#8221;</strong> I suspect there&#8217;s a lot of difference in underlying technologies, but I&#8217;m having trouble getting details. QlikTech (the worst foot-dragger of the three), Microstrategy, and Jaspersoft all owe me follow-up conversations with the people who know what&#8217;s going on well enough to explain it. Tableau keeps promising me a briefing and then not delivering. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  And I&#8217;m even further behind with the behemoth companies &#8212; Oracle, Microsoft, IBM/Cognos (arguably) et al.</p>
<p>Second, <strong>solid-state memory</strong> is coming to data warehousing. The obvious reasons are that it&#8217;s obviously close, and Moore&#8217;s Law still applies to bring it closer. More specific reasons for believing in solid-state include:</p>
<ul>
<li><a href="http://www.dbms2.com/2009/10/25/teradata-hardware-strategy-and-tactics/" >Teradata</a> has made large strides in making solid-state memory useful.</li>
<li>The stealth start-up I mentioned above is poised to make further strides.</li>
<li>(I&#8217;m not totally sure yet about this part) The in-memory analytics mentioned above might wind up working better in solid-state memory than in DRAM.</li>
</ul>
<p>I&#8217;m spending quite a few cycles thinking about this area.</p>
<p>I&#8217;d also like to look further at <strong>analytic applications </strong>and<strong> advanced analytic functionality.</strong> I foreshadowed some of that in my <a href="http://www.dbms2.com/2009/12/02/mapreduce-for-complex-analytics-webina/" >Aster webinars</a>. There&#8217;s some good stuff to talk about at Teradata I should try to write up soon. I need to have a follow-up conversation with fascinating anti-fraud guy I met at Netezza&#8217;s London event. But that&#8217;s all just scratching the surface.</p>
<p>Both the MySQL and PostgreSQL communities are in some disarray. Other non-behemoth <strong>OLTP/general-purpose DBMS </strong> seem to be, at best, thriving niche products. (I see little in the way of innovative new use for, say, Progress, Cache&#8217;, Ingres, or anything multivalue.) But it feels as if there&#8217;s more opportunity out there than is being met. And at a minimum, I&#8217;d like to learn more than the almost nothing I know about <strong>OLTP <a href="http://www.dbms2.com/2009/12/12/legit-nosql-key-value-store/" >NoSQL</a> alternatives.</strong></p>
<p>I&#8217;ve already said that I expect to give an <a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/" >industry-overview talk</a> at MIT on January 28. I also have an overviewy press article and overviewy white paper under discussion. If those come to fruition, I&#8217;ll of course let you know. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Besides the above, I of course have a number of specific posts that I need to get around to researching and writing at some point, often on topics I&#8217;ve already written about before.  Three subjects fairly high on the priority list are scientific data management, machine-generated data, and Oracle Exadata.</p>
<p>And finally, I have some subjects queued up for a couple of my other blogs as well. If you don&#8217;t already take our <a href="http://www.monash.com/blogs.html" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">multi-blog integrated feed</a>, this might be a good time to switch over.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/12/31/research-agenda-for-2010/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Introduction to Gooddata</title>
		<link>http://www.dbms2.com/2009/12/27/introduction-to-gooddata/</link>
		<comments>http://www.dbms2.com/2009/12/27/introduction-to-gooddata/#comments</comments>
		<pubDate>Mon, 28 Dec 2009 03:16:30 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Amazon and its cloud]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Gooddata]]></category>
		<category><![CDATA[Jaspersoft]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1341</guid>
		<description><![CDATA[Around the end of the Cold War, Esther Dyson took it upon herself to go repeatedly to Eastern Europe and do a lot of rah-rah and catalysis, hoping to spark software and other computer entrepreneurs. I don&#8217;t know how many people&#8217;s lives she significantly affected – I&#8217;d guess it&#8217;s actually quite a few – but [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Around the end of the Cold War, Esther Dyson took it upon herself to go repeatedly to Eastern Europe and do a lot of rah-rah and catalysis, hoping to spark software and other computer entrepreneurs. I don&#8217;t know how many people&#8217;s lives she significantly affected – I&#8217;d guess it&#8217;s actually quite a few – but in any case the number is not zero. Roman Stanek, who has built and sold a couple of software business, cites her as a key influence setting him on his path.</p>
<p style="margin-bottom: 0in;">Roman&#8217;s latest venture is business intelligence firm Gooddata. Gooddata was founded in 2007 and has been soliciting and getting attention for a while, so I was surprised to learn that Gooddata officially launched just a few weeks ago. Anyhow, some less technical highlights of the Gooddata story include:<span id="more-1341"></span></p>
<ul>
<li>Gooddata believes it makes BI easy 	to adopt, unlike every other BI vendor on the planet &#8212; not 	excluding the many other BI vendors who say the same thing about 	themselves. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
<li>Gooddata is entirely cloud-based, 	specifically in the Amazon cloud.  I.e., Gooddata is selling 	SaaS-based BI.</li>
<li>Gooddata wants to sell to 	enterprises that are large enough to have more than a couple of BI 	users, and small enough not to be well served by the BI market 	leaders.
<ul>
<li>In revenue terms, this is the ever-popular $100 million &#8211; 	$1 billion market.</li>
<li>Specifically, Gooddata believes 	that those enterprises may have decent “back office” BI, but 	don&#8217;t have much in the front office. Gooddata wants to provide them 	with front office BI, which seems to basically mean CRM analytics. 	Gooddata sees this as a market in which QlikTech is the major 	player.  Generally, Gooddata wants to emulate and go after QlikTech.</li>
<li>Even more specifically, Gooddata 	wants to sell to Salesforce.com customers, who it believes are not 	well-served by what passes for built-in analytics at Salesforce. 	Partnering with NetSuite didn&#8217;t work as well, since NetSuite&#8217;s 	customers turn out to be smaller firms than are in Gooddata&#8217;s target 	market.</li>
</ul>
</li>
<li>Something I heard from both 	Jaspersoft and Gooddata is that there&#8217;s a hot market in providing 	cloud-based BI to online gaming companies. I gather these are mainly 	games running on mass communication platforms such as Facebook or 	the iPhone. Surely not coincidentally, it seems likely that:
<ul>
<li>These are small companies whose 	success – and hence data intake – can suddenly explode.</li>
<li>The data originates in cyberspace, 	with no particular need ever to come to the game companies&#8217; own 	premises.</li>
</ul>
</li>
<li>Gooddata has 50 production 	customers.</li>
<li>Gooddata had 2500 “projects” 	at the end of beta in June, and is adding 100 more per month. (Those 	numbers look weird together.) A “project” is a lot like a 	database, with associated reports, security privileges, etc.</li>
<li>Gooddata has close to 40 people, 	mainly in development.</li>
<li>I didn&#8217;t detect much of a sales 	strategy, nor much of a marketing strategy beyond the impressive 	early buzz generation. Perhaps that&#8217;s a partial explanation as to 	why the rate of Gooddata adoption fell even before the company 	officially launched.</li>
<li>I forgot to ask what those 50 	customers were actually paying, but considering Gooddata&#8217;s price 	list, it appears a typical price range for Gooddata&#8217;s stuff would be 	$500-$2,000/month.</li>
</ul>
<p style="margin-bottom: 0in;">Gooddata technical highlights include:</p>
<ul>
<li>Gooddata is building an 	entire BI stack – reporting, dashboards, ETL, in-memory database 	management, everything. I doubt Gooddata would claim that the pieces 	are best-of-breed in many ways other than BI ease of adoption and 	use.</li>
<li>So far I&#8217;ve seen three Gooddata 	ease-of-use features or feature groups that strike me as 	differentiated – <strong>reusability</strong> (of metrics and/or reports), 	<strong>collaboration,</strong> and <strong>tag clouds.</strong> More on those below. 	Gooddata is also building toward an <strong>agility</strong> pitch, but those 	features aren&#8217;t all baked yet.</li>
<li>Gooddata is MySQL-based today, but 	plans to move to a memory-centric compressed column store in 2010. 	Roman doesn&#8217;t reject analogies to SAP&#8217;s <em>BI/BW/whatever 	Accelerator. </em><span style="font-style: normal;">Yes, folks – 	Gooddata is yet another BI vendor doing some form of memory-centric 	OLAP. That&#8217;s a big trend.</span></li>
<li>I&#8217;m guessing 	that a big reason Gooddata is reinventing so many technical wheels 	is to ensure that the Gooddata stack is seamlessly multi-tenant from 	top to bottom. (Hasso Plattner of SAP&#8217;s <a href="../2009/07/07/hasso-plattner-calls-for-in-memory-oltp-column-stores/">comments 	on a similar idea</a> suggest a similar emphasis.)</li>
<li>Gooddata has 	its own multidimensional query language called MAQL (the A doesn&#8217;t 	seem to stand for anything). Today MAQL generates SQL for MySQL. The 	future columnar memory-centric data store will &#8212; I think – 	understand MAQL natively.</li>
</ul>
<p style="margin-bottom: 0in;">Now we get to the good stuff. When I wrote about <a href="../2009/05/30/reinventing-business-intelligence/">reinventing business intelligence</a> back in May, I focused on some interesting developments I see as actually underway &#8212; at least on an experimental basis and/or from small vendors – namely:</p>
<ul>
<li><strong>Text-search interfaces. </strong>Well, 	while I didn&#8217;t see true text search in the Gooddata demo, I did see 	tag clouds, which have some of the same benefits.</li>
<li><strong>Collaboration tools.</strong> Well, 	Gooddata has a nice-looking approach to BI collaboration, heavily 	reflected in its UI metaphors. (That said, I haven&#8217;t really compared 	Gooddata to Microsoft SharePoint or SAP&#8217;s Portal/Rooms/whatever.)</li>
<li><strong>Memory-centric analytics</strong> (for speed of exploration). As noted above, Gooddata has that coming 	soon.</li>
<li><strong>Data exploration that tries to 	ignore fixed relational schemas,</strong> ala Attivio or Splunk.  Roman 	says Gooddata is interested in or working on that, but offers no 	timetable.</li>
</ul>
<p style="margin-bottom: 0in;">Meanwhile, something I&#8217;ve been seeking for years, but haven&#8217;t seen much progress on since enhancement stopped on Cognos Metrics Manager, is more <a href="../2007/11/13/the-key-problem-with-dashboard-functionality/">user-friendly metrics management</a>.  Well, it doesn&#8217;t have a lot of bells and whistles, but at least Gooddata has the basics – a list of already-defined metrics, and a reasonable way of compounding them into other metrics. I think that kind of thing will be a major BI feature going forward, to the point that a few years from now we&#8217;ll be worrying about how to port them from one BI vendor&#8217;s tool from another.</p>
<p style="margin-bottom: 0in;"><strong>Bottom line: If you&#8217;re interested in BI, you should look at a Gooddata demo.</strong></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/12/27/introduction-to-gooddata/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Ray Wang on SAP</title>
		<link>http://www.dbms2.com/2009/12/11/ray-wang-on-sap/</link>
		<comments>http://www.dbms2.com/2009/12/11/ray-wang-on-sap/#comments</comments>
		<pubDate>Fri, 11 Dec 2009 23:16:54 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[MOLAP]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Solid-state memory]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1286</guid>
		<description><![CDATA[Ray Wang made a terrific post based on SAP&#8217;s annual influencer love-in, an event which I no longer attend. Ray believes SAP has been in a &#8220;crisis&#8221;, and sums up his views as
The Bottom Line  &#8211; SAP’s Turning The Corner

Credit must be given to SAP for charting a new course.  A shift in the management [...]]]></description>
			<content:encoded><![CDATA[<p>Ray Wang made <a href="http://blog.softwareinsider.org/2009/12/11/event-report-2009-sap-influencer-summit-sap-must-put-strategy-to-execution-in-order-to-prove-clarity-of-vision/" onclick="javascript:pageTracker._trackPageview('/blog.softwareinsider.org');">a terrific post based on SAP&#8217;s annual influencer love-in</a>, an event which <a href="http://www.monashreport.com/2007/01/03/sap-nonsense-ethics/" onclick="javascript:pageTracker._trackPageview('/www.monashreport.com');">I no longer attend</a>. Ray believes SAP has been in a &#8220;crisis&#8221;, and sums up his views as</p>
<blockquote><p><strong>The Bottom Line  &#8211; SAP’s Turning The Corner<br />
</strong></p>
<p>Credit must be given to SAP for charting a new course.  A shift in the management philosophy and product direction will take years to realize, however, its not too late for change.  SAP must remember its roots and become more German and less American.  The renewed focus must put customer requests and priorities ahead of SAP’s bureaucracy.  The emphasis must focus on the <a href="http://blog.softwareinsider.org/2009/03/16/mondays-musings-its-the-relationship-stupid-part-1-commoditizing-the-workforce/" onclick="javascript:pageTracker._trackPageview('/blog.softwareinsider.org');">relationship</a>.  When that reemerges in how SAP works with customers, partners, influencers, and its own employees, SAP will be back in good graces. In the meantime, its  time to get to work and deliver.  Oracle’s Fusions Apps are coming soon and competitors such as IBM, Microsoft, Epicor, IFS, and SalesForce.com will not relent.</p></blockquote>
<p>I recall the 1980s, when SAP&#8217;s main differentiator, at least in the English-speaking US, was a total commitment to customer success, and when it could be taken for granted that SAP would do business ethically. Things change, and not always for the better.</p>
<p>Anyhow, the reason I&#8217;m highlighting Ray&#8217;s post is that he makes reference to a number of interesting SAP-cetric technology trends or initiatives.<span id="more-1286"></span> In no particular order, Ray suggests:</p>
<ul>
<li>SAP&#8217;s and Oracle&#8217;s (Fusion) <a href="http://www.dbms2.com/2009/07/07/hasso-plattner-calls-for-in-memory-oltp-column-stores/" >efforts to meld memory-centric analytics with operational apps</a> will be crucial for large enterprises &#8212; but perhaps only around the middle of the next decade. (I basically agree, although I&#8217;d note that:
<ul>
<li>Wisely, Ray suggested a very long time frame.</li>
<li>BI/operational app integration has been, on the whole, glacial.</li>
<li>The idea that you have to put pre-built aggregates into RAM to get performance is an indictment of market-leading RDBMS &#8212; but it&#8217;s a fair indictment.</li>
<li>I&#8217;m not sure whether memory-centric OLAP will wind up in RAM or Flash. If the data stores are updated at near-transactional speeds, RAM may make more sense. Otherwise, Flash should have major advantages.)</li>
</ul>
</li>
<li>SAP&#8217;s long-standing attempts to support third-party development of SAP add-ons are a technological mess, in line with <a href="http://www.dbms2.com/2007/10/12/sap-is-losing-crucial-managerial-talent/" >my fears a couple of years ago</a>. However, the business-relationship part of the effort is vastly stronger.</li>
<li>As SAP focused more on the mid-market, it is partnering closely with Microsoft. (If you think about it, that makes all kinds of sense.)</li>
<li>Energy/environmental/safety tracking &#8212; i.e., sustainability &#8212; tools are a big deal. (See also <em><a href="http://www.economist.com/businessfinance/displaystory.cfm?story_id=15022465" onclick="javascript:pageTracker._trackPageview('/www.economist.com');">The Economist</a></em> on that point.)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/12/11/ray-wang-on-sap/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Boston Big Data Summit keynote outline</title>
		<link>http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/</link>
		<comments>http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/#comments</comments>
		<pubDate>Mon, 23 Nov 2009 06:25:50 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[DBMS product categories]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Humor]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1227</guid>
		<description><![CDATA[Last month, Bob Zurek asked me to give a talk on “Big Data”, where “big” is anything from a few terabytes on up, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I&#8217;m posting them below.

The top two points [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Last month, Bob Zurek asked me to give a talk on <a href="http://www.dbms2.com/2009/10/09/presentations-upcoming/" >“Big Data”, where “big” is anything from a few terabytes on up</a>, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I&#8217;m posting them below.</p>
<p><span id="more-1227"></span></p>
<p style="margin-bottom: 0in;">The top two points from Q&amp;A probably were:</p>
<ul>
<li><strong>Big Data and the cloud actually 	have relatively little to do with each other,</strong> <a href="http://www.dbms2.com/2009/10/30/aster-data-application-server-ncluster/" >a few exceptions</a> notwithstanding, especially if the data is in a shared-nothing DBMS 	(as opposed to, say, a MapReduce-oriented file cluster). Two 	principal reasons are:
<ul>
<li>Redistributing data from node to 	node is a little slow, undermining some of the elasticity benefits 	of the cloud.</li>
<li><a href="http://www.dbms2.com/2009/05/29/sneakernet-to-the-cloud/" >Getting data into the cloud in the 	first place is a lot slow</a>.</li>
</ul>
</li>
<li><strong>The NoSQL movement is a lot like 	the Ron Paul campaign</strong> &#8212; it consists of people who are dissatisfied 	with the status quo, whose dissatisfaction has a lot to do with 	insufficient liberty and/or excessive expenditure, and who otherwise 	don&#8217;t have a whole lot in common with each other.</li>
</ul>
<p style="margin-bottom: 0in;">Anyhow, here are my notes for the talk, edited in just a couple of places for readability or linkage.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><strong>Quick introduction</strong></p>
<ul>
<li>Big Data vs. cloud</li>
<li>How big is Big Data?</li>
<li>At the low end of that range, 	there&#8217;s little you can&#8217;t do with conventional technology if you 	have:
<ul>
<li>An unlimited budget for hardware</li>
<li>An unlimited budget for software</li>
<li>An unlimited budget for people, 	especially Oracle DBAs</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Big Data in OLTP</strong></p>
<ul>
<li>Hard-core OLTP
<ul>
<li>Focus of DBMS technology for a 	long-time</li>
<li>Big budgets because each 	transaction has significant value</li>
<li>Tough to get users to change 	technologies</li>
</ul>
</li>
<li>Lighter-weight OLTP
<ul>
<li>Classic example = web companies
<ul>
<li>Big ones &#8212;  retail-oriented ones 	(eBay, Amazon) partially excepted &#8212; <a href="http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/" >rolled their own technology 	stacks</a></li>
<li>Reluctant to give money to anybody
<ul>
<li>Open source, etc.</li>
</ul>
</li>
</ul>
</li>
<li>Difficulty finding market
<ul>
<li>Product vs. feature
<ul>
<li>Clustering/HA/DR/whatever</li>
<li>Ditto cloud enablement</li>
</ul>
</li>
<li>True products haven&#8217;t found much 	traction yet</li>
</ul>
</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Analytic Big Data use cases</strong></p>
<ul>
<li>Kinds of data for analytics
<ul>
<li>More of same != big</li>
<li>More detail and/or new kinds
<ul>
<li>Complete data sets</li>
<li>Transactions</li>
<li>Call details</li>
<li>Tick/trade history</li>
<li>Web clickstreams</li>
<li>Network event logs</li>
<li>Other machine-generated data</li>
<li>CAM bottom line
<ul>
<li>Anything human-generated should 	and will be retained in its entirety</li>
<li>Quantities of machine-generated 	data retained should and will grow roughly in line w/ computing cost 	reductions (Moore&#8217;s Law, etc.)</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li>Analytic uses of Big Data
<ul>
<li>Analytics is mainly about three 	things
<ul>
<li>Problem detection</li>
<li>Customer relationship improvement
<ul>
<li>(Those overlap when the customer 	relationship is bad)</li>
</ul>
</li>
<li>Financial statements on steroids</li>
</ul>
</li>
</ul>
<ul>
<li>Main kinds of analytics
<ul>
<li>What BI vendors traditionally sell
<ul>
<li>General reporting and dashboards</li>
<li>Ad-hoc query (now driven from 	those reports and dashboards)</li>
<li>Planning (allegedly integrated 	with BI)</li>
</ul>
</li>
<li>Research
<ul>
<li>Ad hoc relational query (worth 	mentioning twice because it drives so much of the market)</li>
<li>Data mining</li>
<li>Most web search and web mining</li>
</ul>
</li>
<li>Operational/near-real-time</li>
<li>Archiving/compliance</li>
</ul>
</li>
<li>What gets Big?
<ul>
<li>Mainly research and archiving</li>
<li>But when reporting or operational 	get Big, you have really interesting computing problems</li>
</ul>
</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Technology issues and trends</strong></p>
<ul>
<li>Moore&#8217;s Law
<ul>
<li>CPUs &#8212; All about cores, hence 	parallelism is key</li>
<li>RAM</li>
<li>SSDs – hence replace disks</li>
<li>Sensors – hence generate lots 	more data</li>
</ul>
</li>
<li>Kryder&#8217;s Law
<ul>
<li>But <a href="http://www.dbms2.com/2005/11/13/breaking-the-disk-speed-barrier/" >rotational speeds up only 	12.5X since Eisenhower Administration</a></li>
<li>Hence solid-state memory (or RAM) 	will soon take over</li>
</ul>
</li>
<li>In the mean time, I/O bottlenecks 	have had to be beaten
<ul>
<li>Hence sequential scans</li>
<li>Hence <a href="http://www.dbms2.com/2007/03/26/index-light-mpp-data-warehouse-appliances/" >index-light</a> architectures</li>
<li>Hence columnar</li>
</ul>
</li>
<li>DBMS “overhead”
<ul>
<li>Raw license and maintenance fees – 	software increasing fraction of total</li>
<li>OLTP vestiges – locking and all 	that</li>
<li>DBAs
<ul>
<li>People costs = huge fraction of 	total</li>
<li>Index-lightness addresses</li>
<li>So does appliance</li>
</ul>
</li>
<li>Many people don&#8217;t really know how to 	write SQL</li>
</ul>
</li>
<li>Configuration
<ul>
<li>Appliance/tightly-balanced
<ul>
<li>Netezza</li>
<li>Teradata earlier</li>
<li>Greenplum/Sun</li>
<li>Oracle</li>
<li>IBM</li>
<li>Microsoft/Madison</li>
</ul>
</li>
<li>Commodity/do what you want
<ul>
<li>Vertica</li>
<li>Greenplum now</li>
<li>Infobright, Aster and others</li>
<li>MapReduce-oriented file systems</li>
</ul>
</li>
<li><a href="http://www.dbms2.com/2009/10/25/data-warehouse-balanced-hardware-configuration/" >Extreme rigidity is silly</a>
<ul>
<li><a href="http://www.dbms2.com/2009/10/25/teradata-hardware-strategy-and-tactics/" >Teradata, Oracle have both 	signaled moving to more modularity</a></li>
<li>Big driver of that = heterogeneous 	storage
<ul>
<li>Cheap disk</li>
<li>Expensive disk</li>
<li>Solid-state</li>
<li>RAM</li>
</ul>
</li>
</ul>
<ul>
<li>CPU/storage ratio is even more of a 	driver</li>
</ul>
</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Theoretically defensible ways to segment the market</strong></p>
<ul>
<li><a href="http://www.dbms2.com/2009/09/10/analytic-speed-latency/" >Latency requirements</a>
<ul>
<li>High availability and low latency 	go together</li>
</ul>
</li>
<li>Query types
<ul>
<li>Simultaneous users for same</li>
</ul>
</li>
<li>Database size</li>
<li>Budget</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Actual segments right now</strong></p>
<ul>
<li><a href="http://www.dbms2.com/2009/08/24/teradatas-active-enterprise-data-warehouse-story/" >Utter ADW/EDW</a></li>
<li>Data mart
<ul>
<li>Size</li>
<li>Naturally columnar vs. naturally 	row-based</li>
</ul>
</li>
<li>Operational/frontline</li>
<li>Less dramatic/smaller EDW</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>This week at the Teradata Partners user conference</title>
		<link>http://www.dbms2.com/2009/10/19/teradata-partners-2009/</link>
		<comments>http://www.dbms2.com/2009/10/19/teradata-partners-2009/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 13:07:31 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data integration and middleware]]></category>
		<category><![CDATA[Data types]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[GIS and geospatial]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1150</guid>
		<description><![CDATA[Teradata tells me that its press embargoes are ending at 9:00 this morning. Here are some highlights of what&#8217;s going on, although names, dates, and details will have to await conversations and press releases this week.

Teradata is productizing 	“private cloud,” under names including “Teradata 	Enterprise Analytics Cloud,” “Teradata Agile Analytics Cloud,” 	and “Teradata Elastic Mart [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Teradata tells me that its press embargoes are ending at 9:00 this morning. Here are some highlights of what&#8217;s going on, although names, dates, and details will have to await conversations and press releases this week.</p>
<ul>
<li><strong>Teradata is productizing 	“private cloud,”</strong> under names including “Teradata 	Enterprise Analytics Cloud,” “Teradata Agile Analytics Cloud,” 	and “Teradata Elastic Mart Builder.” I.e., Teradata hopes to 	leapfrog Greenplum in its “<a href="../2009/06/08/the-future-of-data-marts/">Enterprise 	Data Cloud</a>” strategy. This is only fair, in that Greenplum 	lifted the idea from Teradata and eBay in the first place. It also 	provides major support for what I think is an extremely sensible 	trend. Give or take issues of who announces and ships what a couple 	months before or after a competitor, my early thinking is that the 	main differences between Greenplum and Teradata in this regard will 	be:
<ul>
<li>Virtual as opposed to just 	physical data marts, based on robust workload management software. 	(Advantage: Teradata)</li>
<li>Pricing, deployment options. 	(Advantage: Greenplum)</li>
<li>Features that don&#8217;t directly 	relate to enterprise/private cloud. (Advantage: Either, often 	Teradata.)</li>
</ul>
</li>
<li><strong>Teradata is generally 	strengthening its data movement technology</strong>, e.g. for making 	various appliances work in sync. I&#8217;m not too clear yet on the 	details of that. I think this is what Teradata&#8217;s phrase “ecosystem 	management” refers to.</li>
<li><strong>Teradata is (pre-)announcing – 	at least as a statement of direction &#8212; an appliance based on 	solid-state drives (SSDs). </strong>I&#8217;ve thought for a while that 	Teradata was a leader in thinking through <a href="../2008/10/23/teradata-solid-state-drives-ssd/">the 	issues around solid-state memory in data warehousing</a>, so it 	makes sense that they&#8217;re among the leaders in actually coming to 	market as well. I plan to say more after meeting with, e.g., Carson 	Schmidt.</li>
<li><strong>Teradata has achieved a 300%ish 	speed-up in geospatial processing</strong>. I gather this is largely a 	byproduct of the parallel analytics work Teradata did around 	strengthening its SAS integration. However, there don&#8217;t seem to be a 	lot of Teradata geospatial users yet.</li>
<li><span>Teradata 	Express, </span><strong>Teradata&#8217;s free Windows-based crippleware, is being 	ported to Amazon EC2 and VMware</strong> as well. Presumably to avoid 	cannibalizing Teradata product sales, there are quite a few 	limitations on Teradata Express, including system capacity, database 	size, and “no production use.”</li>
<li><strong>Teradata continues to extend 	its optimizations 	to handle queries issued by business intelligence tools. </strong><span>Previously, the focus of what 	Teradata discussed in this regard was <a href="../2009/08/02/teradata-13-focuses-on-advanced-analytic-performance/">query 	rewrite</a>. But soon automatic recommendation and creation of 	Aggregate Join Indexes – i.e.., materialized views – will be 	included as well.</span></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/19/teradata-partners-2009/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Thinking about analytic speed</title>
		<link>http://www.dbms2.com/2009/09/10/analytic-speed-latency/</link>
		<comments>http://www.dbms2.com/2009/09/10/analytic-speed-latency/#comments</comments>
		<pubDate>Thu, 10 Sep 2009 18:01:33 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Presentations]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=880</guid>
		<description><![CDATA[For a variety of reasons, I don&#8217;t plan to post my complete Enzee Universe keynote slide deck soon, if ever.  But perhaps one or more of its subjects are worth spinning out in their own blog posts.
I&#8217;m going to start with analytic speed or, equivalently, analytic latency. There is, obviously, a huge industry emphasis [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">For a variety of reasons, I don&#8217;t plan to post my complete <a href="http://www.enzeeuniverse.com/" onclick="javascript:pageTracker._trackPageview('/www.enzeeuniverse.com');">Enzee Universe keynote</a> slide deck soon, if ever.  But perhaps one or more of its subjects are worth spinning out in their own blog posts.</p>
<p style="margin-bottom: 0in;">I&#8217;m going to start with <em>analytic speed</em> or, equivalently, <em>analytic latency.</em> There is, obviously, a huge industry emphasis on speed.  Indeed, there&#8217;s so much emphasis that confusion often ensues. My goal in this post is not really to resolve the confusion; that would be ambitious to the max. But I&#8217;m at least trying to call attention to it, so that we can all be more careful in our discussions going forward, and perhaps contribute to a framework for those discussions as well.</p>
<p style="margin-bottom: 0in;">Key points include:</p>
<p style="margin-bottom: 0in;"><span>1.  There are </span><strong>two important senses of &#8220;latency&#8221; in analytics.</strong> One is just query response time. The other is the length of the interval between when data is captured and when it is available for analytic purposes. They&#8217;re often conflated &#8212; and indeed I shall do so for the remainder of this post.</p>
<p style="margin-bottom: 0in;">2.  There are <strong>many different kinds of analytic speed,</strong> which to a large extent can be viewed separately. Major areas include:</p>
<ul>
<li><strong>Data exploration</strong>. <a href="../2009/04/01/business-intelligence-notes-and-trends/">In-memory 	OLAP is a huge trend</a>, and <a href="../2008/08/04/qliktech-qlikview-update/">QlikView</a> is a hot BI product line.</li>
<li><strong>Budgeting/planning.</strong> <a href="../2009/02/07/analytics-role-in-a-frightening-economy/">In 	an unprecedentedly frightening economy, annual planning/forecasting 	cycles may well be too slow</a>.</li>
<li><strong>Operational integration.</strong> This is probably the biggest current area of mission-critical IT 	advancement.  Not coincidentally, it is also the mainstay of <a href="../2009/08/24/teradatas-active-enterprise-data-warehouse-story/">the 	most expensive and complex data warehousing technologies</a>. It&#8217;s 	also <a href="../2007/08/12/applications-for-not-so-low-latency-cep/">an 	ongoing area of application for event/stream processing, aka CEP</a>.</li>
<li><strong>General or deep analytics.</strong> This is what I seem to spend much of my time writing about &#8212; <a href="../2009/07/30/the-netezza-price-point/">data 	warehousing price/performance</a>, <a href="../2009/09/03/sas-on-netezza-and-other-netezza-extensibility/">parallelized 	data mining</a>, and much more.</li>
<li><strong>Data administration.</strong> <a href="../2009/06/08/the-future-of-data-marts/">Ease 	of data mart spin-out and administration</a> is becoming a major 	concern. And of course analytic appliance and DBMS vendors have been 	telling ease-of-deployment, low-DBA-involvement kinds of stories at 	least since Netezza first came to market.</li>
</ul>
<p style="margin-bottom: 0in;">There certainly are relationships among those; e.g., a really great analytic DBMS could help speed up any and all of the last three categories. But when assessing your needs, you can go quite far viewing each of those areas separately.</p>
<p style="margin-bottom: 0in;"><span>3.  It is indeed important to </span><strong>carefully assess your need-for-speed. </strong><span>Acceptable levels of analytic latency vary widely, ranging from sub-millisecond to multi-month</span>. <span id="more-880"></span><span>For example, I&#8217;ve put together a list:</span></p>
<ul>
<li><strong>Algorithmic trading – </strong><em><strong>Sub-millisecond.</strong></em> Increasingly, that&#8217;s what&#8217;s 	needed, at least for query response.</li>
<li><strong>Web page – </strong><em><strong>Tenths 	of seconds.</strong></em> If you want to get up a complex web page in 2 	seconds or less, you may require sub-second response time for your 	queries. (E.g., this is a key message from Teradata&#8217;s customer 	success story at Travelocity.)</li>
<li><strong>Call center – </strong><em><strong>Seconds.</strong></em> If two humans are talking to each other on the phone, a 	couple-second delay in response is probably acceptable.</li>
<li><strong>Transportation – </strong><em><strong>Tens 	of minutes.</strong></em> If a commercial flight is delayed, reaction to 	minimize the consequences often needs to be sub-hour. The same can 	be true for cargo transportation (truck, rail, or air). In other 	cases, a couple of hours may be fast enough.</li>
<li><strong>Inventory – </strong><em><strong>Hours.</strong></em> In the 1980s, the retailers that won were the ones who reordered hot 	seasonal merchandise a couple of days before their competitors. Even 	then, 7-11 Japan was making restocking decisions several times a 	day. Things have only gotten faster since.</li>
<li><strong>Planning – </strong><em><strong>Weeks or 	more. </strong></em>Planning is often done on an annual or even 	multi-year cycle. That may be excessively slow. But weeks or months? 	In many cases, that&#8217;s both the best achievable and plenty good 	enough.</li>
</ul>
<p>That&#8217;s a range of <strong>at least 9 orders of magnitude, </strong>which i<span>s a lot like the difference between <a href="http://hypertextbook.com/facts/1999/RachelShweky.shtml" onclick="javascript:pageTracker._trackPageview('/hypertextbook.com');">the speed of a turtle</a> and the speed of light.</span></p>
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/09/10/analytic-speed-latency/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
