<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS2 -- DataBase Management System Services &#187; Software as a Service (SaaS)</title>
	<atom:link href="http://www.dbms2.com/category/software-as-a-service-database-saas/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 18 Mar 2010 05:19:19 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Open issues in database and analytic technology</title>
		<link>http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/</link>
		<comments>http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/#comments</comments>
		<pubDate>Mon, 01 Feb 2010 22:04:31 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1507</guid>
		<description><![CDATA[The last part of my New England Database Summit talk was on open issues in database and analytic technology. This was closely intertwined with the previous section, and also relied on a lot that I&#8217;ve posted here. So I&#8217;ll just put up a few notes on that part, with lots of linkage to prior discussion [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">The last part of my <a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/" >New England Database Summit</a> talk was on open issues in database and analytic technology. This was closely intertwined with the <a href="http://www.dbms2.com/2010/01/31/trends-database-aanalytic-technology/" >previous section</a>, and also relied on a lot that I&#8217;ve posted here. So I&#8217;ll just put up a few notes on that part, with lots of linkage to prior discussion of the same points.<span id="more-1507"></span></p>
<p><!-- 		@page { margin: 0.79in } 		P { margin-bottom: 0.08in } --></p>
<ul>
<li>The most important issue in 	database and analytic technology, in my opinion, isn&#8217;t technological 	at all – rather, it&#8217;s the legal and political steps needed to <a href="http://www.dbms2.com/2010/01/31/data-based-snooping-threat-libert/" > preserve liberty</a> in the face of advancing, intrusive 	technology.</li>
<li>Another important issue for 	society – and this one does involve a lot of technology – is 	scientific number crunching. In particular, <a href="http://www.dbms2.com/2009/10/03/issues-in-scientific-data-management/" >database technology for 	scientific computing</a> needs to be developed much further. I&#8217;ll have 	more to say on all this soon.</li>
<li>More generally, technology needs 	to keep advancing for parallel analytics. Fortunately, it is. Watch 	this space over the next few weeks.</li>
<li>Oracle has said, in effect, that <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" > its most important technological challenge of the decade</a> is getting 	<a href="http://www.dbms2.com/2010/01/31/flash-pcmsolid-state-memory-disk/" >solid-state memory</a> right. I agree.</li>
<li>Data volumes will keep going up, 	up, up. Technology needs to keep evolving accordingly. Much of what 	I write is on that subject.</li>
<li>Data needs to be processed and analyzed at <a href="http://www.dbms2.com/2009/09/10/analytic-speed-latency/" >very 	different latencies</a>. And there&#8217;s much further to go in integrating 	disparate latencies.</li>
<li>Analytic database management in 	the cloud hasn&#8217;t been solved yet, especially for Big Data. Among the 	reasons are the difficulty of moving data into the cloud (unless it 	originated there), the slowness of moving it from node to node in 	shared-nothing architectures (which reduces the elasticity benefit), 	and above all the long and unpredictable latencies of interprocessor 	communication while queries are running (a key subject of discussion 	at the <a href="http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/" >Boston Big Data Summit</a>).</li>
<li>Better business intelligence user 	interfaces are increasingly available. I&#8217;m thinking particularly of 	approaches with buzzwords like <a href="http://www.dbms2.com/2008/08/04/qliktech-qlikview-update/" >visualization/interactive exploration</a> or <a href="http://www.texttechnologies.com/2007/08/03/the-case-for-inxight-awareness-server/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">faceted</a>. But they aren&#8217;t well-integrated into the overall 	analytic stack, as big BI vendors are trailing the smaller ones in 	this regards. (Part of the problem relates to my previous point.)</li>
<li>Application development over text 	search isn&#8217;t in the same league as application development over 	relational DBMS. The choices are mainly XML (e.g., <a href="http://www.texttechnologies.com/2008/04/29/mark-logic-viewed-as-a-different-kind-of-text-search-technology-vendor/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">MarkLogic</a>), SQL 	for text integrated into RDBMS (limited by the weakness of those 	integrations), and something like <a href="http://www.texttechnologies.com/2008/09/20/attivio-update/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">Attivio&#8217;s Java SDK</a>. There&#8217;s a 	major conceptual barrier in building those apps, namely the 	unpredictability of query results. Still, it should be possible to 	do better.</li>
<li>Similarly, text analytics and 	conventional analytics exist well side by side. They can even be in 	the same database and/or dashboard, although in practice that is 	limited by the strong <a href="http://www.texttechnologies.com/2008/10/24/attensity-update-2/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">SaaS focus of text mining vendors and users</a>. But analytic 	integration of them is really hard. Linguistic imprecision is, in my 	opinion, only the #2 reason for this difficulty. The #1 reason is 	that trends detected by text analytics are much less precise than 	trends on tabular data – e.g., a 50% increase in a certain kind of 	complaint may be no more significant than a 5% change in a revenue 	variable.</li>
<li>I&#8217;m increasingly persuaded that <a href="http://www.dbms2.com/2009/08/21/social-network-analysis-aka-relationship-analytics/" > graph analytics</a> can be handled without a graph-centric data model. 	But right now, it isn&#8217;t being handled well at all. Lots more needs 	to be done – although when it is, it will just exacerbate the 	privacy/liberty dangers that so concern me.</li>
</ul>
<p><em><strong>Other posts based on my January, 2010 New England Database Summit keynote address</strong></em></p>
<ul>
<li><a title="Data-based snooping — a huge threat to liberty that we’re all helping make worse" href="../2010/01/31/data-based-snooping-threat-libert/">Data-based snooping — a huge threat to liberty that we’re all helping make worse</a></li>
<li><a title="Flash, other solid-state memory, and disk" href="../2010/01/31/flash-pcmsolid-state-memory-disk/">Flash, other solid-state memory, and disk</a></li>
<li><a title="Interesting trends in database and analytic technology" href="../2010/01/31/trends-database-aanalytic-technology/">Interesting trends in database and analytic technology</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>More miscellany</title>
		<link>http://www.dbms2.com/2009/12/30/more-miscellany/</link>
		<comments>http://www.dbms2.com/2009/12/30/more-miscellany/#comments</comments>
		<pubDate>Wed, 30 Dec 2009 11:38:22 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Clearpace]]></category>
		<category><![CDATA[Continuent]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1370</guid>
		<description><![CDATA[Adding to yesterday&#8217;s varied quick comments:
Robert Hodges of Continuent offers a great outline of Continuent&#8217;s clustering story, with a lot of &#8220;Now we got right what we previously didn&#8217;t know/admit we got wrong.&#8221; Continuent now claims to have a strong clustering offering, both paid and free/open-source, for both MySQL and PostgreSQL, with Oracle support perhaps [...]]]></description>
			<content:encoded><![CDATA[<p>Adding to <a href="http://www.dbms2.com/2009/12/29/this-and-that/" >yesterday&#8217;s varied quick comments</a>:<span id="more-1370"></span></p>
<p><a href="http://www.dbms2.com/2009/09/03/continuent-on-clustering/" >Robert Hodges</a> of <strong>Continuent</strong> offers <a href="http://scale-out-blog.blogspot.com/2009/12/proving-masterslave-clusters-work-and.html" onclick="javascript:pageTracker._trackPageview('/scale-out-blog.blogspot.com');">a great outline of Continuent&#8217;s clustering story</a>, with a lot of &#8220;Now we got right what we previously didn&#8217;t know/admit we got wrong.&#8221; Continuent now claims to have a strong <strong>clustering</strong> offering, both paid and free/open-source, for both MySQL and PostgreSQL, with Oracle support perhaps coming really soon.</p>
<p>Merv Adrian, who has <a href="http://www.dbms2.com/2009/06/22/the-tpc-h-benchmark-is-a-blight-upon-the-industry/" >overrated the importance of TPC benchmarks</a> in the past, seems to have become more <a href="http://mervadrian.wordpress.com/2009/12/23/additional-caveats-obscure-oracles-tpc-benchmark/" onclick="javascript:pageTracker._trackPageview('/mervadrian.wordpress.com');">skeptical</a>.</p>
<p>Interim CEO <a href="http://www.infobright.com/Blog/ceo_blog" onclick="javascript:pageTracker._trackPageview('/www.infobright.com');">Mark Burton</a> laid out<strong> Infobright&#8217;s focus</strong> pretty clearly when he took over:</p>
<blockquote><p><span style="letter-spacing: 0px;"> &#8230; the focus must be in building products that fit market segments where ease-of-use and easily attainable performance are valued.  This doesn’t sound like the high end of Data Warehousing to me where highly complex MPP architectures and teams of DBAs spend their time.  It sounds like the realm of Departmental IT and SMB where business leaders are in a hurry to gain access to data and answers without the lead time and pain of complex architectures and high costs.</span></p></blockquote>
<p><span style="letter-spacing: 0px;">I&#8217;m hearing about a <strong>SaaS focus</strong> from a lot of companies. The Continuent link above mentions one. So does <a href="http://www.rainstor.com/news-blog/news/users-demand-saas-data-escrow-services" onclick="javascript:pageTracker._trackPageview('/www.rainstor.com');">RainStor&#8217;s latest blog post</a>. <a href="http://www.dbms2.com/2009/12/27/introduction-to-gooddata/" >Gooddata</a>, a SaaS vendor itself, seems focused on analyzing data that was originally created via SaaS. I haven&#8217;t talked with Cast Iron or Pervasive for a while, but when I did, their ETL market targeting was <a href="http://www.dbms2.com/2008/03/21/cast-iron-systems-focuses-on-saas-data-integration/" >all about SaaS</a>. And of course, I hear dumber SaaS-focus ideas as well. I think the biggest substantive reason for this trend is &#8212; i</span><span style="letter-spacing: 0px;">f you don&#8217;t have the broadest feature set, and fear large enterprises therefore won&#8217;t want your stuff, going after SMBs makes sense. And SMBs are presumed to be going SaaS. Also in the mix, of course, are a single platform to support, a small number of large SaaS vendors to sell to or partner with, and/or general trendiness.<br />
</span></p>
<p><span style="letter-spacing: 0px;"><br />
</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/12/30/more-miscellany/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Introduction to Gooddata</title>
		<link>http://www.dbms2.com/2009/12/27/introduction-to-gooddata/</link>
		<comments>http://www.dbms2.com/2009/12/27/introduction-to-gooddata/#comments</comments>
		<pubDate>Mon, 28 Dec 2009 03:16:30 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Amazon and its cloud]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Gooddata]]></category>
		<category><![CDATA[Jaspersoft]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1341</guid>
		<description><![CDATA[Around the end of the Cold War, Esther Dyson took it upon herself to go repeatedly to Eastern Europe and do a lot of rah-rah and catalysis, hoping to spark software and other computer entrepreneurs. I don&#8217;t know how many people&#8217;s lives she significantly affected – I&#8217;d guess it&#8217;s actually quite a few – but [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Around the end of the Cold War, Esther Dyson took it upon herself to go repeatedly to Eastern Europe and do a lot of rah-rah and catalysis, hoping to spark software and other computer entrepreneurs. I don&#8217;t know how many people&#8217;s lives she significantly affected – I&#8217;d guess it&#8217;s actually quite a few – but in any case the number is not zero. Roman Stanek, who has built and sold a couple of software business, cites her as a key influence setting him on his path.</p>
<p style="margin-bottom: 0in;">Roman&#8217;s latest venture is business intelligence firm Gooddata. Gooddata was founded in 2007 and has been soliciting and getting attention for a while, so I was surprised to learn that Gooddata officially launched just a few weeks ago. Anyhow, some less technical highlights of the Gooddata story include:<span id="more-1341"></span></p>
<ul>
<li>Gooddata believes it makes BI easy 	to adopt, unlike every other BI vendor on the planet &#8212; not 	excluding the many other BI vendors who say the same thing about 	themselves. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
<li>Gooddata is entirely cloud-based, 	specifically in the Amazon cloud.  I.e., Gooddata is selling 	SaaS-based BI.</li>
<li>Gooddata wants to sell to 	enterprises that are large enough to have more than a couple of BI 	users, and small enough not to be well served by the BI market 	leaders.
<ul>
<li>In revenue terms, this is the ever-popular $100 million &#8211; 	$1 billion market.</li>
<li>Specifically, Gooddata believes 	that those enterprises may have decent “back office” BI, but 	don&#8217;t have much in the front office. Gooddata wants to provide them 	with front office BI, which seems to basically mean CRM analytics. 	Gooddata sees this as a market in which QlikTech is the major 	player.  Generally, Gooddata wants to emulate and go after QlikTech.</li>
<li>Even more specifically, Gooddata 	wants to sell to Salesforce.com customers, who it believes are not 	well-served by what passes for built-in analytics at Salesforce. 	Partnering with NetSuite didn&#8217;t work as well, since NetSuite&#8217;s 	customers turn out to be smaller firms than are in Gooddata&#8217;s target 	market.</li>
</ul>
</li>
<li>Something I heard from both 	Jaspersoft and Gooddata is that there&#8217;s a hot market in providing 	cloud-based BI to online gaming companies. I gather these are mainly 	games running on mass communication platforms such as Facebook or 	the iPhone. Surely not coincidentally, it seems likely that:
<ul>
<li>These are small companies whose 	success – and hence data intake – can suddenly explode.</li>
<li>The data originates in cyberspace, 	with no particular need ever to come to the game companies&#8217; own 	premises.</li>
</ul>
</li>
<li>Gooddata has 50 production 	customers.</li>
<li>Gooddata had 2500 “projects” 	at the end of beta in June, and is adding 100 more per month. (Those 	numbers look weird together.) A “project” is a lot like a 	database, with associated reports, security privileges, etc.</li>
<li>Gooddata has close to 40 people, 	mainly in development.</li>
<li>I didn&#8217;t detect much of a sales 	strategy, nor much of a marketing strategy beyond the impressive 	early buzz generation. Perhaps that&#8217;s a partial explanation as to 	why the rate of Gooddata adoption fell even before the company 	officially launched.</li>
<li>I forgot to ask what those 50 	customers were actually paying, but considering Gooddata&#8217;s price 	list, it appears a typical price range for Gooddata&#8217;s stuff would be 	$500-$2,000/month.</li>
</ul>
<p style="margin-bottom: 0in;">Gooddata technical highlights include:</p>
<ul>
<li>Gooddata is building an 	entire BI stack – reporting, dashboards, ETL, in-memory database 	management, everything. I doubt Gooddata would claim that the pieces 	are best-of-breed in many ways other than BI ease of adoption and 	use.</li>
<li>So far I&#8217;ve seen three Gooddata 	ease-of-use features or feature groups that strike me as 	differentiated – <strong>reusability</strong> (of metrics and/or reports), 	<strong>collaboration,</strong> and <strong>tag clouds.</strong> More on those below. 	Gooddata is also building toward an <strong>agility</strong> pitch, but those 	features aren&#8217;t all baked yet.</li>
<li>Gooddata is MySQL-based today, but 	plans to move to a memory-centric compressed column store in 2010. 	Roman doesn&#8217;t reject analogies to SAP&#8217;s <em>BI/BW/whatever 	Accelerator. </em><span style="font-style: normal;">Yes, folks – 	Gooddata is yet another BI vendor doing some form of memory-centric 	OLAP. That&#8217;s a big trend.</span></li>
<li>I&#8217;m guessing 	that a big reason Gooddata is reinventing so many technical wheels 	is to ensure that the Gooddata stack is seamlessly multi-tenant from 	top to bottom. (Hasso Plattner of SAP&#8217;s <a href="../2009/07/07/hasso-plattner-calls-for-in-memory-oltp-column-stores/">comments 	on a similar idea</a> suggest a similar emphasis.)</li>
<li>Gooddata has 	its own multidimensional query language called MAQL (the A doesn&#8217;t 	seem to stand for anything). Today MAQL generates SQL for MySQL. The 	future columnar memory-centric data store will &#8212; I think – 	understand MAQL natively.</li>
</ul>
<p style="margin-bottom: 0in;">Now we get to the good stuff. When I wrote about <a href="../2009/05/30/reinventing-business-intelligence/">reinventing business intelligence</a> back in May, I focused on some interesting developments I see as actually underway &#8212; at least on an experimental basis and/or from small vendors – namely:</p>
<ul>
<li><strong>Text-search interfaces. </strong>Well, 	while I didn&#8217;t see true text search in the Gooddata demo, I did see 	tag clouds, which have some of the same benefits.</li>
<li><strong>Collaboration tools.</strong> Well, 	Gooddata has a nice-looking approach to BI collaboration, heavily 	reflected in its UI metaphors. (That said, I haven&#8217;t really compared 	Gooddata to Microsoft SharePoint or SAP&#8217;s Portal/Rooms/whatever.)</li>
<li><strong>Memory-centric analytics</strong> (for speed of exploration). As noted above, Gooddata has that coming 	soon.</li>
<li><strong>Data exploration that tries to 	ignore fixed relational schemas,</strong> ala Attivio or Splunk.  Roman 	says Gooddata is interested in or working on that, but offers no 	timetable.</li>
</ul>
<p style="margin-bottom: 0in;">Meanwhile, something I&#8217;ve been seeking for years, but haven&#8217;t seen much progress on since enhancement stopped on Cognos Metrics Manager, is more <a href="../2007/11/13/the-key-problem-with-dashboard-functionality/">user-friendly metrics management</a>.  Well, it doesn&#8217;t have a lot of bells and whistles, but at least Gooddata has the basics – a list of already-defined metrics, and a reasonable way of compounding them into other metrics. I think that kind of thing will be a major BI feature going forward, to the point that a few years from now we&#8217;ll be worrying about how to port them from one BI vendor&#8217;s tool from another.</p>
<p style="margin-bottom: 0in;"><strong>Bottom line: If you&#8217;re interested in BI, you should look at a Gooddata demo.</strong></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/12/27/introduction-to-gooddata/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Boston Big Data Summit keynote outline</title>
		<link>http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/</link>
		<comments>http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/#comments</comments>
		<pubDate>Mon, 23 Nov 2009 06:25:50 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[DBMS product categories]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Humor]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1227</guid>
		<description><![CDATA[Last month, Bob Zurek asked me to give a talk on “Big Data”, where “big” is anything from a few terabytes on up, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I&#8217;m posting them below.

The top two points [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Last month, Bob Zurek asked me to give a talk on <a href="http://www.dbms2.com/2009/10/09/presentations-upcoming/" >“Big Data”, where “big” is anything from a few terabytes on up</a>, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I&#8217;m posting them below.</p>
<p><span id="more-1227"></span></p>
<p style="margin-bottom: 0in;">The top two points from Q&amp;A probably were:</p>
<ul>
<li><strong>Big Data and the cloud actually 	have relatively little to do with each other,</strong> <a href="http://www.dbms2.com/2009/10/30/aster-data-application-server-ncluster/" >a few exceptions</a> notwithstanding, especially if the data is in a shared-nothing DBMS 	(as opposed to, say, a MapReduce-oriented file cluster). Two 	principal reasons are:
<ul>
<li>Redistributing data from node to 	node is a little slow, undermining some of the elasticity benefits 	of the cloud.</li>
<li><a href="http://www.dbms2.com/2009/05/29/sneakernet-to-the-cloud/" >Getting data into the cloud in the 	first place is a lot slow</a>.</li>
</ul>
</li>
<li><strong>The NoSQL movement is a lot like 	the Ron Paul campaign</strong> &#8212; it consists of people who are dissatisfied 	with the status quo, whose dissatisfaction has a lot to do with 	insufficient liberty and/or excessive expenditure, and who otherwise 	don&#8217;t have a whole lot in common with each other.</li>
</ul>
<p style="margin-bottom: 0in;">Anyhow, here are my notes for the talk, edited in just a couple of places for readability or linkage.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><strong>Quick introduction</strong></p>
<ul>
<li>Big Data vs. cloud</li>
<li>How big is Big Data?</li>
<li>At the low end of that range, 	there&#8217;s little you can&#8217;t do with conventional technology if you 	have:
<ul>
<li>An unlimited budget for hardware</li>
<li>An unlimited budget for software</li>
<li>An unlimited budget for people, 	especially Oracle DBAs</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Big Data in OLTP</strong></p>
<ul>
<li>Hard-core OLTP
<ul>
<li>Focus of DBMS technology for a 	long-time</li>
<li>Big budgets because each 	transaction has significant value</li>
<li>Tough to get users to change 	technologies</li>
</ul>
</li>
<li>Lighter-weight OLTP
<ul>
<li>Classic example = web companies
<ul>
<li>Big ones &#8212;  retail-oriented ones 	(eBay, Amazon) partially excepted &#8212; <a href="http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/" >rolled their own technology 	stacks</a></li>
<li>Reluctant to give money to anybody
<ul>
<li>Open source, etc.</li>
</ul>
</li>
</ul>
</li>
<li>Difficulty finding market
<ul>
<li>Product vs. feature
<ul>
<li>Clustering/HA/DR/whatever</li>
<li>Ditto cloud enablement</li>
</ul>
</li>
<li>True products haven&#8217;t found much 	traction yet</li>
</ul>
</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Analytic Big Data use cases</strong></p>
<ul>
<li>Kinds of data for analytics
<ul>
<li>More of same != big</li>
<li>More detail and/or new kinds
<ul>
<li>Complete data sets</li>
<li>Transactions</li>
<li>Call details</li>
<li>Tick/trade history</li>
<li>Web clickstreams</li>
<li>Network event logs</li>
<li>Other machine-generated data</li>
<li>CAM bottom line
<ul>
<li>Anything human-generated should 	and will be retained in its entirety</li>
<li>Quantities of machine-generated 	data retained should and will grow roughly in line w/ computing cost 	reductions (Moore&#8217;s Law, etc.)</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li>Analytic uses of Big Data
<ul>
<li>Analytics is mainly about three 	things
<ul>
<li>Problem detection</li>
<li>Customer relationship improvement
<ul>
<li>(Those overlap when the customer 	relationship is bad)</li>
</ul>
</li>
<li>Financial statements on steroids</li>
</ul>
</li>
</ul>
<ul>
<li>Main kinds of analytics
<ul>
<li>What BI vendors traditionally sell
<ul>
<li>General reporting and dashboards</li>
<li>Ad-hoc query (now driven from 	those reports and dashboards)</li>
<li>Planning (allegedly integrated 	with BI)</li>
</ul>
</li>
<li>Research
<ul>
<li>Ad hoc relational query (worth 	mentioning twice because it drives so much of the market)</li>
<li>Data mining</li>
<li>Most web search and web mining</li>
</ul>
</li>
<li>Operational/near-real-time</li>
<li>Archiving/compliance</li>
</ul>
</li>
<li>What gets Big?
<ul>
<li>Mainly research and archiving</li>
<li>But when reporting or operational 	get Big, you have really interesting computing problems</li>
</ul>
</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Technology issues and trends</strong></p>
<ul>
<li>Moore&#8217;s Law
<ul>
<li>CPUs &#8212; All about cores, hence 	parallelism is key</li>
<li>RAM</li>
<li>SSDs – hence replace disks</li>
<li>Sensors – hence generate lots 	more data</li>
</ul>
</li>
<li>Kryder&#8217;s Law
<ul>
<li>But <a href="http://www.dbms2.com/2005/11/13/breaking-the-disk-speed-barrier/" >rotational speeds up only 	12.5X since Eisenhower Administration</a></li>
<li>Hence solid-state memory (or RAM) 	will soon take over</li>
</ul>
</li>
<li>In the mean time, I/O bottlenecks 	have had to be beaten
<ul>
<li>Hence sequential scans</li>
<li>Hence <a href="http://www.dbms2.com/2007/03/26/index-light-mpp-data-warehouse-appliances/" >index-light</a> architectures</li>
<li>Hence columnar</li>
</ul>
</li>
<li>DBMS “overhead”
<ul>
<li>Raw license and maintenance fees – 	software increasing fraction of total</li>
<li>OLTP vestiges – locking and all 	that</li>
<li>DBAs
<ul>
<li>People costs = huge fraction of 	total</li>
<li>Index-lightness addresses</li>
<li>So does appliance</li>
</ul>
</li>
<li>Many people don&#8217;t really know how to 	write SQL</li>
</ul>
</li>
<li>Configuration
<ul>
<li>Appliance/tightly-balanced
<ul>
<li>Netezza</li>
<li>Teradata earlier</li>
<li>Greenplum/Sun</li>
<li>Oracle</li>
<li>IBM</li>
<li>Microsoft/Madison</li>
</ul>
</li>
<li>Commodity/do what you want
<ul>
<li>Vertica</li>
<li>Greenplum now</li>
<li>Infobright, Aster and others</li>
<li>MapReduce-oriented file systems</li>
</ul>
</li>
<li><a href="http://www.dbms2.com/2009/10/25/data-warehouse-balanced-hardware-configuration/" >Extreme rigidity is silly</a>
<ul>
<li><a href="http://www.dbms2.com/2009/10/25/teradata-hardware-strategy-and-tactics/" >Teradata, Oracle have both 	signaled moving to more modularity</a></li>
<li>Big driver of that = heterogeneous 	storage
<ul>
<li>Cheap disk</li>
<li>Expensive disk</li>
<li>Solid-state</li>
<li>RAM</li>
</ul>
</li>
</ul>
<ul>
<li>CPU/storage ratio is even more of a 	driver</li>
</ul>
</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Theoretically defensible ways to segment the market</strong></p>
<ul>
<li><a href="http://www.dbms2.com/2009/09/10/analytic-speed-latency/" >Latency requirements</a>
<ul>
<li>High availability and low latency 	go together</li>
</ul>
</li>
<li>Query types
<ul>
<li>Simultaneous users for same</li>
</ul>
</li>
<li>Database size</li>
<li>Budget</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Actual segments right now</strong></p>
<ul>
<li><a href="http://www.dbms2.com/2009/08/24/teradatas-active-enterprise-data-warehouse-story/" >Utter ADW/EDW</a></li>
<li>Data mart
<ul>
<li>Size</li>
<li>Naturally columnar vs. naturally 	row-based</li>
</ul>
</li>
<li>Operational/frontline</li>
<li>Less dramatic/smaller EDW</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Aster Data 4.0 and the evolution of &#8220;advanced analytic(s) servers&#8221;</title>
		<link>http://www.dbms2.com/2009/10/30/aster-data-application-server-ncluster/</link>
		<comments>http://www.dbms2.com/2009/10/30/aster-data-application-server-ncluster/#comments</comments>
		<pubDate>Sat, 31 Oct 2009 01:56:55 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1198</guid>
		<description><![CDATA[Since Linda and I are leaving on vacation in a few hours, Aster Data graciously gave me permission to morph its “12:01 am Monday, November 2” embargo into “late Friday night.”
Aster Data is officially announcing the 4.0 release of nCluster. There are two big pieces to this announcement:

Aster is 	offering a slick vision for integrating [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><em>Since Linda and I are leaving on vacation in a few hours, Aster Data graciously gave me permission to morph its “12:01 am Monday, November 2” embargo into “late Friday night.”</em></p>
<p style="margin-bottom: 0in; font-style: normal;">Aster Data is officially announcing the 4.0 release of nCluster. There are two big pieces to this announcement:</p>
<ul>
<li>Aster is 	offering a slick vision for integrating big-database management and 	general analytic processing on the same MPP cluster, under the 	not-so-slick name “Data-Application Server.”</li>
<li>Aster is also 	offering a sophisticated vision for workload management.</li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;">In addition, Aster has matured nCluster in various ways, for example cleaning up a performance problem with single-row updates.</p>
<p style="margin-bottom: 0in; font-style: normal;">Highlights of the Aster “Data-Application Server” story include:<span id="more-1198"></span></p>
<ul>
<li>At its core, 	the Aster “Data-Application Server” is the Aster nCluster MPP 	analytic DBMS, enhanced with basic application server functionality 	(I didn&#8217;t ask for details of that part), running on the same 	nCluster worker nodes that answer SQL queries.</li>
<li>Thus, Aster is 	eliminating a lot of the data movement that plagues three-tier 	architectures and other less-integrated approaches.</li>
<li>The Aster 	“Data-Application Server” further offers integrated workload 	management for applications and queries; more on that below.</li>
<li>The Aster 	“Data-Application Server” requires applications to be 	parallelized and invoked via Aster&#8217;s <a href="../2009/10/15/mapreduce-webinar-slides/">SQL/MapReduce.</a></li>
<li>As befits a 	MapReduce-based system, the Aster “Data-Application Server” lets 	you write your applications in lots of different languages (the 	usual suspects, and it also does .NET).</li>
<li>The Aster 	“Data-Application Server” runs applications in their own process 	spaces, protecting the DBMS server from crashes and other damaging 	behavior.</li>
<li>The Aster 	“Data-Application Server” allows applications to manage memory 	themselves, persistently, and not just via relational constructs. 	Thus, if you want your application to maintain a graph, mini rules 	engine, and/or finite state machine, you can, without doing SQL 	contortions.</li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;">In a compelling proof point for the Aster Data-Application Server&#8217;s slickness, Aster has leapfrogged Teradata and Netezza in the extent to which SAS functionality is integrated into Aster&#8217;s DBMS. (Aster and SAS both say that you can do full SAS modeling in parallel on Aster, but even so I wouldn&#8217;t be surprised to discover there were some parts of SAS&#8217; system that turned out to be exceptions.) Of course, Aster is hardly the only analytic DBMS vendor to have the idea of explicitly enhancing general analytic processing; that&#8217;s why we see lots of MapReduce announcements, and it&#8217;s also why Teradata enhanced its UDFs (User-Defined Functions) to have some kind of persistent memory.* But I don&#8217;t know of anybody else whose approach is quite so elegant and general at this time.</p>
<p style="margin-bottom: 0in;"><em>*Unfortunately, I don&#8217;t yet know much about Teradata&#8217;s UDF enhancements. I neglected to drill down on Global Persistent Memory when it was mentioned a couple of times at Teradata Partners last week, and Teradata was unable to accommodate my request this week for a rapid follow-up briefing on the subject.</em></p>
<p style="margin-bottom: 0in; font-style: normal;">Aster&#8217;s approach to workload management is similarly stylish. The idea is:</p>
<ul>
<li>Lots of 	variables are available to be taken into account (e.g., user role, 	expected query duration, actual duration of a running query, etc.)</li>
<li>SQL statements 	can be written against any of these variables.</li>
<li>The SQL 	statements serve as rules to set query/task priorities.</li>
<li>There seem to 	be a few different ways to measure priority, including explicit 	allocation of CPU or I/O resources, as well as more conventional 	“This group of queries gets higher priority than that one” 	kinds of metrics.</li>
<li>The whole 	thing provides integrated workload management for queries, 	applications, load jobs, data redistribution, and so on.</li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;">Right now the interface is – well, you&#8217;re manipulating a SQL table. A more conventional workload management GUI is slated for the second quarter of 2010.</p>
<p style="margin-bottom: 0in; font-style: normal;">Discussing subjects such as mirroring and ILM (Information Lifecycle Management) with Aster can be tricky, as Aster uses the word “partition” in confusing ways. Anyhow, Aster has a few different levels of compression, and the ability to apply different levels of compression to different partitions, to change compression levels via ALTER TABLE, and to alter (presumably increase) compression on the fly when doing online backup. Aster is also part of a growing trend to eschew RAID, instead doing mirroring in its own software.  (Other examples of this strategy would be <span><a href="http://www.dbms2.com/2009/10/06/oracle-and-vertica-on-compression-and-other-physical-data-layout-features/" >Vertica</a>, <a href="http://www.dbms2.com/2008/09/28/oracle-database-machine-performance-and-compression/" >Oracle Exadata/ASM</a>, and <a href="http://www.dbms2.com/2009/10/25/teradata-hardware-strategy-and-tactics/" >Teradata Fallback</a>.) </span><span>Prior to nCluster 4.0, this caused a problem, in that the block sizes for mirroring were so large as to create a lag in transactional updating. But Aster says this problem is now solved, and indeed claims that nCluster 4.0 is superior to most rivals in transactional efficiency.</span></p>
<p style="margin-bottom: 0in;">And finally, while I was talking w/ Aster Data anyway, I checked up on cloud and MapReduce customer penetration. The answers were:</p>
<ul>
<li>Aster has two serious production 	cloud users, both of which have been disclosed for a while, namely:
<ul>
<li>ShareThis, which runs Aster 		nCluster on Amazon EC2</li>
<li>Didit, which runs Aster nCluster 		on AppNexus</li>
</ul>
</li>
<li>Outside of those two, Aster sees 	some cloud use for test, development, prototyping, etc.</li>
<li>Every single Aster customer uses 	<a href="../2009/10/15/mapreduce-webinar-slides/">SQL/MapReduce</a> &#8212; i.e., they invoke MapReduce via Aster nCluster SQL queries.</li>
<li>Some of those customers use MapReduce for ETL, some use it 	for actual analytics.</li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/30/aster-data-application-server-ncluster/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Teradata&#8217;s nebulous cloud strategy</title>
		<link>http://www.dbms2.com/2009/10/27/teradatas-nebulous-cloud-strategy/</link>
		<comments>http://www.dbms2.com/2009/10/27/teradatas-nebulous-cloud-strategy/#comments</comments>
		<pubDate>Tue, 27 Oct 2009 19:41:47 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Data integration and middleware]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[eBay]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1180</guid>
		<description><![CDATA[As the pun goes, Teradata&#8217;s cloud strategy is – well, it&#8217;s somewhat nebulous. More precisely, for the foreseeable future, Teradata&#8217;s cloud strategy is a collection of rather disjointed parts, including:

What Teradata calls the Teradata 	 Agile Analytics Cloud, which is a combination of previously 	existing technology plus one new portlet called the Teradata 	Elastic Mart(s) [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">As the pun goes, Teradata&#8217;s cloud strategy is – well, it&#8217;s somewhat nebulous. More precisely, for the foreseeable future, Teradata&#8217;s cloud strategy is a collection of rather disjointed parts, including:</p>
<ul>
<li>What Teradata calls the <em>Teradata 	 Agile Analytics Cloud, </em>which is a combination of previously 	existing technology plus one new portlet called the <em>Teradata 	Elastic Mart(s) Builder.</em> (Teradata&#8217;s <em>Elastic Mart(s) Builder 	Viewpoint</em><span style="font-style: normal;"> portlet is avail</span>able 	for <span style="font-style: normal;">download from <a href="../2009/05/26/teradata-developer-exchange-devx-begins-to-emerge/">Teradata&#8217;s 	Developer Exchange</a>.)</span></li>
<li><em>Teradata Data Mover 2.0,</em> coming “Soon”, which will ease copying (ETL without any 	significant “T”) from one Teradata system to another.</li>
<li><em>Teradata Express</em> DBMS 	crippleware (1 terabyte only, no production use), now available on 	Amazon EC2 and VMware. (I don&#8217;t see where this has much connection to the rest of Teradata&#8217;s cloud strategy, except insofar as it serves to fill out a slide.)</li>
<li>Unannounced (and so far as I can 	tell largely undesigned) future products.</li>
</ul>
<p style="margin-bottom: 0in;">Teradata openly admits that its direction is heavily influenced by Oliver Ratzesberger at <a href="../2009/04/30/ebays-two-enormous-data-warehouses/">eBay</a>. Like Teradata, Oliver and eBay favor virtual data marts over physical ones. That is, Oliver and eBay believe that the ideal scenario is that every piece of data is only stored once, in an integrated Teradata warehouse. But eBay believes and Teradata increasingly agrees that users need a great deal of control over their use of this data, including the ability to import additional data into private sandboxes, and join it to the warehouse data already there.<span id="more-1180"></span></p>
<p style="margin-bottom: 0in;">The <em>Teradata Elastic Mart(s) Builder Viewpoint</em> portlet automates the inclusion of outside data. If you&#8217;re already an authorized Teradata data warehouse user, you can fill in a very short form (three or so fields) and add authorization to import outside data, e.g. from a .CSV file. No fuss, little bother. Trivial as that sounds, when you combine it with Teradata&#8217;s pre-existing robust workload management tools, it creates a pretty good <em>virtual data mart</em> story.</p>
<p style="margin-bottom: 0in;">Spinning out and maintaining consistency with physical data marts is a different matter. Teradata doesn&#8217;t seem too sure it believes in those. And while Teradata is obviously planning to increase its capability in that regard anyway, I didn&#8217;t get a lot of detail beyond the reference to Data Mover 2.0.</p>
<p style="margin-bottom: 0in;"><em><strong>Related links</strong></em></p>
<ul>
<li>My Greenplum-inspired post on <a href="../2009/06/08/the-future-of-data-marts/">the 	future of data marts</a>, outlining issues in “private cloud” 	data warehousing.</li>
<li>eBay&#8217;s “<a href="http://www.xlmpp.com/articles/16-articles/39-analytics-as-a-service" onclick="javascript:pageTracker._trackPageview('/www.xlmpp.com');">Analytics 	as a Service</a>” pitch (about 1 ½ years old)</li>
<li><a href="http://developer.teradata.com/database/articles/what-is-the-teradata-agile-analytics-cloud" onclick="javascript:pageTracker._trackPageview('/developer.teradata.com');">A 	post by Teradata&#8217;s Dan Graham</a> explaining the <em>Teradata Agile 	Analytics Cloud</em><span style="font-style: normal;"> and </span><em>Elastic 	Mart(s) Builder Viewpoint</em> portlet</li>
<li>Home page and complete screen shot 	for the <a href="http://developer.teradata.com/download/viewpoint/elastic-marts-builder" onclick="javascript:pageTracker._trackPageview('/developer.teradata.com');"><em>Teradata 	Elastic Mart(s) Builder Viewpoint</em> portlet</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/27/teradatas-nebulous-cloud-strategy/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>I have some presentations coming up (all on October Thursdays)</title>
		<link>http://www.dbms2.com/2009/10/09/presentations-upcoming/</link>
		<comments>http://www.dbms2.com/2009/10/09/presentations-upcoming/#comments</comments>
		<pubDate>Fri, 09 Oct 2009 12:11:10 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Presentations]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1058</guid>
		<description><![CDATA[On Thursday, October 15, and two different times (10:00 am and 1:00 pm Eastern time), I&#8217;ll be giving a webinar for Aster Data on MapReduce. The content is very much work in progress, but it definitely will:

Be overviewy in nature
Emphasize SQL/MapReduce integration

Then, on the evening of Thursday, October 22, there&#8217;s something called the Boston Big [...]]]></description>
			<content:encoded><![CDATA[<p>On Thursday, October 15, and two different times (10:00 am and 1:00 pm Eastern time), I&#8217;ll be giving <a href="http://www.asterdata.com/masteringmapreduce/" onclick="javascript:pageTracker._trackPageview('/www.asterdata.com');">a webinar for Aster Data on MapReduce</a>. The content is very much work in progress, but it definitely will:</p>
<ul>
<li>Be overviewy in nature</li>
<li>Emphasize SQL/MapReduce integration</li>
</ul>
<p>Then, on the evening of Thursday, October 22, there&#8217;s something called the <a href="http://hypecycles.wordpress.com/2009/10/02/boston-big-data-summit/" onclick="javascript:pageTracker._trackPageview('/hypecycles.wordpress.com');">Boston Big Data Summit</a>, in Waltham, where &#8220;Big Data&#8221; evidently is to be construed as anything from a few terabytes on up.  (Things are smaller in the Northeast than in California &#8230;) It&#8217;s being put together by Amrith Kumar (who I don&#8217;t really know) and Bob Zurek (who everybody knows). This is the inaguaral meeting. It seems I&#8217;m both giving the keynote and running the subsequent panel, one of whose participants will be Ellen Rubin. <span id="more-1058"></span></p>
<p>Rather than prepare a slide deck,&#8217;ll just throw together some notes and talk from those, the old-fashioned way. I gather the audience will be mainly vendors.  If somebody tells me in time whether to expect more marketing or more technical folks, I&#8217;ll slant my presentation accordingly.</p>
<p>One ostensible theme for the night is cloud computing, but I don&#8217;t plan to stay focused in that area; it might be hard to spend 45 minutes on cloud computing without sounding like <a href="http://www.dbms2.com/2009/06/30/is-expressor-software-accomplishing-anything/" >the software salesman who keeps repeating how good it&#8217;s going to be</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/09/presentations-upcoming/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Hasso Plattner calls for in-memory OLTP column stores</title>
		<link>http://www.dbms2.com/2009/07/07/hasso-plattner-calls-for-in-memory-oltp-column-stores/</link>
		<comments>http://www.dbms2.com/2009/07/07/hasso-plattner-calls-for-in-memory-oltp-column-stores/#comments</comments>
		<pubDate>Wed, 08 Jul 2009 03:33:24 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[DBMS product categories]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=834</guid>
		<description><![CDATA[Former SAP CEO Hasso Plattner has written a paper called A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database, in association with a SIGMOD keynote address.* The approach Plattner advocates is an MPP in-memory column store, presumably somewhat akin to SAP&#8217;s frequently renamed Business Warehouse Accelerator/Business Intelligence Accelerator/BWA/BIA/Son-of-TREX technology. There also [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Former SAP CEO Hasso Plattner has writ<span style="font-style: normal;">ten a paper called <a href="http://www.sigmod09.org/images/sigmod1ktp-plattner.pdf" onclick="javascript:pageTracker._trackPageview('/www.sigmod09.org');">A </a></span><a href="http://www.sigmod09.org/images/sigmod1ktp-plattner.pdf" onclick="javascript:pageTracker._trackPageview('/www.sigmod09.org');"><em><span style="font-style: normal;">Common Database Approach</span></em><span style="font-style: normal;"> for OLTP and OLAP Using an In-Memory Column </span><em><span style="font-style: normal;">Database</span></em></a><em><span style="font-style: normal;">, in association with a SIGMOD keynote address.* </span></em>The approach Plattner advocates is an MPP in-memory column store, presumably somewhat akin to SAP&#8217;s frequently renamed <a href="../2006/09/20/saps-bi-accelerator/">Business Warehouse Accelerator/Business Intelligence Accelerator/BWA/BIA/Son-of-TREX</a> technology. There also are strong similarities to the MPP in-memory row store pr<span style="font-style: normal;">oject <a href="http://www.dbms2.com/2008/02/18/mike-stonebraker-calls-for-the-complete-destruction-of-the-old-dbms-order/" >H-Store</a>/<a href="http://www.dbms2.com/2009/06/22/h-store-horizontica-voltdb/" >VoltDB</a>, although I don&#8217;t know whether Plattner would go so far as to adopt the H-Store view that </span><em>all</em><span style="font-style: normal;"> transactions should run in stored procedures.</span> Unsurprisingly, SAP applications are used as the OLTP paradigm throughout.</p>
<p style="margin-bottom: 0in;"><em>*Thanks to <a href="http://marklogic.blogspot.com/2007/02/best-of-mark-logic-ceo-blog.html" onclick="javascript:pageTracker._trackPageview('/marklogic.blogspot.com');">Dave Kellogg</a> for tipping me off to Plattner&#8217;s paper.  I only went to two SIGMOD sessions, neither of which was Plattner&#8217;s. Nobody actually mentioned Plattner&#8217;s talk to me when I was down at SIGMOD.</em></p>
<p style="margin-bottom: 0in;">Perhaps the most interesting part is Plattner&#8217;s claim that <strong>what&#8217;s demanding about OLTP</strong> isn&#8217;t database updating <em>per se,</em> but rather <strong>maintaining aggregates</strong> for quick-response analytics. In his main example of that point, Plattner proposes a real-life &#8220;more than 18&#8243; table schema, of which 2 are base tables, and (most of?) the rest are materialized views that his proposed database architecture dispenses with (because analytic performance is sufficiently good without them).  Thus, Plattner&#8217;s core columnar argument seemingly is</p>
<p style="margin-bottom: 0in;"><strong><em>columnar &#8211;&gt; natively fast analytics &#8211;&gt; no need to maintain aggregates &#8211;&gt; much lower update burden.</em></strong></p>
<p style="margin-bottom: 0in;">That said &#8212; if Plattner&#8217;s paper contained a clear statement of how much more expensive it is to insert or update a single row in a columnar vs. row-based system, I overlooked it. Instead, Plattner seems to be arguing that the volume of base-table updates is low enough that &#8212; whatever it may be &#8212; column-store update overhead is an acceptable price to pay.  (At one point he claims that only 5% of the data inserted in a financial application ever gets changed.) That may actually be true in a financial accounting system, but seems more questionable in a sufficiently large application that gets its updates from automatic devices, or from the consumer web.</p>
<p style="margin-bottom: 0in;">Other highlights include:<span id="more-834"></span></p>
<ul>
<li>Like most modern observers, 	Plattner believes <strong>Postgres-style timestamping</strong> beats 	update-in-place.</li>
<li>Plattner also offers a less common 	reason for liking timestamped inserts over updates-in-place &#8212; he 	thinks <strong>timestamps are helpful in planning-oriented applications.</strong> In 	particular, he wants timestamp-aware SQL extensions.</li>
<li>Plattner claims columnar designs 	have a 10:1 <strong>compression</strong> advantage over row stores &#8212; specifically 	20X vs. 2X &#8212; at least using compression schemes that allow for 	updating at reasonable speed.  That seems exaggerated.</li>
<li>Plattner seemed to drop various 	references to memory-centric structures SAP already uses. (SAP 	has long done a lot in-memory, in both the OLTP and planning 	areas.  Years ago SAP told me of a customer that was buying &gt;1 TB 	of RAM just to run SAP&#8217;s planning software.  SAP also bragged that 	&gt;99% of transactions never hit disk, in some sense of 	&#8220;transaction&#8221;. )</li>
<li>There are lots of references to 	&#8220;tenants&#8221;, SaaS, and/or SAP&#8217;s SaaS product line.  So <strong>SaaS is evidently a design point.</strong> That makes sense. First, SaaS is one of SAP&#8217;s biggest vulnerabilities. Second, the toughest 	customization a SaaS customer might want is to add a few columns to 	standard tables, which might be easier to accomodate with a columnar approach.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/07/07/hasso-plattner-calls-for-in-memory-oltp-column-stores/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Sneakernet to the cloud</title>
		<link>http://www.dbms2.com/2009/05/29/sneakernet-to-the-cloud/</link>
		<comments>http://www.dbms2.com/2009/05/29/sneakernet-to-the-cloud/#comments</comments>
		<pubDate>Sat, 30 May 2009 03:06:04 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Amazon and its cloud]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=793</guid>
		<description><![CDATA[Recently, Amazon CTO Werner Vogels put up a blog post which suggested that, now and in the future, the best way to get large databases into the cloud is via sneakernet.  In some circumstances, he is surely right. Possible implications include:

When sending data to the cloud, you probably want to compress it to the max [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, Amazon CTO Werner Vogels put up a blog post which suggested that, now and in the future, <a href="http://www.allthingsdistributed.com/2009/05/amazon_import_export.html" onclick="javascript:pageTracker._trackPageview('/www.allthingsdistributed.com');">the best way to get large databases into the cloud is via sneakernet</a>.  In some circumstances, he is surely right. Possible implications include:</p>
<ul>
<li>When sending data to the cloud, you probably want to <strong>compress</strong> it to the max before sending. <a href="http://www.dbms2.com/2009/05/14/the-secret-sauce-to-clearpaces-compression/" >Clearpace&#8217;s</a> new <a href="http://www.rainstor.com/" onclick="javascript:pageTracker._trackPageview('/www.rainstor.com');">RainStor</a> structured-data archiving service emphasizes that idea. RainStor marketing says cloud, cloud, cloud &#8212; but Clearpace thinks you really should have a bit of its software onsite too, to compress the data before sending it across the wire.</li>
<li><strong>Getting data from one cloud to another cloud could be problematic.</strong> I&#8217;m fond of saying that weblog data naturally lives in the cloud at your hosting company&#8217;s location, so you should analyze it there too. But this makes the most sense if you analyze it or at least filter/reduce it in place.  (That said, the really, really big web companies have lots of different data centers, and presumably do move huge amounts of log data from place to place.)</li>
</ul>
<p>But for one-time moves of data sets &#8212; sure, sneaker net/snail mail should work just fine.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/05/29/sneakernet-to-the-cloud/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Amazon Elastic MapReduce</title>
		<link>http://www.dbms2.com/2009/04/03/amazon-elastic-mapreduce/</link>
		<comments>http://www.dbms2.com/2009/04/03/amazon-elastic-mapreduce/#comments</comments>
		<pubDate>Fri, 03 Apr 2009 08:57:56 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Amazon and its cloud]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[MapReduce]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=744</guid>
		<description><![CDATA[Amazon is introducing a beta of Amazon Elastic MapReduce.  What it boils down to is cheap, on-demand Hadoop.
This seems like a great way to experiment with MapReduce and see if you like it. But for serious use, I don&#8217;t know why you wouldn&#8217;t prefer MapReduce more closely integrated into a DBMS.
]]></description>
			<content:encoded><![CDATA[<p>Amazon is introducing a beta of <a href="http://aws.amazon.com/elasticmapreduce/" onclick="javascript:pageTracker._trackPageview('/aws.amazon.com');">Amazon Elastic MapReduce</a>.  What it boils down to is cheap, on-demand Hadoop.</p>
<p>This seems like a great way to experiment with MapReduce and see if you like it. But for serious use, I don&#8217;t know why you wouldn&#8217;t prefer MapReduce <a href="http://www.dbms2.com/2008/09/05/three-different-implementations-of-mapreduce/" >more closely integrated into a DBMS</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/04/03/amazon-elastic-mapreduce/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
