<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Analytic technologies</title>
	<atom:link href="http://www.dbms2.com/category/analytics-technologies/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 02 Sep 2010 09:06:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The Workday architecture &#8212; a new kind of OLTP software stack</title>
		<link>http://www.dbms2.com/2010/08/22/workday-technology-stack/</link>
		<comments>http://www.dbms2.com/2010/08/22/workday-technology-stack/#comments</comments>
		<pubDate>Sun, 22 Aug 2010 10:20:08 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data integration and middleware]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Object]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Workday]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2865</guid>
		<description><![CDATA[One of my coolest company visits in some time was to  SaaS  (Software as a Service) vendor Workday, Inc., earlier this month. Reasons included:

Workday has 	forward-thinking ideas about SaaS enterprise 	applications and the integration of business intelligence into same.
Workday has highly 	innovative ideas in how it manages data.
Companies founded by 	Dave Duffield tend [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><span style="font-size: small;">One of my coolest company visits in some time was to </span><span style="font-size: small;"> SaaS  (Software as a Service) vendor</span><span style="font-size: small;"> Workday, Inc., earlier this month. Reasons included:</span></p>
<ul>
<li><span style="font-size: small;">Workday has 	forward-thinking ideas about SaaS enterprise 	applications and the integration of business intelligence into same.</span></li>
<li><span style="font-size: small;">Workday has highly 	innovative ideas in how it manages data.</span></li>
<li><span style="font-size: small;">Companies founded by 	Dave Duffield tend to feature smart, likeable people who talk to one</span><span style="font-size: small;"><span style="font-style: normal;"> pleasantly and forthrightly. Workday is no exception; CTO Stan Swete 	and the other Workday folks present were a delight to talk with.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">I&#8217;d 	invited Merv Adrian to come along with me. He asked great questions, 	and I could gather myself a bit despite how sleep-deprived I was for 	the first part of that trip.</span></span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Workday kindly allowed me to post this </span></span><span style="font-size: small;"><a href="http://www.monash.com/uploads/Workday-August-2010.ppt" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">Workday slide deck</a>.</span><span style="font-size: small;"><span style="font-style: normal;"> Otherwise, I&#8217;ve split out a quick </span></span><a href="http://www.dbms2.com/2010/08/22/workday-inc-company-overview/" ><span style="font-size: small;">Workday, Inc. company overview</span></a><span style="font-size: small;"><span style="font-style: normal;"> into a separate post.</span></span></p>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">The biggie for me was the data and object management part. Specifically:  <span id="more-2865"></span><br />
</span></span></p>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;"><strong>Workday&#8217;s 	applications run entirely in-memory,</strong></span></span><span style="font-size: small;"><span style="font-style: normal;"> in a highly object-oriented structure. Persistence is mainly for the 	sake of data safety …</span></span></li>
<li>… <span style="font-size: small;"><span style="font-style: normal;">but 	not entirely. In earlier releases, Workday kept absolutely 	everything in RAM. However, certain things are kept only on disk, 	such as:</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Audit 	files.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Certain 	documents (notably resumes).</span></span></li>
</ul>
</li>
<li><strong><span style="font-size: small;"><span style="font-style: normal;">Workday&#8217;s 	whole database</span></span></strong><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;"> – data and metadata alike – is persisted to disk in </span></span></span><strong><span style="font-size: small;"><span style="font-style: normal;">&lt;10 	MySQL/InnoDB tables. </span></span></strong><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">MySQL 	is basically just being used as a </span></span></span><strong><span style="font-size: small;"><span style="font-style: normal;">key-value 	store, </span></span></strong><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">albeit 	one with </span></span></span><strong><span style="font-size: small;"><span style="font-style: normal;">ACID 	transactional support. </span></span></strong>
<ul>
<li><span style="font-size: small;">There <span style="font-weight: normal;">are </span><strong>3 main tables: attributes, relationships, instances.</strong></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">When 	I suggested this might be like an entity-attribute-value model, 	Workday said it would be even better to think in terms of</span><span style="font-style: normal;"><strong> instanceID-attribute-value.</strong></span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">As 	you might expect for a database that simple, its schema doesn&#8217;t 	change much.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">By 	way of comparison, Workday estimates that if its software were 	written relationally, </span></span></span><span style="font-size: small;"><span style="font-style: normal;">there 	would b</span></span><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">e </span></span></span><span style="font-size: small;"><span style="font-weight: normal;"><a href="http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/" >1000s 	of tables</a>,</span></span><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;"> which</span></span></span><span style="font-size: small;"><span style="font-style: normal;"> would take up 10-100X as much disk space. </span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;">All 	write transactions are banged immediately into the MySQL database. 	I.e., RAM and disk are never allowed to get out of sync.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday&#8217;s 	database is append-only. This is exploited for effective dating 	(pretty heavily, it seems, perhaps because that&#8217;s a useful concept 	in human resources) and snapshotted reporting.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday&#8217;s 	built-in BI doesn&#8217;t have a lot of choice but to do scans, traversing 	the object model. This turns out to be fast enough.</span></span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-size: small;">Other notes on Workday&#8217;s data and object management strategy include:</span></p>
<ul>
<li><span style="font-size: small;">Workday is 	object-oriented through and through – no object-relational mapping 	&#8211; <a href="http://en.wikipedia.org/wiki/Turtles_all_the_way_down" onclick="javascript:pageTracker._trackPageview('/en.wikipedia.org');">turtles 	all the way down</a>. On average, a class has about 2 attributes.</span></li>
<li><span style="font-size: small;">94% of requests are 	reads, traversing the object hierarchy.</span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	databases are pretty small.</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">The 	biggest database Workday supports uses 17 gigabytes of RAM. </span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	databases are much smaller on disk than in RAM.</span></span></li>
</ul>
</li>
<li><span style="font-size: small;">Workday&#8217;s “dream” 	is to move from disk to solid-state memory. </span></li>
<li><span style="font-size: small;">Workday uses GPLed 	MySQL/InnoDB. So there&#8217;s no software license reason to ever move 	away (e.g., to a pure key-value store).</span></li>
<li><span style="font-size: small;">Disaster recove</span><span style="font-size: small;"><span style="font-style: normal;">ry 	is based on local and remote MySQL slaves. </span></span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Obviously, serious apps have been built before in object-oriented and/or key-value ways, with the resulting objects then being banged to disk (or in some cases kept in memory). Examples include:</span></span></p>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Numerous 	applications are built on <a href="../2010/01/15/intersystems-cache-highlights/">object-oriented 	DBMS</a>. Generally they go against disk, although <a href="../2005/11/14/defining-and-surveying-memory-centric-data-management/">memory-centric 	implementations can save a lot of pointer-chasing</a>. Often they&#8217;re 	queried via SQL.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Basho&#8217;s 	website says that its key-value store Riak was originally conceived 	in connection with a planned salesforce automation product, but I 	don&#8217;t think that the application part of that plan ever got built. </span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">SAP 	has <a href="../2005/12/09/36/">longstanding</a> doubts about relational dogma, although not nearly to Workday&#8217;s 	extreme.</span></span></li>
<li><span style="font-size: small;">Obviously, 	some major internet applications just bang data into key-value 	stores.</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Still, perhaps because it wholly object-oriented yet doesn&#8217;t even bother with anything like a real object-oriented DBMS, Workday&#8217;s approach seems particularly cool. </span></span></p>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Other highlights of Workday, Inc.&#8217;s technical story include:</span></span></p>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	has settled into a schedule of three releases per year, and has 	pretty much lived up to that for &gt;2 years.</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Every 	user is always on the latest Workday release.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">You 	can delay turning on significant new Workday software functionality 	if you want to.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Pure 	UI changes to the Workday software are handled much as they are on 	various websites today. Sometimes you have no choice but to live 	with them; sometimes the prior version of the UI remains available 	to you for a while.</span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday&#8217;s 	navigational approaches look pretty cool.</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">The 	core concept is a list of actions you can perform now, rather than 	more standard menus.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Roles/permissions 	are of course central to this.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Reports 	have lots of actionable links in them. (More than just drilldown, 	although specific examples have slipped my memory.)</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Alternatively, 	you can navigate via a search box, searching both on names of 	objects (e.g. users, divisions) or on names of tasks. This is 	somewhat reminiscent of <a href="http://www.texttechnologies.com/2007/02/28/sap%E2%80%99s-%E2%80%9Csearch%E2%80%9D-strategy-isn%E2%80%99t-about-search/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">an 	approach SAP was considering a few years ago</a>.</span></span></li>
</ul>
</li>
<li><span style="font-size: small;">Workday says it has 	four key design premises:</span>
<ul>
<li><span style="font-size: small;"><em>Web-Familiar 	Experience.</em> I&#8217;d say that&#8217;s true to to the extent it makes sense. 	In many ways, the web needs to catch up to Workday.</span></li>
<li><span style="font-size: small;"><em>Enterprise 	Reporting.</em> The idea is that you get a report, then take actions 	based on it. Hence the report-centric options for navigation.</span></li>
<li><span style="font-size: small;"><em>Integration 	On-Demand.</em> That&#8217;s a fancy way of saying “Plays nicely with 	others.”</span></li>
<li><span style="font-size: small;"><em>Configurable 	Business Processes.</em><span style="font-style: normal;"> Duh. That&#8217;s 	pretty essential if you want to do serious SaaS applications.</span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	maintains a strong separation between application logic and UI 	development. Developer do no screen layouts. Instead, Uis are 	automatically generated for:</span></span>
<ul>
<li><span style="font-size: small;">Flash/FLEX</span></li>
<li><span style="font-size: small;">iPhone</span></li>
<li><span style="font-size: small;">Mobile HTML</span></li>
<li><span style="font-size: small;">PDF export</span></li>
<li><span style="font-size: small;">Excel export</span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	only talks to the outside world via web services.</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	is heavily </span></span><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">into 	SOAP (Simple Object Access Protocol). </span></span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">The 	acquisition of OEM partner CapeClear gave Workday an Integration 	Service (i.e., enterprise service bus) that translates SOAP into 	whatever else might be needed for integration, and also does 	reliable delivery. </span></span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">All 	that said, Stan Swete sees integration among various SaaS offerings 	as an area needing significant future attention.</span></span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Workday&#8217;s 	business intelligence ideas are interesting, but I think there&#8217;s a 	long way for that technology still to go.</span></span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Workday&#8217;s 	BI seems to be focused on report/drilldown kinds of functionality.</span></span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">You 	can slice by up to 2 dimensions at once.</span></span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Then 	you can keep slicing, however, by more dimensions, as many times as 	you like.</span></span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">While 	you can take actions straight from reports, some of the specific 	BI/app integration ideas we discussed are still futures. (E.g., 	analyzing spend at the time of expense report data entry or 	approval.)</span></span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Of 	course, Workday&#8217;s web services interface lets you export Workday 	data into 3rd-party tools. Indeed, if you want to integrate data 	from Workday and some other source(s), that&#8217;s your only choice.</span></span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Workday 	offers a clever metaphor to illustrate that your data may be more 	secure offsite than on – the bank vault. (I have no idea whether 	that&#8217;s a SaaS industry standard, but I hadn&#8217;t heard it before.) Of 	course, that metaphor does beg some issues specific to the remote 	data case, such as:</span></span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">When 	your data is on premises, you know whether the government has 	insisted on looking at it.</span></span></span></li>
<li><span style="font-size: small;">More than cash, data keeps traveling back and forth to 	the remote location, which creates at least a theoretical risk of 	interception.</span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Workday 	says the toughest part of globalization is the issue of which 	personal data is or is not maintained. For example, in the US you&#8217;re 	not allowed to not ask a job applicant&#8217;s religion, but in the UK 	you&#8217;re not only permitted but indeed required to.</span></span></span></li>
</ul>
<p><em><strong>This post is part of a three-post series</strong></em></p>
<ul>
<li><a href="http://www.dbms2.com/2010/08/22/workday-inc-company-overview/" >Workday Inc. company overview</a> (brief)</li>
<li><a href="http://www.dbms2.com/2010/08/22/workday-technology-stack/" >Workday Inc. technology overview</a> (detailed)</li>
<li>Workday Inc. CTO Stan Swete&#8217;s <a href="http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/" >comments on database strategy</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/22/workday-technology-stack/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The substance of Pentaho&#8217;s Hadoop strategy</title>
		<link>http://www.dbms2.com/2010/08/21/the-substance-of-pentahos-hadoop-strategy/</link>
		<comments>http://www.dbms2.com/2010/08/21/the-substance-of-pentahos-hadoop-strategy/#comments</comments>
		<pubDate>Sat, 21 Aug 2010 06:40:29 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Pentaho]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2848</guid>
		<description><![CDATA[Pentaho has been talking about a Hadoop-related strategy. Unfortunately, in support of its Hadoop efforts, Pentaho has been &#8212; quite insistently &#8212; saying things that don&#8217;t make a lot of sense to people who know anything about Hadoop.
That said, I think I found four sensible points in Pentaho&#8217;s Hadoop strategy, namely:

If you use an ETL [...]]]></description>
			<content:encoded><![CDATA[<p>Pentaho has been talking about a Hadoop-related strategy. Unfortunately, in support of its Hadoop efforts, Pentaho has been &#8212; quite insistently &#8212; saying things that don&#8217;t make a lot of sense to people who know anything about Hadoop.</p>
<p>That said, I think I found four sensible points in Pentaho&#8217;s Hadoop strategy, namely:</p>
<ol>
<li>If you use an ETL tool like Pentaho&#8217;s to move things in and out of HDFS, you may be able to orchestrate two more steps in the ETL process than if you used Hadoop&#8217;s native orchestration tools.</li>
<li>A lot of what you want to do in MapReduce is things that can be graphically specified in an ETL tool like Pentaho&#8217;s. (That would include tokenization or regex.)</li>
<li>If you have some really lightweight BI requirements (ad hoc, reporting, or whatever) against HDFS data, you might be content to do it straight against HDFS, rather than moving the data into a real DBMS. If so, BI tools like Pentaho&#8217;s might be useful.</li>
<li>Somebody might want to use a screwy version of MapReduce, where by &#8220;screwy&#8221; I mean anything that isn&#8217;t <a href="http://www.dbms2.com/2010/06/30/cloudera-enterprise-hadoop-evolution/" >Cloudera Enterprise</a>, <a href="http://www.dbms2.com/2009/12/02/mapreduce-for-complex-analytics-webina/" >Aster Data SQL/MapReduce</a>, or some other implementation/distribution with a lot of supporting tools. In that case, they might need all the tools they can get.</li>
</ol>
<p>The first of those points is, in the grand scheme of things, pretty trivial.</p>
<p>The third one makes sense. While Hadoop&#8217;s Hive client means you could roll your own integration with your own favorite BI tool in any case, having somebody certify it for you themselves could be nice. So if Pentaho ships something that works before other vendors do, good on them. (Target date seems to be October.)</p>
<p>The fourth one is kind of sad.</p>
<p>But if there&#8217;s any shovel-meet-pony aspect to all this &#8212; or indeed a reason for writing this blog post &#8212; it would be the second point. If one understands data management, but is in the &#8220;Oh no! Hadoop wants me to PROGRAM!&#8221; crowd, then being able to specify one&#8217;s MapReduce might be a really nice alternative versus having to actually code it.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/21/the-substance-of-pentahos-hadoop-strategy/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>DB2 workload management</title>
		<link>http://www.dbms2.com/2010/08/18/ibm-db2-workload-management/</link>
		<comments>http://www.dbms2.com/2010/08/18/ibm-db2-workload-management/#comments</comments>
		<pubDate>Wed, 18 Aug 2010 08:47:09 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2819</guid>
		<description><![CDATA[DB2 has added a lot of workload management features in recent releases. So when we talked Tuesday afternoon, Tim Vincent and I didn&#8217;t bother going through every one. Even so, we covered some interesting subjects in the area of DB2 workload management, including:  

If your goal is to keep a certain 	class of queries from [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><a href="../2009/04/24/some-db2-highlights/">DB2 has added a lot of workload management features in recent releases</a>. So when we talked Tuesday afternoon, Tim Vincent and I didn&#8217;t bother going through every one. Even so, we covered some interesting subjects in the area of DB2 workload management, including:  <span id="more-2819"></span></p>
<ul>
<li>If your goal is to keep a certain 	class of queries from taking too many resources, Tim thinks a great 	way of doing that is to control how many of them are allowed to run 	concurrently.</li>
<li>By way of contrast, Tim is 	cautious about the common approach of just lowering a query&#8217;s 	priority. His concern is that a long-running query could linger even 	longer, creating a long-lasting bottleneck in, for example, <a href="http://www.dbms2.com/2010/08/18/more-on-temp-space-compression-and-random-io/" >temp 	space</a>.</li>
<li>When running over (I believe) 	Linux and AIX, DB2 workload management is integrated with operating 	system workload management. I.e., the same “service class” or 	“workload class” (at a guess, the former is the official term 	and the latter is the term that makes sense) of queries and 	associated processes gets the same treatment in both DB2 and the OS.</li>
<li>DB2&#8217;s workload management extends 	to buffer pools, to inhibit low-priority queries from evicting a 	higher-priority query&#8217;s data from cache.</li>
<li>Sometimes, workload management 	doesn&#8217;t throttle a query, but just decides to collect stats for 	future analysis. (This is on the eminently reasonably theory that 	the best stats to collect are the ones that are live when  	performance problems are actually occurring.)</li>
</ul>
<p style="margin-bottom: 0in;">Finally, Tim spoke of what I regard as the weirdest workload management requirement, one I also heard about from <a href="http://www.dbms2.com/2009/07/18/netezza-on-concurrency-and-workload-management/" >Netezza</a> <span style="font-style: normal;">(but didn&#8217;t explicitly mention) in</span> June. Sometimes, it seems, you simply don&#8217;t want queries to finish too fast. Why? Because if you give great performance when the machine is lightly loaded, then business users might expect that performance too when the machine is heavily loaded and you can&#8217;t deliver it. Apparently, in some environments it&#8217;s better to never deliver great query performance than it is to do so only inconsistently.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/18/ibm-db2-workload-management/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>More on temp space, compression, and &#8220;random&#8221; I/O</title>
		<link>http://www.dbms2.com/2010/08/18/more-on-temp-space-compression-and-random-io/</link>
		<comments>http://www.dbms2.com/2010/08/18/more-on-temp-space-compression-and-random-io/#comments</comments>
		<pubDate>Wed, 18 Aug 2010 05:44:59 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2805</guid>
		<description><![CDATA[My PhD was in a probability-related area of mathematics (game theory), so I tend to squirm when something is described as &#8220;random&#8221; that clearly is not. That said, a comment by Shilpa Lawande on our recent Flash/temp space discussion suggests the following way of framing a key point:

You really, really want to have multiple data [...]]]></description>
			<content:encoded><![CDATA[<p>My PhD was in a probability-related area of mathematics (game theory), so I tend to squirm when something is described as &#8220;random&#8221; that clearly is not. That said, <a href="http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/#comment-181134" >a comment by Shilpa Lawande</a> on our recent <a href="http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/" >Flash/temp space discussion</a> suggests the following way of framing a key point:</p>
<ul>
<li>You really, really want to have multiple data streams coming out of temp space, as close to simultaneously as possible.</li>
<li>The storage performance characteristics of such a workload are more reminiscent of &#8220;random&#8221; than &#8220;sequential&#8221; I/O.</li>
</ul>
<p>If everybody else is cool with it too, I can live with that. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Meanwhile, I talked again with Tim Vincent of IBM this afternoon. Tim endorsed the temp space/Flash fit, but with a different emphasis, which upon review I find I don&#8217;t really understand. The idea is:</p>
<ul>
<li>Analytic DBMS processing generally stresses reads over writes.</li>
<li>Temp space is an exception &#8212; read and write use of temp space is pretty balanced. (You spool data out once, you read it back in once, and that&#8217;s the end of that; next time it will be overwritten.)</li>
</ul>
<p>My problem with that is: Flash typically has lower write than read IOPS (I/O per second), so being (relatively) write-intensive would, to a first approximation, seem if anything to disfavor a workload for Flash.</p>
<p>On the plus side, I was reminded of something I should have noted when I wrote about <a href="http://www.dbms2.com/2010/06/21/netezza-ibm-db2-compression/" >DB2 compression</a> before:</p>
<p>Much like Vertica, <strong>DB2 operates on compressed data all the way through, including in temp space. </strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/18/more-on-temp-space-compression-and-random-io/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Vertica&#8217;s innovative architecture for Flash, plus more about temp space than you perhaps wanted to know</title>
		<link>http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/</link>
		<comments>http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/#comments</comments>
		<pubDate>Mon, 16 Aug 2010 08:07:33 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2788</guid>
		<description><![CDATA[Vertica is announcing:

Technology it already has 	released*, but has not published any reference architectures 	for
A 	Barney partnership**

In other words, Vertica has succumbed to the common delusion that it&#8217;s a good idea to put out half-baked press releases the week of TDWI conferences. But if we look past that kind of all-too-common nonsense, Vertica is highlighting [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Vertica is announcing:</p>
<ul>
<li>Technology it already has 	released*, but has not published any reference architectures 	for</li>
<li><span style="font-style: normal;">A 	<a href="http://www.strategicmessaging.com/barney-partnerships/2010/08/12/" onclick="javascript:pageTracker._trackPageview('/www.strategicmessaging.com');">Barney</a> partnership**</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">In other words, Vertica has succumbed to the common delusion that it&#8217;s a good idea to put out half-baked press releases the week of TDWI conferences. </span>But if we look past that kind of all-too-common nonsens<span style="font-weight: normal;">e, Vertica is highlighting an interesting technical story, about </span><strong>how the analytic DBMS industry can exploit solid-state memory technology.</strong></p>
<p style="margin-bottom: 0in;"><em>*Upgrades to <a href="../2009/08/04/flexstore-and-the-rest-of-vertica-35/">Vertica FlexStore</a> to handle Flash memory, actually released as part of <a href="../2010/02/22/vertica-4/">Vertica 4.0</a></em></p>
<p style="margin-bottom: 0in;"><em>** With Fusion I/O</em></p>
<p style="margin-bottom: 0in;">To set the context, let&#8217;s recall a few points I&#8217;ve noted in the past:</p>
<ul>
<li><a href="../2010/01/31/flash-pcmsolid-state-memory-disk/">Solid-state 	memory&#8217;s price/throughput tradeoffs obviously make it the future of 	database storage</a>.</li>
<li><a href="../2010/06/25/flash-is-coming-well/">The 	Flash future is coming soon</a>, in part because Flash&#8217;s propensity 	to wear out is overstated. This is especially true in the case of 	modern analytic DBMS, which tend to write to blocks all at once, and 	most particularly the case for append-only systems such as Vertica.</li>
<li><a href="../2010/08/12/teradata-future-product-strategy/">Being 	able to intelligently split databases among various cost tiers of 	storage – e.g. Flash and disk – makes a whole lot of sense</a>.</li>
</ul>
<p style="margin-bottom: 0in;">Taken together, those points tell us:</p>
<p style="margin-bottom: 0in;"><strong>For optimal price/performance, analytic DBMS should support databases that run part on Flash, part on disk.</strong></p>
<p style="margin-bottom: 0in;">While all this is a future for some other analytic DBMS vendors, Vertica is shipping it today.* What&#8217;s more, three aspects of Vertica&#8217;s architecture make it particularly well-suited for hybrid Flash/disk storage, in each case for a similar reason – you can get most of the performance benefit of all-Flash for a relatively low actual investment in Flash chips:  <span id="more-2788"></span></p>
<ul>
<li><strong>Vertica lets you split tables 	by column, </strong><span style="font-weight: normal;">and Vertica 	FlexStore is versatile enough to let you put only the most-used 	columns in Flash. (Vertica offers a figure that 85% of usage calls 	on only 15% of columns, but I don&#8217;t know how rigorously grounded 	those numbers are.)</span></li>
<li>To the extent that Vertica data is<span style="font-weight: normal;"> <a href="../2008/09/24/vertica-finally-spells-out-its-compression-claims/">more </a></span><a href="../2008/09/24/vertica-finally-spells-out-its-compression-claims/">compressed</a> than many of Vertica&#8217;s competitors&#8217; (which it probably is, debates 	over the magnitude of Vertica&#8217;s advantage notwithstanding), the 	total storage-hardware cost of sticking stuff in Flash is less when 	you use Vertica than with other systems.</li>
<li>Vertica has <span style="font-weight: normal;">relatively 	less need for </span><strong>temp space</strong> than some other systems. 	(Vertica uses figures of &lt;20% of total storage, vs. 30%+ for some 	other systems.) If you want to use Flash for temp space, so as to 	accelerate your toughest queries, that can save you some cash …</li>
<li>… and by the way, <strong>temp space 	is an especially good use of Flash, </strong>because <strong>temp space is 	accessed in a less sequential manner than data storage is.</strong></li>
</ul>
<p style="margin-bottom: 0in;">The least obvious of those points are about temp space; I only understood the particulars when Vertica development chief Shilpa Lawande explained them to me Thursday.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><em>* At least in theory; customer adoption may be a different matter.</em></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">But before drilling down on temp space, let me first note that there&#8217;s one offsetting factor to all those “We need somewhat less Flash than the other guys” Vertica advantages. Like all serious databases, a Vertica installation keeps two or more copies of all data, to that there&#8217;s no storage single point of failure. In a flexible system like Vertica, you can put one copy on Flash and one on disk. But if you do that in Vertica, you forgo fully exploiting one possible benefit of Vertica&#8217;s architecture – the ability to store different copies of a column in different orders, which are beneficial for accelerating different groups of queries.*</p>
<p style="margin-bottom: 0in;"><em>*More precisely, you don&#8217;t get the full benefits of Flash acceleration for every query touching those columns.</em></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">OK. Back to temp space. There are four kinds of things you can put in storage if you&#8217;re running a database management system:</p>
<ul>
<li>The <strong>software</strong> itself.</li>
<li><span style="font-weight: normal;">Persistent </span><strong>data. </strong><span style="font-weight: normal;">(I.e., tables, 	if the DBMS you&#8217;re running is relational.)</span></li>
<li><strong>Metadata,</strong> especially the 	kind that lets you find data &#8211;<strong> indexes,</strong> zone maps, catalogs, 	etc.</li>
<li><strong>Temporary data constructs</strong> built as part of, say, a s<span style="font-weight: normal;">ort-merge 	join. These, by definition, are what populate temp space.</span></li>
</ul>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">Just to be clear, those constructs are NOT temporary tables of the sort created by, say, Microstrategy; such tables are handled like any other data. Rather, they are ephemeral creat<span style="font-weight: normal;">ions and, so far as I can tell, not tables at all. </span></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">Vertica offered two theories as to why its DBMS requires less temp space than competitors do:</p>
<ul>
<li>To the extent data is decompressed 	before being operated on in memory by the DBMS, that decompression 	would of course also apply to temp space as well. Vertica prides 	itself on <strong>keeping data compressed</strong> all the way through, and 	seems to get away with smaller temp space allocations as a benefit.</li>
<li>Since Vertica can store columns in 	expedient sort orders, it does less sorting overall, and sorting is 	a big use of temp space.</li>
</ul>
<p style="margin-bottom: 0in;">Obviously, no matter which DBMS you use, the amount of temp space you need is surely workload-dependent. Even so, Vertica&#8217;s claim to something of an advantage seems legit.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><em>Truth be told, I&#8217;m not convinced the savings involved are great enough to </em>matter<em> a whole lot – but it&#8217;s a fun subject to think through. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </em></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">And finally: One of my biggest surprises since starting to look at analytic-DBMS-on-Flash has been the centrality of temp space. Talking to Vertica Thursday, I finally uncovered a key reason why: <strong>Temp space tends to be accessed via multiple streams of data at once.</strong> I&#8217;m still struggling with WHY that is true, with two reasons suggested being:</p>
<ul>
<li>Temp space can be accessed by 	multiple operations at once. (But isn&#8217;t that also true of the rest 	of storage?)</li>
<li>Merge sorts, a common use of temp 	space, read multiple streams of data. (Couldn&#8217;t you tweak your 	software to make that not be true?)</li>
</ul>
<p style="margin-bottom: 0in;">But if we grant that temp space naturally is accessed in multiple places at once – well, that&#8217;s a lot like random I/O, and <a href="../2005/11/13/breaking-the-disk-speed-barrier/">if you&#8217;re doing a lot of random reads, you&#8217;d love to use something other than spinning disk</a>.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Teradata&#8217;s future product strategy</title>
		<link>http://www.dbms2.com/2010/08/12/teradata-future-product-strategy/</link>
		<comments>http://www.dbms2.com/2010/08/12/teradata-future-product-strategy/#comments</comments>
		<pubDate>Thu, 12 Aug 2010 10:37:14 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Kickfire]]></category>
		<category><![CDATA[Microstrategy]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2769</guid>
		<description><![CDATA[I think Teradata&#8217;s future product strategy is coming into focus. I&#8217;ll start by outlining some particular aspects, and then show how I think it all ties together.

The immediate hook here is that I had a short conversation with Scott Gnau of Teradata yesterday, triggered by Teradata&#8217;s acquisition of Kickfire&#8217;s assets. Takeaways from that part included:

The [...]]]></description>
			<content:encoded><![CDATA[<p>I think Teradata&#8217;s future product strategy is coming into focus. I&#8217;ll start by outlining some particular aspects, and then show how I think it all ties together.<br />
<span id="more-2769"></span></p>
<p style="margin-bottom: 0in;">The immediate hook here is that I had a short conversation with Scott Gnau of Teradata yesterday, triggered by <a href="../2010/07/27/kickfire-unlikely-to-survive/">Teradata&#8217;s acquisition of Kickfire&#8217;s assets</a>. Takeaways from that part included:</p>
<ul>
<li>The acquisition is all about 	Kickfire&#8217;s <a href="../2009/08/21/kickfires-fpga-based-technical-strategy/">data 	pipelining</a> technology.</li>
<li>Scott (in my opinion rightly) 	thinks that isn&#8217;t particularly tied to Kickfire&#8217;s choice of 	particular DBMS architecture (fairly vanilla columnar).</li>
<li>No decision has been made about 	whether the right vehicle for this technology is an FPGA (Field 	Programmable Gate Array), conventional Intel CPU, RAM, etc.</li>
</ul>
<p style="margin-bottom: 0in;"><em>If you want to handicap Teradata&#8217;s future data pipelining strategy, you might note that:</em></p>
<ul>
<li><em>Kickfire&#8217;s own choice – and 	hence its existing implementation – is an FPGA.</em></li>
<li><em><a href="../2009/08/04/vectorwise-ingres-and-monetdb/">VectorWise&#8217;s 	approach to pipelining is Intel-based,</a> apparently at the cost of 	being closely tied to specific generations of Intel CPUs.</em></li>
<li><em><a href="../2009/07/27/xtremedata-announces-its-dbx-data-warehouse-appliance/">XtremeData&#8217;s 	approach to pipelining</a> is FPGA-based.</em></li>
<li><em>Teradata has a lot more 	development resources than any of those other companies, as well as 	important existing products, and hence has both means and motive to 	shoehorn new technology into older system designs.</em></li>
</ul>
<p style="margin-bottom: 0in;">While I had Scott on the phone, I brought up a few other subjects too. Highlights included:</p>
<ul>
<li>Teradata&#8217;s Flash-based appliance 	is doing just fine in beta test and customer POCs (Proofs of 	Concept).</li>
<li>Other kinds of Teradata appliance 	are not inconceivable.</li>
<li>Scott thinks <a href="http://www.dbms2.com/2010/07/31/teradata-xkoto-gridscale-rip-and-active-active-clustering/" >Michael McIntire&#8217;s 	condemnation of Active-Active architectures</a> is overstated. That 	said,
<ul>
<li>Scott does acknowledge a need for 	greater Active-Active scalability, and suggests that the reason 	Xkoto&#8217;s current products are being discontinued is their lack of 	scaling.</li>
<li>Scott seems quietly confident the 	scaling will get done.</li>
</ul>
</li>
<li>Scott is emphatic that Teradata is 	not going to go to <a href="../2009/04/20/calpont-update-you-read-it-here-first/">a 	two-tier architecture</a>. In particular, the point of splitting 	storage/lightweight database processing and heavyweight database 	processing on separate tiers is generally to save bandwidth, and 	Teradata&#8217;s BYNET is typically less than 10% loaded.</li>
<li>Scott didn&#8217;t dispute my claim that 	this all suggests <a href="../2008/10/14/teradata-virtual-storage/">Teradata 	Virtual Storage</a> is the future, at the expense of a rigid 	delineation among <a href="../2008/10/23/teradata-appliance-product-lines/">specific 	use-case-focused product lines</a>.</li>
<li>Unlike <a href="http://www.dbms2.com/2010/02/22/netezza-twinfin/" >Netezza</a> or <a href="http://www.dbms2.com/2010/02/22/aster-data-ncluster-4-5/" >Aster</a>, Teradata doesn&#8217;t seem to plan analytic capability that works outside 	the UDF (User Defined Function) framework. However, Scott noted that 	Teradata has long had the capability that Aster and Netezza now also 	have of letting you run analytic code either in “protected mode” 	(if the process fails the whole database doesn&#8217;t crash) or in the 	database kernel (best performance, if you&#8217;re sufficiently confident 	in the code&#8217;s stability to take the risk). Scott also spoke of the 	release later this quarter of Teradata FastPath, which will offer 	yet better performance (however, there&#8217;s a gotcha to Teradata 	FastPath that&#8217;s still NDA).</li>
</ul>
<p style="margin-bottom: 0in;">Putting all that together with the rest of what we know about Teradata, I&#8217;m going to call out<strong> three pillars of Teradata&#8217;s long-term product strategy:</strong></p>
<ul>
<li><strong>Same fundamentals as always.</strong> Teradata&#8217;s core product strategy is:
<ul>
<li>Single DBMS, capable of meeting 	all analytic needs while running in a single instance, usually 	running on &#8230;</li>
<li>… proprietary hardware …</li>
<li>… built from 	conservatively-chosen parts.</li>
</ul>
</li>
<li><strong>Selective vertical application 	stack.</strong> No matter how horizontally-oriented they are, many 	companies that have been in the analytic technology business for a 	while wind up with some vertical applications. It sort of just 	happens. Teradata is no exception. Teradata also likes to sell 	services to its product customers, and some of those are quite 	vertical-aware.</li>
<li><strong>Mutable, modular platform.</strong> This is what I highlighted above. Note that it&#8217;s philosophically 	attuned with the one-system-does-everything approach Teradata 	prefers. More subtly, please also note that it goes well with 	customer-by-customer price customization, which is almost a must for 	Teradata given the Innovator&#8217;s Dilemma kind of pricing box it finds 	itself in.</li>
</ul>
<p style="margin-bottom: 0in;">So far, that&#8217;s not too exciting, except in the details of how Teradata&#8217;s engineers make that all work. But there&#8217;s a <strong>fourth pillar to Teradata&#8217;s technical strategy</strong> as well, and it&#8217;s a wild card: t<strong>ight partnerships.</strong> Every time I talk with Teradata hardware chief Carson Schmidt, he seems excited about some particular version of a part or other – sometimes from a reasonably established vendor (once it was LSI Logic), sometimes from a tiny one (notably <a href="../2009/10/25/teradata-hardware-strategy-and-tactics/">the “stealth” start-up on which Teradata bet its first solid-state product</a>.) In the future, I expect tight business intelligence partnerships as well. Cognos BI will be increasingly integrated with IBM&#8217;s DBMS and hardware; Business Objects&#8217; BI will increasingly be integrated with SAP&#8217;s applications; and Oracle&#8217;s BI will eventually be integrated with everything. How do you compete with that if you<span style="font-style: normal;">&#8216;re Microstrategy? </span>Well, you try to have superior product, of course – but you also partner as closely with DBMS vendors as you can, an approach Microstrategy has already started. Predictive analytics stalwart <a href="http://www.dbms2.com/2010/05/15/further-clarifying-in-database-mpp-sas/" >SAS</a>, of course, is on a partnership binge as well.</p>
<p style="margin-bottom: 0in;">Teradata has a larger installed base than almost all its competitors, and enjoys richer third-party software and service support as a result. But I suspect that going forward,  for Teradata to remain a leading competitor at price points it is willing to accept, Teradata&#8217;s “ecosystem” advantages will need to ratchet up one or several notches.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/12/teradata-future-product-strategy/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Big Data is Watching You!</title>
		<link>http://www.dbms2.com/2010/08/11/big-data-is-watching-you/</link>
		<comments>http://www.dbms2.com/2010/08/11/big-data-is-watching-you/#comments</comments>
		<pubDate>Wed, 11 Aug 2010 05:30:22 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2760</guid>
		<description><![CDATA[There&#8217;s a boom in large-scale analytics. The subjects of this analysis may be categorized as:

People
Financial trades
Electronic networks
Everything else

The most varied, interesting, and valuable of those four categories is the first one.

That may change some day, with the growing importance of machine-generated data, and of big-data science in particular. But I think it&#8217;s a fair assessment [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">There&#8217;s a boom in large-scale analytics. The subjects of this analysis may be categorized as:</p>
<ul>
<li>People</li>
<li>Financial trades</li>
<li>Electronic networks</li>
<li>Everything else</li>
</ul>
<p style="margin-bottom: 0in;">The most varied, interesting, and valuable of those four categories is the first one.</p>
<p><span id="more-2760"></span></p>
<p style="margin-bottom: 0in;"><em>That may change some day, with the growing importance of<a href="http://www.dbms2.com/2010/04/08/machine-generated-data-example/" > </a><a href="http://www.dbms2.com/2010/04/08/machine-generated-data-example/" >machine-generated data</a>,</em><em> and of <a href="http://www.dbms2.com/2009/10/03/issues-in-scientific-data-management/" >big-data science</a> </em><em>in particular. But I think it&#8217;s a fair assessment at the present, and for at least the next few years.</em></p>
<p style="margin-bottom: 0in;">Some of th<span style="font-weight: normal;">e most interesting use cases are concentrated in the areas of identifying individuals, groups of people, or behaviors of (groups of) people. For example:</span></p>
<ul>
<li>comScore works hard to <strong>identify 	individual web surfers </strong><span style="font-weight: normal;">– 	i.e. to </span><strong>deanonymize</strong><span style="font-weight: normal;"> them &#8212; even</span> though they may have given incomplete or false 	personal information.</li>
<li>Other companies at least try to 	figure out <strong>which information in a user&#8217;s profile is unreliable,</strong> so as to classify them better. (Yes, there are 62-year-old 	video-game-obsessed Lady Gaga fans, but that&#8217;s generally not the way 	to bet.)</li>
<li>Multiple telecom vendors try to 	identify who their <strong>most influential customers</strong> are (to a first 	approximation, they&#8217;re the ones most often called by the most 	people, but it surely gets more sophisticated than that). This 	information is then used to reduce churn, either by working hard to 	retain those users, or – if they do churn – to move very fast to 	retain the business from their friends.</li>
<li>Other kinds of companies do 	similar kinds of analysis, to the extent that they have enough of a 	social graph to do so. (This application is a case where the term 	“<a href="http://www.dbms2.com/2010/06/08/profile-of-revealed-preferences/" >social graph</a>” is not a misnomer.)</li>
<li><strong>Turing detectives</strong> (I just 	coined that phrase) try to determine whether users are humans or 	bots.</li>
<li>Central to detecting <strong>insurance 	fraud</strong> is identifying suspiciously close connections between 	claimants, service providers, and so on.</li>
<li>Identifying groups of people is 	also important in flagging <strong>insider trading.</strong><span style="font-weight: normal;"> Even more important are other kinds of analysis, along the lines of 	“is this normal innocent trading behavior?” </span></li>
<li><span style="font-weight: normal;">Intelligence 	agencies try to detect networks of </span><strong>terrorists</strong><span style="font-weight: normal;"> and their sympathizers. They further try to identify unusual 	patterns of communication or meetings along those networks that 	might indicate terrorist acts are being planned. (Civilian law 	enforcement agencies can use similar techniques.)</span></li>
</ul>
<p style="margin-bottom: 0in; font-weight: normal;">In most cases, the analysis and/or run-time execution of the relevant models is done with the help of analytic DBMS. Other technologies that come into play include non-DBMS MapReduce (Hadoop), graph engines, and CEP (Complex Event Processing). The vendor most heavily represented on that list is probably Aster Data, because:</p>
<ul>
<li>Aster Data is 	focused on hard-core analytics.</li>
<li>I talk a lot 	with Aster Data, and in particular had a long, detailed use-cases 	discussion with them last week.</li>
<li><span style="font-weight: normal;">The 	comScore example happens to come from a speaker at </span><a href="http://www.dbms2.com/2010/05/07/implications-onew-analytic-technology/" ><span style="font-weight: normal;">an 	Aster event</span></a><span style="font-weight: normal;"> I also 	participated in.</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-weight: normal;">And by the way, all this only scratches the surface of what will be possible down the road. It&#8217;s based mainly on where you live, what you purchase, how you behave on websites, and who you communicate with. </span><span style="color: #000080;"><span lang="zxx"><span style="text-decoration: underline;"><a href="../2010/07/04/fair-data-use/"><span style="font-weight: normal;">Other kinds of data, which could be used to be yet more intrusive</span></a></span></span></span><span style="font-weight: normal;">, generally aren&#8217;t involved.</span></p>
<p style="margin-bottom: 0in;"><span style="font-weight: normal;">I actually have two points in drawing up this list. One is golly-gee-whiz about how a lot of analytically sophisticated applications are actually getting into production. The other is to highlight the privacy and liberty threats If This Goes On Unchecked (which is why I didn&#8217;t include some other less-people-focused examples). There&#8217;s also a related danger that, to the extent we don&#8217;t get some smart regulations to keep us safe(r), we&#8217;ll get a bunch of stupid regulations instead. </span></p>
<p style="margin-bottom: 0in;"><span style="font-weight: normal;">The Analytic Era has only just begun.<br />
</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/11/big-data-is-watching-you/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Links and observations</title>
		<link>http://www.dbms2.com/2010/08/09/links-and-observations/</link>
		<comments>http://www.dbms2.com/2010/08/09/links-and-observations/#comments</comments>
		<pubDate>Tue, 10 Aug 2010 02:37:51 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Calpont]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[Kickfire]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Northscale]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[XtremeData]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2743</guid>
		<description><![CDATA[I&#8217;m back from a trip to the SF Bay area, with a lot of writing ahead of me. I&#8217;ll dive in with some quick comments here, then write at greater length about some of these points when I can. From my trip:  

Aster Data showed me a lot of customer names and deal sizes, across [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m back from a trip to the SF Bay area, with a lot of writing ahead of me. I&#8217;ll dive in with some quick comments here, then write at greater length about some of these points when I can. From my trip:  <span id="more-2743"></span></p>
<ul>
<li>Aster Data showed me a lot of customer names and deal sizes, across a bunch of industries (mainly enterprise rather than web). Yes, Aster&#8217;s market success is for real. (But almost all those details are NDA.)</li>
<li>Sybase&#8217;s product plans for IQ are pretty impressive. (But the most interesting parts are, you guessed it, NDA.)</li>
<li>I&#8217;ve kissed and made up* with ParAccel, now that they&#8217;ve replaced their CEO, replaced their marketing chief, and stopped the worst of the <a href="http://www.dbms2.com/2010/01/15/there-sure-seem-to-be-a-lot-of-inaccuracies-on-paraccels-website/" >marketing</a> <a href="http://www.dbms2.com/2009/06/22/the-tpc-h-benchmark-is-a-blight-upon-the-industry/" >nonsense</a> I used to complain about. ParAccel has some interesting plans for ParAccel 3.0 which are, naturally, NDA.</li>
<li>The Peoplesoft guys are doing it over again at Workday. Only this time, their platform isn&#8217;t a relational DBMS. Rather, it&#8217;s an in-memory, completely object-oriented data model, with disk used only on a &#8220;Just in case the power ever goes out&#8221; basis. (Thankfully, nothing at all about our conversation was NDA.)</li>
<li>I&#8217;m finally feeling good about <a href="# I spent considerable time  with my clients at both Greenplum and EMC (if we ignore the fact that  the deal has closed and they're now the same company). I also had more  of  a hardcore engineering discussion than I've had with Greenplum for  quite a while (I should have been pushier about that earlier). Takeaways  included:      * This is starting off as a honeymoon deal. Everything  Greenplum was planning to do is being continued. Additional resources  are being poured into Greenplum to do more.     * Some Greenplum execs  seem to envision staying long term, some seem to envision moving on to  their next startups. The ones who envision moving on are, however, going  to work hard first to make the merger a success.     * Greenplum has,  for quite a while, had more of an advanced analytics/embedded predictive  modeling story than I realized. Bad on them for not fleshing it out  more in marketing and product packaging alike.     * Greenplum both  denies the concurrency problems I previously noted and also has a very  credible story as to how it will eliminate them. :) Seriously, Greenplum  tells of one customer that routinely runs 150 simultaneously queries -  on what I think is not a terribly big system -- and a number of POCs  (Proofs of Concept) that simulated similar levels of concurrency.">Northscale&#8217;s  memcached-compatible persistent store Membase</a>. The main reason is  that they showed me a near-term path to interfaces that are richer than  key-value. Also, Todd Hoff reassured me that even pure persistent  memcached has a place.</li>
<li>Rumor says that even the one app for which Facebook was using Cassandra &#8212; in-box search &#8212; has been decommissioned. On the other hand, numerous other scale-0ut DBMS (SQL or otherwise) seem to have Facebook footholds. But details are &#8212; all together now! &#8212; NDA.</li>
</ul>
<p><em>*If you know ParAccel&#8217;s new marketing chief Michael Weir, you  surely guessed I mean that only in a figurative sense.</em></p>
<p>From elsewhere:</p>
<ul>
<li>Daniel Abadi offered <a href="http://dbmsmusings.blogspot.com/2010/08/thoughts-on-kickfires-apparent-demise.html" onclick="javascript:pageTracker._trackPageview('/dbmsmusings.blogspot.com');">his  analysis</a> of <a href="../2010/07/27/kickfire-unlikely-to-survive/">Kickfire&#8217;s  demise</a>. In general I agree, but Daniel neglected to mention one  hugely important factor &#8212; the chicken-egg negative effect of Kickfire&#8217;s  lack of market or marketing traction. Customers were extremely reluctant to buy from Kickfire  because they perceived, correctly, that Kickfire&#8217;s survivability was far  from assured.</li>
<li>While the <a href="http://infinidb.org/community/forums/11-general-infinidb/1000-strange-issue-with-drop-table" onclick="javascript:pageTracker._trackPageview('/infinidb.org');">InfiniDB forums</a> suggest that there are at least a couple of production users of Calpont&#8217;s free InfiniDB, Calpont seemingly has a long way to go to be even as successful as Kickfire. But Calpont does have a bit of money to spend on lead generation; maybe some day they&#8217;ll even have actual customers.</li>
<li>In a response to a question I messaged over, <a href="http://www.dbms2.com/2010/03/18/xtremedata-update/" >XtremeData</a> tells me they have actual customers now. Press releases to follow.</li>
<li>The <a href="http://news.cnet.com/8301-31021_3-20013111-260.html?part=rss&amp;subj=news&amp;tag=2547-1_3-0-20" onclick="javascript:pageTracker._trackPageview('/news.cnet.com');">admiration for the job Mark Hurd did at HP</a> is in my opinion overstated. Sure, the financial/operational management appeared to work, but HP did little on Hurd&#8217;s watch to strengthen its reputation or customers&#8217; loyalty. In particular:
<ul>
<li>HP&#8217;s analytics efforts have accomplished little.</li>
<li>HP&#8217;s data warehouse appliance efforts have failed pathetically.</li>
<li>From what I hear, HP&#8217;s execution in its Exadata partnership was not good.</li>
<li>HP&#8217;s server business in general is distinguished mainly by HP being a big company.</li>
<li>HP&#8217;s EDS acquisition has been rocky, not that EDS was sailing so smoothly on its own beforehand.</li>
<li>HP&#8217;s success in PCs amounts to &#8220;arguably, HP sucks a little less than the other guys&#8221;.</li>
<li>HP&#8217;s elite reputation is long gone (admittedly, for the most part that predates Hurd).</li>
</ul>
</li>
<li><a href="http://intelligent-enterprise.informationweek.com/blog/archives/2010/08/software_innova.html" onclick="javascript:pageTracker._trackPageview('/intelligent-enterprise.informationweek.com');">Doug Henschen</a> evidently favors really strong intellectual property protection for software, even forbidding plug-compatible reverse engineering. I agree with Doug up to the point that <a href="http://www.monashreport.com/2010/07/19/my-view-of-intellectual-property/" onclick="javascript:pageTracker._trackPageview('/www.monashreport.com');">it should be forbidden to copy proprietary software</a>, but I don&#8217;t see why he (or a court) would view such behavior as copying.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/09/links-and-observations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Notes on EMC&#8217;s Greenplum subsidiary</title>
		<link>http://www.dbms2.com/2010/08/09/emc-greenplum/</link>
		<comments>http://www.dbms2.com/2010/08/09/emc-greenplum/#comments</comments>
		<pubDate>Tue, 10 Aug 2010 00:02:17 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2744</guid>
		<description><![CDATA[I spent considerable time last week with my clients at both Greenplum and EMC  (if we ignore the fact that the deal has closed and they&#8217;re now the same  company). I also had more of  a hardcore engineering discussion than  I&#8217;ve had with Greenplum for quite a while (I should have been [...]]]></description>
			<content:encoded><![CDATA[<p>I spent considerable time last week with my clients at both Greenplum and EMC  (if we ignore the fact that the deal has closed and they&#8217;re now the same  company). I also had more of  a hardcore engineering discussion than  I&#8217;ve had with Greenplum for quite a while (I should have been pushier  about that earlier). Takeaways included:</p>
<ul>
<li>This is starting off as a honeymoon deal. Everything Greenplum was  planning to do is being continued. Additional resources are being  poured into Greenplum to do more.</li>
<li>Some Greenplum execs seem to envision staying long term, some seem  to envision moving on to their next startups. The ones who envision  moving on are, however, going to work hard first to make the merger a  success.</li>
<li>Greenplum has, for quite a while, had more of an advanced  analytics/embedded predictive modeling story than I realized. Bad on  them for not fleshing it out more in marketing and product packaging  alike.</li>
<li>Greenplum both denies the <a href="http://www.dbms2.com/2010/07/06/emc-is-buying-greenplum/" >concurrency  problems</a> I previously noted and also has a very credible story as  to how it will eliminate them. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Seriously, Greenplum tells of one  customer that routinely runs 150 simultaneous queries &#8211; on what I  think is not a terribly big system &#8212; and a number of POCs (Proofs of  Concept) that simulated similar levels of concurrency.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/09/emc-greenplum/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Teradata, Xkoto Gridscale (RIP), and active-active clustering</title>
		<link>http://www.dbms2.com/2010/07/31/teradata-xkoto-gridscale-rip-and-active-active-clustering/</link>
		<comments>http://www.dbms2.com/2010/07/31/teradata-xkoto-gridscale-rip-and-active-active-clustering/#comments</comments>
		<pubDate>Sat, 31 Jul 2010 08:23:57 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Continuent]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Xkoto]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2708</guid>
		<description><![CDATA[Having gotten a number of questions about Teradata&#8217;s acquisition of Xkoto, I leaned on Teradata for an update, and eventually connected with Scott Gnau. Takeaways included:

Teradata is discontinuing  Xkoto&#8217;s existing product Gridscale, which 	Scott characterized as being too OLTP-focused to be a good fit for 	Teradata. Teradata hopes and expects that existing Xkoto Gridscale [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Having gotten a number of questions about Teradata&#8217;s acquisition of Xkoto, I leaned on Teradata for an update, and eventually connected with Scott Gnau. Takeaways included:</p>
<ul>
<li>Teradata is discontinuing <a href="http://www.dbms2.com/2009/09/11/xkoto-gridscale-highlights/" > </a><a href="http://www.dbms2.com/2009/09/11/xkoto-gridscale-highlights/" >Xkoto&#8217;s existing product Gridscale</a>, <span style="font-style: normal;">which 	Scott characterized as being too OLTP-focused to be a good fit for 	Teradata. Teradata hopes and expects that existing Xkoto Gridscale 	customers won&#8217;t renew maintenance. (I&#8217;m not sure</span> that they&#8217;ll 	even get the option to do so.)</li>
<li>The point of Teradata&#8217;s technology 	+ engineers acquisition of Xkoto is to enhance Teradata&#8217;s 	active-active or multi-active data warehousing capabilities, which 	it has had in some form for several years.</li>
<li>In particular, Teradata wants to 	tie together different products in the Teradata product line. (Note: 	Those typically all run pretty much the same Teradata database 	management software, except insofar as they might be on different 	releases.)</li>
<li>Scott rattled off all the 	plausible areas of enhancement, with multiple phrasings – 	performance, manageability, ease of use, tools, features, etc.</li>
<li>Teradata plans to have one or two 	releases based on Xkoto technology in 2011.</li>
</ul>
<p style="margin-bottom: 0in;">Frankly, I&#8217;m disappointed at the struggles of clustering efforts such as Xkoto Gridscale or <a href="http://www.dbms2.com/2009/09/03/continuent-on-clustering/" >Continuent&#8217;s pre-Tungsten products</a>, but if the DBMS vendors meet the same needs themselves, that&#8217;s OK too.</p>
<p style="margin-bottom: 0in;">The logic behind active-active database implementations actually seems pretty compelling:  <span id="more-2708"></span></p>
<ul>
<li>You may well be keeping a second 	copy of your database for high availability/hot standby.</li>
<li>You might even be keeping a third 	copy for off-site disaster recovery.</li>
<li>In some cases, you might have 	reasons beyond disaster recovery to distribute a database around the 	world.</li>
<li>So why not allow queries to be run 	against all the copies?</li>
<li>And by the way, splitting the 	workload up a bit by kinds (e.g., long-running vs. short query) 	might let you optimize the implementation of each copy of the 	database. (This last point becomes even more important with the rise 	of solid-state memory.)</li>
</ul>
<p style="margin-bottom: 0in;">Analytic DBMS vendors pretty much all need to offer this. (Possible exception: If they have a data-mart-only positioning so extreme that customers will never care about any form of failover.) That said, I must confess to not having done a good job of tracking who does or doesn&#8217;t have which features in this area to date; informative comments to this post in that regard would be much appreciated!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/07/31/teradata-xkoto-gridscale-rip-and-active-active-clustering/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
