<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Software as a Service (SaaS)</title>
	<atom:link href="http://www.dbms2.com/category/software-as-a-service-database-saas/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 02 Sep 2010 09:06:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Workday comments on its database architecture</title>
		<link>http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/</link>
		<comments>http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/#comments</comments>
		<pubDate>Sun, 22 Aug 2010 10:20:44 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Object]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Workday]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2874</guid>
		<description><![CDATA[In my discussion of Workday&#8217;s technology, I gave an estimate that Workday&#8217;s database, if relationally designed, would require “1000s” of tables. That estimate came from Workday, Inc. CTO Stan Swete, in a thoughtful email that made several points about Workday&#8217;s database strategy. Workday kindly gave me permission to quote it below.


I would say thousands. The [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in; page-break-before: always;"><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">In my discussion of </span></span></span><span style="font-size: small;"><span style="font-weight: normal;"><a href="http://www.dbms2.com/2010/08/22/workday-technology-stack/" >Workday&#8217;s technology</a>,</span></span><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;"> I gave an estimate that Workday&#8217;s database, if relationally designed, would require “1000s” of tables. That estimate came from Workday, Inc. CTO Stan Swete, in a thoughtful email that made several points about Workday&#8217;s database strategy. Workday kindly gave me permission to quote it below.</span></span></span><br />
<span id="more-2874"></span></p>
<blockquote>
<p style="margin-bottom: 0in; font-style: normal; font-weight: normal;"><span style="font-size: small;">I would say thousands. The object model for our applications consists of over 2000 classes. On average these classes have multiple relationships with other classes so that would have some kind of multiplicative effect when it came to using tables.</span></p>
<p style="margin-bottom: 0in; font-style: normal; font-weight: normal;"><span style="font-size: small;">One example of where you’d be proliferating tables (and not getting as satisfactory of a solution relationally) is worktags. Currently we have a class for worktags. Instances of this class can point to various instances of detail lines (expense lines, po lines, invoice lines, etc…). A detail line can have many worktags pointing to it. To model this relationally you’d need either a separate table for each type of detail line in the system to store the tags associated with it or a single worktag for detailed line table that could be foreign keyed for all types of detail lines that would store their worktag. Either way involves more tables and more clunkiness.</span></p>
<p style="margin-bottom: 0in; font-style: normal; font-weight: normal;"><span style="font-size: small;">Another example of where our oo designs wouldn’t directly translate is our ability to describe to shared part of a detail line in one class and have all instances of detail lines inherit the fields that are shared. To do this relationally you’d probably replicate the shared fields in each table representing the various kinds of transactional details (again lines, po lines, invoice lines, etc…). You’d lose the ability to maintain and change the shared fields (and the processing logic for those fields) in one place.</span></p>
<p style="margin-bottom: 0in; font-style: normal; font-weight: normal;"><span style="font-size: small;">Anyway, I’d go with “thousands” as our answer. I do think this is an interesting question and wish we had more time to figure out a more accurate answer.</span></p>
</blockquote>
<p><em><strong>This post is part of a three-post series</strong></em></p>
<ul>
<li><a href="http://www.dbms2.com/2010/08/22/workday-inc-company-overview/" >Workday Inc. company overview</a> (brief)</li>
<li><a href="http://www.dbms2.com/2010/08/22/workday-technology-stack/" >Workday Inc. technology overview</a> (detailed)</li>
<li>Workday Inc. CTO Stan Swete&#8217;s <a href="http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/" >comments on database strategy</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The Workday architecture &#8212; a new kind of OLTP software stack</title>
		<link>http://www.dbms2.com/2010/08/22/workday-technology-stack/</link>
		<comments>http://www.dbms2.com/2010/08/22/workday-technology-stack/#comments</comments>
		<pubDate>Sun, 22 Aug 2010 10:20:08 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data integration and middleware]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Object]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Workday]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2865</guid>
		<description><![CDATA[One of my coolest company visits in some time was to  SaaS  (Software as a Service) vendor Workday, Inc., earlier this month. Reasons included:

Workday has 	forward-thinking ideas about SaaS enterprise 	applications and the integration of business intelligence into same.
Workday has highly 	innovative ideas in how it manages data.
Companies founded by 	Dave Duffield tend [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><span style="font-size: small;">One of my coolest company visits in some time was to </span><span style="font-size: small;"> SaaS  (Software as a Service) vendor</span><span style="font-size: small;"> Workday, Inc., earlier this month. Reasons included:</span></p>
<ul>
<li><span style="font-size: small;">Workday has 	forward-thinking ideas about SaaS enterprise 	applications and the integration of business intelligence into same.</span></li>
<li><span style="font-size: small;">Workday has highly 	innovative ideas in how it manages data.</span></li>
<li><span style="font-size: small;">Companies founded by 	Dave Duffield tend to feature smart, likeable people who talk to one</span><span style="font-size: small;"><span style="font-style: normal;"> pleasantly and forthrightly. Workday is no exception; CTO Stan Swete 	and the other Workday folks present were a delight to talk with.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">I&#8217;d 	invited Merv Adrian to come along with me. He asked great questions, 	and I could gather myself a bit despite how sleep-deprived I was for 	the first part of that trip.</span></span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Workday kindly allowed me to post this </span></span><span style="font-size: small;"><a href="http://www.monash.com/uploads/Workday-August-2010.ppt" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">Workday slide deck</a>.</span><span style="font-size: small;"><span style="font-style: normal;"> Otherwise, I&#8217;ve split out a quick </span></span><a href="http://www.dbms2.com/2010/08/22/workday-inc-company-overview/" ><span style="font-size: small;">Workday, Inc. company overview</span></a><span style="font-size: small;"><span style="font-style: normal;"> into a separate post.</span></span></p>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">The biggie for me was the data and object management part. Specifically:  <span id="more-2865"></span><br />
</span></span></p>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;"><strong>Workday&#8217;s 	applications run entirely in-memory,</strong></span></span><span style="font-size: small;"><span style="font-style: normal;"> in a highly object-oriented structure. Persistence is mainly for the 	sake of data safety …</span></span></li>
<li>… <span style="font-size: small;"><span style="font-style: normal;">but 	not entirely. In earlier releases, Workday kept absolutely 	everything in RAM. However, certain things are kept only on disk, 	such as:</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Audit 	files.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Certain 	documents (notably resumes).</span></span></li>
</ul>
</li>
<li><strong><span style="font-size: small;"><span style="font-style: normal;">Workday&#8217;s 	whole database</span></span></strong><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;"> – data and metadata alike – is persisted to disk in </span></span></span><strong><span style="font-size: small;"><span style="font-style: normal;">&lt;10 	MySQL/InnoDB tables. </span></span></strong><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">MySQL 	is basically just being used as a </span></span></span><strong><span style="font-size: small;"><span style="font-style: normal;">key-value 	store, </span></span></strong><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">albeit 	one with </span></span></span><strong><span style="font-size: small;"><span style="font-style: normal;">ACID 	transactional support. </span></span></strong>
<ul>
<li><span style="font-size: small;">There <span style="font-weight: normal;">are </span><strong>3 main tables: attributes, relationships, instances.</strong></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">When 	I suggested this might be like an entity-attribute-value model, 	Workday said it would be even better to think in terms of</span><span style="font-style: normal;"><strong> instanceID-attribute-value.</strong></span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">As 	you might expect for a database that simple, its schema doesn&#8217;t 	change much.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">By 	way of comparison, Workday estimates that if its software were 	written relationally, </span></span></span><span style="font-size: small;"><span style="font-style: normal;">there 	would b</span></span><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">e </span></span></span><span style="font-size: small;"><span style="font-weight: normal;"><a href="http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/" >1000s 	of tables</a>,</span></span><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;"> which</span></span></span><span style="font-size: small;"><span style="font-style: normal;"> would take up 10-100X as much disk space. </span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;">All 	write transactions are banged immediately into the MySQL database. 	I.e., RAM and disk are never allowed to get out of sync.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday&#8217;s 	database is append-only. This is exploited for effective dating 	(pretty heavily, it seems, perhaps because that&#8217;s a useful concept 	in human resources) and snapshotted reporting.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday&#8217;s 	built-in BI doesn&#8217;t have a lot of choice but to do scans, traversing 	the object model. This turns out to be fast enough.</span></span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-size: small;">Other notes on Workday&#8217;s data and object management strategy include:</span></p>
<ul>
<li><span style="font-size: small;">Workday is 	object-oriented through and through – no object-relational mapping 	&#8211; <a href="http://en.wikipedia.org/wiki/Turtles_all_the_way_down" onclick="javascript:pageTracker._trackPageview('/en.wikipedia.org');">turtles 	all the way down</a>. On average, a class has about 2 attributes.</span></li>
<li><span style="font-size: small;">94% of requests are 	reads, traversing the object hierarchy.</span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	databases are pretty small.</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">The 	biggest database Workday supports uses 17 gigabytes of RAM. </span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	databases are much smaller on disk than in RAM.</span></span></li>
</ul>
</li>
<li><span style="font-size: small;">Workday&#8217;s “dream” 	is to move from disk to solid-state memory. </span></li>
<li><span style="font-size: small;">Workday uses GPLed 	MySQL/InnoDB. So there&#8217;s no software license reason to ever move 	away (e.g., to a pure key-value store).</span></li>
<li><span style="font-size: small;">Disaster recove</span><span style="font-size: small;"><span style="font-style: normal;">ry 	is based on local and remote MySQL slaves. </span></span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Obviously, serious apps have been built before in object-oriented and/or key-value ways, with the resulting objects then being banged to disk (or in some cases kept in memory). Examples include:</span></span></p>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Numerous 	applications are built on <a href="../2010/01/15/intersystems-cache-highlights/">object-oriented 	DBMS</a>. Generally they go against disk, although <a href="../2005/11/14/defining-and-surveying-memory-centric-data-management/">memory-centric 	implementations can save a lot of pointer-chasing</a>. Often they&#8217;re 	queried via SQL.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Basho&#8217;s 	website says that its key-value store Riak was originally conceived 	in connection with a planned salesforce automation product, but I 	don&#8217;t think that the application part of that plan ever got built. </span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">SAP 	has <a href="../2005/12/09/36/">longstanding</a> doubts about relational dogma, although not nearly to Workday&#8217;s 	extreme.</span></span></li>
<li><span style="font-size: small;">Obviously, 	some major internet applications just bang data into key-value 	stores.</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Still, perhaps because it wholly object-oriented yet doesn&#8217;t even bother with anything like a real object-oriented DBMS, Workday&#8217;s approach seems particularly cool. </span></span></p>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Other highlights of Workday, Inc.&#8217;s technical story include:</span></span></p>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	has settled into a schedule of three releases per year, and has 	pretty much lived up to that for &gt;2 years.</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Every 	user is always on the latest Workday release.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">You 	can delay turning on significant new Workday software functionality 	if you want to.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Pure 	UI changes to the Workday software are handled much as they are on 	various websites today. Sometimes you have no choice but to live 	with them; sometimes the prior version of the UI remains available 	to you for a while.</span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday&#8217;s 	navigational approaches look pretty cool.</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">The 	core concept is a list of actions you can perform now, rather than 	more standard menus.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Roles/permissions 	are of course central to this.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Reports 	have lots of actionable links in them. (More than just drilldown, 	although specific examples have slipped my memory.)</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Alternatively, 	you can navigate via a search box, searching both on names of 	objects (e.g. users, divisions) or on names of tasks. This is 	somewhat reminiscent of <a href="http://www.texttechnologies.com/2007/02/28/sap%E2%80%99s-%E2%80%9Csearch%E2%80%9D-strategy-isn%E2%80%99t-about-search/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">an 	approach SAP was considering a few years ago</a>.</span></span></li>
</ul>
</li>
<li><span style="font-size: small;">Workday says it has 	four key design premises:</span>
<ul>
<li><span style="font-size: small;"><em>Web-Familiar 	Experience.</em> I&#8217;d say that&#8217;s true to to the extent it makes sense. 	In many ways, the web needs to catch up to Workday.</span></li>
<li><span style="font-size: small;"><em>Enterprise 	Reporting.</em> The idea is that you get a report, then take actions 	based on it. Hence the report-centric options for navigation.</span></li>
<li><span style="font-size: small;"><em>Integration 	On-Demand.</em> That&#8217;s a fancy way of saying “Plays nicely with 	others.”</span></li>
<li><span style="font-size: small;"><em>Configurable 	Business Processes.</em><span style="font-style: normal;"> Duh. That&#8217;s 	pretty essential if you want to do serious SaaS applications.</span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	maintains a strong separation between application logic and UI 	development. Developer do no screen layouts. Instead, Uis are 	automatically generated for:</span></span>
<ul>
<li><span style="font-size: small;">Flash/FLEX</span></li>
<li><span style="font-size: small;">iPhone</span></li>
<li><span style="font-size: small;">Mobile HTML</span></li>
<li><span style="font-size: small;">PDF export</span></li>
<li><span style="font-size: small;">Excel export</span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	only talks to the outside world via web services.</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	is heavily </span></span><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">into 	SOAP (Simple Object Access Protocol). </span></span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">The 	acquisition of OEM partner CapeClear gave Workday an Integration 	Service (i.e., enterprise service bus) that translates SOAP into 	whatever else might be needed for integration, and also does 	reliable delivery. </span></span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">All 	that said, Stan Swete sees integration among various SaaS offerings 	as an area needing significant future attention.</span></span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Workday&#8217;s 	business intelligence ideas are interesting, but I think there&#8217;s a 	long way for that technology still to go.</span></span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Workday&#8217;s 	BI seems to be focused on report/drilldown kinds of functionality.</span></span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">You 	can slice by up to 2 dimensions at once.</span></span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Then 	you can keep slicing, however, by more dimensions, as many times as 	you like.</span></span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">While 	you can take actions straight from reports, some of the specific 	BI/app integration ideas we discussed are still futures. (E.g., 	analyzing spend at the time of expense report data entry or 	approval.)</span></span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Of 	course, Workday&#8217;s web services interface lets you export Workday 	data into 3rd-party tools. Indeed, if you want to integrate data 	from Workday and some other source(s), that&#8217;s your only choice.</span></span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Workday 	offers a clever metaphor to illustrate that your data may be more 	secure offsite than on – the bank vault. (I have no idea whether 	that&#8217;s a SaaS industry standard, but I hadn&#8217;t heard it before.) Of 	course, that metaphor does beg some issues specific to the remote 	data case, such as:</span></span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">When 	your data is on premises, you know whether the government has 	insisted on looking at it.</span></span></span></li>
<li><span style="font-size: small;">More than cash, data keeps traveling back and forth to 	the remote location, which creates at least a theoretical risk of 	interception.</span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Workday 	says the toughest part of globalization is the issue of which 	personal data is or is not maintained. For example, in the US you&#8217;re 	not allowed to not ask a job applicant&#8217;s religion, but in the UK 	you&#8217;re not only permitted but indeed required to.</span></span></span></li>
</ul>
<p><em><strong>This post is part of a three-post series</strong></em></p>
<ul>
<li><a href="http://www.dbms2.com/2010/08/22/workday-inc-company-overview/" >Workday Inc. company overview</a> (brief)</li>
<li><a href="http://www.dbms2.com/2010/08/22/workday-technology-stack/" >Workday Inc. technology overview</a> (detailed)</li>
<li>Workday Inc. CTO Stan Swete&#8217;s <a href="http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/" >comments on database strategy</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/22/workday-technology-stack/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Workday, Inc. company overview</title>
		<link>http://www.dbms2.com/2010/08/22/workday-inc-company-overview/</link>
		<comments>http://www.dbms2.com/2010/08/22/workday-inc-company-overview/#comments</comments>
		<pubDate>Sun, 22 Aug 2010 10:20:02 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Workday]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2878</guid>
		<description><![CDATA[My main post on Workday&#8217;s technology got really long, so I decided to split out a company backgrounder separately. Here goes.
Workday, Inc. was founded by Dave Duffield and Aneel Bhusri, who&#8217;d previously worked together at PeopleSoft. It is generally the case that the companies Dave starts:  


Develop 	application software for large or fairly large enterprise [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">My main post on </span></span><a href="http://www.dbms2.com/2010/08/22/workday-technology-stack/" ><span style="font-size: small;">Workday&#8217;s technolog</span></a><span style="font-size: small;"><span style="font-style: normal;"><a href="http://www.dbms2.com/2010/08/22/workday-technology-stack/" >y</a> got really long, so I decided to split out a company backgrounder separately. Here goes.</span></span></p>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Workday, Inc. was founded by Dave Duffield and Aneel Bhusri, who&#8217;d previously worked together at PeopleSoft. It is generally the case that the companies Dave starts:  <span id="more-2878"></span><br />
</span></span></p>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Develop 	application software for large or fairly large enterprise customers.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Build 	those applications in/on their own platform technology, which is 	meant to be cutting-edge in its day. (For example, PeopleSoft was 	early in building an RDBMS-based client/server application suite, 	and did so with the help of a clever technology called PeopleTools <a href="../../../../../2008/04/13/scaledb-presents-the-revenge-of-the-pointer/">I 	nonetheless helped talk PeopleSoft out of further commercializing</a>.)</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Focus 	first on human resources software (Dave had another HR company 	before PeopleSoft).</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Move 	fairly early into non-profit/higher-education accounting (Dave had a 	company in that area before PeopleSoft, and PeopleSoft was fairly 	active in the area too).</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Emphasize a pleasant corporate culture.</span></span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">All these things seem true of Workday Inc., although the non-profit/higher-ed move is just underway now. Specifically: </span></span></p>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	was founded in 2005, starting with an asset buy of some platform 	software a key PeopleTools developer had been working on for years.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	has had multitenant SaaS offerings from the getgo. (And that&#8217;s all Workday does.)<br />
</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	has around 150 customers.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	sells mainly to multinational corporations, generally based in North 	America. Efforts in the UK are beginning to ramp up.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	has six core application modules, among which are:</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	Human Capital Management (almost all the customers).</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	Payroll (a little under 50 customers). Workday is partnered with 	local providers for payroll in 20 countries, and is building its 	second inhouse version (Canadian) now.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	Financial Management (a little under 20 customers, for what is far 	from a complete system).</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	Benefits Network, providing connectivity to benefits providers 	(that&#8217;s the only Workday module that isn&#8217;t straight software).</span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday, 	Inc. has around 500 employees, mainly in Pleasanton, CA. About 20 	are in Dublin, Ireland, courtesy of the acquisition of CapeClear. 	About 1/3 are in development.</span></span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Workday prices its services based on metrics for the overall client business, not per-Workday-user. (Actually, the metric is basically headcount, which makes sense given Workday&#8217;s application focus.)</span></span></p>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Some of these points are covered in more detail in a <a href="http://www.monash.com/uploads/Workday-August-2010.ppt" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">Workday Inc. slide deck</a>.<br />
</span></span></p>
<p><em><strong>This post is part of a three-post series</strong></em></p>
<ul>
<li><a href="http://www.dbms2.com/2010/08/22/workday-inc-company-overview/" >Workday Inc. company overview</a> (brief)</li>
<li><a href="http://www.dbms2.com/2010/08/22/workday-technology-stack/" >Workday Inc. technology overview</a> (detailed)</li>
<li>Workday Inc. CTO Stan Swete&#8217;s <a href="http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/" >comments on database strategy</a></li>
</ul>
<p><em>Edit: Also, there&#8217;s a <a href="http://blogs.workday.com/Blog.html" onclick="javascript:pageTracker._trackPageview('/blogs.workday.com');">Workday blog</a> with only a few posts, which nonetheless seems to flesh out a few of the ideas in this post series.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/22/workday-inc-company-overview/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Akiban highlights</title>
		<link>http://www.dbms2.com/2010/04/03/akiban-highlights/</link>
		<comments>http://www.dbms2.com/2010/04/03/akiban-highlights/#comments</comments>
		<pubDate>Sat, 03 Apr 2010 05:36:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Akiban]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Object]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1809</guid>
		<description><![CDATA[Akiban responded quickly to my complaints about its communication style, and I chatted for a couple of hours with senior Akiban techies Ori Herrnstadt, Peter Beaman and Jack Orenstein. It&#8217;s still early days for Akiban product development, so some details haven&#8217;t been determined yet, and others I just haven&#8217;t yet pinned down. Still, I know [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><strong><span style="font-weight: normal;">Akiban responded quickly to my </span></strong><a href="http://www.dbms2.com/2010/03/22/akibanakiba/" ><strong><span style="font-weight: normal;">complaints</span></strong></a><strong><span style="font-weight: normal;"> about its communication style, and I chatted for a couple of hours with senior Akiban techies Ori Herrnstadt, Peter Beaman and Jack Orenstein. It&#8217;s still early days for Akiban product development, so some details haven&#8217;t been determined yet, and others I just haven&#8217;t yet pinned down. Still, I know a lot more than I did a day ago. Highlights of my talk with Akiban included:<span id="more-1809"></span></span></strong></p>
<ul>
<li><strong><span style="font-weight: normal;">Akiban 	is basically in the business of making OLTP (OnLine Transaction 	Processing) DBMS.</span></strong></li>
<li><strong><span style="font-weight: normal;">That 	said, Akiban does not necessarily aspire to offer the DBMS that has 	the best update efficiency or throughput. In particular, the Akiban 	DBMS stores every datum twice, even before replication (and indexes) 	are taken into account.</span></strong></li>
<li><strong><span style="font-weight: normal;">Akiban 	wants to store everything as third normal form relational databases. 	I didn&#8217;t ask whether 3NF is a hard requirement, a Really Good Idea 	if you want Akiban to run fast (that&#8217;s the one I&#8217;d guess), or merely 	a general design assumption.</span></strong></li>
<li><strong><span style="font-weight: normal;">Akiban 	characterizes its core differentiators/value proposition as being:</span></strong>
<ul>
<li><strong><span style="font-weight: normal;">Scale-out</span></strong></li>
<li><strong><span style="font-weight: normal;">No 	need to pay the traditional cost of joins</span></strong></li>
</ul>
</li>
<li><strong><span style="font-weight: normal;">Thus, 	Akiban is telling something like a </span></strong><a href="http://www.dbms2.com/2010/03/14/nosql-taxonomy/" ><strong><span style="font-weight: normal;">NoSQL</span></strong></a><strong><span style="font-weight: normal;"> story.</span></strong></li>
<li><strong><span style="font-weight: normal;">However, 	Akiban offers SQL.</span></strong></li>
<li><strong><span style="font-weight: normal;">Specifically, 	Akiban offers SQL through a MySQL front end. However, the choice of 	front-end could change (Drizzle?), and non-relational front-ends 	(object?)* could eventually also be offered. </span></strong></li>
<li><strong><span style="font-weight: normal;">Akiban&#8217;s 	first target market is SaaS providers, specifically ones that have 	true multitenancy issues. More generally, Akiban is pursuing 	cloud/private cloud applications with lots of tables. (Ori talks of 	a few thousand tables as being a small number.) At least at first, 	Akiban is conceding the market for huge-volume, scale-out, 	no-expensive-join web databases to the NoSQL contenders.</span></strong></li>
<li><strong><span style="font-weight: normal;">Akiban 	has been in prototyping/development of some kind for several years. 	However, Akiban got its first angel funding early last year and its 	first venture funding late in 2009, so development only ramped up 	recently.</span></strong></li>
<li><strong><span style="font-weight: normal;">Ori 	tells a version of the rather common “Everything I need to know in 	life I learned in the Israeli Army, and now I&#8217;m commercializing it” 	story. However, I didn&#8217;t get the sense that Akiban is necessarily a 	direct extension of a specific Israeli military project.</span></strong></li>
</ul>
<p style="margin-bottom: 0in;"><strong><em><span style="font-weight: normal;">* A lot of Boston-area DBMS developers have significant non-relational experience. E.g., Jack Orenstein was an Object Design founder, and Peter Beaman used to work for Intersystems, both object-oriented DBMS vendors.</span></em></strong></p>
<p style="margin-bottom: 0in;"><strong><span style="font-weight: normal;">Akiban technical highlights include:</span></strong></p>
<ul>
<li><strong><span style="font-weight: normal;">Somewhat 	confusingly, Akiban databases are divided into “groups” of 	tables. The point of Akiban groups is:</span></strong>
<ul>
<li><strong><span style="font-weight: normal;">Many-to-many 	relationships exist only within Akiban groups, not among tables in 	different groups.</span></strong></li>
<li><strong><span style="font-weight: normal;">Tables 	within Akiban groups are kind of pre-joined; more precisely, data is 	organized physically in a way that anticipates joins.</span></strong></li>
<li><strong><span style="font-weight: normal;">Thus, 	most Akiban joins can be executed without the cost of traditional 	join algorithms.</span></strong></li>
</ul>
</li>
<li><strong><span style="font-weight: normal;">One 	copy of the data Akiban stores is, in effect, clustered by object. 	E.g., a customer and her orders are stored together, or a patient 	and the records of her doctor visits. That&#8217;s how Akiban anticipates 	most joins.</span></strong></li>
<li><strong><span style="font-weight: normal;">The 	other copy of Akiban data is stored in columns (I&#8217;m not sure if this 	part is strictly columnar or more hybrid row/column), which are 	ordered consistently. In particular, they&#8217;re in an order dictated by 	the organization of the other copy of the data, whatever that means. 	Akiban&#8217;s goal is for this copy of the data to support reporting, 	operational BI, etc.</span></strong></li>
<li><strong><span style="font-weight: normal;">Akiban 	relies heavily on its optimizer to determine data layout, probably more than conventional DBMS do.</span></strong></li>
<li><strong><span style="font-weight: normal;">In 	essence, Akiban has a MySQL front-end and a storage engine back end, 	each running on its own hardware cluster, with each node of one 	cluster talking to each node of the other. </span></strong></li>
<li><strong><span style="font-weight: normal;">I 	gather that Akiban distributes data among nodes clustered according 	to, in effect, object identifier. Presumably, inter-node joins are 	rare. But we didn&#8217;t discuss distribution, replication, or other 	scale-out issues in any detail. Indeed, I gathered that significant 	parts of all that weren&#8217;t built yet, and perhaps even not yet 	architected.<br />
</span></strong></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/04/03/akiban-highlights/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Three kinds of software innovation, and whether patents could possibly work for them</title>
		<link>http://www.dbms2.com/2010/03/23/software-innovation-patent/</link>
		<comments>http://www.dbms2.com/2010/03/23/software-innovation-patent/#comments</comments>
		<pubDate>Tue, 23 Mar 2010 08:18:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1763</guid>
		<description><![CDATA[In connection with an attempt to articulate my views on software patents (more on those below), I was thinking about the different ways in which software development can be innovative. And it turns out that most forms of software innovation can, at their core, be assigned to one or more of three overlapping categories:

Direct improvement [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">In connection with an attempt to articulate my views on software patents (more on those below), I was thinking about the different ways in which software development can be innovative. And it turns out that most forms of software innovation can, at their core, be assigned to one or more of three overlapping categories:<span id="more-1763"></span></p>
<ul>
<li><strong>Direct improvement in user 	interface or functionality.</strong> Examples (again overlapping) 	include:
<ul>
<li>True UI enhancements.</li>
<li>Application functionality that 	just lets you do more.</li>
<li>Most modern <strong>mobile, web, and/or 	social software</strong> efforts, in which a relatively small amount of 	coding effort produces features that may or may not lead to rapid 	viral adoption.</li>
<li>Ease or functionality not just for 	end users, but also for <strong>administrators.</strong> In particular, SaaS, <a href="http://www.dbms2.com/category/software-as-a-service-database-saas/cloud-computing/" >cloud</a>, <a href="http://www.dbms2.com/2009/06/08/the-future-of-data-marts/" >private cloud</a> and/or <a href="http://www.dbms2.com/category/database-management-system/data-warehouse-appliances/" >appliance</a> benefits are 	commonly concentrated in this area.</li>
<li>Languages and other <strong>programmer</strong> aids too.</li>
</ul>
</li>
<li><strong>Performance/efficiency 	improvement.</strong> Overlapping examples include:
<ul>
<li>Anything that directly purports to 	improve response time, hardware cost or utilization, or power/floor 	space consumption.</li>
<li>Anything to do with 	<a href="http://www.dbms2.com/category/parallelization/" >parallelization</a> or scale-out.</li>
<li>Many, many under-the-covers 	enhancements to make data more protected (against theft or loss 	alike), user features snazzier, and so on. With a few exceptions – 	which are generally regarded as unsolved artificial intelligence 	problems – almost anything can be hacked together quickly in some 	high-level programming tool, assuming performance is no of no 	concern. It&#8217;s getting the performance remotely right that can often 	slow market introduction.*</li>
</ul>
</li>
<li><strong>New or enhanced logical data 	model.</strong> Examples of innovation via data model – either truly 	new or else just newly implemented in a performant way &#8212; include:
<ul>
<li><strong>A huge fraction of application 	innovation,</strong> in “traditional” functionality and workflow 	alike. In several technological eras, just about everything about 	applications has been a commodity <strong>except</strong> the data model, but 	the data model alone was enough to provide long-lasting product 	differentiation. Indeed, it probably is true today, although that 	may finally change as business intelligence integration becomes a 	large part of application software technology.</li>
<li>Most things that are called 	<strong>knowledge representation.</strong></li>
<li>Many things that are described by 	terms like <a href="http://www.dbms2.com/2010/01/17/three-broad-categories-of-data/" >“unstructured” or “semi-structured”</a> data.</li>
<li>Most innovations described by 	terms such as <strong>metadata management.</strong></li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;">To check that I&#8217;m not being too glib here, let&#8217;s consider a few categories of software technology.</p>
<ul>
<li><strong>MPP analytic DBMS</strong> are all 	about performance/efficiency improvement (whether of SQL queries or 	<a href="http://www.dbms2.com/2010/02/22/netezza-twinfin/" >other analytics</a>), except when they&#8217;re about ease of 	administration and the like.</li>
<li><a href="http://www.dbms2.com/2009/10/10/enterprises-using-hadoo/" >Hadoop</a> is about scaling out 	cheap machines in a way that is (for some purposes) easy to program.</li>
<li>The core of <a href="http://www.dbms2.com/2010/03/14/nosql-taxonomy/" >NoSQL</a> is about 	efficient scale-out; easier programming also plays a big role.</li>
<li>Disruptive small vendor <strong>business 	intelligence</strong> innovation has a lot to do with <a href="http://www.dbms2.com/2009/05/30/reinventing-business-intelligence/" >better and more 	useful user experiences</a>,<span style="font-style: normal;"> except 	when it&#8217;s about ease of programming and/or administration. The BI 	industry is also moving to in-memory analytics, which harnesses 	better performance to provide more interactive user experiences.</span></li>
<li><em>SAS,</em><span style="font-style: normal;"> which has long competed on the basis of superior functionality for 	statistical programmers, is now also on a big performance kick via 	MPP analytic DBMS partnerships.</span></li>
<li><span style="font-style: normal;"><strong>Oracle&#8217;s 	DBMS efforts</strong></span><span style="font-style: normal;"> have long 	been focused on </span><a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" >performance</a><span style="font-style: normal;"> and </span><a href="http://www.monash.com/oracle10g.html" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">administrative usability</a>.</li>
<li><span style="font-style: normal;">As 	noted above, </span><span style="font-style: normal;"><strong>enterprise 	application</strong></span><span style="font-style: normal;"> functionality is usually all about the data model. Exceptions arise 	when there is a major generation of UI functionality, such as 	interactivity (long ago), GUIs (ditto), or BI integration (in its 	early days now). SaaS is also pitched as an ease-of-everything play.</span></li>
<li><span style="font-style: normal;"><strong>Administrative 	tools</strong></span><span style="font-style: normal;"> are usually about 	making administration easier. In a few cases (e.g., backups), 	they&#8217;re more about performance.</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">I&#8217;d say my proposed trichotomy is holding up pretty well.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">So what set me off on this line of reasoning? Well, </span><a href="http://redmonk.com/sogrady/2010/03/19/software-patents/" onclick="javascript:pageTracker._trackPageview('/redmonk.com');">Stephen O&#8217;Grady</a><span style="font-style: normal;"> wrote</span></p>
<blockquote>
<p style="margin-bottom: 0in;">The reason I am against software patents is … very simple. … I am against software patents because it is not reasonable to expect that the current patent system, nor even one designed to improve or replace it, will ever be able to accurately determine what might be considered legitimately patentable from the overwhelming volume of innovations in software. Even the most trivial of software applications involves hundreds, potentially thousands of design decisions which might be considered by those aggressively seeking patents as potentially protectable inventions. If even the most basic elements of these are patentable, as they are currently, the patent system will be fundamentally unable to scale to meet that demand. As it is today.</p>
<p><span style="font-style: normal;">In addition to questions of volume are issues of expertise; for some of the proposed inventions, there may only be a handful of people in the world qualified to actually make a judgment on whether a development is sufficiently innovative so as to justify a patent. None of those people, presumably, will be employed by the patent office. &#8230; Nor will two developers always come to the same conclusions as to the degree to which a given invention is unique. </span></p></blockquote>
<p><span style="font-style: normal;">In considering whether I agreed, I realized that the analysis is different for each of my three categories of innovation mentioned above.</span></p>
<ul>
<li><span style="font-style: normal;">In the case of a </span><span style="font-style: normal;"><strong>logical 	data model,</strong></span><span style="font-style: normal;"> O&#8217;Grady is 	almost surely right. Many of those are just copied from the real 	world anyway, and hence don&#8217;t meet any kind of “novel and 	non-obvious” test. The rest are so general and abstract it&#8217;s 	really hard to say what – if anything – is new and non-obvious 	about them vs. well-established, often academic prior art.</span></li>
<li><span style="font-style: normal;">In the case of </span><span style="font-style: normal;"><strong>performance 	enhancements,</strong></span><span style="font-style: normal;"> the core 	ideas can usually also be found in well-established computer science 	publications. What&#8217;s more, the true innovations may be such simple 	algorithms that they&#8217;re not patentable. What&#8217;s left over is </span><a href="http://www.dbms2.com/2009/08/21/bottleneck-whack-a-mole/" >incremental enhancement</a>.<span style="font-style: normal;"> Once again, O&#8217;Grady is right.</span></li>
<li><span style="font-style: normal;">But the case of </span><span style="font-style: normal;"><strong>user 	interface/experience enhancements</strong></span><span style="font-style: normal;"> is not so clear. Inventor comes up with a useful idea for something 	that hasn&#8217;t been built before. Inventor builds and patents it. I&#8217;m 	not sure how that&#8217;s different from the case of building physical 	devices of various kinds, which have been patented for centuries. 	Determining what&#8217;s novel or non-obvious doesn&#8217;t seem to require 	specialized technical knowledge, at least not above and beyond that 	required in other disciplines. </span></li>
</ul>
<p><span style="font-style: normal;"><strong>Bottom line:</strong></span><span style="font-style: normal;"> There are many other reasons to oppose software patents, but Stephen O&#8217;Grady&#8217;s “It&#8217;s impossible to adjudicate them fairly” argument remains unproven, at least when it is applied to software enhancements whose essence is better designs for user experiences.</span></p>
<p><em><strong>Related links:</strong></em></p>
<ul>
<li><span style="font-style: normal;">My negative comments about 	patents in the areas of <a href="http://www.dbms2.com/2010/02/11/google-mapreduce-patent/" >MapReduce</a> and <a href="http://www.dbms2.com/2010/01/15/vertica-sybase-ipatent-litigation/" >columnar DBMS</a></span></li>
<li><a href="http://www.monashreport.com/2006/04/06/microsoft-underscores-its-core-paradigm/" onclick="javascript:pageTracker._trackPageview('/www.monashreport.com');"><span style="font-style: normal;">Three standpoints from which 	to view a software product strategy</span></a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/23/software-innovation-patent/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Open issues in database and analytic technology</title>
		<link>http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/</link>
		<comments>http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/#comments</comments>
		<pubDate>Mon, 01 Feb 2010 22:04:31 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1507</guid>
		<description><![CDATA[The last part of my New England Database Summit talk was on open issues in database and analytic technology. This was closely intertwined with the previous section, and also relied on a lot that I&#8217;ve posted here. So I&#8217;ll just put up a few notes on that part, with lots of linkage to prior discussion [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">The last part of my <a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/" >New England Database Summit</a> talk was on open issues in database and analytic technology. This was closely intertwined with the <a href="http://www.dbms2.com/2010/01/31/trends-database-aanalytic-technology/" >previous section</a>, and also relied on a lot that I&#8217;ve posted here. So I&#8217;ll just put up a few notes on that part, with lots of linkage to prior discussion of the same points.<span id="more-1507"></span></p>
<p><!-- 		@page { margin: 0.79in } 		P { margin-bottom: 0.08in } --></p>
<ul>
<li>The most important issue in 	database and analytic technology, in my opinion, isn&#8217;t technological 	at all – rather, it&#8217;s the legal and political steps needed to <a href="http://www.dbms2.com/2010/01/31/data-based-snooping-threat-libert/" > preserve liberty</a> in the face of advancing, intrusive 	technology.</li>
<li>Another important issue for 	society – and this one does involve a lot of technology – is 	scientific number crunching. In particular, <a href="http://www.dbms2.com/2009/10/03/issues-in-scientific-data-management/" >database technology for 	scientific computing</a> needs to be developed much further. I&#8217;ll have 	more to say on all this soon.</li>
<li>More generally, technology needs 	to keep advancing for parallel analytics. Fortunately, it is. Watch 	this space over the next few weeks.</li>
<li>Oracle has said, in effect, that <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" > its most important technological challenge of the decade</a> is getting 	<a href="http://www.dbms2.com/2010/01/31/flash-pcmsolid-state-memory-disk/" >solid-state memory</a> right. I agree.</li>
<li>Data volumes will keep going up, 	up, up. Technology needs to keep evolving accordingly. Much of what 	I write is on that subject.</li>
<li>Data needs to be processed and analyzed at <a href="http://www.dbms2.com/2009/09/10/analytic-speed-latency/" >very 	different latencies</a>. And there&#8217;s much further to go in integrating 	disparate latencies.</li>
<li>Analytic database management in 	the cloud hasn&#8217;t been solved yet, especially for Big Data. Among the 	reasons are the difficulty of moving data into the cloud (unless it 	originated there), the slowness of moving it from node to node in 	shared-nothing architectures (which reduces the elasticity benefit), 	and above all the long and unpredictable latencies of interprocessor 	communication while queries are running (a key subject of discussion 	at the <a href="http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/" >Boston Big Data Summit</a>).</li>
<li>Better business intelligence user 	interfaces are increasingly available. I&#8217;m thinking particularly of 	approaches with buzzwords like <a href="http://www.dbms2.com/2008/08/04/qliktech-qlikview-update/" >visualization/interactive exploration</a> or <a href="http://www.texttechnologies.com/2007/08/03/the-case-for-inxight-awareness-server/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">faceted</a>. But they aren&#8217;t well-integrated into the overall 	analytic stack, as big BI vendors are trailing the smaller ones in 	this regards. (Part of the problem relates to my previous point.)</li>
<li>Application development over text 	search isn&#8217;t in the same league as application development over 	relational DBMS. The choices are mainly XML (e.g., <a href="http://www.texttechnologies.com/2008/04/29/mark-logic-viewed-as-a-different-kind-of-text-search-technology-vendor/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">MarkLogic</a>), SQL 	for text integrated into RDBMS (limited by the weakness of those 	integrations), and something like <a href="http://www.texttechnologies.com/2008/09/20/attivio-update/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">Attivio&#8217;s Java SDK</a>. There&#8217;s a 	major conceptual barrier in building those apps, namely the 	unpredictability of query results. Still, it should be possible to 	do better.</li>
<li>Similarly, text analytics and 	conventional analytics exist well side by side. They can even be in 	the same database and/or dashboard, although in practice that is 	limited by the strong <a href="http://www.texttechnologies.com/2008/10/24/attensity-update-2/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">SaaS focus of text mining vendors and users</a>. But analytic 	integration of them is really hard. Linguistic imprecision is, in my 	opinion, only the #2 reason for this difficulty. The #1 reason is 	that trends detected by text analytics are much less precise than 	trends on tabular data – e.g., a 50% increase in a certain kind of 	complaint may be no more significant than a 5% change in a revenue 	variable.</li>
<li>I&#8217;m increasingly persuaded that <a href="http://www.dbms2.com/2009/08/21/social-network-analysis-aka-relationship-analytics/" > graph analytics</a> can be handled without a graph-centric data model. 	But right now, it isn&#8217;t being handled well at all. Lots more needs 	to be done – although when it is, it will just exacerbate the 	privacy/liberty dangers that so concern me.</li>
</ul>
<p><em><strong>Other posts based on my January, 2010 New England Database Summit keynote address</strong></em></p>
<ul>
<li><a title="Data-based snooping — a huge threat to liberty that we’re all helping make worse" href="../2010/01/31/data-based-snooping-threat-libert/">Data-based snooping — a huge threat to liberty that we’re all helping make worse</a></li>
<li><a title="Flash, other solid-state memory, and disk" href="../2010/01/31/flash-pcmsolid-state-memory-disk/">Flash, other solid-state memory, and disk</a></li>
<li><a title="Interesting trends in database and analytic technology" href="../2010/01/31/trends-database-aanalytic-technology/">Interesting trends in database and analytic technology</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>More miscellany</title>
		<link>http://www.dbms2.com/2009/12/30/more-miscellany/</link>
		<comments>http://www.dbms2.com/2009/12/30/more-miscellany/#comments</comments>
		<pubDate>Wed, 30 Dec 2009 11:38:22 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Continuent]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Rainstor]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1370</guid>
		<description><![CDATA[Adding to yesterday&#8217;s varied quick comments:
Robert Hodges of Continuent offers a great outline of Continuent&#8217;s clustering story, with a lot of &#8220;Now we got right what we previously didn&#8217;t know/admit we got wrong.&#8221; Continuent now claims to have a strong clustering offering, both paid and free/open-source, for both MySQL and PostgreSQL, with Oracle support perhaps [...]]]></description>
			<content:encoded><![CDATA[<p>Adding to <a href="http://www.dbms2.com/2009/12/29/this-and-that/" >yesterday&#8217;s varied quick comments</a>:<span id="more-1370"></span></p>
<p><a href="http://www.dbms2.com/2009/09/03/continuent-on-clustering/" >Robert Hodges</a> of <strong>Continuent</strong> offers <a href="http://scale-out-blog.blogspot.com/2009/12/proving-masterslave-clusters-work-and.html" onclick="javascript:pageTracker._trackPageview('/scale-out-blog.blogspot.com');">a great outline of Continuent&#8217;s clustering story</a>, with a lot of &#8220;Now we got right what we previously didn&#8217;t know/admit we got wrong.&#8221; Continuent now claims to have a strong <strong>clustering</strong> offering, both paid and free/open-source, for both MySQL and PostgreSQL, with Oracle support perhaps coming really soon.</p>
<p>Merv Adrian, who has <a href="http://www.dbms2.com/2009/06/22/the-tpc-h-benchmark-is-a-blight-upon-the-industry/" >overrated the importance of TPC benchmarks</a> in the past, seems to have become more <a href="http://mervadrian.wordpress.com/2009/12/23/additional-caveats-obscure-oracles-tpc-benchmark/" onclick="javascript:pageTracker._trackPageview('/mervadrian.wordpress.com');">skeptical</a>.</p>
<p>Interim CEO <a href="http://www.infobright.com/Blog/ceo_blog" onclick="javascript:pageTracker._trackPageview('/www.infobright.com');">Mark Burton</a> laid out<strong> Infobright&#8217;s focus</strong> pretty clearly when he took over:</p>
<blockquote><p><span style="letter-spacing: 0px;"> &#8230; the focus must be in building products that fit market segments where ease-of-use and easily attainable performance are valued.  This doesn’t sound like the high end of Data Warehousing to me where highly complex MPP architectures and teams of DBAs spend their time.  It sounds like the realm of Departmental IT and SMB where business leaders are in a hurry to gain access to data and answers without the lead time and pain of complex architectures and high costs.</span></p></blockquote>
<p><span style="letter-spacing: 0px;">I&#8217;m hearing about a <strong>SaaS focus</strong> from a lot of companies. The Continuent link above mentions one. So does <a href="http://www.rainstor.com/news-blog/news/users-demand-saas-data-escrow-services" onclick="javascript:pageTracker._trackPageview('/www.rainstor.com');">RainStor&#8217;s latest blog post</a>. <a href="http://www.dbms2.com/2009/12/27/introduction-to-gooddata/" >Gooddata</a>, a SaaS vendor itself, seems focused on analyzing data that was originally created via SaaS. I haven&#8217;t talked with Cast Iron or Pervasive for a while, but when I did, their ETL market targeting was <a href="http://www.dbms2.com/2008/03/21/cast-iron-systems-focuses-on-saas-data-integration/" >all about SaaS</a>. And of course, I hear dumber SaaS-focus ideas as well. I think the biggest substantive reason for this trend is &#8212; i</span><span style="letter-spacing: 0px;">f you don&#8217;t have the broadest feature set, and fear large enterprises therefore won&#8217;t want your stuff, going after SMBs makes sense. And SMBs are presumed to be going SaaS. Also in the mix, of course, are a single platform to support, a small number of large SaaS vendors to sell to or partner with, and/or general trendiness.<br />
</span></p>
<p><span style="letter-spacing: 0px;"><br />
</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/12/30/more-miscellany/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Introduction to Gooddata</title>
		<link>http://www.dbms2.com/2009/12/27/introduction-to-gooddata/</link>
		<comments>http://www.dbms2.com/2009/12/27/introduction-to-gooddata/#comments</comments>
		<pubDate>Mon, 28 Dec 2009 03:16:30 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Amazon and its cloud]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Gooddata]]></category>
		<category><![CDATA[Jaspersoft]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1341</guid>
		<description><![CDATA[Around the end of the Cold War, Esther Dyson took it upon herself to go repeatedly to Eastern Europe and do a lot of rah-rah and catalysis, hoping to spark software and other computer entrepreneurs. I don&#8217;t know how many people&#8217;s lives she significantly affected – I&#8217;d guess it&#8217;s actually quite a few – but [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Around the end of the Cold War, Esther Dyson took it upon herself to go repeatedly to Eastern Europe and do a lot of rah-rah and catalysis, hoping to spark software and other computer entrepreneurs. I don&#8217;t know how many people&#8217;s lives she significantly affected – I&#8217;d guess it&#8217;s actually quite a few – but in any case the number is not zero. Roman Stanek, who has built and sold a couple of software business, cites her as a key influence setting him on his path.</p>
<p style="margin-bottom: 0in;">Roman&#8217;s latest venture is business intelligence firm Gooddata. Gooddata was founded in 2007 and has been soliciting and getting attention for a while, so I was surprised to learn that Gooddata officially launched just a few weeks ago. Anyhow, some less technical highlights of the Gooddata story include:<span id="more-1341"></span></p>
<ul>
<li>Gooddata believes it makes BI easy 	to adopt, unlike every other BI vendor on the planet &#8212; not 	excluding the many other BI vendors who say the same thing about 	themselves. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
<li>Gooddata is entirely cloud-based, 	specifically in the Amazon cloud.  I.e., Gooddata is selling 	SaaS-based BI.</li>
<li>Gooddata wants to sell to 	enterprises that are large enough to have more than a couple of BI 	users, and small enough not to be well served by the BI market 	leaders.
<ul>
<li>In revenue terms, this is the ever-popular $100 million &#8211; 	$1 billion market.</li>
<li>Specifically, Gooddata believes 	that those enterprises may have decent “back office” BI, but 	don&#8217;t have much in the front office. Gooddata wants to provide them 	with front office BI, which seems to basically mean CRM analytics. 	Gooddata sees this as a market in which QlikTech is the major 	player.  Generally, Gooddata wants to emulate and go after QlikTech.</li>
<li>Even more specifically, Gooddata 	wants to sell to Salesforce.com customers, who it believes are not 	well-served by what passes for built-in analytics at Salesforce. 	Partnering with NetSuite didn&#8217;t work as well, since NetSuite&#8217;s 	customers turn out to be smaller firms than are in Gooddata&#8217;s target 	market.</li>
</ul>
</li>
<li>Something I heard from both 	Jaspersoft and Gooddata is that there&#8217;s a hot market in providing 	cloud-based BI to online gaming companies. I gather these are mainly 	games running on mass communication platforms such as Facebook or 	the iPhone. Surely not coincidentally, it seems likely that:
<ul>
<li>These are small companies whose 	success – and hence data intake – can suddenly explode.</li>
<li>The data originates in cyberspace, 	with no particular need ever to come to the game companies&#8217; own 	premises.</li>
</ul>
</li>
<li>Gooddata has 50 production 	customers.</li>
<li>Gooddata had 2500 “projects” 	at the end of beta in June, and is adding 100 more per month. (Those 	numbers look weird together.) A “project” is a lot like a 	database, with associated reports, security privileges, etc.</li>
<li>Gooddata has close to 40 people, 	mainly in development.</li>
<li>I didn&#8217;t detect much of a sales 	strategy, nor much of a marketing strategy beyond the impressive 	early buzz generation. Perhaps that&#8217;s a partial explanation as to 	why the rate of Gooddata adoption fell even before the company 	officially launched.</li>
<li>I forgot to ask what those 50 	customers were actually paying, but considering Gooddata&#8217;s price 	list, it appears a typical price range for Gooddata&#8217;s stuff would be 	$500-$2,000/month.</li>
</ul>
<p style="margin-bottom: 0in;">Gooddata technical highlights include:</p>
<ul>
<li>Gooddata is building an 	entire BI stack – reporting, dashboards, ETL, in-memory database 	management, everything. I doubt Gooddata would claim that the pieces 	are best-of-breed in many ways other than BI ease of adoption and 	use.</li>
<li>So far I&#8217;ve seen three Gooddata 	ease-of-use features or feature groups that strike me as 	differentiated – <strong>reusability</strong> (of metrics and/or reports), 	<strong>collaboration,</strong> and <strong>tag clouds.</strong> More on those below. 	Gooddata is also building toward an <strong>agility</strong> pitch, but those 	features aren&#8217;t all baked yet.</li>
<li>Gooddata is MySQL-based today, but 	plans to move to a memory-centric compressed column store in 2010. 	Roman doesn&#8217;t reject analogies to SAP&#8217;s <em>BI/BW/whatever 	Accelerator. </em><span style="font-style: normal;">Yes, folks – 	Gooddata is yet another BI vendor doing some form of memory-centric 	OLAP. That&#8217;s a big trend.</span></li>
<li>I&#8217;m guessing 	that a big reason Gooddata is reinventing so many technical wheels 	is to ensure that the Gooddata stack is seamlessly multi-tenant from 	top to bottom. (Hasso Plattner of SAP&#8217;s <a href="../2009/07/07/hasso-plattner-calls-for-in-memory-oltp-column-stores/">comments 	on a similar idea</a> suggest a similar emphasis.)</li>
<li>Gooddata has 	its own multidimensional query language called MAQL (the A doesn&#8217;t 	seem to stand for anything). Today MAQL generates SQL for MySQL. The 	future columnar memory-centric data store will &#8212; I think – 	understand MAQL natively.</li>
</ul>
<p style="margin-bottom: 0in;">Now we get to the good stuff. When I wrote about <a href="../2009/05/30/reinventing-business-intelligence/">reinventing business intelligence</a> back in May, I focused on some interesting developments I see as actually underway &#8212; at least on an experimental basis and/or from small vendors – namely:</p>
<ul>
<li><strong>Text-search interfaces. </strong>Well, 	while I didn&#8217;t see true text search in the Gooddata demo, I did see 	tag clouds, which have some of the same benefits.</li>
<li><strong>Collaboration tools.</strong> Well, 	Gooddata has a nice-looking approach to BI collaboration, heavily 	reflected in its UI metaphors. (That said, I haven&#8217;t really compared 	Gooddata to Microsoft SharePoint or SAP&#8217;s Portal/Rooms/whatever.)</li>
<li><strong>Memory-centric analytics</strong> (for speed of exploration). As noted above, Gooddata has that coming 	soon.</li>
<li><strong>Data exploration that tries to 	ignore fixed relational schemas,</strong> ala Attivio or Splunk.  Roman 	says Gooddata is interested in or working on that, but offers no 	timetable.</li>
</ul>
<p style="margin-bottom: 0in;">Meanwhile, something I&#8217;ve been seeking for years, but haven&#8217;t seen much progress on since enhancement stopped on Cognos Metrics Manager, is more <a href="../2007/11/13/the-key-problem-with-dashboard-functionality/">user-friendly metrics management</a>.  Well, it doesn&#8217;t have a lot of bells and whistles, but at least Gooddata has the basics – a list of already-defined metrics, and a reasonable way of compounding them into other metrics. I think that kind of thing will be a major BI feature going forward, to the point that a few years from now we&#8217;ll be worrying about how to port them from one BI vendor&#8217;s tool from another.</p>
<p style="margin-bottom: 0in;"><strong>Bottom line: If you&#8217;re interested in BI, you should look at a Gooddata demo.</strong></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/12/27/introduction-to-gooddata/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Boston Big Data Summit keynote outline</title>
		<link>http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/</link>
		<comments>http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/#comments</comments>
		<pubDate>Mon, 23 Nov 2009 06:25:50 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[DBMS product categories]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Humor]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1227</guid>
		<description><![CDATA[Last month, Bob Zurek asked me to give a talk on “Big Data”, where “big” is anything from a few terabytes on up, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I&#8217;m posting them below.

The top two points [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Last month, Bob Zurek asked me to give a talk on <a href="http://www.dbms2.com/2009/10/09/presentations-upcoming/" >“Big Data”, where “big” is anything from a few terabytes on up</a>, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I&#8217;m posting them below.</p>
<p><span id="more-1227"></span></p>
<p style="margin-bottom: 0in;">The top two points from Q&amp;A probably were:</p>
<ul>
<li><strong>Big Data and the cloud actually 	have relatively little to do with each other,</strong> <a href="http://www.dbms2.com/2009/10/30/aster-data-application-server-ncluster/" >a few exceptions</a> notwithstanding, especially if the data is in a shared-nothing DBMS 	(as opposed to, say, a MapReduce-oriented file cluster). Two 	principal reasons are:
<ul>
<li>Redistributing data from node to 	node is a little slow, undermining some of the elasticity benefits 	of the cloud.</li>
<li><a href="http://www.dbms2.com/2009/05/29/sneakernet-to-the-cloud/" >Getting data into the cloud in the 	first place is a lot slow</a>.</li>
</ul>
</li>
<li><strong>The NoSQL movement is a lot like 	the Ron Paul campaign</strong> &#8212; it consists of people who are dissatisfied 	with the status quo, whose dissatisfaction has a lot to do with 	insufficient liberty and/or excessive expenditure, and who otherwise 	don&#8217;t have a whole lot in common with each other.</li>
</ul>
<p style="margin-bottom: 0in;">Anyhow, here are my notes for the talk, edited in just a couple of places for readability or linkage.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><strong>Quick introduction</strong></p>
<ul>
<li>Big Data vs. cloud</li>
<li>How big is Big Data?</li>
<li>At the low end of that range, 	there&#8217;s little you can&#8217;t do with conventional technology if you 	have:
<ul>
<li>An unlimited budget for hardware</li>
<li>An unlimited budget for software</li>
<li>An unlimited budget for people, 	especially Oracle DBAs</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Big Data in OLTP</strong></p>
<ul>
<li>Hard-core OLTP
<ul>
<li>Focus of DBMS technology for a 	long-time</li>
<li>Big budgets because each 	transaction has significant value</li>
<li>Tough to get users to change 	technologies</li>
</ul>
</li>
<li>Lighter-weight OLTP
<ul>
<li>Classic example = web companies
<ul>
<li>Big ones &#8212;  retail-oriented ones 	(eBay, Amazon) partially excepted &#8212; <a href="http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/" >rolled their own technology 	stacks</a></li>
<li>Reluctant to give money to anybody
<ul>
<li>Open source, etc.</li>
</ul>
</li>
</ul>
</li>
<li>Difficulty finding market
<ul>
<li>Product vs. feature
<ul>
<li>Clustering/HA/DR/whatever</li>
<li>Ditto cloud enablement</li>
</ul>
</li>
<li>True products haven&#8217;t found much 	traction yet</li>
</ul>
</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Analytic Big Data use cases</strong></p>
<ul>
<li>Kinds of data for analytics
<ul>
<li>More of same != big</li>
<li>More detail and/or new kinds
<ul>
<li>Complete data sets</li>
<li>Transactions</li>
<li>Call details</li>
<li>Tick/trade history</li>
<li>Web clickstreams</li>
<li>Network event logs</li>
<li>Other machine-generated data</li>
<li>CAM bottom line
<ul>
<li>Anything human-generated should 	and will be retained in its entirety</li>
<li>Quantities of machine-generated 	data retained should and will grow roughly in line w/ computing cost 	reductions (Moore&#8217;s Law, etc.)</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li>Analytic uses of Big Data
<ul>
<li>Analytics is mainly about three 	things
<ul>
<li>Problem detection</li>
<li>Customer relationship improvement
<ul>
<li>(Those overlap when the customer 	relationship is bad)</li>
</ul>
</li>
<li>Financial statements on steroids</li>
</ul>
</li>
</ul>
<ul>
<li>Main kinds of analytics
<ul>
<li>What BI vendors traditionally sell
<ul>
<li>General reporting and dashboards</li>
<li>Ad-hoc query (now driven from 	those reports and dashboards)</li>
<li>Planning (allegedly integrated 	with BI)</li>
</ul>
</li>
<li>Research
<ul>
<li>Ad hoc relational query (worth 	mentioning twice because it drives so much of the market)</li>
<li>Data mining</li>
<li>Most web search and web mining</li>
</ul>
</li>
<li>Operational/near-real-time</li>
<li>Archiving/compliance</li>
</ul>
</li>
<li>What gets Big?
<ul>
<li>Mainly research and archiving</li>
<li>But when reporting or operational 	get Big, you have really interesting computing problems</li>
</ul>
</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Technology issues and trends</strong></p>
<ul>
<li>Moore&#8217;s Law
<ul>
<li>CPUs &#8212; All about cores, hence 	parallelism is key</li>
<li>RAM</li>
<li>SSDs – hence replace disks</li>
<li>Sensors – hence generate lots 	more data</li>
</ul>
</li>
<li>Kryder&#8217;s Law
<ul>
<li>But <a href="http://www.dbms2.com/2005/11/13/breaking-the-disk-speed-barrier/" >rotational speeds up only 	12.5X since Eisenhower Administration</a></li>
<li>Hence solid-state memory (or RAM) 	will soon take over</li>
</ul>
</li>
<li>In the mean time, I/O bottlenecks 	have had to be beaten
<ul>
<li>Hence sequential scans</li>
<li>Hence <a href="http://www.dbms2.com/2007/03/26/index-light-mpp-data-warehouse-appliances/" >index-light</a> architectures</li>
<li>Hence columnar</li>
</ul>
</li>
<li>DBMS “overhead”
<ul>
<li>Raw license and maintenance fees – 	software increasing fraction of total</li>
<li>OLTP vestiges – locking and all 	that</li>
<li>DBAs
<ul>
<li>People costs = huge fraction of 	total</li>
<li>Index-lightness addresses</li>
<li>So does appliance</li>
</ul>
</li>
<li>Many people don&#8217;t really know how to 	write SQL</li>
</ul>
</li>
<li>Configuration
<ul>
<li>Appliance/tightly-balanced
<ul>
<li>Netezza</li>
<li>Teradata earlier</li>
<li>Greenplum/Sun</li>
<li>Oracle</li>
<li>IBM</li>
<li>Microsoft/Madison</li>
</ul>
</li>
<li>Commodity/do what you want
<ul>
<li>Vertica</li>
<li>Greenplum now</li>
<li>Infobright, Aster and others</li>
<li>MapReduce-oriented file systems</li>
</ul>
</li>
<li><a href="http://www.dbms2.com/2009/10/25/data-warehouse-balanced-hardware-configuration/" >Extreme rigidity is silly</a>
<ul>
<li><a href="http://www.dbms2.com/2009/10/25/teradata-hardware-strategy-and-tactics/" >Teradata, Oracle have both 	signaled moving to more modularity</a></li>
<li>Big driver of that = heterogeneous 	storage
<ul>
<li>Cheap disk</li>
<li>Expensive disk</li>
<li>Solid-state</li>
<li>RAM</li>
</ul>
</li>
</ul>
<ul>
<li>CPU/storage ratio is even more of a 	driver</li>
</ul>
</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Theoretically defensible ways to segment the market</strong></p>
<ul>
<li><a href="http://www.dbms2.com/2009/09/10/analytic-speed-latency/" >Latency requirements</a>
<ul>
<li>High availability and low latency 	go together</li>
</ul>
</li>
<li>Query types
<ul>
<li>Simultaneous users for same</li>
</ul>
</li>
<li>Database size</li>
<li>Budget</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Actual segments right now</strong></p>
<ul>
<li><a href="http://www.dbms2.com/2009/08/24/teradatas-active-enterprise-data-warehouse-story/" >Utter ADW/EDW</a></li>
<li>Data mart
<ul>
<li>Size</li>
<li>Naturally columnar vs. naturally 	row-based</li>
</ul>
</li>
<li>Operational/frontline</li>
<li>Less dramatic/smaller EDW</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Aster Data 4.0 and the evolution of &#8220;advanced analytic(s) servers&#8221;</title>
		<link>http://www.dbms2.com/2009/10/30/aster-data-application-server-ncluster/</link>
		<comments>http://www.dbms2.com/2009/10/30/aster-data-application-server-ncluster/#comments</comments>
		<pubDate>Sat, 31 Oct 2009 01:56:55 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1198</guid>
		<description><![CDATA[Since Linda and I are leaving on vacation in a few hours, Aster Data graciously gave me permission to morph its “12:01 am Monday, November 2” embargo into “late Friday night.”
Aster Data is officially announcing the 4.0 release of nCluster. There are two big pieces to this announcement:

Aster is 	offering a slick vision for integrating [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><em>Since Linda and I are leaving on vacation in a few hours, Aster Data graciously gave me permission to morph its “12:01 am Monday, November 2” embargo into “late Friday night.”</em></p>
<p style="margin-bottom: 0in; font-style: normal;">Aster Data is officially announcing the 4.0 release of nCluster. There are two big pieces to this announcement:</p>
<ul>
<li>Aster is 	offering a slick vision for integrating big-database management and 	general analytic processing on the same MPP cluster, under the 	not-so-slick name “Data-Application Server.”</li>
<li>Aster is also 	offering a sophisticated vision for workload management.</li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;">In addition, Aster has matured nCluster in various ways, for example cleaning up a performance problem with single-row updates.</p>
<p style="margin-bottom: 0in; font-style: normal;">Highlights of the Aster “Data-Application Server” story include:<span id="more-1198"></span></p>
<ul>
<li>At its core, 	the Aster “Data-Application Server” is the Aster nCluster MPP 	analytic DBMS, enhanced with basic application server functionality 	(I didn&#8217;t ask for details of that part), running on the same 	nCluster worker nodes that answer SQL queries.</li>
<li>Thus, Aster is 	eliminating a lot of the data movement that plagues three-tier 	architectures and other less-integrated approaches.</li>
<li>The Aster 	“Data-Application Server” further offers integrated workload 	management for applications and queries; more on that below.</li>
<li>The Aster 	“Data-Application Server” requires applications to be 	parallelized and invoked via Aster&#8217;s <a href="../2009/10/15/mapreduce-webinar-slides/">SQL/MapReduce.</a></li>
<li>As befits a 	MapReduce-based system, the Aster “Data-Application Server” lets 	you write your applications in lots of different languages (the 	usual suspects, and it also does .NET).</li>
<li>The Aster 	“Data-Application Server” runs applications in their own process 	spaces, protecting the DBMS server from crashes and other damaging 	behavior.</li>
<li>The Aster 	“Data-Application Server” allows applications to manage memory 	themselves, persistently, and not just via relational constructs. 	Thus, if you want your application to maintain a graph, mini rules 	engine, and/or finite state machine, you can, without doing SQL 	contortions.</li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;">In a compelling proof point for the Aster Data-Application Server&#8217;s slickness, Aster has leapfrogged Teradata and Netezza in the extent to which SAS functionality is integrated into Aster&#8217;s DBMS. (Aster and SAS both say that you can do full SAS modeling in parallel on Aster, but even so I wouldn&#8217;t be surprised to discover there were some parts of SAS&#8217; system that turned out to be exceptions.) Of course, Aster is hardly the only analytic DBMS vendor to have the idea of explicitly enhancing general analytic processing; that&#8217;s why we see lots of MapReduce announcements, and it&#8217;s also why Teradata enhanced its UDFs (User-Defined Functions) to have some kind of persistent memory.* But I don&#8217;t know of anybody else whose approach is quite so elegant and general at this time.</p>
<p style="margin-bottom: 0in;"><em>*Unfortunately, I don&#8217;t yet know much about Teradata&#8217;s UDF enhancements. I neglected to drill down on Global Persistent Memory when it was mentioned a couple of times at Teradata Partners last week, and Teradata was unable to accommodate my request this week for a rapid follow-up briefing on the subject.</em></p>
<p style="margin-bottom: 0in; font-style: normal;">Aster&#8217;s approach to workload management is similarly stylish. The idea is:</p>
<ul>
<li>Lots of 	variables are available to be taken into account (e.g., user role, 	expected query duration, actual duration of a running query, etc.)</li>
<li>SQL statements 	can be written against any of these variables.</li>
<li>The SQL 	statements serve as rules to set query/task priorities.</li>
<li>There seem to 	be a few different ways to measure priority, including explicit 	allocation of CPU or I/O resources, as well as more conventional 	“This group of queries gets higher priority than that one” 	kinds of metrics.</li>
<li>The whole 	thing provides integrated workload management for queries, 	applications, load jobs, data redistribution, and so on.</li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;">Right now the interface is – well, you&#8217;re manipulating a SQL table. A more conventional workload management GUI is slated for the second quarter of 2010.</p>
<p style="margin-bottom: 0in; font-style: normal;">Discussing subjects such as mirroring and ILM (Information Lifecycle Management) with Aster can be tricky, as Aster uses the word “partition” in confusing ways. Anyhow, Aster has a few different levels of compression, and the ability to apply different levels of compression to different partitions, to change compression levels via ALTER TABLE, and to alter (presumably increase) compression on the fly when doing online backup. Aster is also part of a growing trend to eschew RAID, instead doing mirroring in its own software.  (Other examples of this strategy would be <span><a href="http://www.dbms2.com/2009/10/06/oracle-and-vertica-on-compression-and-other-physical-data-layout-features/" >Vertica</a>, <a href="http://www.dbms2.com/2008/09/28/oracle-database-machine-performance-and-compression/" >Oracle Exadata/ASM</a>, and <a href="http://www.dbms2.com/2009/10/25/teradata-hardware-strategy-and-tactics/" >Teradata Fallback</a>.) </span><span>Prior to nCluster 4.0, this caused a problem, in that the block sizes for mirroring were so large as to create a lag in transactional updating. But Aster says this problem is now solved, and indeed claims that nCluster 4.0 is superior to most rivals in transactional efficiency.</span></p>
<p style="margin-bottom: 0in;">And finally, while I was talking w/ Aster Data anyway, I checked up on cloud and MapReduce customer penetration. The answers were:</p>
<ul>
<li>Aster has two serious production 	cloud users, both of which have been disclosed for a while, namely:
<ul>
<li>ShareThis, which runs Aster 		nCluster on Amazon EC2</li>
<li>Didit, which runs Aster nCluster 		on AppNexus</li>
</ul>
</li>
<li>Outside of those two, Aster sees 	some cloud use for test, development, prototyping, etc.</li>
<li>Every single Aster customer uses 	<a href="../2009/10/15/mapreduce-webinar-slides/">SQL/MapReduce</a> &#8212; i.e., they invoke MapReduce via Aster nCluster SQL queries.</li>
<li>Some of those customers use MapReduce for ETL, some use it 	for actual analytics.</li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/30/aster-data-application-server-ncluster/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>
