<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Theory and architecture</title>
	<atom:link href="http://www.dbms2.com/category/database-theory-practice/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 02 Sep 2010 09:06:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>More on NoSQL and HVSP (or OLRP)</title>
		<link>http://www.dbms2.com/2010/08/26/nosql-hvsp-olrp/</link>
		<comments>http://www.dbms2.com/2010/08/26/nosql-hvsp-olrp/#comments</comments>
		<pubDate>Thu, 26 Aug 2010 09:10:31 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Akiban]]></category>
		<category><![CDATA[Basho and Riak]]></category>
		<category><![CDATA[Cache]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Clustrix]]></category>
		<category><![CDATA[CouchDB]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Object]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Riptano]]></category>
		<category><![CDATA[Schooner]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Tokutek]]></category>
		<category><![CDATA[memcached]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2907</guid>
		<description><![CDATA[Since posting last Wednesday morning that I&#8217;m looking into NoSQL and HVSP, I&#8217;ve had a lot of conversations, including with (among others):

Dwight Merriman of 10gen (MongoDB)
Damien Katz of Couchio (CouchDB)
Matt Pfeil of Riptano (Cassandra)
Todd Lipcon of Cloudera (HBase committer)
Tony Falco of Basho (Riak)
John Busch of Schooner
Ori Herrnstadt of Akiban

By no means do I have time [...]]]></description>
			<content:encoded><![CDATA[<p>Since posting last Wednesday morning that <a href="http://www.dbms2.com/2010/08/18/nosql-hvsp-adoption/" >I&#8217;m looking into NoSQL and HVSP</a>, I&#8217;ve had a lot of conversations, including with (among others):</p>
<ul>
<li>Dwight Merriman of 10gen (MongoDB)</li>
<li>Damien Katz of Couchio (CouchDB)</li>
<li>Matt Pfeil of <a href="http://www.dbms2.com/2010/07/06/riptano-and-cassandra-adoption/" >Riptano</a> (Cassandra)</li>
<li>Todd Lipcon of Cloudera (HBase committer)</li>
<li>Tony Falco of Basho (Riak)</li>
<li>John Busch of Schooner</li>
<li><strong><span style="font-weight: normal;">Ori Herrnstadt</span></strong> of <a href="http://www.dbms2.com/2010/04/03/akiban-highlights/" >Akiban</a></li>
</ul>
<p><span id="more-2907"></span>By no means do I have time to do these conversations justice, in terms of giving them the write-ups and/or immediate follow-up that they deserve. Indeed, I&#8217;ll leave for vacation Saturday morning with my 2000-word NoSQL article still unwritten. So I&#8217;ll dump as many observations as I can into one or a few posts now, and play catch-up later as circumstances allow.</p>
<p>In no particular order:</p>
<ul>
<li>A number of NoSQL offerings have had more uptake to date than most of the scale-out SQL offerings have.</li>
<li>&#8220;Document-oriented&#8221; NoSQL projects CouchDB and MongoDB have probably had the most users get into production, but perhaps for pretty small systems.</li>
<li>Cassandra and Hbase &#8212; the column-group-architecture guys &#8212; have probably had the most bang-in-lots-of-writes <a href="http://www.dbms2.com/2010/03/13/the-naming-of-the-foo/" >HVSP</a> production uptake.*</li>
<li>I didn&#8217;t talk customer count with Schooner, but the decently-stocked <a href="http://www.schoonerinfotech.com/customers" onclick="javascript:pageTracker._trackPageview('/www.schoonerinfotech.com');">Schooner customer page</a> suggests Schooner may be something of an exception to these generalities.</li>
<li>A lot of these companies are in the low-to-mid-teens of employees.</li>
<li>The SQL-oriented companies, despite having fewer or no customers, often seem to have more money. (One reason I get the impression SQL guys have more money is, frankly, that more  of them are talking about engaging <a href="http://www.monash.com/advantage.html" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">my services</a>.)
<ul>
<li>Schooner cites $20 million in VC.</li>
<li><a href="http://www.dbms2.com/2010/05/12/the-clustrix-story/" >Clustrix</a> cites a figure close to that.</li>
<li>Basho cites $10 million, plus <a href="http://www.masshightech.com/stories/2010/08/02/daily35-Basho-rejects-VC-takes-late-friends-and-family-round.html" onclick="javascript:pageTracker._trackPageview('/www.masshightech.com');">a new round of $1.5 or $2 or $2.5 million</a>. The new round is at a  lowered valuation.</li>
<li>That same site says <a href="http://www.dbms2.com/2009/04/16/introduction-to-tokutek/" >Tokutek</a> finally was able to<a href="http://www.masshightech.com/stories/2010/08/16/daily47-Database-software-firm-Tokutek-lands-28M.html" onclick="javascript:pageTracker._trackPageview('/www.masshightech.com');"> raise some VC</a>. Congrats!</li>
</ul>
</li>
<li>It&#8217;s only a two-company trend, but I was pleased to hear that both 10gen/MongoDB and Akiban were seeing Drupal as a major use case or potential use case. No word on rescuing WordPress from its MySQL implementation, alas, but it seems that a Drupal site typically has 40-200+ tables, while a WordPress one has 10ish.</li>
<li>Another trend I think I&#8217;m seeing is serious object-oriented apps banging things straight into a simple back end. <a href="http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/" >Workday</a> is a huge example of that. Akiban hopes to do something similar with Hibernate.</li>
<li>Stability and maturity are still issues for many of these products. E.g., HBase isn&#8217;t even in Release 1.0 yet. Ditto Cassandra, and surely many of the others. Unsurprisingly, <a href="http://blog.mikiobraun.de/2010/08/cassandra-gc-tuning.html" onclick="javascript:pageTracker._trackPageview('/blog.mikiobraun.de');">making Cassandra stable is still a challenge</a>.</li>
</ul>
<p><em>*As is common for terms I suggest, the &#8220;HVSP&#8221; name is not getting any traction. What do you think of Marton Trencseni&#8217;s suggestion of <a href="http://www.dbms2.com/2010/03/13/the-naming-of-the-foo/#comment-182138" >OLRP, for OnLine Request Processing</a>?</em></p>
<p>One thing that makes following this area interesting is that so many projects are open source, leading there to be a lot of information in the wild. I hardly have time to read the mailing list for each project; but the people I talk with do, and often they may sorta kinda remember something somebody else posted one or several months back. As just one example, the mailing lists are said to confirm:</p>
<ul>
<li>Contrary to rumor, <a href="http://twitter.com/eventcloudpro/status/17872687577" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Facebook hasn&#8217;t moved in-box search off of Cassandra</a>.</li>
<li>Apparently, however, it&#8217;s true that <a href="http://www.dbms2.com/2008/07/21/project-cassandra-facebook-open-sourced-quasi-dbms/" >Cassandra inventor Facebook</a> has stopped working on Cassandra, and Facebook&#8217;s core Cassandra developers have shifted over to HBase.</li>
</ul>
<p>Also, figuring out usage of open source software can be &#8230; interesting.</p>
<ul>
<li> People who use open source software don&#8217;t have to reveal themselves, as there&#8217;s no purchase transaction to kick things off.</li>
<li>On the other hand, if they&#8217;re serious enough in their use, they often do.
<ul>
<li>There are two main ways to get tech support for open source software &#8212; the community or a company that sells support &#8212; and both ways let the main support-selling company know that one is a user.</li>
<li>Some folks even add themselves to open lists of users, for example these rather long lists for <a href="http://wiki.apache.org/hadoop/Hbase/PoweredBy" onclick="javascript:pageTracker._trackPageview('/wiki.apache.org');">HBase</a> and <a href="http://wiki.apache.org/couchdb/CouchDB_in_the_wild" onclick="javascript:pageTracker._trackPageview('/wiki.apache.org');">CouchDB</a>.</li>
<li>Or they show up at conferences. For example, <a href="http://twitter.com/spyced/status/21490457839" onclick="javascript:pageTracker._trackPageview('/twitter.com');">two</a> <a href="http://twitter.com/spyced/status/21675203015" onclick="javascript:pageTracker._trackPageview('/twitter.com');">tweets</a> from Riptano founder Jonathan Ellis suggest at least 30 production Cassandra users were represented at a recent event. That&#8217;s more detail than his colleague Matt Pfeil wanted to give me when talked. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
</ul>
</li>
</ul>
<p>OK. This post has gotten pretty long, even without me saying anything resembling an overview of any of the seven companies I listed up top, or of their products&#8217; adoption. So I&#8217;ll just publish this now, and edit in links below to follow-on posts if and when they become available.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/26/nosql-hvsp-olrp/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Workday comments on its database architecture</title>
		<link>http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/</link>
		<comments>http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/#comments</comments>
		<pubDate>Sun, 22 Aug 2010 10:20:44 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Object]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Workday]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2874</guid>
		<description><![CDATA[In my discussion of Workday&#8217;s technology, I gave an estimate that Workday&#8217;s database, if relationally designed, would require “1000s” of tables. That estimate came from Workday, Inc. CTO Stan Swete, in a thoughtful email that made several points about Workday&#8217;s database strategy. Workday kindly gave me permission to quote it below.


I would say thousands. The [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in; page-break-before: always;"><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">In my discussion of </span></span></span><span style="font-size: small;"><span style="font-weight: normal;"><a href="http://www.dbms2.com/2010/08/22/workday-technology-stack/" >Workday&#8217;s technology</a>,</span></span><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;"> I gave an estimate that Workday&#8217;s database, if relationally designed, would require “1000s” of tables. That estimate came from Workday, Inc. CTO Stan Swete, in a thoughtful email that made several points about Workday&#8217;s database strategy. Workday kindly gave me permission to quote it below.</span></span></span><br />
<span id="more-2874"></span></p>
<blockquote>
<p style="margin-bottom: 0in; font-style: normal; font-weight: normal;"><span style="font-size: small;">I would say thousands. The object model for our applications consists of over 2000 classes. On average these classes have multiple relationships with other classes so that would have some kind of multiplicative effect when it came to using tables.</span></p>
<p style="margin-bottom: 0in; font-style: normal; font-weight: normal;"><span style="font-size: small;">One example of where you’d be proliferating tables (and not getting as satisfactory of a solution relationally) is worktags. Currently we have a class for worktags. Instances of this class can point to various instances of detail lines (expense lines, po lines, invoice lines, etc…). A detail line can have many worktags pointing to it. To model this relationally you’d need either a separate table for each type of detail line in the system to store the tags associated with it or a single worktag for detailed line table that could be foreign keyed for all types of detail lines that would store their worktag. Either way involves more tables and more clunkiness.</span></p>
<p style="margin-bottom: 0in; font-style: normal; font-weight: normal;"><span style="font-size: small;">Another example of where our oo designs wouldn’t directly translate is our ability to describe to shared part of a detail line in one class and have all instances of detail lines inherit the fields that are shared. To do this relationally you’d probably replicate the shared fields in each table representing the various kinds of transactional details (again lines, po lines, invoice lines, etc…). You’d lose the ability to maintain and change the shared fields (and the processing logic for those fields) in one place.</span></p>
<p style="margin-bottom: 0in; font-style: normal; font-weight: normal;"><span style="font-size: small;">Anyway, I’d go with “thousands” as our answer. I do think this is an interesting question and wish we had more time to figure out a more accurate answer.</span></p>
</blockquote>
<p><em><strong>This post is part of a three-post series</strong></em></p>
<ul>
<li><a href="http://www.dbms2.com/2010/08/22/workday-inc-company-overview/" >Workday Inc. company overview</a> (brief)</li>
<li><a href="http://www.dbms2.com/2010/08/22/workday-technology-stack/" >Workday Inc. technology overview</a> (detailed)</li>
<li>Workday Inc. CTO Stan Swete&#8217;s <a href="http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/" >comments on database strategy</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The Workday architecture &#8212; a new kind of OLTP software stack</title>
		<link>http://www.dbms2.com/2010/08/22/workday-technology-stack/</link>
		<comments>http://www.dbms2.com/2010/08/22/workday-technology-stack/#comments</comments>
		<pubDate>Sun, 22 Aug 2010 10:20:08 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data integration and middleware]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Object]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Workday]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2865</guid>
		<description><![CDATA[One of my coolest company visits in some time was to  SaaS  (Software as a Service) vendor Workday, Inc., earlier this month. Reasons included:

Workday has 	forward-thinking ideas about SaaS enterprise 	applications and the integration of business intelligence into same.
Workday has highly 	innovative ideas in how it manages data.
Companies founded by 	Dave Duffield tend [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><span style="font-size: small;">One of my coolest company visits in some time was to </span><span style="font-size: small;"> SaaS  (Software as a Service) vendor</span><span style="font-size: small;"> Workday, Inc., earlier this month. Reasons included:</span></p>
<ul>
<li><span style="font-size: small;">Workday has 	forward-thinking ideas about SaaS enterprise 	applications and the integration of business intelligence into same.</span></li>
<li><span style="font-size: small;">Workday has highly 	innovative ideas in how it manages data.</span></li>
<li><span style="font-size: small;">Companies founded by 	Dave Duffield tend to feature smart, likeable people who talk to one</span><span style="font-size: small;"><span style="font-style: normal;"> pleasantly and forthrightly. Workday is no exception; CTO Stan Swete 	and the other Workday folks present were a delight to talk with.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">I&#8217;d 	invited Merv Adrian to come along with me. He asked great questions, 	and I could gather myself a bit despite how sleep-deprived I was for 	the first part of that trip.</span></span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Workday kindly allowed me to post this </span></span><span style="font-size: small;"><a href="http://www.monash.com/uploads/Workday-August-2010.ppt" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">Workday slide deck</a>.</span><span style="font-size: small;"><span style="font-style: normal;"> Otherwise, I&#8217;ve split out a quick </span></span><a href="http://www.dbms2.com/2010/08/22/workday-inc-company-overview/" ><span style="font-size: small;">Workday, Inc. company overview</span></a><span style="font-size: small;"><span style="font-style: normal;"> into a separate post.</span></span></p>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">The biggie for me was the data and object management part. Specifically:  <span id="more-2865"></span><br />
</span></span></p>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;"><strong>Workday&#8217;s 	applications run entirely in-memory,</strong></span></span><span style="font-size: small;"><span style="font-style: normal;"> in a highly object-oriented structure. Persistence is mainly for the 	sake of data safety …</span></span></li>
<li>… <span style="font-size: small;"><span style="font-style: normal;">but 	not entirely. In earlier releases, Workday kept absolutely 	everything in RAM. However, certain things are kept only on disk, 	such as:</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Audit 	files.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Certain 	documents (notably resumes).</span></span></li>
</ul>
</li>
<li><strong><span style="font-size: small;"><span style="font-style: normal;">Workday&#8217;s 	whole database</span></span></strong><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;"> – data and metadata alike – is persisted to disk in </span></span></span><strong><span style="font-size: small;"><span style="font-style: normal;">&lt;10 	MySQL/InnoDB tables. </span></span></strong><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">MySQL 	is basically just being used as a </span></span></span><strong><span style="font-size: small;"><span style="font-style: normal;">key-value 	store, </span></span></strong><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">albeit 	one with </span></span></span><strong><span style="font-size: small;"><span style="font-style: normal;">ACID 	transactional support. </span></span></strong>
<ul>
<li><span style="font-size: small;">There <span style="font-weight: normal;">are </span><strong>3 main tables: attributes, relationships, instances.</strong></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">When 	I suggested this might be like an entity-attribute-value model, 	Workday said it would be even better to think in terms of</span><span style="font-style: normal;"><strong> instanceID-attribute-value.</strong></span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">As 	you might expect for a database that simple, its schema doesn&#8217;t 	change much.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">By 	way of comparison, Workday estimates that if its software were 	written relationally, </span></span></span><span style="font-size: small;"><span style="font-style: normal;">there 	would b</span></span><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">e </span></span></span><span style="font-size: small;"><span style="font-weight: normal;"><a href="http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/" >1000s 	of tables</a>,</span></span><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;"> which</span></span></span><span style="font-size: small;"><span style="font-style: normal;"> would take up 10-100X as much disk space. </span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;">All 	write transactions are banged immediately into the MySQL database. 	I.e., RAM and disk are never allowed to get out of sync.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday&#8217;s 	database is append-only. This is exploited for effective dating 	(pretty heavily, it seems, perhaps because that&#8217;s a useful concept 	in human resources) and snapshotted reporting.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday&#8217;s 	built-in BI doesn&#8217;t have a lot of choice but to do scans, traversing 	the object model. This turns out to be fast enough.</span></span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-size: small;">Other notes on Workday&#8217;s data and object management strategy include:</span></p>
<ul>
<li><span style="font-size: small;">Workday is 	object-oriented through and through – no object-relational mapping 	&#8211; <a href="http://en.wikipedia.org/wiki/Turtles_all_the_way_down" onclick="javascript:pageTracker._trackPageview('/en.wikipedia.org');">turtles 	all the way down</a>. On average, a class has about 2 attributes.</span></li>
<li><span style="font-size: small;">94% of requests are 	reads, traversing the object hierarchy.</span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	databases are pretty small.</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">The 	biggest database Workday supports uses 17 gigabytes of RAM. </span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	databases are much smaller on disk than in RAM.</span></span></li>
</ul>
</li>
<li><span style="font-size: small;">Workday&#8217;s “dream” 	is to move from disk to solid-state memory. </span></li>
<li><span style="font-size: small;">Workday uses GPLed 	MySQL/InnoDB. So there&#8217;s no software license reason to ever move 	away (e.g., to a pure key-value store).</span></li>
<li><span style="font-size: small;">Disaster recove</span><span style="font-size: small;"><span style="font-style: normal;">ry 	is based on local and remote MySQL slaves. </span></span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Obviously, serious apps have been built before in object-oriented and/or key-value ways, with the resulting objects then being banged to disk (or in some cases kept in memory). Examples include:</span></span></p>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Numerous 	applications are built on <a href="../2010/01/15/intersystems-cache-highlights/">object-oriented 	DBMS</a>. Generally they go against disk, although <a href="../2005/11/14/defining-and-surveying-memory-centric-data-management/">memory-centric 	implementations can save a lot of pointer-chasing</a>. Often they&#8217;re 	queried via SQL.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Basho&#8217;s 	website says that its key-value store Riak was originally conceived 	in connection with a planned salesforce automation product, but I 	don&#8217;t think that the application part of that plan ever got built. </span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">SAP 	has <a href="../2005/12/09/36/">longstanding</a> doubts about relational dogma, although not nearly to Workday&#8217;s 	extreme.</span></span></li>
<li><span style="font-size: small;">Obviously, 	some major internet applications just bang data into key-value 	stores.</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Still, perhaps because it wholly object-oriented yet doesn&#8217;t even bother with anything like a real object-oriented DBMS, Workday&#8217;s approach seems particularly cool. </span></span></p>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Other highlights of Workday, Inc.&#8217;s technical story include:</span></span></p>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	has settled into a schedule of three releases per year, and has 	pretty much lived up to that for &gt;2 years.</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Every 	user is always on the latest Workday release.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">You 	can delay turning on significant new Workday software functionality 	if you want to.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Pure 	UI changes to the Workday software are handled much as they are on 	various websites today. Sometimes you have no choice but to live 	with them; sometimes the prior version of the UI remains available 	to you for a while.</span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday&#8217;s 	navigational approaches look pretty cool.</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">The 	core concept is a list of actions you can perform now, rather than 	more standard menus.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Roles/permissions 	are of course central to this.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Reports 	have lots of actionable links in them. (More than just drilldown, 	although specific examples have slipped my memory.)</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Alternatively, 	you can navigate via a search box, searching both on names of 	objects (e.g. users, divisions) or on names of tasks. This is 	somewhat reminiscent of <a href="http://www.texttechnologies.com/2007/02/28/sap%E2%80%99s-%E2%80%9Csearch%E2%80%9D-strategy-isn%E2%80%99t-about-search/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">an 	approach SAP was considering a few years ago</a>.</span></span></li>
</ul>
</li>
<li><span style="font-size: small;">Workday says it has 	four key design premises:</span>
<ul>
<li><span style="font-size: small;"><em>Web-Familiar 	Experience.</em> I&#8217;d say that&#8217;s true to to the extent it makes sense. 	In many ways, the web needs to catch up to Workday.</span></li>
<li><span style="font-size: small;"><em>Enterprise 	Reporting.</em> The idea is that you get a report, then take actions 	based on it. Hence the report-centric options for navigation.</span></li>
<li><span style="font-size: small;"><em>Integration 	On-Demand.</em> That&#8217;s a fancy way of saying “Plays nicely with 	others.”</span></li>
<li><span style="font-size: small;"><em>Configurable 	Business Processes.</em><span style="font-style: normal;"> Duh. That&#8217;s 	pretty essential if you want to do serious SaaS applications.</span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	maintains a strong separation between application logic and UI 	development. Developer do no screen layouts. Instead, Uis are 	automatically generated for:</span></span>
<ul>
<li><span style="font-size: small;">Flash/FLEX</span></li>
<li><span style="font-size: small;">iPhone</span></li>
<li><span style="font-size: small;">Mobile HTML</span></li>
<li><span style="font-size: small;">PDF export</span></li>
<li><span style="font-size: small;">Excel export</span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	only talks to the outside world via web services.</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	is heavily </span></span><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">into 	SOAP (Simple Object Access Protocol). </span></span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">The 	acquisition of OEM partner CapeClear gave Workday an Integration 	Service (i.e., enterprise service bus) that translates SOAP into 	whatever else might be needed for integration, and also does 	reliable delivery. </span></span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">All 	that said, Stan Swete sees integration among various SaaS offerings 	as an area needing significant future attention.</span></span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Workday&#8217;s 	business intelligence ideas are interesting, but I think there&#8217;s a 	long way for that technology still to go.</span></span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Workday&#8217;s 	BI seems to be focused on report/drilldown kinds of functionality.</span></span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">You 	can slice by up to 2 dimensions at once.</span></span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Then 	you can keep slicing, however, by more dimensions, as many times as 	you like.</span></span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">While 	you can take actions straight from reports, some of the specific 	BI/app integration ideas we discussed are still futures. (E.g., 	analyzing spend at the time of expense report data entry or 	approval.)</span></span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Of 	course, Workday&#8217;s web services interface lets you export Workday 	data into 3rd-party tools. Indeed, if you want to integrate data 	from Workday and some other source(s), that&#8217;s your only choice.</span></span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Workday 	offers a clever metaphor to illustrate that your data may be more 	secure offsite than on – the bank vault. (I have no idea whether 	that&#8217;s a SaaS industry standard, but I hadn&#8217;t heard it before.) Of 	course, that metaphor does beg some issues specific to the remote 	data case, such as:</span></span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">When 	your data is on premises, you know whether the government has 	insisted on looking at it.</span></span></span></li>
<li><span style="font-size: small;">More than cash, data keeps traveling back and forth to 	the remote location, which creates at least a theoretical risk of 	interception.</span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Workday 	says the toughest part of globalization is the issue of which 	personal data is or is not maintained. For example, in the US you&#8217;re 	not allowed to not ask a job applicant&#8217;s religion, but in the UK 	you&#8217;re not only permitted but indeed required to.</span></span></span></li>
</ul>
<p><em><strong>This post is part of a three-post series</strong></em></p>
<ul>
<li><a href="http://www.dbms2.com/2010/08/22/workday-inc-company-overview/" >Workday Inc. company overview</a> (brief)</li>
<li><a href="http://www.dbms2.com/2010/08/22/workday-technology-stack/" >Workday Inc. technology overview</a> (detailed)</li>
<li>Workday Inc. CTO Stan Swete&#8217;s <a href="http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/" >comments on database strategy</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/22/workday-technology-stack/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>I&#8217;m collecting data points on NoSQL and HVSP adoption</title>
		<link>http://www.dbms2.com/2010/08/18/nosql-hvsp-adoption/</link>
		<comments>http://www.dbms2.com/2010/08/18/nosql-hvsp-adoption/#comments</comments>
		<pubDate>Wed, 18 Aug 2010 13:09:08 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Akiban]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Clustrix]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Groovy Corporation]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Northscale]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[ScaleDB]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[VoltDB and H-Store]]></category>
		<category><![CDATA[dbShards and CodeFutures]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2840</guid>
		<description><![CDATA[I was asked to do a magazine article on NoSQL, where by &#8220;NoSQL&#8221; is meant &#8220;whatever they talk about at NoSQL conferences.&#8221; By now the number of publications planning to run the article is up to 2, the deadline is next week and, crucially, it has been agreed that I may talk about HVSP in [...]]]></description>
			<content:encoded><![CDATA[<p>I was asked to do a magazine article on NoSQL, where by &#8220;NoSQL&#8221; is meant &#8220;whatever they talk about at NoSQL conferences.&#8221; By now the number of publications planning to run the article is up to 2, the deadline is next week and, crucially, it has been agreed that I may talk about <a href="http://www.dbms2.com/2010/03/13/the-naming-of-the-foo/" >HVSP</a> in general, NoSQL and SQL alike.</p>
<p>It also is understood that, realistically, I can&#8217;t be expected to know and mention the very latest news for all the many products in the categories. Even so, I think this would be fine time to check just where NoSQL and HVSP adoption stand. Here is most of what I know, or links to same; it would be great if you guys would contribute additional data in the comment thread.</p>
<p>In the NoSQL area:  <span id="more-2840"></span></p>
<ul>
<li>Back in April, the VoltDB guys told me they thought Cassandra and HBase were the two NoSQL systems with the most momentum.</li>
<li>I know distressingly little about HBase adoption, but a source who may or may not wish to remain anonymous was kind enough to alert me that Twitter and StumbleUpon each have ~30 node deployments, for analytics and analytics/HVSP respectively.</li>
<li>I wrote in detail on <a href="http://www.dbms2.com/2010/07/06/riptano-and-cassandra-adoption/" >Cassandra adoption</a> last month. News since then includes:
<ul>
<li>Facebook is rumored to have dropped Cassandra completely.</li>
<li><a href="http://engineering.twitter.com/2010/07/cassandra-at-twitter-today.html" onclick="javascript:pageTracker._trackPageview('/engineering.twitter.com');">Twitter clarified that it may not be quite as lovestruck by Cassandra as before</a>, but they&#8217;re still very close friends.</li>
<li>It&#8217;s not obvious that the <a href="http://www.riptano.com/blog/cassandra-summit-recap" onclick="javascript:pageTracker._trackPageview('/www.riptano.com');">Cassandra Summit</a> unveiled a lot of new adoption stories.</li>
</ul>
</li>
<li>Northscale&#8217;s <a href="http://www.dbms2.com/2010/08/18/northscale-membase-roadmap/" >Membase</a> is still in its early days.  Zynga is bought in, however, as is something called NHN Korea. <em>(Edit: I subsequently saw NHN Korea on a prominent SEO expert&#8217;s list of the top half dozen or so search engines in the world. Who knew?)</em></li>
<li>Basho has listed a few <a href="http://www.basho.com/customers.html" onclick="javascript:pageTracker._trackPageview('/www.basho.com');">Riak customers</a>. If memory serves (I haven&#8217;t spoken with Basho for a while, and some of my notes are misplaced due to some computer sloppiness), Basho has a few dozen customers in total.</li>
<li>Mozilla has <a href="http://blog.mozilla.com/data/2010/08/16/benchmarking-riak-for-the-mozilla-test-pilot-project/" onclick="javascript:pageTracker._trackPageview('/blog.mozilla.com');">a 4 machine, 64 core Riak cluster</a> in production.</li>
<li><a href="http://highscalability.com/hypertable-new-bigtable-clone-runs-hdfs-or-kfs" onclick="javascript:pageTracker._trackPageview('/highscalability.com');">Hypertable</a> has a few users/project sponsors, Baidu being the biggest name among them.</li>
<li>I don&#8217;t really know how the MongoDB/10gen guys are doing. I think this is at least as much my fault as theirs. Anyhow, they seem to have <a href="http://www.10gen.com/news" onclick="javascript:pageTracker._trackPageview('/www.10gen.com');">links</a> to a couple of folks who have written about MongoDB usage.</li>
<li>NimbusDB is still in stealth mode. I&#8217;d be surprised if they had users  for a while yet, since in January they didn&#8217;t yet sound as if  development was very far underway. (Actually, I forget whether NimbusDB  is supposed to be SQL-based or not.)</li>
</ul>
<p>Among the SQL or SQL-friendly guys:</p>
<ul>
<li><a href="http://www.dbms2.com/2010/05/12/the-clustrix-story/" >Clustrix</a> says it has a few production users, some big-name, but is not disclosing them yet.</li>
<li><a href="http://www.dbms2.com/2010/07/28/dbshards/" >dbShards has around 6 customers</a>, including Facebook. (Facebook may outpace even Twitter and Zynga in using the most products mentioned in this post.)</li>
<li>As of May, <a href="http://www.dbms2.com/2010/05/25/voltdb-finally-launches/" >VoltDB</a> had one paying customer, plus 150 beta customers who weren&#8217;t in production yet.</li>
<li><a href="http://www.dbms2.com/2010/04/03/akiban-highlights/" >Akiban</a> says they&#8217;ll get me up to speed on Thursday. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
<li><a href="http://www.dbms2.com/2008/04/13/scaledb-presents-the-revenge-of-the-pointer/" >ScaleDB</a> seems to be pedaling along in perennial beta. Whether ScaleDB has any actual beta users is less clear. On the plus side, checking that out uncovered a pretty funny <a href="http://scaledb.blogspot.com/2010/04/scaledb-introduces-clustered-database.html" onclick="javascript:pageTracker._trackPageview('/scaledb.blogspot.com');">April Fool blog post</a>.</li>
<li><a href="http://www.dbms2.com/2009/07/30/groovy-corp-puts-out-a-ridiculous-press-release/" >Groovy Corporation</a> seems to have disappeared, or morphed into something called <a href="http://www.groovycorp.com/home.html" onclick="javascript:pageTracker._trackPageview('/www.groovycorp.com');">uCirrus</a>, or something like that.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/18/nosql-hvsp-adoption/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Finally confirmed: Membase has a reasonable product roadmap</title>
		<link>http://www.dbms2.com/2010/08/18/northscale-membase-roadmap/</link>
		<comments>http://www.dbms2.com/2010/08/18/northscale-membase-roadmap/#comments</comments>
		<pubDate>Wed, 18 Aug 2010 09:37:55 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Northscale]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[memcached]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2830</guid>
		<description><![CDATA[On my recent trip to California, neither I nor my clients at Northscale covered ourselves in meeting-arranging glory. Still, from the rushed 30 minute meeting we did wind up having, I finally came away feeling good about Membase&#8217;s product direction.
To review, Membase is a reasonably elastic persistent data store, sporting the memcached API, making memcached/Membase [...]]]></description>
			<content:encoded><![CDATA[<p>On my recent trip to California, neither I nor my clients at Northscale covered ourselves in meeting-arranging glory. Still, from the rushed 30 minute meeting we did wind up having, I finally came away feeling good about Membase&#8217;s product direction.</p>
<p>To review, Membase is a reasonably elastic persistent data store, sporting the memcached API, making memcached/Membase an attractive alternative to memcached/sharded MySQL. As of now, Membase is a pure key-value store.</p>
<p>Northscale defends pure key-value stores by arguing, in effect:  <span id="more-2830"></span></p>
<ul>
<li>You can do a lot with entity-attribute-value triples.</li>
<li>If your key looks like an entity-attribute concatenation, then  your entity-attribute-value triple can be transformed into a key-value  pair.</li>
</ul>
<p>Northscale has a point. Still, I think that in most use cases you&#8217;ll want a data model and/or data access methods that are at least a little richer than pure entity-attribute-value.</p>
<p>Fortunately, that&#8217;s the direction Northscale is taking Membase. I don&#8217;t get the impression that the details have been worked out yet, but the general idea is:</p>
<ul>
<li>Northscale is putting a publish-subscribe interface into Membase it calls &#8220;tap,&#8221; useful for replication, node rebalancing, etc.</li>
<li>Tap will also serve to connect Membase data to a Membase feature Northscale calls “Node Code,&#8221; which will be code that runs in a separate process on each Membase node.</li>
<li>Node Code will include things like:
<ul>
<li>Language run-times</li>
<li>Standard libraries for things like 	index-building</li>
</ul>
</li>
</ul>
<p>Will Membase Node Code be a close substitute for relational DBMS functionality, or even the <a href="http://www.dbms2.com/2010/07/06/cassandra-technical-overview/" >Cassandra</a> architecture? I doubt it, especially at first. But at least it will keep Membase developers from getting locked in to a very simple and restrictive data management paradigm.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/18/northscale-membase-roadmap/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>DB2 workload management</title>
		<link>http://www.dbms2.com/2010/08/18/ibm-db2-workload-management/</link>
		<comments>http://www.dbms2.com/2010/08/18/ibm-db2-workload-management/#comments</comments>
		<pubDate>Wed, 18 Aug 2010 08:47:09 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2819</guid>
		<description><![CDATA[DB2 has added a lot of workload management features in recent releases. So when we talked Tuesday afternoon, Tim Vincent and I didn&#8217;t bother going through every one. Even so, we covered some interesting subjects in the area of DB2 workload management, including:  

If your goal is to keep a certain 	class of queries from [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><a href="../2009/04/24/some-db2-highlights/">DB2 has added a lot of workload management features in recent releases</a>. So when we talked Tuesday afternoon, Tim Vincent and I didn&#8217;t bother going through every one. Even so, we covered some interesting subjects in the area of DB2 workload management, including:  <span id="more-2819"></span></p>
<ul>
<li>If your goal is to keep a certain 	class of queries from taking too many resources, Tim thinks a great 	way of doing that is to control how many of them are allowed to run 	concurrently.</li>
<li>By way of contrast, Tim is 	cautious about the common approach of just lowering a query&#8217;s 	priority. His concern is that a long-running query could linger even 	longer, creating a long-lasting bottleneck in, for example, <a href="http://www.dbms2.com/2010/08/18/more-on-temp-space-compression-and-random-io/" >temp 	space</a>.</li>
<li>When running over (I believe) 	Linux and AIX, DB2 workload management is integrated with operating 	system workload management. I.e., the same “service class” or 	“workload class” (at a guess, the former is the official term 	and the latter is the term that makes sense) of queries and 	associated processes gets the same treatment in both DB2 and the OS.</li>
<li>DB2&#8217;s workload management extends 	to buffer pools, to inhibit low-priority queries from evicting a 	higher-priority query&#8217;s data from cache.</li>
<li>Sometimes, workload management 	doesn&#8217;t throttle a query, but just decides to collect stats for 	future analysis. (This is on the eminently reasonably theory that 	the best stats to collect are the ones that are live when  	performance problems are actually occurring.)</li>
</ul>
<p style="margin-bottom: 0in;">Finally, Tim spoke of what I regard as the weirdest workload management requirement, one I also heard about from <a href="http://www.dbms2.com/2009/07/18/netezza-on-concurrency-and-workload-management/" >Netezza</a> <span style="font-style: normal;">(but didn&#8217;t explicitly mention) in</span> June. Sometimes, it seems, you simply don&#8217;t want queries to finish too fast. Why? Because if you give great performance when the machine is lightly loaded, then business users might expect that performance too when the machine is heavily loaded and you can&#8217;t deliver it. Apparently, in some environments it&#8217;s better to never deliver great query performance than it is to do so only inconsistently.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/18/ibm-db2-workload-management/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>More on temp space, compression, and &#8220;random&#8221; I/O</title>
		<link>http://www.dbms2.com/2010/08/18/more-on-temp-space-compression-and-random-io/</link>
		<comments>http://www.dbms2.com/2010/08/18/more-on-temp-space-compression-and-random-io/#comments</comments>
		<pubDate>Wed, 18 Aug 2010 05:44:59 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2805</guid>
		<description><![CDATA[My PhD was in a probability-related area of mathematics (game theory), so I tend to squirm when something is described as &#8220;random&#8221; that clearly is not. That said, a comment by Shilpa Lawande on our recent Flash/temp space discussion suggests the following way of framing a key point:

You really, really want to have multiple data [...]]]></description>
			<content:encoded><![CDATA[<p>My PhD was in a probability-related area of mathematics (game theory), so I tend to squirm when something is described as &#8220;random&#8221; that clearly is not. That said, <a href="http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/#comment-181134" >a comment by Shilpa Lawande</a> on our recent <a href="http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/" >Flash/temp space discussion</a> suggests the following way of framing a key point:</p>
<ul>
<li>You really, really want to have multiple data streams coming out of temp space, as close to simultaneously as possible.</li>
<li>The storage performance characteristics of such a workload are more reminiscent of &#8220;random&#8221; than &#8220;sequential&#8221; I/O.</li>
</ul>
<p>If everybody else is cool with it too, I can live with that. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Meanwhile, I talked again with Tim Vincent of IBM this afternoon. Tim endorsed the temp space/Flash fit, but with a different emphasis, which upon review I find I don&#8217;t really understand. The idea is:</p>
<ul>
<li>Analytic DBMS processing generally stresses reads over writes.</li>
<li>Temp space is an exception &#8212; read and write use of temp space is pretty balanced. (You spool data out once, you read it back in once, and that&#8217;s the end of that; next time it will be overwritten.)</li>
</ul>
<p>My problem with that is: Flash typically has lower write than read IOPS (I/O per second), so being (relatively) write-intensive would, to a first approximation, seem if anything to disfavor a workload for Flash.</p>
<p>On the plus side, I was reminded of something I should have noted when I wrote about <a href="http://www.dbms2.com/2010/06/21/netezza-ibm-db2-compression/" >DB2 compression</a> before:</p>
<p>Much like Vertica, <strong>DB2 operates on compressed data all the way through, including in temp space. </strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/18/more-on-temp-space-compression-and-random-io/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Vertica&#8217;s innovative architecture for Flash, plus more about temp space than you perhaps wanted to know</title>
		<link>http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/</link>
		<comments>http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/#comments</comments>
		<pubDate>Mon, 16 Aug 2010 08:07:33 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2788</guid>
		<description><![CDATA[Vertica is announcing:

Technology it already has 	released*, but has not published any reference architectures 	for
A 	Barney partnership**

In other words, Vertica has succumbed to the common delusion that it&#8217;s a good idea to put out half-baked press releases the week of TDWI conferences. But if we look past that kind of all-too-common nonsense, Vertica is highlighting [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Vertica is announcing:</p>
<ul>
<li>Technology it already has 	released*, but has not published any reference architectures 	for</li>
<li><span style="font-style: normal;">A 	<a href="http://www.strategicmessaging.com/barney-partnerships/2010/08/12/" onclick="javascript:pageTracker._trackPageview('/www.strategicmessaging.com');">Barney</a> partnership**</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">In other words, Vertica has succumbed to the common delusion that it&#8217;s a good idea to put out half-baked press releases the week of TDWI conferences. </span>But if we look past that kind of all-too-common nonsens<span style="font-weight: normal;">e, Vertica is highlighting an interesting technical story, about </span><strong>how the analytic DBMS industry can exploit solid-state memory technology.</strong></p>
<p style="margin-bottom: 0in;"><em>*Upgrades to <a href="../2009/08/04/flexstore-and-the-rest-of-vertica-35/">Vertica FlexStore</a> to handle Flash memory, actually released as part of <a href="../2010/02/22/vertica-4/">Vertica 4.0</a></em></p>
<p style="margin-bottom: 0in;"><em>** With Fusion I/O</em></p>
<p style="margin-bottom: 0in;">To set the context, let&#8217;s recall a few points I&#8217;ve noted in the past:</p>
<ul>
<li><a href="../2010/01/31/flash-pcmsolid-state-memory-disk/">Solid-state 	memory&#8217;s price/throughput tradeoffs obviously make it the future of 	database storage</a>.</li>
<li><a href="../2010/06/25/flash-is-coming-well/">The 	Flash future is coming soon</a>, in part because Flash&#8217;s propensity 	to wear out is overstated. This is especially true in the case of 	modern analytic DBMS, which tend to write to blocks all at once, and 	most particularly the case for append-only systems such as Vertica.</li>
<li><a href="../2010/08/12/teradata-future-product-strategy/">Being 	able to intelligently split databases among various cost tiers of 	storage – e.g. Flash and disk – makes a whole lot of sense</a>.</li>
</ul>
<p style="margin-bottom: 0in;">Taken together, those points tell us:</p>
<p style="margin-bottom: 0in;"><strong>For optimal price/performance, analytic DBMS should support databases that run part on Flash, part on disk.</strong></p>
<p style="margin-bottom: 0in;">While all this is a future for some other analytic DBMS vendors, Vertica is shipping it today.* What&#8217;s more, three aspects of Vertica&#8217;s architecture make it particularly well-suited for hybrid Flash/disk storage, in each case for a similar reason – you can get most of the performance benefit of all-Flash for a relatively low actual investment in Flash chips:  <span id="more-2788"></span></p>
<ul>
<li><strong>Vertica lets you split tables 	by column, </strong><span style="font-weight: normal;">and Vertica 	FlexStore is versatile enough to let you put only the most-used 	columns in Flash. (Vertica offers a figure that 85% of usage calls 	on only 15% of columns, but I don&#8217;t know how rigorously grounded 	those numbers are.)</span></li>
<li>To the extent that Vertica data is<span style="font-weight: normal;"> <a href="../2008/09/24/vertica-finally-spells-out-its-compression-claims/">more </a></span><a href="../2008/09/24/vertica-finally-spells-out-its-compression-claims/">compressed</a> than many of Vertica&#8217;s competitors&#8217; (which it probably is, debates 	over the magnitude of Vertica&#8217;s advantage notwithstanding), the 	total storage-hardware cost of sticking stuff in Flash is less when 	you use Vertica than with other systems.</li>
<li>Vertica has <span style="font-weight: normal;">relatively 	less need for </span><strong>temp space</strong> than some other systems. 	(Vertica uses figures of &lt;20% of total storage, vs. 30%+ for some 	other systems.) If you want to use Flash for temp space, so as to 	accelerate your toughest queries, that can save you some cash …</li>
<li>… and by the way, <strong>temp space 	is an especially good use of Flash, </strong>because <strong>temp space is 	accessed in a less sequential manner than data storage is.</strong></li>
</ul>
<p style="margin-bottom: 0in;">The least obvious of those points are about temp space; I only understood the particulars when Vertica development chief Shilpa Lawande explained them to me Thursday.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><em>* At least in theory; customer adoption may be a different matter.</em></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">But before drilling down on temp space, let me first note that there&#8217;s one offsetting factor to all those “We need somewhat less Flash than the other guys” Vertica advantages. Like all serious databases, a Vertica installation keeps two or more copies of all data, to that there&#8217;s no storage single point of failure. In a flexible system like Vertica, you can put one copy on Flash and one on disk. But if you do that in Vertica, you forgo fully exploiting one possible benefit of Vertica&#8217;s architecture – the ability to store different copies of a column in different orders, which are beneficial for accelerating different groups of queries.*</p>
<p style="margin-bottom: 0in;"><em>*More precisely, you don&#8217;t get the full benefits of Flash acceleration for every query touching those columns.</em></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">OK. Back to temp space. There are four kinds of things you can put in storage if you&#8217;re running a database management system:</p>
<ul>
<li>The <strong>software</strong> itself.</li>
<li><span style="font-weight: normal;">Persistent </span><strong>data. </strong><span style="font-weight: normal;">(I.e., tables, 	if the DBMS you&#8217;re running is relational.)</span></li>
<li><strong>Metadata,</strong> especially the 	kind that lets you find data &#8211;<strong> indexes,</strong> zone maps, catalogs, 	etc.</li>
<li><strong>Temporary data constructs</strong> built as part of, say, a s<span style="font-weight: normal;">ort-merge 	join. These, by definition, are what populate temp space.</span></li>
</ul>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">Just to be clear, those constructs are NOT temporary tables of the sort created by, say, Microstrategy; such tables are handled like any other data. Rather, they are ephemeral creat<span style="font-weight: normal;">ions and, so far as I can tell, not tables at all. </span></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">Vertica offered two theories as to why its DBMS requires less temp space than competitors do:</p>
<ul>
<li>To the extent data is decompressed 	before being operated on in memory by the DBMS, that decompression 	would of course also apply to temp space as well. Vertica prides 	itself on <strong>keeping data compressed</strong> all the way through, and 	seems to get away with smaller temp space allocations as a benefit.</li>
<li>Since Vertica can store columns in 	expedient sort orders, it does less sorting overall, and sorting is 	a big use of temp space.</li>
</ul>
<p style="margin-bottom: 0in;">Obviously, no matter which DBMS you use, the amount of temp space you need is surely workload-dependent. Even so, Vertica&#8217;s claim to something of an advantage seems legit.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><em>Truth be told, I&#8217;m not convinced the savings involved are great enough to </em>matter<em> a whole lot – but it&#8217;s a fun subject to think through. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </em></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">And finally: One of my biggest surprises since starting to look at analytic-DBMS-on-Flash has been the centrality of temp space. Talking to Vertica Thursday, I finally uncovered a key reason why: <strong>Temp space tends to be accessed via multiple streams of data at once.</strong> I&#8217;m still struggling with WHY that is true, with two reasons suggested being:</p>
<ul>
<li>Temp space can be accessed by 	multiple operations at once. (But isn&#8217;t that also true of the rest 	of storage?)</li>
<li>Merge sorts, a common use of temp 	space, read multiple streams of data. (Couldn&#8217;t you tweak your 	software to make that not be true?)</li>
</ul>
<p style="margin-bottom: 0in;">But if we grant that temp space naturally is accessed in multiple places at once – well, that&#8217;s a lot like random I/O, and <a href="../2005/11/13/breaking-the-disk-speed-barrier/">if you&#8217;re doing a lot of random reads, you&#8217;d love to use something other than spinning disk</a>.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Links and observations</title>
		<link>http://www.dbms2.com/2010/08/09/links-and-observations/</link>
		<comments>http://www.dbms2.com/2010/08/09/links-and-observations/#comments</comments>
		<pubDate>Tue, 10 Aug 2010 02:37:51 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Calpont]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[Kickfire]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Northscale]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[XtremeData]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2743</guid>
		<description><![CDATA[I&#8217;m back from a trip to the SF Bay area, with a lot of writing ahead of me. I&#8217;ll dive in with some quick comments here, then write at greater length about some of these points when I can. From my trip:  

Aster Data showed me a lot of customer names and deal sizes, across [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m back from a trip to the SF Bay area, with a lot of writing ahead of me. I&#8217;ll dive in with some quick comments here, then write at greater length about some of these points when I can. From my trip:  <span id="more-2743"></span></p>
<ul>
<li>Aster Data showed me a lot of customer names and deal sizes, across a bunch of industries (mainly enterprise rather than web). Yes, Aster&#8217;s market success is for real. (But almost all those details are NDA.)</li>
<li>Sybase&#8217;s product plans for IQ are pretty impressive. (But the most interesting parts are, you guessed it, NDA.)</li>
<li>I&#8217;ve kissed and made up* with ParAccel, now that they&#8217;ve replaced their CEO, replaced their marketing chief, and stopped the worst of the <a href="http://www.dbms2.com/2010/01/15/there-sure-seem-to-be-a-lot-of-inaccuracies-on-paraccels-website/" >marketing</a> <a href="http://www.dbms2.com/2009/06/22/the-tpc-h-benchmark-is-a-blight-upon-the-industry/" >nonsense</a> I used to complain about. ParAccel has some interesting plans for ParAccel 3.0 which are, naturally, NDA.</li>
<li>The Peoplesoft guys are doing it over again at Workday. Only this time, their platform isn&#8217;t a relational DBMS. Rather, it&#8217;s an in-memory, completely object-oriented data model, with disk used only on a &#8220;Just in case the power ever goes out&#8221; basis. (Thankfully, nothing at all about our conversation was NDA.)</li>
<li>I&#8217;m finally feeling good about <a href="# I spent considerable time  with my clients at both Greenplum and EMC (if we ignore the fact that  the deal has closed and they're now the same company). I also had more  of  a hardcore engineering discussion than I've had with Greenplum for  quite a while (I should have been pushier about that earlier). Takeaways  included:      * This is starting off as a honeymoon deal. Everything  Greenplum was planning to do is being continued. Additional resources  are being poured into Greenplum to do more.     * Some Greenplum execs  seem to envision staying long term, some seem to envision moving on to  their next startups. The ones who envision moving on are, however, going  to work hard first to make the merger a success.     * Greenplum has,  for quite a while, had more of an advanced analytics/embedded predictive  modeling story than I realized. Bad on them for not fleshing it out  more in marketing and product packaging alike.     * Greenplum both  denies the concurrency problems I previously noted and also has a very  credible story as to how it will eliminate them. :) Seriously, Greenplum  tells of one customer that routinely runs 150 simultaneously queries -  on what I think is not a terribly big system -- and a number of POCs  (Proofs of Concept) that simulated similar levels of concurrency.">Northscale&#8217;s  memcached-compatible persistent store Membase</a>. The main reason is  that they showed me a near-term path to interfaces that are richer than  key-value. Also, Todd Hoff reassured me that even pure persistent  memcached has a place.</li>
<li>Rumor says that even the one app for which Facebook was using Cassandra &#8212; in-box search &#8212; has been decommissioned. On the other hand, numerous other scale-0ut DBMS (SQL or otherwise) seem to have Facebook footholds. But details are &#8212; all together now! &#8212; NDA.</li>
</ul>
<p><em>*If you know ParAccel&#8217;s new marketing chief Michael Weir, you  surely guessed I mean that only in a figurative sense.</em></p>
<p>From elsewhere:</p>
<ul>
<li>Daniel Abadi offered <a href="http://dbmsmusings.blogspot.com/2010/08/thoughts-on-kickfires-apparent-demise.html" onclick="javascript:pageTracker._trackPageview('/dbmsmusings.blogspot.com');">his  analysis</a> of <a href="../2010/07/27/kickfire-unlikely-to-survive/">Kickfire&#8217;s  demise</a>. In general I agree, but Daniel neglected to mention one  hugely important factor &#8212; the chicken-egg negative effect of Kickfire&#8217;s  lack of market or marketing traction. Customers were extremely reluctant to buy from Kickfire  because they perceived, correctly, that Kickfire&#8217;s survivability was far  from assured.</li>
<li>While the <a href="http://infinidb.org/community/forums/11-general-infinidb/1000-strange-issue-with-drop-table" onclick="javascript:pageTracker._trackPageview('/infinidb.org');">InfiniDB forums</a> suggest that there are at least a couple of production users of Calpont&#8217;s free InfiniDB, Calpont seemingly has a long way to go to be even as successful as Kickfire. But Calpont does have a bit of money to spend on lead generation; maybe some day they&#8217;ll even have actual customers.</li>
<li>In a response to a question I messaged over, <a href="http://www.dbms2.com/2010/03/18/xtremedata-update/" >XtremeData</a> tells me they have actual customers now. Press releases to follow.</li>
<li>The <a href="http://news.cnet.com/8301-31021_3-20013111-260.html?part=rss&amp;subj=news&amp;tag=2547-1_3-0-20" onclick="javascript:pageTracker._trackPageview('/news.cnet.com');">admiration for the job Mark Hurd did at HP</a> is in my opinion overstated. Sure, the financial/operational management appeared to work, but HP did little on Hurd&#8217;s watch to strengthen its reputation or customers&#8217; loyalty. In particular:
<ul>
<li>HP&#8217;s analytics efforts have accomplished little.</li>
<li>HP&#8217;s data warehouse appliance efforts have failed pathetically.</li>
<li>From what I hear, HP&#8217;s execution in its Exadata partnership was not good.</li>
<li>HP&#8217;s server business in general is distinguished mainly by HP being a big company.</li>
<li>HP&#8217;s EDS acquisition has been rocky, not that EDS was sailing so smoothly on its own beforehand.</li>
<li>HP&#8217;s success in PCs amounts to &#8220;arguably, HP sucks a little less than the other guys&#8221;.</li>
<li>HP&#8217;s elite reputation is long gone (admittedly, for the most part that predates Hurd).</li>
</ul>
</li>
<li><a href="http://intelligent-enterprise.informationweek.com/blog/archives/2010/08/software_innova.html" onclick="javascript:pageTracker._trackPageview('/intelligent-enterprise.informationweek.com');">Doug Henschen</a> evidently favors really strong intellectual property protection for software, even forbidding plug-compatible reverse engineering. I agree with Doug up to the point that <a href="http://www.monashreport.com/2010/07/19/my-view-of-intellectual-property/" onclick="javascript:pageTracker._trackPageview('/www.monashreport.com');">it should be forbidden to copy proprietary software</a>, but I don&#8217;t see why he (or a court) would view such behavior as copying.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/09/links-and-observations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nested data structures keep coming up, especially for log files</title>
		<link>http://www.dbms2.com/2010/07/31/nested-data-structures-keep-coming-up-especially-for-log-files/</link>
		<comments>http://www.dbms2.com/2010/07/31/nested-data-structures-keep-coming-up-especially-for-log-files/#comments</comments>
		<pubDate>Sat, 31 Jul 2010 10:42:06 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[eBay]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2723</guid>
		<description><![CDATA[Nested data structures have come up several times now, almost always in the context of log files.

Google has published about a project called Dremel. Per Tasso Agyros, one of Dremel&#8217;s key concepts is nested data structures.
Those arrays that the XLDB/SciDB folks keep talking about are meant to be nested data structures. Scientific data is of [...]]]></description>
			<content:encoded><![CDATA[<p>Nested data structures have come up several times now, almost always in the context of log files.</p>
<ul>
<li>Google has published about a project called <a href="http://www.asterdata.com/blog/index.php/2010/07/19/google%E2%80%99s-dremel-%E2%80%93-or-can-mapreduce-itself-handle-fast-interactive-querying/" onclick="javascript:pageTracker._trackPageview('/www.asterdata.com');">Dremel</a>. Per Tasso Agyros, one of Dremel&#8217;s key concepts is nested data structures.</li>
<li>Those <a href="http://www.dbms2.com/2009/10/03/issues-in-scientific-data-management/" >arrays</a> that the XLDB/SciDB folks keep talking about are meant to be nested data structures. Scientific data is of course log-oriented. <a href="http://www.dbms2.com/2010/05/22/scidb-and-scientific-database-management/" >eBay was very interested in that project too</a>.</li>
<li>Facebook&#8217;s log files have a big nested data structure flavor.</li>
</ul>
<p>I don&#8217;t have a grasp yet on what exactly is happening here, but it&#8217;s something.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/07/31/nested-data-structures-keep-coming-up-especially-for-log-files/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
