<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Amazon and its cloud</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/amazon-simpledb-s3-ec2/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 09 Feb 2012 09:21:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Quick thoughts on Oracle-on-Amazon</title>
		<link>http://www.dbms2.com/2011/05/24/quick-thoughts-on-oracle-on-amazon/</link>
		<comments>http://www.dbms2.com/2011/05/24/quick-thoughts-on-oracle-on-amazon/#comments</comments>
		<pubDate>Tue, 24 May 2011 13:16:32 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Amazon and its cloud]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4557</guid>
		<description><![CDATA[Amazon has a page up for what it calls Amazon RDS for Oracle Database. You can rent Amazon instances suitable for running Oracle, and bring your own license (BYOL), or you can rent a &#8220;License Included&#8221; instance that includes Oracle Standard Edition One (a cheap version of Oracle that is limited to two sockets). My [...]]]></description>
			<content:encoded><![CDATA[<p>Amazon has a page up for what it calls <a href="http://aws.amazon.com/rds/oracle/">Amazon RDS for Oracle Database</a>. You can rent Amazon instances suitable for running Oracle, and bring your own license (BYOL), or you can rent a &#8220;License Included&#8221; instance that includes Oracle Standard Edition One (a cheap version of Oracle that is limited to two sockets).</p>
<p>My quick thoughts start:</p>
<ul>
<li>Mainly, this isn&#8217;t for production usage. But exceptions might arise when:
<ul>
<li>An  application, from creation to abandonment, is only expected to have a  short lifespan, in support of a specific project.</li>
<li>There is an extreme internal-politics bias to operating versus  capital expenses, or something like that, forcing a user department to cloud production deployment even when it doesn&#8217;t make much rational  sense.</li>
<li>An application is small enough, or the situation is  sufficiently  desperate, that any inefficiencies are outweighed by convenience.</li>
</ul>
</li>
<li>There is non-production appeal. In particular:
<ul>
<li>Spinning up a quick cloud instance can make a lot of sense for a developer.</li>
<li>The same goes if you want to sell an Oracle-based application and need to offer demo/test capabilities.</li>
<li>The same might go for off-site replication/disaster recovery.</li>
</ul>
</li>
</ul>
<p>Of course, those are all standard observations every time something that&#8217;s basically on-premises software is offered in the cloud. They&#8217;re only reinforced by the fact that the only Oracle software Amazon can actually license you is a particularly low-end edition.</p>
<p>And Oracle is indeed on-premises software. In particular, Oracle is hard enough to manage when it&#8217;s on your premises, with a known  hardware configuration; who would want to try to manage a production  instance of Oracle in the cloud?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/05/24/quick-thoughts-on-oracle-on-amazon/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Cassandra technical overview</title>
		<link>http://www.dbms2.com/2010/07/06/cassandra-technical-overview/</link>
		<comments>http://www.dbms2.com/2010/07/06/cassandra-technical-overview/#comments</comments>
		<pubDate>Tue, 06 Jul 2010 09:10:39 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Amazon and its cloud]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[DataStax]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2473</guid>
		<description><![CDATA[Back in March, I talked with Jonathan Ellis of Rackspace, who runs the Apache Cassandra project. I started drafting a blog post then, but never put it up. Then Jonathan cofounded Riptano, a company to commercialize Cassandra, and so I talked with him again in May. Well, I&#8217;m finally finding time to clear my Cassandra/Riptano [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Back in March, I talked with Jonathan Ellis of Rackspace, who runs the Apache Cassandra project. I started drafting a blog post then, but never put it up. Then Jonathan cofounded Riptano, a company to commercialize Cassandra, and so I talked with him again in May. Well, I&#8217;m finally finding time to clear my Cassandra/Riptano backlog. I&#8217;ll cover the more technical parts below, and the more business- or usage-oriented ones in <a href="http://www.dbms2.com/2010/07/06/riptano-and-cassandra-adoption/">a companion Cassandra/Riptano post</a>.</p>
<p style="margin-bottom: 0in;">Jonathan&#8217;s core claims for Cassandra include:</p>
<ul>
<li>Cassandra is shared-nothing.</li>
<li>Cassandra has good approaches to 	replication and partitioning, right out of the box.</li>
<li>In particular, Cassandra is good 	for use cases that distribute a database around the world and want 	to access it at “local” latencies. (Indeed, Jonathan asserts 	that non-local replication is a significant non-big-data Cassandra 	use case.)</li>
<li>Cassandra&#8217;s scale-out is 	application-transparent, unlike sharded MySQL&#8217;s.</li>
<li> Cassandra is fast at both appends 	and range queries, which would be hard to accomplish in a pure 	key-value store.</li>
</ul>
<p style="margin-bottom: 0in;">In general, Jonathan positions Cassandra as being best-suited to handle a small number of operations at high volume, throughput, and speed. The rest of what you do, as far as he&#8217;s concerned, may well belong in a more traditional SQL DBMS.  <span id="more-2473"></span></p>
<p style="margin-bottom: 0in;">Further highlights of our talks included, as best I understood them:</p>
<ul>
<li>Cassandra is based in parts both 	on Google&#8217;s <strong>BigTable</strong> paper of 2006 and Amazon&#8217;s <strong>Dynamo</strong> paper of 2007.
<ul>
<li>The core of what Cassandra takes 	from BigTable is based on <strong>log-structured merge trees,</strong> which 	actually entered the computer science literature in 1996.</li>
<li>Cassand<span style="font-weight: normal;">ra&#8217;s 	approach to horizontal scaling, replication, failover, etc. seems to 	be based Dynamo. </span></li>
</ul>
</li>
<li>There seems to be <strong>a logical 	concept of “row”</strong> in Cassandra, or it&#8217;s at least meaningful 	to use the SQL/relational concept of a “row” when talking about 	Cassandra data. However, Cassandra is closer to being a <strong>column-based 	data store</strong> than a row-based one. (Not the same thing, but 	closer.)</li>
<li>Even so, it only takes a single 	seek to return a whole Cassandra “row”.</li>
<li>Cassandra 	writes data quite differently from the way a classical OLTP DBMS 	would.
<ul>
<li><strong>Cassandra writes just the data 	elements</strong><span style="font-weight: normal;"> – i.e., fields – </span><strong>that are actually being inserted or changed,</strong> not whole 	rows.</li>
<li>One benefit is that Cassandra data 	is very <strong>sparse.</strong> NULLs aren&#8217;t stored in any way, and hence in 	particular take up no space.</li>
<li>Another benefit – and one of the 	core concepts of Cassandra – is that <strong>you can implicitly assume 	different schemas for different rows of the same “table.”</strong> In 	particular, you can add data for columns that you didn&#8217;t envision 	when you first started storing “rows” of the same “table.”</li>
<li><strong>Writes are collected into 	sorted “memtables,” which from time to time are sent to disk.</strong> Once data gets to disk, it&#8217;s <strong>immutable,</strong> except for occasional 	merge/reorganization/garbage collection.
<ul>
<li>Jonathan claims, plausibly, that 	this makes write throughput very fast (because the I/O is 	fundamentally sequential in nature.)</li>
<li>The default as to how long data 	typically stays in memory before it gets persisted to dis<span style="font-weight: normal;">k 	is “whichever comes first of {64 MB written, 300k updates, 1 	hour}”. </span></li>
</ul>
</li>
<li>Cassandra has <strong>durability</strong> – 	guaranteed non-loss of data – assuming fsync is turned on. fsync 	seems to create a 15% or so overhead.</li>
<li>However, Cassandra has <strong>no 	concept of a “transaction.”</strong></li>
<li>As one would 	expect, data can be read even before it has been persisted to disk.</li>
</ul>
</li>
<li>According to 	Jonathan, Cassandra can do about 14,000 writes or 7,000 reads per 	second, on a quad-core server.
<ul>
<li>Those figures scale pretty 	linearly with the number of servers. (There&#8217;s some overhead for 	network latency.)</li>
<li>Those figures assume a five-column 	row.</li>
<li>Cassandra&#8217;s write-performance 	figures are only “mildly sensitive” to the width of the row. 	E.g., doubling row width only gives a 15-20% throughput hit, due to 	some fixed per-row overhead. That said, I imagine going 100X in row 	width would create a major slowdown, although perhaps while 	measuring width more in bytes than in column count.</li>
<li>Cassandra&#8217;s <span style="color: #000080;"><span lang="zxx"><span style="text-decoration: underline;"><a href="http://racklabs.com/%7Ebwilliam/cassandra/04vs05vs06.png">performance</a></span></span></span> has been growing nicely in each point release. Jonathan thinks this 	general trend will continue.</li>
</ul>
</li>
<li>Jonathan thinks Cassandra is 	pretty good at keeping your data safe.
<ul>
<li>Each node has a commit log.</li>
<li>When a node goes down, its writes 	are buffered until it comes back up.</li>
</ul>
</li>
<li>You can run Hadoop MapReduce 	straight against Cassandra files.</li>
<li>A Cassandra node might hold 	anything from 10s of gigabytes to multiple terabytes of data. You 	might want to go with the low end if you want to have lots of cache 	hits.</li>
<li>Solid-state storage would speed up 	Cassandra reads, not writes, and is not widely used with Cassandra 	yet.</li>
<li>Jonathan says Cassandra is really 	good at handling time series data, by which I suspect he means log 	files. <a href="https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/">Cloudkick</a> is a user of this capability.</li>
</ul>
<p style="margin-bottom: 0in;">I certainly didn&#8217;t grasp everything about Cassandra replication and partitioning strategies. That wasn&#8217;t the focus of our talks, and anyway I got the impression they are so flexible that there&#8217;s little that can firmly be said about them. But I did get the impressions:</p>
<ul>
<li>You set your consistency rules in 	the Cassandra API, not on a per-table basis. (I think this means 	that a lack of administrative tools is supposedly a feature, not a 	drawback.)</li>
<li>As a practical matter, Cassandra 	users commonly take one of two approaches to consistency:
<ul>
<li><a href="http://www.dbms2.com/2010/05/01/ryw-read-your-writes-consistency/">RYW consistency</a>, most 	commonly with N = 3 and R = W = 2.</li>
<li>Geographically dispersed eventual 	consistency.</li>
</ul>
</li>
<li>Cassandra data is most commonly 	distributed via consistent hashing, but other options are 	“pluggable.”</li>
<li>If you add a node, the busiest 	note automagically decides to ship some data over, reducing its 	load. Of course, this only works if you get the new node on before 	the old node is so maxed out it doesn&#8217;t have time to do the 	shipping.</li>
</ul>
<p style="margin-bottom: 0in;">When we talked in March, the next release of Cassandra was going to be 0.7. Cassandra 0.7 was going to be a performance/scalability release, for example fixing the flaw that garbage collection read rows into memory one at a time. After that, Cassandra 0.8 was to be a feature release, with one planned feature being more automatic index management and/or materialized-view-like capability, so as to reduce the burden on Cassandra developers of schema management.</p>
<p style="margin-bottom: 0in;"><em><strong>Related links</strong></em></p>
<ul>
<li>My M<span style="font-style: normal;">arch 	<a href="../2010/03/12/some-nosql-links/">NoSQL 	links post</a> included </span>the Google and Amazon papers</li>
<li>The <a href="https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/">March 	2, 2010 Cloudkick post</a> also linked above goes into a lot of 	detail, including what they think is great about Cassandra and what 	they think is still missing</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/07/06/cassandra-technical-overview/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Daniel Abadi on NoSQL design tradeoffs</title>
		<link>http://www.dbms2.com/2010/05/02/daniel-abadi-on-nosql-design-tradeoffs/</link>
		<comments>http://www.dbms2.com/2010/05/02/daniel-abadi-on-nosql-design-tradeoffs/#comments</comments>
		<pubDate>Sun, 02 May 2010 05:30:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Amazon and its cloud]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2019</guid>
		<description><![CDATA[In a thought-provoking post, Daniel Abadi points out NoSQL-related terminological problems similar to the ones I just railed against, and argues To me, CAP should really be PACELC &#8212; if there is a partition (P) how does the system tradeoff between availability and consistency (A and C); else (E) when the system is running as [...]]]></description>
			<content:encoded><![CDATA[<p>In a thought-provoking post, <a href="http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html">Daniel Abadi</a> points out NoSQL-related terminological problems similar to <a href="http://www.dbms2.com/2010/05/01/ryw-read-your-writes-consistency/">the ones I just railed against</a>, and argues</p>
<blockquote><p>To me, CAP should really be PACELC &#8212; if there is a partition (P) how does the system tradeoff between availability and consistency (A and C); else (E) when the system is running as normal in the absence of partitions, how does the system tradeoff between latency (L) and consistency (C)?</p></blockquote>
<p>and goes on to say</p>
<blockquote><p>For example, Amazon’s Dynamo (and related systems like Cassandra and SimpleDB) are PA/EL in PACELC &#8212; upon a partition, they give up consistency for availability; and under normal operation they give up consistency for lower latency. Giving up C in both parts of PACELC makes the design simpler &#8212; once the application is configured to be able to handle inconsistencies, it makes sense to give up consistency for both availability and lower latency.</p></blockquote>
<p>However, I think Daniel&#8217;s improved formulation is still misleading, in at least two ways:</p>
<ul>
<li> Daniel implicitly assumes any given NoSQL system makes a fixed set of tradeoffs, when actually &#8212; as he in fact notes in his post &#8211;  some of them offer tradeoffs that are quite tunable.</li>
<li>I think Daniel is at best oversimplifying when he appears to assert that best-case network latency is an important design criterion for all that many NoSQL systems. Naively, anything that acknowledges reads or writes requires two hops. Two-phase commit (2PC) requires three hops. 33% latency reductions are not the kinds of goals that drive dramatic DBMS redesigns, even though <a href="http://www.dbms2.com/2009/09/10/analytic-speed-latency/">tenths of seconds</a> &#8212; i.e. 100s of milliseconds &#8212; matter in the kinds of environments where NoSQL is sprouting up.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/02/daniel-abadi-on-nosql-design-tradeoffs/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Read-your-writes (RYW), aka immediate, consistency</title>
		<link>http://www.dbms2.com/2010/05/01/ryw-read-your-writes-consistency/</link>
		<comments>http://www.dbms2.com/2010/05/01/ryw-read-your-writes-consistency/#comments</comments>
		<pubDate>Sat, 01 May 2010 04:57:37 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Amazon and its cloud]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1985</guid>
		<description><![CDATA[In which we reveal the fundamental inequality of NoSQL, and why NoSQL folks are so negative about joins. Discussions of NoSQL design philosophies tend to quickly focus in on the matter of consistency. &#8220;Consistency&#8221;, however, turns out to be a rather overloaded concept, and confusion often ensues. In this post I plan to address one [...]]]></description>
			<content:encoded><![CDATA[<p><em>In which we reveal the fundamental inequality of NoSQL, and why NoSQL folks are so negative about joins.</em></p>
<p>Discussions of NoSQL design philosophies tend to quickly focus in on the matter of <strong>consistency.</strong> &#8220;Consistency&#8221;, however, turns out to be a rather overloaded concept, and confusion often ensues.</p>
<p>In this post I plan to address one essential subject, while ducking various related ones as hard as I can. It&#8217;s what Werner Vogel of Amazon called <a href="http://www.allthingsdistributed.com/2007/12/eventually_consistent.html">read-your-writes consistency</a> (a term to which I was actually introduced by Justin Sheehy of Basho). It&#8217;s either identical or very similar to what is sometimes called <a href="http://theryanking.com/entries/2010/04/29/potential-consistency/">immediate consistency</a>, and presumably also to what Amazon has recently called the &#8220;<a href="http://developer.amazonwebservices.com/connect/ann.jspa?annID=611">read my last write</a>&#8221; capability of SimpleDB.</p>
<p><em><strong>This is something every database-savvy person should know about, but most so far still don&#8217;t.</strong> I didn&#8217;t myself until a few weeks ago.</em></p>
<p>Considering the many different kinds of consistency outlined in the Werner Vogel link above or in the Wikipedia <a href="http://en.wikipedia.org/wiki/Consistency_model">consistency models</a> article &#8212; whose names may not always be used in, er, a wholly consistent manner &#8212; I don&#8217;t think there&#8217;s much benefit to renaming <em>read-your-writes consistency</em> yet again. Rather, let&#8217;s just call it <strong>RYW consistency,</strong> come up with a way to pronounce &#8220;RYW&#8221;, and have done with it. (I suggest<strong> &#8220;ree-ooh&#8221;</strong>, which evokes two syllables from the original phrase. Thoughts?)</p>
<p>Definition: <strong>RYW (Read-Your-Writes) consistency</strong> is achieved when<strong> the system guarantees that, once a record has been updated, any attempt to read the record will return the updated value.</strong></p>
<p><span id="more-1985"></span>Here a &#8220;record&#8221; can be a row, a key-value pair, or any similar unit of data. An &#8220;update&#8221; can be whichever of insert/append or true change the system supports.</p>
<p>A conventional relational DBMS will almost always feature RYW consistency. Some NoSQL systems feature tunable consistency, in which &#8212; depending on your settings &#8212; RYW consistency may or many not be assured.</p>
<p>The core ideas of RYW consistency, as implemented in various NoSQL systems, are:</p>
<ul>
<li>Let N = the number of copies of each record distributed across nodes of a parallel system.</li>
<li>Let W = the number of nodes that must successfully acknowledge a write  for it to be successfully committed. By definition, W &lt;= N.</li>
<li>Let R = the number of nodes that must send back the same value of a unit of data for it to be accepted as read by the system. By definition, R &lt;= N.</li>
<li>The greater N-R and N-W are, the more node or network failures you can typically tolerate without blocking work.</li>
<li><strong>As long as R + W &gt; N, you are assured of RYW consistency.</strong></li>
</ul>
<p>That bolded part is the key point, and I suggest that you stop and convince yourself of it before reading further.</p>
<p><em>Example: Let N = 3, W = 2, and R = 2. Suppose you write a record successfully to at least two nodes out of three. Further suppose that you then poll all three of the nodes. Then the only way you can get two values that agree with each other is if at least one of them &#8212; and hence both &#8212; return the value that was correctly and successfully written to at least two nodes in the first place.</em></p>
<p>In a conventional parallel DBMS, N = R = W, which is to say N-R = N-W = 0. Thus, a single hardware failure causes data operations to fail too. For some applications &#8212; e.g., highly parallel OLTP web apps &#8212; that kind of fragility is deemed unacceptable.</p>
<p>On the other hand, if W&lt; N, it is possible to construct edge cases in which two or more consecutive failures cause incorrect data values to actually be returned. So you want to clean up any discrepancies quickly and bring the system back to a consistent state. That is where the idea of <em>eventual consistency</em> comes in, although you definitely can &#8212; and in some famous NoSQL implementations actually do &#8212; have eventual consistency in a system that is not RYW consistent.</p>
<p>Much technology goes into eventual consistency, as well as into the data distribution and polling in the first place. And in tunable systems, the choices of N, R, and W &#8212; perhaps on a &#8220;table&#8221; by &#8220;table&#8221; basis &#8212; can get pretty interesting. I&#8217;m ducking all those subjects for now, however, not least because of how much I still have to learn about them.</p>
<p>One point I will note, however, is this &#8212; <strong>RYW consistency and table joins make for awkward companions</strong>. If you want to join two tables, each of them distributed across some kind of parallel cluster, there are only two possibilities:</p>
<ul>
<li>In most cases, the data you need to join is co-located on the same nodes.</li>
<li>You&#8217;re going to have an awful lot of network traffic.</li>
</ul>
<p>In an R = W = N scenario, co-location may be realistic. But when R &lt; N and W &lt; N, a join can return incorrect results even when both of the tables being joined would have been read correctly.</p>
<p><em>In our example above, we had N = 3 and R = W = 2. Single-table RYW consistency was ensured. But suppose you join two records, each of which had been written correctly to 2 out of 3 nodes &#8212; but with only 1 node being correct about</em> both<em> records. Then only that 1 node out of 3 will return a correct value for the join, and badness will ensue.</em></p>
<p>Any architecture I can think of to circumvent that problem results in &#8212; you guessed it &#8212; an awful lot of network traffic.</p>
<p>And that, folks, is a big part of why <a href="http://www.dbms2.com/2010/03/13/the-naming-of-the-foo/">the NoSQL folks are so negative about joins</a>.</p>
<p><em><strong>Related link</strong></em></p>
<ul>
<li><a href="http://www.dbms2.com/2009/09/13/fault-tolerant-queries/">Query fault-tolerance</a></li>
<li>Huan Liu&#8217;s skepticism as to <a href="http://huanliu.wordpress.com/tag/consistent-read/">whether RYW consistency causes a significant performance hit</a></li>
<li><a href="http://www.dbms2.com/2010/05/02/daniel-abadi-on-nosql-design-tradeoffs/">Daniel Abadi&#8217;s views on NoSQL design tradeoffs</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/01/ryw-read-your-writes-consistency/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Some NoSQL links</title>
		<link>http://www.dbms2.com/2010/03/12/some-nosql-links/</link>
		<comments>http://www.dbms2.com/2010/03/12/some-nosql-links/#comments</comments>
		<pubDate>Fri, 12 Mar 2010 23:51:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Amazon and its cloud]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Continuent]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Tokutek]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1692</guid>
		<description><![CDATA[I plan to post a few things soon about MongoDB, Cassandra, and NoSQL in general. So I&#8217;m poking around a bit reading stuff on the subjects. Here are some links I found. A little over a year ago, Julian Browne put up a great post on Eric Brewer&#8217;s CAP conjecture/theorem, which provides much of the [...]]]></description>
			<content:encoded><![CDATA[<p>I plan to post a few things soon about MongoDB, Cassandra, and NoSQL in general. So I&#8217;m poking around a bit reading stuff on the subjects. Here are some links I found.<span id="more-1692"></span></p>
<ul>
<li>A little over a year ago, Julian Browne put up a great post on <a href="http://www.julianbrowne.com/article/viewer/brewers-cap-theorem">Eric Brewer&#8217;s CAP conjecture/theorem</a>, which provides much of the impetus to relax the traditional requirement for atomicity/consistency.</li>
<li>Even more directly inspirational to NoSQL technology development were two seminal papers: Google&#8217;s on <a href="http://labs.google.com/papers/bigtable.html">BigTable</a> and Amazon&#8217;s on <a href="http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf">Dynamo</a>. (That said, I&#8217;m having trouble getting myself to actually read them from start to finish, especially since they&#8217;ve been superseded by subsequent technology development.)</li>
<li>10gen (the MongoDB guys) hosted a NoSQL conference yesterday. Much blogging has ensued. The best post I&#8217;ve seen so far was by <a href="http://blog.marcua.net/post/442594842/notes-from-nosql-live-boston-2010">Adam Marcus</a>. I find the graph database notes near the bottom particularly interesting.</li>
<li>Mark Callaghan hit back against the <a href="http://mysqlha.blogspot.com/2010/03/plays-well-with-others.html">NoSQL <span style="text-decoration: line-through;">movement</span> hype</a>, and in particular against the <a href="http://www.dbms2.com/2010/03/02/cassandra-nosql-scalable-oltp/">MySQL/memcached is passe</a>&#8216; meme. On the other hand, he also bemoaned many failings of MySQL. On the third hand, he praised or at least expressed hope for a variety of MySQL-related technologies, including <a href="http://www.dbms2.com/2009/04/16/introduction-to-tokutek/">Tokutek&#8217;s TokuDB</a> and <a href="http://www.dbms2.com/2009/09/03/continuent-on-clustering/">Continuent&#8217;s Tungsten</a>.</li>
<li>In connection with that debate, Mark Rendle offered a <a href="http://blog.markrendle.net/2010/03/do-you-need-relational-database.html">funny rant</a>, mainly pro-NoSQL, in the style of a Socratic dialogue.</li>
<li>John Quinn of Digg recently described <a href="http://www.stumbleupon.com/su/5099Ti/about.digg.com/node/564">Digg&#8217;s move from MySQL to Cassandra</a>, and outlined a lot of features Digg was adding to Cassandra, all of which it is open-sourcing.</li>
<li>The NoSQL guys maintain their own long <a href="http://nosql-database.org/links.html">list of NoSQL-related links</a>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/12/some-nosql-links/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Introduction to Gooddata</title>
		<link>http://www.dbms2.com/2009/12/27/introduction-to-gooddata/</link>
		<comments>http://www.dbms2.com/2009/12/27/introduction-to-gooddata/#comments</comments>
		<pubDate>Mon, 28 Dec 2009 03:16:30 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Amazon and its cloud]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Gooddata]]></category>
		<category><![CDATA[Jaspersoft]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1341</guid>
		<description><![CDATA[Around the end of the Cold War, Esther Dyson took it upon herself to go repeatedly to Eastern Europe and do a lot of rah-rah and catalysis, hoping to spark software and other computer entrepreneurs. I don&#8217;t know how many people&#8217;s lives she significantly affected – I&#8217;d guess it&#8217;s actually quite a few – but [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Around the end of the Cold War, Esther Dyson took it upon herself to go repeatedly to Eastern Europe and do a lot of rah-rah and catalysis, hoping to spark software and other computer entrepreneurs. I don&#8217;t know how many people&#8217;s lives she significantly affected – I&#8217;d guess it&#8217;s actually quite a few – but in any case the number is not zero. Roman Stanek, who has built and sold a couple of software business, cites her as a key influence setting him on his path.</p>
<p style="margin-bottom: 0in;">Roman&#8217;s latest venture is business intelligence firm Gooddata. Gooddata was founded in 2007 and has been soliciting and getting attention for a while, so I was surprised to learn that Gooddata officially launched just a few weeks ago. Anyhow, some less technical highlights of the Gooddata story include:<span id="more-1341"></span></p>
<ul>
<li>Gooddata believes it makes BI easy 	to adopt, unlike every other BI vendor on the planet &#8212; not 	excluding the many other BI vendors who say the same thing about 	themselves. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
<li>Gooddata is entirely cloud-based, 	specifically in the Amazon cloud.  I.e., Gooddata is selling 	SaaS-based BI.</li>
<li>Gooddata wants to sell to 	enterprises that are large enough to have more than a couple of BI 	users, and small enough not to be well served by the BI market 	leaders.
<ul>
<li>In revenue terms, this is the ever-popular $100 million &#8211; 	$1 billion market.</li>
<li>Specifically, Gooddata believes 	that those enterprises may have decent “back office” BI, but 	don&#8217;t have much in the front office. Gooddata wants to provide them 	with front office BI, which seems to basically mean CRM analytics. 	Gooddata sees this as a market in which QlikTech is the major 	player.  Generally, Gooddata wants to emulate and go after QlikTech.</li>
<li>Even more specifically, Gooddata 	wants to sell to Salesforce.com customers, who it believes are not 	well-served by what passes for built-in analytics at Salesforce. 	Partnering with NetSuite didn&#8217;t work as well, since NetSuite&#8217;s 	customers turn out to be smaller firms than are in Gooddata&#8217;s target 	market.</li>
</ul>
</li>
<li>Something I heard from both 	Jaspersoft and Gooddata is that there&#8217;s a hot market in providing 	cloud-based BI to online gaming companies. I gather these are mainly 	games running on mass communication platforms such as Facebook or 	the iPhone. Surely not coincidentally, it seems likely that:
<ul>
<li>These are small companies whose 	success – and hence data intake – can suddenly explode.</li>
<li>The data originates in cyberspace, 	with no particular need ever to come to the game companies&#8217; own 	premises.</li>
</ul>
</li>
<li>Gooddata has 50 production 	customers.</li>
<li>Gooddata had 2500 “projects” 	at the end of beta in June, and is adding 100 more per month. (Those 	numbers look weird together.) A “project” is a lot like a 	database, with associated reports, security privileges, etc.</li>
<li>Gooddata has close to 40 people, 	mainly in development.</li>
<li>I didn&#8217;t detect much of a sales 	strategy, nor much of a marketing strategy beyond the impressive 	early buzz generation. Perhaps that&#8217;s a partial explanation as to 	why the rate of Gooddata adoption fell even before the company 	officially launched.</li>
<li>I forgot to ask what those 50 	customers were actually paying, but considering Gooddata&#8217;s price 	list, it appears a typical price range for Gooddata&#8217;s stuff would be 	$500-$2,000/month.</li>
</ul>
<p style="margin-bottom: 0in;">Gooddata technical highlights include:</p>
<ul>
<li>Gooddata is building an 	entire BI stack – reporting, dashboards, ETL, in-memory database 	management, everything. I doubt Gooddata would claim that the pieces 	are best-of-breed in many ways other than BI ease of adoption and 	use.</li>
<li>So far I&#8217;ve seen three Gooddata 	ease-of-use features or feature groups that strike me as 	differentiated – <strong>reusability</strong> (of metrics and/or reports), 	<strong>collaboration,</strong> and <strong>tag clouds.</strong> More on those below. 	Gooddata is also building toward an <strong>agility</strong> pitch, but those 	features aren&#8217;t all baked yet.</li>
<li>Gooddata is MySQL-based today, but 	plans to move to a memory-centric compressed column store in 2010. 	Roman doesn&#8217;t reject analogies to SAP&#8217;s <em>BI/BW/whatever 	Accelerator. </em><span style="font-style: normal;">Yes, folks – 	Gooddata is yet another BI vendor doing some form of memory-centric 	OLAP. That&#8217;s a big trend.</span></li>
<li>I&#8217;m guessing 	that a big reason Gooddata is reinventing so many technical wheels 	is to ensure that the Gooddata stack is seamlessly multi-tenant from 	top to bottom. (Hasso Plattner of SAP&#8217;s <a href="../2009/07/07/hasso-plattner-calls-for-in-memory-oltp-column-stores/">comments 	on a similar idea</a> suggest a similar emphasis.)</li>
<li>Gooddata has 	its own multidimensional query language called MAQL (the A doesn&#8217;t 	seem to stand for anything). Today MAQL generates SQL for MySQL. The 	future columnar memory-centric data store will &#8212; I think – 	understand MAQL natively.</li>
</ul>
<p style="margin-bottom: 0in;">Now we get to the good stuff. When I wrote about <a href="../2009/05/30/reinventing-business-intelligence/">reinventing business intelligence</a> back in May, I focused on some interesting developments I see as actually underway &#8212; at least on an experimental basis and/or from small vendors – namely:</p>
<ul>
<li><strong>Text-search interfaces. </strong>Well, 	while I didn&#8217;t see true text search in the Gooddata demo, I did see 	tag clouds, which have some of the same benefits.</li>
<li><strong>Collaboration tools.</strong> Well, 	Gooddata has a nice-looking approach to BI collaboration, heavily 	reflected in its UI metaphors. (That said, I haven&#8217;t really compared 	Gooddata to Microsoft SharePoint or SAP&#8217;s Portal/Rooms/whatever.)</li>
<li><strong>Memory-centric analytics</strong> (for speed of exploration). As noted above, Gooddata has that coming 	soon.</li>
<li><strong>Data exploration that tries to 	ignore fixed relational schemas,</strong> ala Attivio or Splunk.  Roman 	says Gooddata is interested in or working on that, but offers no 	timetable.</li>
</ul>
<p style="margin-bottom: 0in;">Meanwhile, something I&#8217;ve been seeking for years, but haven&#8217;t seen much progress on since enhancement stopped on Cognos Metrics Manager, is more <a href="../2007/11/13/the-key-problem-with-dashboard-functionality/">user-friendly metrics management</a>.  Well, it doesn&#8217;t have a lot of bells and whistles, but at least Gooddata has the basics – a list of already-defined metrics, and a reasonable way of compounding them into other metrics. I think that kind of thing will be a major BI feature going forward, to the point that a few years from now we&#8217;ll be worrying about how to port them from one BI vendor&#8217;s tool from another.</p>
<p style="margin-bottom: 0in;"><strong>Bottom line: If you&#8217;re interested in BI, you should look at a Gooddata demo.</strong></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/12/27/introduction-to-gooddata/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Sneakernet to the cloud</title>
		<link>http://www.dbms2.com/2009/05/29/sneakernet-to-the-cloud/</link>
		<comments>http://www.dbms2.com/2009/05/29/sneakernet-to-the-cloud/#comments</comments>
		<pubDate>Sat, 30 May 2009 03:06:04 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Amazon and its cloud]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=793</guid>
		<description><![CDATA[Recently, Amazon CTO Werner Vogels put up a blog post which suggested that, now and in the future, the best way to get large databases into the cloud is via sneakernet.  In some circumstances, he is surely right. Possible implications include: When sending data to the cloud, you probably want to compress it to the [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, Amazon CTO Werner Vogels put up a blog post which suggested that, now and in the future, <a href="http://www.allthingsdistributed.com/2009/05/amazon_import_export.html">the best way to get large databases into the cloud is via sneakernet</a>.  In some circumstances, he is surely right. Possible implications include:</p>
<ul>
<li>When sending data to the cloud, you probably want to <strong>compress</strong> it to the max before sending. <a href="http://www.dbms2.com/2009/05/14/the-secret-sauce-to-clearpaces-compression/">Clearpace&#8217;s</a> new <a href="http://www.rainstor.com/">RainStor</a> structured-data archiving service emphasizes that idea. RainStor marketing says cloud, cloud, cloud &#8212; but Clearpace thinks you really should have a bit of its software onsite too, to compress the data before sending it across the wire.</li>
<li><strong>Getting data from one cloud to another cloud could be problematic.</strong> I&#8217;m fond of saying that weblog data naturally lives in the cloud at your hosting company&#8217;s location, so you should analyze it there too. But this makes the most sense if you analyze it or at least filter/reduce it in place.  (That said, the really, really big web companies have lots of different data centers, and presumably do move huge amounts of log data from place to place.)</li>
</ul>
<p>But for one-time moves of data sets &#8212; sure, sneaker net/snail mail should work just fine.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/05/29/sneakernet-to-the-cloud/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Maybe Amazon should be using a real DBMS after all</title>
		<link>http://www.dbms2.com/2009/04/14/maybe-amazon-should-be-using-a-real-dbms-after-all/</link>
		<comments>http://www.dbms2.com/2009/04/14/maybe-amazon-should-be-using-a-real-dbms-after-all/#comments</comments>
		<pubDate>Tue, 14 Apr 2009 10:06:14 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Amazon and its cloud]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=750</guid>
		<description><![CDATA[Supposedly Amazon managers found that an employee who happened to work in France had filled out a field incorrectly and more than 50,000 items got flipped over to be flagged as &#8220;adult,&#8221; the source said. (Technically, the flag for adult content was flipped from &#8216;false&#8217; to &#8216;true.&#8217;) &#8220;It&#8217;s no big policy change, just some field [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.seattlepi.com/amazon/archives/166384.asp">Supposedly</a></p>
<blockquote><p>Amazon managers found that an employee who happened to work in France had filled out a field incorrectly and more than 50,000 items got flipped over to be flagged as &#8220;adult,&#8221; the source said. (Technically, the flag for adult content was flipped from &#8216;false&#8217; to &#8216;true.&#8217;)</p>
<p>&#8220;It&#8217;s no big policy change, just some field that&#8217;s been around forever filled out incorrectly,&#8221; the source said.</p>
<p>Amazon employees worked on the problem well past midnight, and then handed it over to an international team, he said.</p></blockquote>
<p>This was the best practice for reversing an error &#8212; how? Is SimpleDB somehow implicated? If this story is remotely true, and if there&#8217;s a sensible database architecture, I can&#8217;t imagine why there wouldn&#8217;t be a faster fix.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/04/14/maybe-amazon-should-be-using-a-real-dbms-after-all/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Amazon Elastic MapReduce</title>
		<link>http://www.dbms2.com/2009/04/03/amazon-elastic-mapreduce/</link>
		<comments>http://www.dbms2.com/2009/04/03/amazon-elastic-mapreduce/#comments</comments>
		<pubDate>Fri, 03 Apr 2009 08:57:56 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Amazon and its cloud]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[MapReduce]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=744</guid>
		<description><![CDATA[Amazon is introducing a beta of Amazon Elastic MapReduce.  What it boils down to is cheap, on-demand Hadoop. This seems like a great way to experiment with MapReduce and see if you like it. But for serious use, I don&#8217;t know why you wouldn&#8217;t prefer MapReduce more closely integrated into a DBMS.]]></description>
			<content:encoded><![CDATA[<p>Amazon is introducing a beta of <a href="http://aws.amazon.com/elasticmapreduce/">Amazon Elastic MapReduce</a>.  What it boils down to is cheap, on-demand Hadoop.</p>
<p>This seems like a great way to experiment with MapReduce and see if you like it. But for serious use, I don&#8217;t know why you wouldn&#8217;t prefer MapReduce <a href="http://www.dbms2.com/2008/09/05/three-different-implementations-of-mapreduce/">more closely integrated into a DBMS</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/04/03/amazon-elastic-mapreduce/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>April Fool&#8217;s Day highlights</title>
		<link>http://www.dbms2.com/2009/04/01/april-fools-day-highlights/</link>
		<comments>http://www.dbms2.com/2009/04/01/april-fools-day-highlights/#comments</comments>
		<pubDate>Wed, 01 Apr 2009 07:31:00 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Amazon and its cloud]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Humor]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=738</guid>
		<description><![CDATA[Amazon says it&#8217;s taking &#8220;cloud&#8221; computing to new heights, as it were. Derivative funds and large government-subsidized entities will be especially interested in FACE’s transmodal operation. They can allocate a dedicated FACE, load it up with data, and then send it out to sea to perform advanced processing in safety. The government will have absolutely [...]]]></description>
			<content:encoded><![CDATA[<p>Amazon says <a href="http://aws.typepad.com/aws/2009/03/up-up-and-away-cloud-computing-reaches-for-the-sky.html">it&#8217;s taking &#8220;cloud&#8221; computing to new heights</a>, as it were.</p>
<blockquote><p>Derivative funds and large government-subsidized entities will be especially interested in FACE’s transmodal operation. They can allocate a dedicated FACE, load it up with data, and then send it out to sea to perform advanced processing in safety. The government will have absolutely no chance of acting against them, because they will be too busy trying to decide which Federal Air Regulation (FAR) was violated, not to mention scheduling news conferences.</p></blockquote>
<p>First excellent April Fool&#8217;s joke I saw this year was from <a href="http://www.texttechnologies.com/2009/04/01/april-fools-spoof-re-newspapers-social-media/"><em>The Guardian</em></a>.  The best so far is from <a href="http://www.expedia.com/daily/mars/flights-to-mars/?mcicid=Mars_home_us">Expedia</a>.  Others are linked in <a href="http://twitter.com/CurtMonash">my Twitter feed</a>.  And personally, I&#8217;m encouraging the concept of <a href="http://www.networkworld.com/community/node/40460">April No-Fooling Day</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/04/01/april-fools-day-highlights/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 0.315 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2012-02-09 13:25:29 -->

