<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS2 -- DataBase Management System Services &#187; OLTP</title>
	<atom:link href="http://www.dbms2.com/category/database-management-system/online-transaction-processing-oltp/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Fri, 19 Mar 2010 15:49:58 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The Naming of the Foo</title>
		<link>http://www.dbms2.com/2010/03/13/the-naming-of-the-foo/</link>
		<comments>http://www.dbms2.com/2010/03/13/the-naming-of-the-foo/#comments</comments>
		<pubDate>Sat, 13 Mar 2010 22:47:06 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Mark Logic]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1703</guid>
		<description><![CDATA[Let&#8217;s start from some reasonable premises.

No technology category name is 	ever perfect.
It&#8217;s particularly hard to describe 	NoSQL (Not Only SQL) accurately, given the basic confusion as to 	what NoSQL is all about.
That said, it 	seems pretty clear that NoSQL is about making big websites (and 	perhaps other cloud-like installations) run and scale.
Dwight Merriman (founder/CEO of [...]]]></description>
			<content:encoded><![CDATA[<p>Let&#8217;s start from some reasonable premises.<span id="more-1703"></span></p>
<ul>
<li><a href="http://www.strategicmessaging.com/monashs-first-law-of-commercial-semantics-explained/2009/01/09/" onclick="javascript:pageTracker._trackPageview('/www.strategicmessaging.com');">No technology category name is 	ever perfect</a>.</li>
<li>It&#8217;s particularly hard to describe 	NoSQL (Not Only SQL) accurately, given <a href="http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/" >the basic confusion as to 	what NoSQL is all about</a>.</li>
<li>That said, it 	seems pretty clear that NoSQL is about making big websites (and 	perhaps other cloud-like installations) run and scale.</li>
<li>Dwight Merriman (founder/CEO of 	MongoDB vendor 10gen) is heading in the right direction when he says 	that the unifying ideas of NoSQL are that you do away with 	transactions and joins. But if he&#8217;s ever said something like “NoSQL 	is Foo without joins and transactions,” I don&#8217;t know what Foo is.</li>
<li><span style="font-style: normal;">Actually, 	I do know what Foo is – Foo is what happens when lots of people 	want to get small amounts each of information in or out of a 	database at the same time. I just don&#8217;t know what Foo is called.</span></li>
<li>Obviously, Foo is a lot like OLTP 	(OnLine Transaction Processing). However, it would be pretty silly 	for Foo to actually be OLTP, given that one of the core points of 	NoSQL is that you don&#8217;t have transactions.</li>
<li>It not just the “T” part of 	OLTP that&#8217;s fried.  Calling something “OnLine” only makes sense 	as long as offline is an option, and offline transaction processing 	has been obsolete for a very long time.*</li>
</ul>
<p style="margin-bottom: 0in;"><em>*Sure, if you strain you can talk yourself into exceptions. But the point stands.</em></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">So we need a name for Foo, where Foo is what happens when</span><span style="font-style: normal;"><strong> lots of people want to get small amounts each of information in or out of a database at the same time.</strong></span><span style="font-style: normal;"> Thus, three major subcategories of more-or-less disk-based Foo are:</span></p>
<ul>
<li><span style="font-style: normal;">No-compromises 	ACID-compliant relational OLTP</span></li>
<li><span style="font-style: normal;">Sharded 	MySQL</span></li>
<li>NoSQL</li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">There may be some more purely memory-centric versions too, but let&#8217;s put those aside for the moment. </span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">Absent a better idea, I can squeeze Foo into yet another four-letter acronym:</span></p>
<p style="margin-bottom: 0in;"><strong><span style="font-style: normal;">HVSP (High-Volume Simple Processing)</span></strong></p>
<p style="margin-bottom: 0in; font-style: normal;">That&#8217;s as imperfect as any other category name, and an awkward mouthful to boot. So I&#8217;d love to hear a better one; if you have such, please share it!  In the mean time, I think “HVSP” has merit because:</p>
<ul>
<li><span style="font-style: normal;">The 	“Processing” part should be noncontroversial.</span></li>
<li>“<span style="font-style: normal;">High-Volume” 	is inherent to the challenge. If RDBMS scale well enough for your 	use case, using something less powerful is probably silly.*  	Similarly, while Oracle shines at high-volume OLTP workloads, there 	are many cheaper DBMS that do a fine job of OLTP at lower volumes.</span></li>
<li>“<span style="font-style: normal;">Simple” 	is the core principle of NoSQL systems, which drop joins and 	transactions as being too much foofarah.  That only makes sense at 	all under the assumption that you have bone-simple queries and 	updates, so that programming around the lack of joins and 	transactions isn&#8217;t all that much of a burden.</span></li>
<li><span style="font-style: normal;">Something 	similar is true of sharded MySQL.</span></li>
<li><span style="font-style: normal;">Less 	obviously, “simple” is a core principle of relational OLTP as 	well. The point of the relational model is to cap the complexity of 	data operations, or more precisely to hide that complexity from 	programmers.</span></li>
<li><span style="font-style: normal;">And 	overloading the word “simple” a bit, it&#8217;s fair to say that if 	you&#8217;re reading or writing one record at a time, you&#8217;re doing 	something relatively simple, at least as opposed to what you do in 	analytic processing. The OLTP vs. OLAP distinction is preserved in 	this name change.</span></li>
<li><span style="font-style: normal;">The whole thing matches my definition above, namely &#8220;what happens when lots of people want to get small amounts each of information in or out of a database at the same time.&#8221;</span></li>
</ul>
<p style="margin-bottom: 0in;"><em>*Assuming, of course, that rows-and-tables are a good metaphor for your data structure in the first place.</em></p>
<p style="margin-bottom: 0in; font-style: normal;">Systems I&#8217;m leaving out of the HVSP and hence also NoSQL categories include:</p>
<ul>
<li><span style="font-style: normal;"><strong>Hadoop 	and other batch-oriented MapReduce.</strong></span><span style="font-style: normal;"> Hadoop isn&#8217;t part of NoSQL. I&#8217;m pretty sure that </span><a href="http://twitter.com/mikeolson/status/10388695185" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Cloudera 	CEO Mike Olson</a><span style="font-style: normal;"> agrees with me.</span></li>
<li><span style="font-style: normal;"><span style="font-weight: normal;">More 	generally, </span></span><span style="font-style: normal;"><strong>non-SQL 	data stores that don&#8217;t meet the HVSP criteria.</strong></span><span style="font-style: normal;"> Dave Kellogg stretches things when he claims that <a href="http://www.kellblog.com/2010/03/10/ieee-computer-society-article-on-nosql-an-executive-level-overview/" onclick="javascript:pageTracker._trackPageview('/www.kellblog.com');">MarkLogic 	is a NoSQL system</a>. (But then, that was in a post where he 	seemingly praised </span><a href="http://www.dbms2.com/2009/12/11/nosql-q-and-a/" >a train wreck of an article</a><span style="font-style: normal;">.)</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">But hey – what good is a categorization if it doesn&#8217;t leave some things out?</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/13/the-naming-of-the-foo/feed/</wfw:commentRss>
		<slash:comments>25</slash:comments>
		</item>
		<item>
		<title>Cassandra and the NoSQL scalable OLTP argument</title>
		<link>http://www.dbms2.com/2010/03/02/cassandra-nosql-scalable-oltp/</link>
		<comments>http://www.dbms2.com/2010/03/02/cassandra-nosql-scalable-oltp/#comments</comments>
		<pubDate>Tue, 02 Mar 2010 19:01:13 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1675</guid>
		<description><![CDATA[Todd Hoff put up a provocative post on High Scalability called MySQL and Memcached: End of an Era? The post itself focuses on observations like:

Facebook invented and is adopting Cassandra.
Twitter is adopting Cassandra.
Digg is adopting Cassandra.
LinkedIn invented and is adopting Voldemort.
Gee, it seems as if the super-scalable website biz has moved beyond MySQL/Memcached.

But in addition, he [...]]]></description>
			<content:encoded><![CDATA[<p>Todd Hoff put up a provocative post on High Scalability called <a href="http://highscalability.com/blog/2010/2/26/mysql-and-memcached-end-of-an-era.html" onclick="javascript:pageTracker._trackPageview('/highscalability.com');">MySQL and Memcached: End of an Era?</a> The post itself focuses on observations like:</p>
<ul>
<li>Facebook invented and is adopting Cassandra.</li>
<li>Twitter is adopting Cassandra.</li>
<li>Digg is adopting Cassandra.</li>
<li>LinkedIn invented and is adopting Voldemort.</li>
<li>Gee, it seems as if the super-scalable website biz has moved beyond MySQL/Memcached.</li>
</ul>
<p>But in addition, he provides a lot of useful links, which DBMS-oriented folks such as myself might have previously overlooked. <span id="more-1675"></span>Following those trails gets one to, among other things:</p>
<ul>
<li>A September, 2009 post outlining <a href="http://about.digg.com/blog/looking-future-cassandra" onclick="javascript:pageTracker._trackPageview('/about.digg.com');">Digg&#8217;s reasons for moving to Cassandra</a>. The core idea is that joining two tables is expensive; it&#8217;s cheaper to store the results prejoined on disk. Details are provided.</li>
<li>A February, 2010 post outlining <a href="http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king" onclick="javascript:pageTracker._trackPageview('/nosql.mypopescu.com');">Twitter&#8217;s reasons for moving to Cassandra</a>. They boil down to &#8220;sufficiently scalable, sufficiently simple, sufficiently robust, robustly open source.&#8221;</li>
<li>A <a href="http://www.niallkennedy.com/blog/uploads/flickr_php.pdf" onclick="javascript:pageTracker._trackPageview('/www.niallkennedy.com');">Flickr slide presentation</a> saying &#8220;normalization is for wimps&#8221;. They seemed to be staying with MySQL, but lusting after XPath.</li>
<li>A nice <a href="http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/" onclick="javascript:pageTracker._trackPageview('/blog.evanweaver.com');">Cassandra technical overview</a> by Evan Weaver of Twitter.</li>
</ul>
<p>I also recall seeing something that said &#8220;We have 13X as many queries as updates, so of course we should optimize for reads,&#8221; but I can&#8217;t find that now. The classical OLTP answer to that would probably be &#8220;Yeah, but by the time you&#8217;re two-phase-committing and integrity-checking all the part of that update, it turns out updates are still what you should optimize for.&#8221; Well, what if the update is so simple that that&#8217;s no longer a valid argument?</p>
<p>There certainly seem to be some non-obvious technical choices being made here, with options being conflated that perhaps shouldn&#8217;t be. In particular, I wonder whether things are being written to cheap disk in a really fast way when it might be better to keep them in more expensive RAM or, perhaps better yet, solid-state memory. Perhaps then the functionality/performance tradeoff wouldn&#8217;t be so painful.</p>
<p>On the other hand, the designers of the world&#8217;s most scalable websites &#8212; e-commerce sites perhaps excepted &#8212; seem pretty unanimous in thinking it&#8217;s best to bake some database/integrity management into the applications, rather than offload it all to an RDBMS. Why? Because the transactions are so simple that hand-coding all that isn&#8217;t prohibitive. And of course because of their extreme performance and scalability needs.</p>
<p>I&#8217;m not sure on what basis one could argue that they&#8217;re wrong.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/02/cassandra-nosql-scalable-oltp/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Two cornerstones of Oracle’s database hardware strategy</title>
		<link>http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/</link>
		<comments>http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/#comments</comments>
		<pubDate>Fri, 22 Jan 2010 08:59:23 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cache]]></category>
		<category><![CDATA[DBMS product categories]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1429</guid>
		<description><![CDATA[After several months of careful optimization, Oracle managed to pick the most inconvenient* day possible for me to get an Exadata update from Juan Loaiza. But the call itself was long and fascinating, with the two main takeaways being:

Oracle      thinks flash memory is the most important hardware technology of the [...]]]></description>
			<content:encoded><![CDATA[<p>After several months of careful optimization, Oracle managed to pick the most inconvenient* day possible for me to get an Exadata update from Juan Loaiza. But the call itself was long and fascinating, with the two main takeaways being:</p>
<ul>
<li>Oracle      thinks <strong>flash memory is the most important hardware technology of the      decade,</strong> one that could lead to Oracle being “bumped off” if they don’t      get it right.</li>
<li>Juan      believes <strong>the “bulk” of Oracle’s business will move over to Exadata-like      technology over the next 5-10 years. </strong>Numbers-wise, this seems to be based more      on Exadata being a platform for consolidating an enterprise’s many Oracle databases than it is on Exadata running a few Especially Big Honking Database      management tasks.</li>
</ul>
<p>And by the way, Oracle doesn’t make its storage-tier software available to run on anything than Oracle-designed boxes.  At the moment, that means Exadata Versions 1 and 2. Since Exadata is by far Oracle’s best DBMS offering (at least in theory), that means <strong>Oracle’s best database offering only runs on specific Oracle-sold hardware platforms.<span id="more-1429"></span></strong> <em></em></p>
<p><em>*E.g., I was sitting upstairs in my parents’ apartment in </em><em>Columbus</em><em>, </em><em>OH</em><em> having the call while their doctor, who I’ve never met, was visiting downstairs. He offered to make a special trip back Saturday afternoon because he missed me Wednesday, but he’s notorious for not coming when he says he will.</em> <em>Update: He didn&#8217;t come Saturday. On Saturday he said he&#8217;d come Sunday. He didn&#8217;t do that either. </em></p>
<p>Other high- and lowlights of our conversation included:</p>
<ul>
<li>Flash      is the main new hardware element in Exadata Version 2. Otherwise, Exadata      2 is just an annual refresh of Exadata Version 1 to include updated      components (Nehalem chips, bigger disk drives, etc.)</li>
<li>Juan      thinks it’s suboptimal to use flash memory through the bottleneck of disk      controllers, favoring PCIe cards instead. (I emphatically agree.)</li>
<li>Juan      resolutely ducked questions about <a href="../../../../../2009/09/25/the-hunt-for-oracle-exadata-production-references/">actual      Exadata production deployment</a>. Literally the only fact he shared in      that regard is that there are at least 2 Exadata production systems      running that each have 2 or more racks cabled together.</li>
<li>Juan      stressed that Exadata runs apps written over Oracle DBMS unchanged.</li>
<li>When      making mixed-workload claims for Exadata 2, Juan stressed consolidation of      multiple databases, some OLTP and some analytic. He didn’t really argue      with my skepticism about <a href="../../../../../2009/09/29/integration-oltp-data-warehousing-exadata-2/">integrating      OLTP and analytics in the same database</a>, with one exception:</li>
<li>Juan      pointed out that in major OLTP apps such as ERP systems, there often is      actually more processing going on in reporting and other batch stuff than      there is in true OLTP.</li>
<li>Exadata      2’s flash memory is designed as a disk cache, smarter than LRU (Least      Recently Used). The two examples Juan gave of “smarter than LRU” are that      backups and table scans don’t flush the cache.</li>
<li>I      forget whether this is new in Exadata 2 (I think it is), but anyhow –      Exadata has a “Storage Index” that’s a lot like a <a href="../../../../../2006/09/20/netezza-vs-conventional-data-warehousing-rdbms/">Netezza      zone map</a>. I.e., for each megabyte or so of data it stores the min and      max value of every column; if a query predicate rules out those ranges,      that megabyte is never retrieved.</li>
<li>Oracle      has long offered what sounds like flexible workload management capability,      and this has now been extended to specifically include I/O resources on      the storage tier.</li>
<li>This      isn’t Exadata-specific, but Oracle has built a file system on top of its      DBMS, optimized for speed, which helps with, e.g., ELT      (Extract/Load/Transform). Evidently, it’s not at all the same thing as      Mark Benioff’s 1990s Microsoft-annoying IFS (Internet File System)      project, which seems to have morphed into a content management SDK.</li>
</ul>
<p>Highlights specifically in the area of parallelization included:</p>
<ul>
<li>Juan      stressed that all databases consolidated onto an Exadata machine      are/should be striped across all storage units.</li>
<li>On the      other hand, Juan said that different databases should be confined to      specific cores or CPUs on the database tier.</li>
<li>But on      the third hand, Juan also stressed – in what could be called a “private      cloud” pitch – that there’s great elasticity as to which databases are      matched to which server CPUs.</li>
<li>Contrary      to what <a href="../../../../../2008/09/28/exadata-oracle-database-machine-parallelization/">I      thought he and/or his colleagues told me a year ago</a>, Juan said RAC      (Real Application Clusters) is a big part of Oracle’s data warehouse      processing.</li>
<li>However,      Juan says that what I regard(ed) as a major objection to Oracle’s      database-tier parallelization &#8212; the need to manually specify “degrees of      parallelism” &#8212; has now been obviated by automation. Juan thinks that few      data warehouse DBAs will now need to manually tune parallelism, with minor      exceptions. One exception he cites is that if a nightly report really is      non-urgent, it can just be forced to run on a single core with no chance      to grab more resources. (However, Juan thinks manual tuning of parallelism      will continue to play a greater role in OLTP.)</li>
</ul>
<p>OK. That’s all I can get done tonight (see above re: inconvenience of timing). Follow-on subjects I’d like to and indeed plan to post about include:</p>
<ul>
<li>What      Juan said about hybrid columnar compression</li>
<li>Oracle’s      delightfully non-confidential slide deck, and a few comments about same</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Intersystems Cache&#8217; highlights</title>
		<link>http://www.dbms2.com/2010/01/15/intersystems-cache-highlights/</link>
		<comments>http://www.dbms2.com/2010/01/15/intersystems-cache-highlights/#comments</comments>
		<pubDate>Fri, 15 Jan 2010 08:07:25 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Emulation, transparency, portability]]></category>
		<category><![CDATA[Intersystems and Cache']]></category>
		<category><![CDATA[Mid-range]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Object]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1400</guid>
		<description><![CDATA[I talked with Robert Nagle of Intersystems last week, and it went better than at least one other Intersystems briefing I&#8217;ve had. Intersystems&#8217; main product is Cache&#8217;, an object-oriented DBMS introduced in 1997 (before that Intersystems was focused on the fourth-generation programming language M, renamed from MUMPS). Unlike most other OODBMS, Cache&#8217; is used for [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I talked with Robert Nagle of Intersystems last week, and it went better than at least <a href="http://www.monashreport.com/2006/05/13/burning-issues-in-an-analysts-life/" onclick="javascript:pageTracker._trackPageview('/www.monashreport.com');">one other Intersystems briefing I&#8217;ve had</a>. Intersystems&#8217; main product is Cache&#8217;, an object-oriented DBMS introduced in 1997 (before that Intersystems was focused on the fourth-generation programming language M, renamed from MUMPS). Unlike most other OODBMS, Cache&#8217; is used for a lot of stuff one would think an RDBMS would be used for, across all sorts of industries. That said, there&#8217;s a distinct health-care focus to Intersystems, in that:</p>
<ul>
<li>MUMPS, the original Intersystems 	technology, was focused on health care.</li>
<li>The reasons Intersystems went 	object-oriented have a lot to do with <a href="../2008/08/16/intersystems-cache-microsoft-sql-serve/">the 	structure of health-care records</a>.</li>
<li>Intersystems&#8217; biggest and most 	visible ISVs are in the health-care area.</li>
<li>Intersystems is actually beginning 	to sell an electronic health records system called TrakCare around 	the world (but not in the US, where it has lots of large competitive 	VARs).</li>
</ul>
<p style="margin-bottom: 0in;"><em>Note: Intersystems Cache&#8217; is sold mainly through VARs (Value-Added Resellers), aka ISVs/OEMs. I.e., it&#8217;s sold by people who write applications on top of it.</em></p>
<p style="margin-bottom: 0in;">So far as I understand – and this is still pretty vague and apt to be partially erroneous – the Intersystems Cache&#8217; technical story goes something like this:<span id="more-1400"></span></p>
<ul>
<li>Intersystems Cache&#8217; is an object-oriented DBMS.</li>
<li>The preferred language for talking 	to Intersystems Cache&#8217; is Java.</li>
<li>Intersystems claims Cache&#8217; has 	good SQL performance, for most kinds of use-case.</li>
<li>Intersystems Cache&#8217; stores data in a kind of 	sparse hierarchy. It uses a lot of “common character count” 	compression, which sounds a lot to me like <a href="../2008/05/13/mcobject-extremedb-a-soliddb-alternative/">Patricia 	tries</a>.</li>
<li>Intersystems has recently bundled 	some BI/reporting tools into the Cache&#8217; stack. Surely not 	coincidentally, Intersystems once told me that some of its ISVs paid 	more to Crystal Reports than to Intersystems.</li>
<li>Intersystems Cache&#8217; has had Sybase emulation 	for several years, and just added Informix emulation. Most but not 	all stored procedures from those other DBMS run against Cache&#8217; as 	well.</li>
<li>Intersystems Cache&#8217; recently added a bunch of 	manageability, security, etc. features, the details of which 	generally inspired “Oh, you didn&#8217;t have that earlier?” reactions in me.</li>
<li>Intersystems&#8217; just did a revamp of the Cache&#8217; 	object model to make it more Smalltalk-like, in which messages are 	set to parent rather than child classes when appropriate. Thus, when 	you recompile a class, you don&#8217;t also have to recompile all its 	children, and incremental recompilation is now near-instantaneous. 	(Put that one in the “Oh, you didn&#8217;t have that earlier?” 	category too.) Versioning will be better as well.</li>
<li>In the latest release, Cache&#8217; has 	added what Intersystems calls “Java Event Processing.” This 	doesn&#8217;t sound like CEP (Complex Event Processing), and I forgot to 	ask whether it was memory-centric at all. Anyhow, the idea is to 	bang objects into the database really quickly, having them be 	immediately available for SQL query.  “Really quickly” means 	&gt;10,000 objects/core/second, with one test at the European Space 	Agency getting up to 85,000. By way of contrast, Intersystems 	asserts (based on bake-offs) that RDBMS competitors have to insert 	into BLOBs to get competitive performance, with associated loss of 	queryability.</li>
</ul>
<p style="margin-bottom: 0in;">Finally, a few financial highlights:</p>
<ul>
<li>Intersystems did a little over 	$1/4 billion in revenue in 2009.</li>
<li>85% of that was Cache&#8217;.</li>
<li>Revenue growth was slightly 	positive in 2009, and 15% in 2008.</li>
<li>Headcount growth was 25% in 2009 	and is planned to be big again in 2010, after being modest in prior 	years.</li>
</ul>
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/01/15/intersystems-cache-highlights/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Boston Big Data Summit keynote outline</title>
		<link>http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/</link>
		<comments>http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/#comments</comments>
		<pubDate>Mon, 23 Nov 2009 06:25:50 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[DBMS product categories]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Humor]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1227</guid>
		<description><![CDATA[Last month, Bob Zurek asked me to give a talk on “Big Data”, where “big” is anything from a few terabytes on up, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I&#8217;m posting them below.

The top two points [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Last month, Bob Zurek asked me to give a talk on <a href="http://www.dbms2.com/2009/10/09/presentations-upcoming/" >“Big Data”, where “big” is anything from a few terabytes on up</a>, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I&#8217;m posting them below.</p>
<p><span id="more-1227"></span></p>
<p style="margin-bottom: 0in;">The top two points from Q&amp;A probably were:</p>
<ul>
<li><strong>Big Data and the cloud actually 	have relatively little to do with each other,</strong> <a href="http://www.dbms2.com/2009/10/30/aster-data-application-server-ncluster/" >a few exceptions</a> notwithstanding, especially if the data is in a shared-nothing DBMS 	(as opposed to, say, a MapReduce-oriented file cluster). Two 	principal reasons are:
<ul>
<li>Redistributing data from node to 	node is a little slow, undermining some of the elasticity benefits 	of the cloud.</li>
<li><a href="http://www.dbms2.com/2009/05/29/sneakernet-to-the-cloud/" >Getting data into the cloud in the 	first place is a lot slow</a>.</li>
</ul>
</li>
<li><strong>The NoSQL movement is a lot like 	the Ron Paul campaign</strong> &#8212; it consists of people who are dissatisfied 	with the status quo, whose dissatisfaction has a lot to do with 	insufficient liberty and/or excessive expenditure, and who otherwise 	don&#8217;t have a whole lot in common with each other.</li>
</ul>
<p style="margin-bottom: 0in;">Anyhow, here are my notes for the talk, edited in just a couple of places for readability or linkage.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><strong>Quick introduction</strong></p>
<ul>
<li>Big Data vs. cloud</li>
<li>How big is Big Data?</li>
<li>At the low end of that range, 	there&#8217;s little you can&#8217;t do with conventional technology if you 	have:
<ul>
<li>An unlimited budget for hardware</li>
<li>An unlimited budget for software</li>
<li>An unlimited budget for people, 	especially Oracle DBAs</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Big Data in OLTP</strong></p>
<ul>
<li>Hard-core OLTP
<ul>
<li>Focus of DBMS technology for a 	long-time</li>
<li>Big budgets because each 	transaction has significant value</li>
<li>Tough to get users to change 	technologies</li>
</ul>
</li>
<li>Lighter-weight OLTP
<ul>
<li>Classic example = web companies
<ul>
<li>Big ones &#8212;  retail-oriented ones 	(eBay, Amazon) partially excepted &#8212; <a href="http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/" >rolled their own technology 	stacks</a></li>
<li>Reluctant to give money to anybody
<ul>
<li>Open source, etc.</li>
</ul>
</li>
</ul>
</li>
<li>Difficulty finding market
<ul>
<li>Product vs. feature
<ul>
<li>Clustering/HA/DR/whatever</li>
<li>Ditto cloud enablement</li>
</ul>
</li>
<li>True products haven&#8217;t found much 	traction yet</li>
</ul>
</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Analytic Big Data use cases</strong></p>
<ul>
<li>Kinds of data for analytics
<ul>
<li>More of same != big</li>
<li>More detail and/or new kinds
<ul>
<li>Complete data sets</li>
<li>Transactions</li>
<li>Call details</li>
<li>Tick/trade history</li>
<li>Web clickstreams</li>
<li>Network event logs</li>
<li>Other machine-generated data</li>
<li>CAM bottom line
<ul>
<li>Anything human-generated should 	and will be retained in its entirety</li>
<li>Quantities of machine-generated 	data retained should and will grow roughly in line w/ computing cost 	reductions (Moore&#8217;s Law, etc.)</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li>Analytic uses of Big Data
<ul>
<li>Analytics is mainly about three 	things
<ul>
<li>Problem detection</li>
<li>Customer relationship improvement
<ul>
<li>(Those overlap when the customer 	relationship is bad)</li>
</ul>
</li>
<li>Financial statements on steroids</li>
</ul>
</li>
</ul>
<ul>
<li>Main kinds of analytics
<ul>
<li>What BI vendors traditionally sell
<ul>
<li>General reporting and dashboards</li>
<li>Ad-hoc query (now driven from 	those reports and dashboards)</li>
<li>Planning (allegedly integrated 	with BI)</li>
</ul>
</li>
<li>Research
<ul>
<li>Ad hoc relational query (worth 	mentioning twice because it drives so much of the market)</li>
<li>Data mining</li>
<li>Most web search and web mining</li>
</ul>
</li>
<li>Operational/near-real-time</li>
<li>Archiving/compliance</li>
</ul>
</li>
<li>What gets Big?
<ul>
<li>Mainly research and archiving</li>
<li>But when reporting or operational 	get Big, you have really interesting computing problems</li>
</ul>
</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Technology issues and trends</strong></p>
<ul>
<li>Moore&#8217;s Law
<ul>
<li>CPUs &#8212; All about cores, hence 	parallelism is key</li>
<li>RAM</li>
<li>SSDs – hence replace disks</li>
<li>Sensors – hence generate lots 	more data</li>
</ul>
</li>
<li>Kryder&#8217;s Law
<ul>
<li>But <a href="http://www.dbms2.com/2005/11/13/breaking-the-disk-speed-barrier/" >rotational speeds up only 	12.5X since Eisenhower Administration</a></li>
<li>Hence solid-state memory (or RAM) 	will soon take over</li>
</ul>
</li>
<li>In the mean time, I/O bottlenecks 	have had to be beaten
<ul>
<li>Hence sequential scans</li>
<li>Hence <a href="http://www.dbms2.com/2007/03/26/index-light-mpp-data-warehouse-appliances/" >index-light</a> architectures</li>
<li>Hence columnar</li>
</ul>
</li>
<li>DBMS “overhead”
<ul>
<li>Raw license and maintenance fees – 	software increasing fraction of total</li>
<li>OLTP vestiges – locking and all 	that</li>
<li>DBAs
<ul>
<li>People costs = huge fraction of 	total</li>
<li>Index-lightness addresses</li>
<li>So does appliance</li>
</ul>
</li>
<li>Many people don&#8217;t really know how to 	write SQL</li>
</ul>
</li>
<li>Configuration
<ul>
<li>Appliance/tightly-balanced
<ul>
<li>Netezza</li>
<li>Teradata earlier</li>
<li>Greenplum/Sun</li>
<li>Oracle</li>
<li>IBM</li>
<li>Microsoft/Madison</li>
</ul>
</li>
<li>Commodity/do what you want
<ul>
<li>Vertica</li>
<li>Greenplum now</li>
<li>Infobright, Aster and others</li>
<li>MapReduce-oriented file systems</li>
</ul>
</li>
<li><a href="http://www.dbms2.com/2009/10/25/data-warehouse-balanced-hardware-configuration/" >Extreme rigidity is silly</a>
<ul>
<li><a href="http://www.dbms2.com/2009/10/25/teradata-hardware-strategy-and-tactics/" >Teradata, Oracle have both 	signaled moving to more modularity</a></li>
<li>Big driver of that = heterogeneous 	storage
<ul>
<li>Cheap disk</li>
<li>Expensive disk</li>
<li>Solid-state</li>
<li>RAM</li>
</ul>
</li>
</ul>
<ul>
<li>CPU/storage ratio is even more of a 	driver</li>
</ul>
</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Theoretically defensible ways to segment the market</strong></p>
<ul>
<li><a href="http://www.dbms2.com/2009/09/10/analytic-speed-latency/" >Latency requirements</a>
<ul>
<li>High availability and low latency 	go together</li>
</ul>
</li>
<li>Query types
<ul>
<li>Simultaneous users for same</li>
</ul>
</li>
<li>Database size</li>
<li>Budget</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Actual segments right now</strong></p>
<ul>
<li><a href="http://www.dbms2.com/2009/08/24/teradatas-active-enterprise-data-warehouse-story/" >Utter ADW/EDW</a></li>
<li>Data mart
<ul>
<li>Size</li>
<li>Naturally columnar vs. naturally 	row-based</li>
</ul>
</li>
<li>Operational/frontline</li>
<li>Less dramatic/smaller EDW</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Thoughts on the integration of OLTP and data warehousing, especially in Exadata 2</title>
		<link>http://www.dbms2.com/2009/09/29/integration-oltp-data-warehousing-exadata-2/</link>
		<comments>http://www.dbms2.com/2009/09/29/integration-oltp-data-warehousing-exadata-2/#comments</comments>
		<pubDate>Tue, 29 Sep 2009 04:09:12 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=939</guid>
		<description><![CDATA[Oracle is pushing Exadata 2 as being a great system for any of OLTP (OnLine Transaction Processing), data warehousing or, presumably, the integration of same. This claim rests on a few premises, namely:

Exadata is great for data 	warehousing. At this time, that&#8217;s a claim much better supported 	by marketing and theory than by practice.
Exadata 2 [...]]]></description>
			<content:encoded><![CDATA[<p>Oracle is pushing Exadata 2 as being a great system for any of OLTP (OnLine Transaction Processing), data warehousing or, presumably, the integration of same. This claim rests on a few premises, namely:<span id="more-939"></span></p>
<ul>
<li><strong>Exadata is great for data 	warehousing.</strong> At this time, that&#8217;s a claim much better supported 	by <a href="http://intelligent-enterprise.informationweek.com/channels/information_management/showArticle.jhtml?articleID=213000356&amp;pgno=1" onclick="javascript:pageTracker._trackPageview('/intelligent-enterprise.informationweek.com');">marketing</a> and <a href="http://www.dbms2.com/2008/10/17/oracle-notes/" >theory</a> than by <a href="http://www.dbms2.com/2009/09/25/the-hunt-for-oracle-exadata-production-references/" >practice</a>.</li>
<li><strong>Exadata 2 is a suitable annual 	improvement over last year&#8217;s Exadata 1.</strong> That&#8217;s quite plausible.</li>
<li><strong>Oracle is outstanding for OLTP.</strong> That&#8217;s borne out by vast amounts of experience, especially if by 	“outstanding” you mean “Gets the job done really, really well 	at a very high cost in terms of both licenses and labor.”</li>
<li><strong>The Flash memory in Exadata 2 	makes Oracle even better for OLTP.*</strong> That&#8217;s plausible too. 	Worst-case is probably that Flash support doesn&#8217;t really work well 	in this release, but will be cleaned up soon.**</li>
<li><strong>OLTP and data warehousing uses 	for Exadata don&#8217;t interfere with each other. </strong> That one bears 	some discussion.</li>
</ul>
<p style="margin-bottom: 0in;"><em>*Oracle has repeatedly emphasized that <a href="http://www.dbms2.com/2009/09/21/notes-on-the-oracle-database-11g-release-2-white-paper/" >the Flash memory in Exadata 2 is meant to speed up OLTP</a>. By way of contrast, I&#8217;ve only noticed one vague claim that Flash memory helps data warehousing – a reference to a doubling in “user scan rates”, which perhaps was a slip of the marketing pen.</em></p>
<p style="margin-bottom: 0in;"><em>**Oracle probably has been working on Flash memory support for a long time. But it&#8217;s likely that Oracle didn&#8217;t have a strategic commitment to Sun&#8217;s specific technology until April of this year. After all, back in March it looked as if IBM would wind up owning Sun.</em></p>
<p style="margin-bottom: 0in;">The integration-versus-separation argument for OLTP and analytic databases is an old one. In the early 1980s, IBM pushed both the “Information Center” (precursor to the data warehouse) and relational DBMS (portrayed as good for query and maybe for OLTP as well).  In the early 1990s, Ted Codd opined that relational DBMS were good for OLTP but not analytics, instead favoring “OLAP” systems like Arbor Software&#8217;s Essbase (which, ironically, is now owned by Oracle). As the 1990s progressed, a consensus emerged that most large* enterprises should have at least one relational data warehouse separate from the core OLTP DBMS, a view that has persisted to this day. Until the announcement of Exadata 2, Oracle hadn&#8217;t seriously disputed this consensus, although it of course it always has wanted its DBMS software to run your OLTP and analytic databases alike.</p>
<p style="margin-bottom: 0in;"><em>*At a sufficiently small enterprise, one DBMS suffices. If a single commodity server has enough power to do all your processing, without even requiring you to have the expertise to tune very seriously, that&#8217;s probably the right way to go.</em></p>
<p style="margin-bottom: 0in;">Assuming one DBMS has plenty of functionality for OLTP and analytics alike – as Oracle certainly does – the main arguments for separating OLTP and data warehousing revolve around performance. Reasons to split out a separate analytic database include:</p>
<ul>
<li><strong>You might just want to run a 	separate brand of DBMS for your OLTP and data warehousing.</strong> Oracle thinks this is a terrible idea. (I disagree, as do a whole 	lot of analytic DBMS vendors – Teradata, Netezza, Greenplum, 	Sybase, Vertica, Aster Data, Infobright, Kognitio, et al. &#8212; and 	their customers.)</li>
<li><strong>You may want to lay out or 	index your tables differently for OLTP and data warehousing.</strong> Materialized view capabilities as flexible as Oracle&#8217;s should let 	you do that in a single database.</li>
<li><strong>You may want to lay out your 	files differently for OLTP and data warehousing (e.g., in terms of 	block sizes).</strong> Oracle might claim that ASM (Automatic Storage 	Management) and, in particular, the “Stripe and Mirror Everything” 	option obviate that point. I&#8217;m far from convinced.</li>
<li><strong>OLTP and analytic workloads 	step on each other&#8217;s toes, Part 1.</strong> For example, analytic 	queries that call for table scans often don&#8217;t mix well with OLTP 	operations that call for random reads and (especially) writes.* In 	principle, Flash memory could greatly reduce the problem, if the 	OLTP workload talks mainly to Flash, while Flash talks to disk 	mainly via microbatches. But I&#8217;ll be quite surprised if Oracle has 	aced that challenge on the first try. More likely, a longish stretch 	of <a href="../2009/08/21/bottleneck-whack-a-mole/">Bottleneck 	Whack-A-Mole</a> lies ahead.</li>
<li><strong>OLTP and analytic workloads 	step on each other&#8217;s toes, Part 2.</strong> Even more fundamentally: If 	you don&#8217;t have sufficiently good workload management tools, 	combining OLTP and analytic workloads is a ghastly performance idea, 	with OLTP slowing to a crawl while analytic queries rumble to 	completion. However, I&#8217;d think Oracle is in pretty good shape in 	that area.</li>
</ul>
<p style="margin-bottom: 0in;"><em>*If this weren&#8217;t a terribly difficult problem, Oracle, IBM, and/or Teradata – all of which can do a reasonably decent job of <a href="../2009/08/24/teradatas-active-enterprise-data-warehouse-story/">mixing long and short queries in the same workload</a> &#8212; would probably have solved it years ago.</em></p>
<p style="margin-bottom: 0in;"><strong>Bottom line: Some day, Oracle Exadata may be a great system for integrated OLTP and data warehousing – but probably not in the current release.</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/09/29/integration-oltp-data-warehousing-exadata-2/feed/</wfw:commentRss>
		<slash:comments>35</slash:comments>
		</item>
		<item>
		<title>Notes on the Oracle Database 11g Release 2 white paper</title>
		<link>http://www.dbms2.com/2009/09/21/notes-on-the-oracle-database-11g-release-2-white-paper/</link>
		<comments>http://www.dbms2.com/2009/09/21/notes-on-the-oracle-database-11g-release-2-white-paper/#comments</comments>
		<pubDate>Mon, 21 Sep 2009 17:12:33 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Cache]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Oracle TimesTen]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=923</guid>
		<description><![CDATA[
The Oracle Database 11g Release 2 white paper I cited a couple of weeks ago has evidently been edited, given that a phrase I quoted last month is no longer to be found. Anyhow, here are some quotes from and comments on what evidently is the latest version.
The In-Memory Database Cache (IMDB Cache) option of [...]]]></description>
			<content:encoded><![CDATA[<ul></ul>
<p><!-- 		@page { size: 8.5in 11in; margin: 0.79in } 		P { margin-bottom: 0.08in } -->The <a href="http://www.oracle.com/technology/products/database/oracle11g/pdf/oracle-database-11g-release2-overview.pdf" onclick="javascript:pageTracker._trackPageview('/www.oracle.com');">Oracle Database 11g Release 2 white paper</a> I cited <a href="http://www.dbms2.com/2009/09/03/oracle-11g-exadata-hybrid-columnar-compression/" >a couple of weeks ago </a>has evidently been edited, given that a phrase I quoted last month is no longer to be found. Anyhow, here are some quotes from and comments on what evidently is the latest version.<span id="more-923"></span></p>
<blockquote><p>The In-Memory Database Cache (IMDB Cache) option of Oracle Database 11g Release 2, allows data to be cached and processed in the memory of the applications themselves, off-loading the data processing to middle tier resources. Any network latency between the middle tier and the back-end database is removed from the transaction path, with the result that individual transactions can often be executed up to 10 times faster. This is particularly useful where very high rates of transaction processing is required, such as those found under market trading systems, Telco switching systems, and Real Time manufacturing environments. All data in the middle tier is fully protected through local recovery, and asynchronous posting to the back end Oracle Database. With Oracle Database 11g Release 2, the ability to transparently deploy IMDB Cache with existing Oracle applications becomes much easier – with common data types, SQL and PL/SQL support, and native support for the Oracle Call Interface (OCI).</p></blockquote>
<p>At a guess, this sounds like it&#8217;s based on Oracle&#8217;s TimesTen acquisition.</p>
<blockquote><p>Oracle Database 11g Release 2 adds further optimizations, including capabilities to automatically determine the most optimal degree of parallelization for a query, based on available resources. With this comes automated parallel statement queuing, where the database determines that, based on current resource availability, it is more effective to queue a query for later execution once required resources have freed up.</p></blockquote>
<p>Sounds like a kind of automatic workload management &#8212; i.e., the kind of optimization vendors of mature products get around to putting into their systems. It does not sound like query pipelining, however.</p>
<blockquote><p>Oracle Database 11g Release 2 will automatically distribute a large compressed table (or a smaller non-compressed table), into the available memory across all the servers in the Grid, and will then localize parallel query processing to the data in memory on the individual nodes. This dramatically improves query performance, and is especially useful where large tables can be entirely compressed into the available memory using compression capabilities.</p></blockquote>
<p>So Oracle caches compressed data. Not stated is which compression techniques are covered.</p>
<blockquote><p>Each Exadata Storage Server stores up to 7 Terabyte [sic] of uncompressed user data, and also comes enabled with 384 GB of solid-state Flash cache. This Flash Cache automatically caches active data of the magnetic disks in the Oracle Exadata Storage Server, delivering a 10x performance gain for read and write operations under OLTP applications.</p></blockquote>
<p>Sounds like the Flash memory is positioned for OLTP use.</p>
<blockquote><p>In the past, Database Administrators and System Administrators have spent a great deal of time determining to how best place data across these disk arrays, to get maximum performance and availability. The best procedure for data placement is to simply Stripe And Mirror Everything; stripe data blocks equally across all disks in an array, and then mirror the blocks on at least two disks. This approach provides the perfect balance between performance, disk utilization, and ease of use.</p></blockquote>
<p>This is a big part of what could be called the &#8220;Administering Oracle doesn&#8217;t suck nearly as badly as it used to&#8221; pitch. (Mitchell Kertzman, who was Sybase CEO after the mid-1990s meltdown, told me his motto was &#8220;We suck less every day.&#8221; But I digress &#8230;)</p>
<blockquote><p>Automatic Storage Management (ASM), a feature of Oracle Database 11g automates the striping and mirroring of database without the need to purchase third party volume management software. As data volumes increase, additional disks can be added, and ASM will automatically restripe and rebalance the data across available disks to ensure optimal performance. Similarly, disks that report errors can be removed from the disk array, and ASM will re-adjust accordingly.</p></blockquote>
<p>I.e., you can add nodes without taking the system down. That&#8217;s becoming a pretty standard feature for serious parallel DBMS.</p>
<blockquote><p>Oracle Database 11g Release 2 improves ASM in significant areas. New intelligent data placement capabilities store infrequently accessed data on the inner rings of the physical disks, while frequently accessed data is placed on the outer rings, offering better performance optimization.</p></blockquote>
<p>Also pretty standard.</p>
<blockquote><p>Oracle has been enhancing partitioning capabilities for over ten years. Oracle Partitioning, an option of Oracle Database 11g Release 2, allows very large tables (and their associated indexes) to be partitioned into smaller, more manageable units, providing a “divide and conquer” approach to very large database management. Partitioning also improves performance, as the optimizer will prune queries to only use the relevant partitions of a table or index in a lookup. Oracle Database 11g Release 2 provides multiple methods for partitioning data, and also allows different levels of partitioning on the same table, so that a single partitioning strategy can be used to improve both performance and manageability.</p></blockquote>
<p>Even better might be a system that doesn&#8217;t lean heavily on complex partitioning to achieve good performance.</p>
<blockquote><p>Oracle Partitioning can also manage the lifecycle of information. Typically, all databases have active data – the information being processed this month or quarter, and historical data that is primarily read-only. Organizations can take advantage of the inherent lifecycle of data to implement a multi-tiered storage solution and lower their overall storage costs. For example, a large table within an order-entry system could contain all the orders processed in the last 7 years. Oracle Partitioning can be used to set up monthly partitions, with the current last four months of order data partitioned onto a high-end storage array, with all the other partitions placed on a lower-cost storage solution, often 2-3 times less cost than the high end storage environment.</p></blockquote>
<p>This is becoming a standard feature for any parallel DBMS that can support multiple kinds of storage in one system.</p>
<blockquote><p>Oracle Database 11g also provides advanced compression techniques to further reduce storage requirements. Using Oracle Advanced Compression, an option to Oracle Database 11g, all data in a table can be compressed using a continuous table compression capability that achieves a 2-4 times compression ratio with little performance impact on OLTP or Data Warehousing workloads. This compression technology replaces duplicate values in a table with a single value, and continuously adapts to data changes over time, so compression ratios are always maintained.</p></blockquote>
<p>Sounds like dictionary/token compression.</p>
<blockquote><p>With Oracle Database 11g Release 2, the Exadata Storage Servers in the Sun Oracle Database Machine also enable new hybrid columnar compression technology that provides up to a 10 times compression ratio, with corresponding improvements in query performance. And, for pure historical data, a new archival level of hybrid columnar compression can be used that provides up to 50 times compression ratios.</p></blockquote>
<p>I thought they said 40X before. But even if my memory isn&#8217;t playing tricks regarding that, single-point compression ratio estimates are always very approximate.</p>
<blockquote><p>Any hardware component in an Oracle Grid can be dynamically added or removed as required. Disks can be added or removed online with ASM, with the data automatically rebalanced across the new disk infrastructure. Additional servers can also be easily added or removed to a Real Application Cluster with users connected to these nodes rebalanced across the infrastructure. This ability to migrate users from one server to another in a RAC cluster also enables rolling patching of the database software. If a patch needs to be applied, then a server can be removed from the cluster, patched, and then put back into the cluster. The same operation can be repeated for the next server in the cluster, and so on.</p></blockquote>
<p>Nice. And the paper goes on in that vein for quite a while.</p>
<blockquote><p>Oracle Total Recall, an option to Oracle Database 11g, provides a solution for the retention of historical information. With Oracle Total Recall, all changes made to data are kept to provide a complete change history of information. This means that auditors can not only see who did what when, but they can also see what the actual information was at the time – something that previously has only be [sic] available by building into the application, or by expensive backup retention policies.</p></blockquote>
<p>Timestamping/time-travel/whatever is increasingly becoming a standard feature as well, especially given the number of PostgreSQL-based DBMS on the market.</p>
<blockquote><p>New internal control requirements found in regulations can be difficult and expensive to implement in an environment with multiple applications. Oracle Database Vault, an option to Oracle Database 11g, allows access controls to be transparently applied underneath existing applications. Users can be prevented from accessing specific application data, or from accessing the database outside of normal hours; separation-of-duty requirements can be enforced for different Database Administrators without a costly least privilege exercise. And Oracle Advanced Security, an option to Oracle Database 11g, can be used to transparently encrypt data at all levels – data in transit on the network; data at rest on physical storage and in backups. Similarly, the Data Masking pack can be used to obfuscate data as it moves from production to development, reducing the potential violation of privacy regulations or risking sensitive data leaks.</p></blockquote>
<p>Oracle is the gold standard in database security.</p>
<blockquote><p>Oracle’s self-management approach takes two tacks. Firstly, wherever possible, repeatable, labor intensive and error prone tasks that can be fully automated in the database have been. For example, Storage Management, Memory Management, Statistics collection, Backup and Recovery, and SQL Tuning have all been automated. Secondly, where operations cannot be fully automated, intelligent advisors are built into the database to mentor Database Administrators on how to get the best out of their systems. Advisors are provided for Configuration Management, Patching, Indexing, Partitioning, Performance Diagnostics, Data Recovery, and, new in Oracle Database 11g Release 2, Compression and Maximum Availability.</p></blockquote>
<p>And boy are they needed.</p>
<blockquote><p>Recent studies performed by an independent research company shows that Database Administrators can expect to spend 26% less time managing their 11g environments over their 10g environments, and as much as 50% when compared to older Oracle9i deployments.</p></blockquote>
<p>50% of way too much is still way too much.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/09/21/notes-on-the-oracle-database-11g-release-2-white-paper/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>The Boston Globe had an article on VoltDB</title>
		<link>http://www.dbms2.com/2009/08/04/the-boston-globe-had-an-article-on-voltdb/</link>
		<comments>http://www.dbms2.com/2009/08/04/the-boston-globe-had-an-article-on-voltdb/#comments</comments>
		<pubDate>Tue, 04 Aug 2009 09:17:10 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[VoltDB and H-Store]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=856</guid>
		<description><![CDATA[The Boston Globe article has more detail than Vertica and VoltDB have ever OKed me to put out, and some business details they&#8217;ve never given me.
]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://www.boston.com/business/technology/innoeco/2009/08/on_the_radar_voltdb_just_the_l.html" onclick="javascript:pageTracker._trackPageview('/www.boston.com');">Boston Globe article</a> has more detail than Vertica and VoltDB have ever OKed me to put out, and some business details they&#8217;ve never given me.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/08/04/the-boston-globe-had-an-article-on-voltdb/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Groovy Corp puts out a ridiculous press release</title>
		<link>http://www.dbms2.com/2009/07/30/groovy-corp-puts-out-a-ridiculous-press-release/</link>
		<comments>http://www.dbms2.com/2009/07/30/groovy-corp-puts-out-a-ridiculous-press-release/#comments</comments>
		<pubDate>Thu, 30 Jul 2009 18:13:58 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Groovy Corporation]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[OLTP]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=851</guid>
		<description><![CDATA[I knew Groovy Corp&#8217;s press release today would be bad, as it was pitched in advance as being about an awe-inspiring benchmark.  That part met my very low expectations, emphasizing how the Groovy SQL Switch massively outperformed MySQL* in a benchmark, and how this supposedly shows the Groovy SQL Switch would outperform every other competitive [...]]]></description>
			<content:encoded><![CDATA[<p>I knew Groovy Corp&#8217;s press release today would be bad, as it was pitched in advance as being about an awe-inspiring benchmark.  That part met my very low expectations, emphasizing how <a href="http://www.dbms2.com/2009/07/28/the-groovy-sql-switch/" >the Groovy SQL Switch</a> massively outperformed MySQL* in a benchmark, and how this supposedly shows the Groovy SQL Switch would outperform every other competitive RDBMS by at least similar margins.</p>
<p><em>*While a few use cases are exceptions, being &#8220;better than MySQL&#8221; for a DBMS is basically like being &#8220;better than Pabst Blue Ribbon&#8221; for a beer. Unless price is your top consideration, why are you even making the comparison?</em></p>
<p>Even worse, the press release, from its subhead and very first sentence, emphasizes the claim &#8220;the Groovy SQL Switch&#8217;s ability to significantly outperform relational databases.&#8221; As CEO Joe Ward quickly agreed by email, that&#8217;s not accurate.  As you would expect from the &#8220;SQL&#8221; in its name, the Groovy SQL Switch is just as relational as the products it&#8217;s being contrasted to.  Unfortunately for Joe, who I gather aspires to edit it to say something more sensible, <a href="http://www.individual.com/story.php?story=104608487" onclick="javascript:pageTracker._trackPageview('/www.individual.com');">the press release</a> is out already in multiple places.</p>
<p>More favorably, Renee Blodgett has <a href="http://www.weblogtheworld.com/united-kingdom/no-more-refresh-on-the-web-real-time-a-reality-with-groovy-corp/" onclick="javascript:pageTracker._trackPageview('/www.weblogtheworld.com');">a short, laudatory post</a> about Groovy, with some kind of embedded video.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/07/30/groovy-corp-puts-out-a-ridiculous-press-release/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>What are the best choices for scaling Postgres?</title>
		<link>http://www.dbms2.com/2009/07/29/scaling-postgres-choices/</link>
		<comments>http://www.dbms2.com/2009/07/29/scaling-postgres-choices/#comments</comments>
		<pubDate>Wed, 29 Jul 2009 06:16:02 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cache]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[EnterpriseDB and Postgres Plus]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=849</guid>
		<description><![CDATA[I have a client who wants to build a new application with peak update volume of several million transactions per hour.  (Their base business is data mart outsourcing, but now they&#8217;re building update-heavy technology as well. ) They have a small budget.  They&#8217;ve been a MySQL shop in the past, but would prefer to contract [...]]]></description>
			<content:encoded><![CDATA[<p>I have a client who wants to build a new application with peak update volume of several million transactions per hour.  (Their base business is data mart outsourcing, but now they&#8217;re building update-heavy technology as well. ) They have a small budget.  They&#8217;ve been a MySQL shop in the past, but would prefer to contract (not eliminate) their use of MySQL rather than expand it.</p>
<p>My client actually signed a deal for EnterpriseDB&#8217;s Postgres Plus Advanced Server and GridSQL, but unwound the transaction quickly. (They say EnterpriseDB was very gracious about the reversal.) There seem to have been two main reasons for the flip-flop.  First, it seems that EnterpriseDB&#8217;s version of Postgres isn&#8217;t up to PostgreSQL&#8217;s 8.4 feature set yet, although EnterpriseDB&#8217;s timetable for catching up might have tolerable. But GridSQL apparently is further behind yet, with no timetable for up-to-date PostgreSQL compatibility.  That was the dealbreaker.</p>
<p>The current base-case plan is to use generic open source PostgreSQL, with scale-out achieved via hand sharding, Hibernate, or &#8230; ??? Experience and thoughts along those lines would be much appreciated.</p>
<p>Another option for OLTP performance and scale-out is of course memory-centric options such as <a href="http://www.dbms2.com/2009/06/22/h-store-horizontica-voltdb/" >VoltDB</a> or <a href="http://www.dbms2.com/2009/07/28/the-groovy-sql-switch/" >the Groovy SQL Switch</a>.  But this client&#8217;s database is terabyte-scale, so hardware costs could be an issue, as of course could be product maturity.</p>
<p>By the way, a large fraction of these updates will be actual changes, as opposed to new records, in case that matters.  I expect that the schema being updated will be very simple &#8212; i.e., clearly simpler than in a classic order entry scenario.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/07/29/scaling-postgres-choices/feed/</wfw:commentRss>
		<slash:comments>29</slash:comments>
		</item>
	</channel>
</rss>
