<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Memory-centric data management</title>
	<atom:link href="http://www.dbms2.com/category/memory-centric-data-management/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Wed, 08 Feb 2012 12:22:57 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Terminology: Data mustering</title>
		<link>http://www.dbms2.com/2011/11/28/terminology-data-mustering/</link>
		<comments>http://www.dbms2.com/2011/11/28/terminology-data-mustering/#comments</comments>
		<pubDate>Mon, 28 Nov 2011 19:10:11 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5736</guid>
		<description><![CDATA[I find myself in need of a word or phrase that means bring data together from various sources so that it&#8217;s ready to be used, where the use can be analysis or operations. The first words I thought of were &#8220;aggregation&#8221; and &#8220;collection,&#8221; but they both have other meanings in IT. Even &#8220;data marshalling&#8221; has [...]]]></description>
			<content:encoded><![CDATA[<p>I find myself in need of a word or phrase that means <strong>bring data together from various sources so that it&#8217;s ready to be used,</strong> where the use can be analysis or operations. The first words I thought of were &#8220;aggregation&#8221; and &#8220;collection,&#8221; but they both have other meanings in IT. Even &#8220;data marshalling&#8221; has a specific meaning different from what I want. So instead, I&#8217;ll go with <strong>data mustering.</strong></p>
<p>I mean for the term &#8220;data mustering&#8221; to encompass at least three scenarios:</p>
<ul>
<li>Integrated (relational) data warehouse.</li>
<li>Big bit bucket.</li>
<li>Big bit stream.</li>
</ul>
<p>Let me explain what I mean by each.  <span id="more-5736"></span></p>
<p><strong>&#8220;Integrated data warehouse&#8221;</strong> is a phrase Teradata has started using for enterprise data warehouses that, <a href="../../../../../2010/04/12/enterprise-data-warehouse-edw-myt/">like approximately every other EDW in the entire history of data warehousing</a>, aren&#8217;t truly enterprise-wide. In other words, it means &#8220;not just a data mart&#8221;. <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">No category name is perfect</a>, but I think that one works reasonably well.</p>
<p>I previously described the <strong><a href="../../../../../2011/06/04/dirty-data-stored-dirt-cheap/">big bit bucket</a></strong> use case as</p>
<blockquote><p>Users take a whole lot of data, often <a href="../../../../../2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a> in logs of different kinds, and dump it into one place, managed by Hadoop, at open-source pricing.</p></blockquote>
<p>and quickly added</p>
<blockquote><p>Of course, there are various outfits who’d like to sell you not-so-cheap bit buckets. Contending technologies include <a href="../../../../../2011/06/02/why-you-would-want-an-appliance-and-when-you-wouldnt/">Hadoop appliances</a> (which I don’t believe in), <a href="../../../../../2009/10/18/technical-introduction-to-splunk/">Splunk</a> (which in many use cases I do), and <a href="../../../../../2010/11/29/marklogic-and-its-document-dbms/">MarkLogic</a> (ditto, but often the cases are different from Splunk’s). Cloudera and IBM, among other vendors, would also like to sell you some proprietary software to go with your standard Apache Hadoop code.</p></blockquote>
<p>I think I&#8217;ll stand pat on that explanation. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>By analogy, a <strong>big bit stream </strong>is various streams of data, assembled in the custody of a streaming engine. Sybase told me Wednesday that this scenario appears in both of the traditional markets for CEP/streaming &#8212; national intelligence, where it is a major use of streaming, and capital markets in some use cases as well. And it&#8217;s consistent with what I&#8217;ve heard from other CEP/streaming vendors as well.</p>
<p>As for where I got the word &#8220;mustering&#8221; &#8212; it&#8217;s a military term, for when you assemble your troops and their gear either for inspection or for actual use. The main modern usage I know of the word is as part of the phrase &#8220;pass muster&#8221;, which originally referred to the concept that the person being paid to put a regiment together should from time to time demonstrate that the regiment physically existed in the form that regimental records seemed to show.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/28/terminology-data-mustering/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Some big-vendor execution questions, and why they matter</title>
		<link>http://www.dbms2.com/2011/11/21/big-vendor-execution-analytics/</link>
		<comments>http://www.dbms2.com/2011/11/21/big-vendor-execution-analytics/#comments</comments>
		<pubDate>Mon, 21 Nov 2011 11:01:20 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cognos]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5704</guid>
		<description><![CDATA[When I drafted a list of key analytics-sector issues in honor of look-ahead season, the first item was &#8220;execution of various big vendors&#8217; ambitious initiatives&#8221;.  By &#8220;execute&#8221; I mean mainly: &#8220;Deliver products that really meet customers&#8217; desires and needs.&#8221; &#8220;Successfully convince them that you&#8217;re doing so &#8230;&#8221; &#8220;&#8230; at an attractive overall cost.&#8221; Vendors mentioned [...]]]></description>
			<content:encoded><![CDATA[<p>When I drafted a list of key analytics-sector issues in honor of <a href="http://www.dbms2.com/2011/11/21/analytic-trends-in-2012-qa/">look-ahead season</a>, the first item was &#8220;execution of various big vendors&#8217; ambitious initiatives&#8221;.  By &#8220;execute&#8221; I mean mainly:</p>
<ul>
<li>&#8220;Deliver products that really meet customers&#8217; desires and needs.&#8221;</li>
<li> &#8220;Successfully convince them that you&#8217;re doing so &#8230;&#8221;</li>
<li>&#8220;&#8230; at an attractive overall cost.&#8221;</li>
</ul>
<p>Vendors mentioned here are Oracle, SAP, HP, and IBM. Anybody smaller got left out due to the length of this post. Among the bigger omissions were:</p>
<ul>
<li>salesforce.com (multiple subjects).</li>
<li><a href="../../../../../2011/04/21/sas-hpa-does-make-sense-after-all/">SAS HPA</a>.</li>
<li><a href="../../../../../2011/08/21/hadoop-evolution/">The evolution of Hadoop</a>.</li>
</ul>
<p><span id="more-5704"></span><strong>A (lingering) issue for SAP and Oracle alike</strong></p>
<p>As I noted in January of this year, <a href="../../../../../2011/01/03/the-six-useful-things-you-can-do-with-analytic-technology/">integration of business intelligence into operational apps is making very slow progress</a>. Even so, it&#8217;s a huge part of the apparent strategy at SAP and Oracle alike, as well it should be. Much of the benefit from automating routine desk work has already happened. The areas ripest for exploitation are the ones where analytics are part of the equation.</p>
<p>Given the lack of tangible progress, why do I think this is a genuine area of Oracle and SAP emphasis? Three reasons of many are:</p>
<ul>
<li>Why else did SAP buy Business Objects?</li>
<li>If they&#8217;re not trying to <a href="../../../../../2011/03/30/short-request-and-analytic-processing/">integrate operational apps and analytics</a>, why else does SAP&#8217;s emphasis on HANA make sense?</li>
<li>Without business intelligence in the picture, how does Oracle&#8217;s integrated-stack story promise any direct user benefits?*</li>
</ul>
<p><em>*As opposed to IT concerns &#8212; integration, administration, TCO (Total Cost of Ownership), etc.</em></p>
<p>After so many years of disappointment, I&#8217;m not going to forecast 2012 as a pivotal year for <strong>the integration of business intelligence into operational applications.</strong> But if one of SAP or Oracle ever does get a significant BI/operational app integration advantage over the other, it could be a major competitive advantage in those application market segments that are still up for grabs. It also is an opportunity for both vendors to gain BI market share in their respective application customer bases.</p>
<p><strong>A more urgent issue for SAP</strong></p>
<p>SAP has put huge amounts of credibility on the line for HANA, the integration of two different and not particularly mature in-memory database technologies. So far, it is difficult to find evidence that HANA is robust enough for widespread adoption. Whether or not SAP can fix that is a huge open question, which could have significant impact on the course of several technology areas: applications, business intelligence, in-memory DBMS, and maybe even hardware.</p>
<p>Based on current information, which is admittedly partial, I&#8217;m a short-term pessimist on HANA. Longer-term, I&#8217;m on record as saying that <a href="../../../../../2011/05/23/databases-ram/">traditional databases will eventually wind up in RAM</a>. SAP will surely get that technology right some day, whether or not the way it does so has anything to do with present-day HANA code.</p>
<p><strong>Four more issues for Oracle </strong></p>
<p>Oracle&#8217;s ambitions are near-endless, and so also therefore is its list of execution challenges. Four in the analytics area that I find particularly interesting are:</p>
<ul>
<li><strong>True hybrid columnar DBMS.</strong> <a href="../../../../../2011/09/22/teradata-columnar-compression/">I was guessing that Oracle, like Teradata, would announce true hybrid columnar the week of Oracle OpenWorld</a>. I was wrong. But if Oracle can&#8217;t bring out true hybrid columnar DBMS functionality relatively soon, Exadata will lose credibility as a competitor to more specialized analytic DBMS.</li>
<li><strong>Oracle Exalytics.</strong> With Exalytics in the mix, Oracle&#8217;s technology stack has HANA-like potential. But will Exalytics even ship in 2012? (I think so.) Will it be good for much in the first release? (I&#8217;m skeptical.)</li>
<li><strong>Oracle&#8217;s Big Data Appliance</strong>. I&#8217;m skeptical both about <a href="../../../../../2011/10/20/more-notes-on-oracle-nosql/">Oracle&#8217;s NoSQL product</a> &#8212; <a href="http://www.infoworld.com/d/data-explosion/first-look-oracle-nosql-database-179107">a favorable InfoWorld review</a> notwithstanding &#8212; and <a href="../../../../../2011/09/23/hadoop-appliances/">Hadoop appliances</a>. But if I&#8217;m wrong, and Oracle can successfully embrace/extend the new non-relational paradigms, then it really might regain control over the evolution of data management.</li>
<li><strong><a href="../../../../../2011/10/18/oracle-is-buying-endeca/">Oracle&#8217;s Endeca acquisition</a></strong> &#8212; will Oracle prove me wrong and integrate Endeca effectively into its overall analytic product line? If it does, we might finally see effective text (and eventually speech) navigation of enterprise software. (But as with all Oracle issues cited here, this is something that probably won&#8217;t amount to much in 2012 even if it does later go well.)</li>
</ul>
<p><strong>Three issues for IBM</strong></p>
<p>Like Oracle, IBM is a huge company with many ambitions and hence many execution challenges. The biggest of those is surely: <strong>How effective can IBM be at selling outside its existing customer base?</strong> I don&#8217;t hear as much competitively about IBM DataStage, IBM SPSS or now IBM Netezza as I did when their vendors were independent companies. Even Cognos may not be much of an exception to the rule, although it has its own large customer base outside of IBM&#8217;s traditional one. (To lesser extents , the same is of course true of Netezza and numerous other IBM acquisitions.)</p>
<p>Another general issue for IBM is <strong>substantively integrating its various product lines,</strong> at least to the extent that makes sense. DB2/Netezza integration sounds good, but even that is a matter more of product marketing (the admirable part of that discipline) more than of actual technology. Other integrations (e.g. Cognos/DB2 in various bundles) have tended toward the dubious side.*</p>
<p><em>*I&#8217;m still waiting for IBM to get back to me with examples of how Cognos/DB2 joint tuning amounts to anything. It&#8217;s been more than a year, so I&#8217;m glad I didn&#8217;t hold my breath.</em></p>
<p>In a somewhat narrower vein, I wonder: <strong><a href="../../../../../2011/11/10/cep-streaming-catchup/">Will IBM be able to gain traction for InfoSphere Streams</a>? </strong>And if so, when and where will the traction be?</p>
<p><strong>Will HP screw up Vertica?</strong></p>
<p>Vertica has a very attractive product offering. It&#8217;s perhaps <a href="../../../../../2011/06/20/columnar-dbms-vendor-customer-metrics/">the most scalable analytic DBMS outside of Teradata</a>, running on the hardware of your reasonable choice.  It&#8217;s also the one I recommend most often to clients in the 1-50 terabyte range.</p>
<p>So far HP doesn&#8217;t seem to have done much to leadfoot Vertica. (About all I&#8217;ve heard from competitors is that Vertica seems to have faded somewhat in the financial services market, and there could be multiple explanations if that is indeed true.) But if HP Vertica does somehow manage to botch things, opportunities will open up for a range of columnar analytic DBMS competitors.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/21/big-vendor-execution-analytics/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>StreamBase LiveView &#8212; push-based real-time BI</title>
		<link>http://www.dbms2.com/2011/11/10/streambase-liveview-push-based-real-time-bi/</link>
		<comments>http://www.dbms2.com/2011/11/10/streambase-liveview-push-based-real-time-bi/#comments</comments>
		<pubDate>Fri, 11 Nov 2011 03:38:53 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[StreamBase]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5631</guid>
		<description><![CDATA[My clients at StreamBase are coming out with a new product line called LiveView, and I agreed they could launch it via this blog. Key points about StreamBase LiveView Version 1.0 include: LiveView is a business intelligence and alerting suite built on/in the rest of StreamBase&#8217;s technology, meant to operate on streaming data. LiveView is [...]]]></description>
			<content:encoded><![CDATA[<p>My clients at StreamBase are coming out with a new product line called LiveView, and I agreed they could launch it via this blog. Key points about StreamBase LiveView Version 1.0 include:</p>
<ul>
<li>LiveView is a business intelligence and alerting suite built on/in <a href="http://www.dbms2.com/2011/11/10/streambase-catchup/">the rest of </a><a href="http://www.dbms2.com/2011/11/10/streambase-catchup/">StreamBase&#8217;s technology</a>, meant to operate on streaming data.</li>
<li>LiveView is positioned by StreamBase as having a true push event-driven architecture rather than pull/poll.</li>
<li>StreamBase LiveView is designed to query in-memory data and then have the results change in real time as the data set changes.</li>
<li>The LiveView user interface is a rapidly changing work in progress.</li>
<li>LiveView has other Version 1 limitations as well</li>
<li>LiveView is targeted squarely at StreamBase&#8217;s financial trading core market until some of the Version 1 limitations are lifted.</li>
</ul>
<p>The basic StreamBase LiveView pipeline goes something like:   <span id="more-5631"></span></p>
<ul>
<li>Data comes into the system via multiple streams.</li>
<li>Transformations upon data arrival can include but are not limited to:
<ul>
<li>Aggregations.</li>
<li>Joins to reference data.</li>
<li>Joins to other streams.</li>
</ul>
</li>
<li>The streams (transformed or perhaps otherwise) are output to tables &#8230;</li>
<li>&#8230; which are continuously updated as more data streams through.</li>
<li> The data in the resulting table can be consumed:
<ul>
<li>Via LiveView-provided BI capabilities.</li>
<li>Via an API.</li>
</ul>
</li>
</ul>
<p>When wearing my vendor consultant hat, I warmly encourage StreamBase to emphasize the lack of a batch step anywhere in this process. As an analyst, however, I&#8217;m more restrained about a claim like &#8220;We uniquely free you from batch.&#8221; I agree that avoiding batch jobs is a Very Nice Thing. But you also are spared most batch-cycle processing if you stream updates from your short-request database to an analytic DBMS, e.g. via some kind of near-real-time replication.</p>
<p>That said, the push-versus-pull continuous filtering part of the StreamBase LiveView story seems pretty real. I think having sub-second display updates is cool in all sorts of BI use cases, and seriously useful in some number of them. While I don&#8217;t have a clear opinion as to whether the StreamBase approach offers huge performance advantages for that kind of latency over &#8220;pull&#8221; alternatives, my guess is in the direction of &#8220;yes&#8221;.</p>
<p>Version 1 limitations on StreamBase LiveView include:</p>
<ul>
<li>You consume data one table at a time, with no possibility of a join after the data has originally been put into a LiveView table.</li>
<li>While LiveView in principle offers rich alerting potential, you get at it via an API rather than much in the way of alerting-specific tools.</li>
<li>The first LiveView UI StreamBase put together looks a lot like 1980s stock quote machines. The next one it added looks a lot like Panopticon. Much cool-looking enhancement remains to be done.</li>
</ul>
<p><em>One competitive (non)-note: This all sounds something like what TIBCO has been pushing for years, but in fact I don&#8217;t have much knowledge of TIBCO&#8217;s efforts in the area. I had a meeting set up to learn about it some time ago, but it got canceled because TIBCO&#8217;s PR people:</em></p>
<ul>
<li><em>Didn&#8217;t want to let any kind of meeting happen without them, even though a serious CTO-type representative seemed happy to talk, but also &#8230;</em></li>
<li><em>&#8230; didn&#8217;t want to work at dinner time.</em></li>
</ul>
<p><em>I haven&#8217;t had substantive contact with TIBCO since.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/10/streambase-liveview-push-based-real-time-bi/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>StreamBase catchup</title>
		<link>http://www.dbms2.com/2011/11/10/streambase-catchup/</link>
		<comments>http://www.dbms2.com/2011/11/10/streambase-catchup/#comments</comments>
		<pubDate>Fri, 11 Nov 2011 03:31:44 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[StreamBase]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5630</guid>
		<description><![CDATA[While I was cryptic in my general CEP/streaming catchup, I&#8217;ll say a bit more regarding StreamBase in particular. At the highest level, non-technically: StreamBase once planned to conquer the world. However, StreamBase really only sold effectively in the financial trading and intelligence markets. StreamBase retrenched, focusing almost exclusively on the financial trading market. With StreamBase [...]]]></description>
			<content:encoded><![CDATA[<p>While I was cryptic in my general <a href="http://www.dbms2.com/2011/11/10/cep-streaming-catchup/">CEP/streaming catchup</a>, I&#8217;ll say a bit more regarding StreamBase in particular. At the highest level, non-technically:</p>
<ul>
<li>StreamBase once planned to conquer the world.</li>
<li>However, StreamBase really only sold effectively in the financial trading and intelligence markets.</li>
<li>StreamBase retrenched, focusing almost exclusively on the financial trading market.</li>
<li>With <a href="http://www.dbms2.com/2011/11/10/streambase-liveview-push-based-real-time-bi/">StreamBase LiveView</a>, StreamBase is expanding from embedded <a href="../../../../../2011/11/08/terminology-operational-analytics/">operational analytics</a> to do (also operational) business intelligence as well.</li>
<li>StreamBase is hopeful that, perhaps starting with Version 2 or so, LiveView will be successful outside the financial trading market.</li>
</ul>
<p><span id="more-5630"></span><em>Not coincidental to these shifts in focus, StreamBase was our client, then stopped being one for a while, and now is a client again.</em></p>
<p>StreamBase (the product set) consists primarily of three things (LiveView aside):</p>
<ul>
<li>A development environment, whose output is in &#8230;</li>
<li>&#8230; a visual programming language called EventFlow &#8230;</li>
<li>&#8230; which is complied and executed by StreamBase&#8217;s execution layers.</li>
</ul>
<p>One important set of ancillary products are StreamBase&#8217;s connectors to various data sources &#8212; StreamBase offers about 125 of its own, a number that approaches 200 when <a href="../../../../../2010/02/16/quick-thoughts-on-the-streambase-component-exchange/">community contributions</a> are included.</p>
<p>StreamBase has a second programming language called StreamSQL, but that&#8217;s rarely used except for embedding in or connecting to third-party software. EventFlow and StreamSQL compile to nearly identical byte code. (The main difference seems to be that as a practical matter you&#8217;ll name things a bit differently in the two languages, focusing on verbs in EventFlow and nouns in StreamSQL.)</p>
<p>StreamBase says that in the financial trading market, great performance out of the box equates to better time-to-value, since you are spared time you&#8217;d otherwise have to spend tuning the system. Implicit in that is a claim &#8212; which competitors might dispute <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  &#8212; that StreamBase has great <a href="../../../../../2009/05/21/notes-on-cep-performance/">performance</a>. StreamBase fondly thinks that having a domain-specific language gives it a leg up in achieving great compiler optimization. (The same would presumably apply to StreamBase&#8217;s competitors, but only if they have optimizing compilers themselves.)</p>
<p>One point that&#8217;s a little unusual for me these days is that StreamBase favors big SMP (Symmetric MultiProcessing) boxes over blade-based scale-out. 16+ cores and 256 gigabytes of RAM are not uncommon. Clusters commonly include 4-8 machines, but rarely more; the largest StreamBase cluster evidently contains 36 machines.</p>
<p>And with that I&#8217;ll turn to StreamBase&#8217;s newest offering, <a href="http://www.dbms2.com/2011/11/10/streambase-liveview-push-based-real-time-bi/">LiveView</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/10/streambase-catchup/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Very brief CEP/streaming catchup</title>
		<link>http://www.dbms2.com/2011/11/10/cep-streaming-catchup/</link>
		<comments>http://www.dbms2.com/2011/11/10/cep-streaming-catchup/#comments</comments>
		<pubDate>Fri, 11 Nov 2011 03:29:37 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[StreamBase]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Truviso]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5632</guid>
		<description><![CDATA[When I agreed to launch the StreamBase LiveView product via DBMS 2, I planned to catch up on the whole CEP/streaming area first. Due to the power and internet outages last week, that didn&#8217;t entirely happen. So I&#8217;ll do a bit of that now, albeit more cryptically than I hoped and intended. The upshot of [...]]]></description>
			<content:encoded><![CDATA[<p>When I agreed to launch the StreamBase LiveView product via <em>DBMS 2,</em> I planned to catch up on the whole CEP/streaming area first. Due to the power and internet outages last week, that didn&#8217;t entirely happen. So I&#8217;ll do a bit of that now, albeit more cryptically than I hoped and intended.</p>
<ul>
<li>The upshot of my <a href="../../../../../2011/08/25/renaming-cep-or-not/">what to call CEP thread</a> in August was that &#8220;streaming&#8221; and &#8220;event processing&#8221; are not the same concept, but it so happens that they have the most traction where they intersect. That said, I both observe and endorse an apparent shift from &#8220;event&#8221; to &#8220;stream&#8221; as the core of the terminology, in <a href="../../../../../2008/03/19/what-to-call-cep/">a reversal of my opinion of several years ago</a>.</li>
<li>IBM continues to throw a lot of resources at its <a href="../../../../../2009/05/13/ibm-system-s-infosphere-streams-processing/">System S/ InfoSphere Streams</a> product, but I haven&#8217;t heard yet of much marketplace success. That said, I believe IBM is still pretty serious about Streams, as one would expect from an effort whose code name so cheekily references <a href="http://www.softwarememories.com/2008/10/02/a-bit-of-db2-history-per-ibm/">System R</a>. In particular, Streams shows up prominently on IBM&#8217;s top-level analytic architecture slide.</li>
<li>Sybase recently released its ESP (Event Stream Processor) 5.0, which it says is the full merger of the Aleri and Coral8 predecessors. You can still get Sybase ESP without buying into the full <a href="../../../../../2010/02/05/sybase-aleri-rap/">Sybase RAP</a> stack, and Sybase has no plans to change that.</li>
<li>Sybase has discontinued all <a href="../../../../../2009/03/25/aleri-update/">the business intelligence types of products Aleri and Coral8 were developing</a>. Rather, Sybase is OEMing Panopticon, which it reports has been well received. Other than the discontinuation of the BI efforts, there seem to be few Aleri or Coral8 features missing from the merged Sybase ESP product.</li>
<li>Truviso continues to be <a href="../../../../../2010/05/04/truviso-evidently-reinvents-itself/">out of the picture</a>.</li>
<li>I have more to say about <a href="http://www.dbms2.com/2011/11/10/streambase-catchup/">StreamBase</a> separately.</li>
<li>I have more to say about Sybase and IBM, which I&#8217;ll get to when I can.</li>
<li>I have nothing new on Progress Apama. I also know little about any of the open source efforts.</li>
</ul>
<p>Meanwhile, if you want to see technically nitty-gritty posts about the CEP/streaming area, you may want to look at <a href="../../../../../category/memory-centric-data-management/event-stream-processing/page/4/">my CEP/streaming coverage circa 2007-9</a>, based on conversations with (among others) <a href="../../../../../2007/06/18/mike-stonebraker-on-financial-stream-processing/">Mike Stonebraker</a>, <a href="../../../../../2007/08/03/a-deeper-dive-into-apama/">John Bates</a>, and <a href="../../../../../2007/08/10/the-essence-of-cep-according-to-coral8/">Mark Tsimelzon</a>.</p>
<p><strong> </strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/10/cep-streaming-catchup/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The database architecture of salesforce.com, force.com, and database.com</title>
		<link>http://www.dbms2.com/2011/09/15/database-architecture-salesforce-com-force-com-and-database/</link>
		<comments>http://www.dbms2.com/2011/09/15/database-architecture-salesforce-com-force-com-and-database/#comments</comments>
		<pubDate>Thu, 15 Sep 2011 16:09:32 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Object]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[salesforce.com]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5237</guid>
		<description><![CDATA[salesforce.com, force.com, and database.com use exactly the same database infrastructure and architecture. That&#8217;s the good news. The bad news is that salesforce.com is somewhat obscure about technical details, for reasons such as: A long-ago marketing decision to not give infrastructure details, so as to convey a &#8220;Don&#8217;t worry; we&#8217;ll take care of everything&#8221; message. Even [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.dbms2.com/2011/09/15/salesforce-force-database-data-heroku/">salesforce.com, force.com, and database.com use exactly the same database infrastructure and architecture</a>. That&#8217;s the good news. The bad news is that salesforce.com is somewhat obscure about technical details, for reasons such as:</p>
<ul>
<li>A long-ago marketing decision to not give infrastructure details, so as to convey a &#8220;Don&#8217;t worry; we&#8217;ll take care of everything&#8221; message.</li>
<li>Even so, a long-ago and perhaps now-regretted marketing decision to disclose and even exaggerate salesforce.com&#8217;s reliance on Oracle, as part of an early-days attempt to prove salesforce was using enterprise-class technology.</li>
<li>A desire to hide the recipe for salesforce.com&#8217;s secret sauce.</li>
<li>Force of habit &#8212; I&#8217;m not sure salesforce even knows how to tell its technical story with any clarity.</li>
</ul>
<p>Actually, salesforce.com has moved some kinds of data out of Oracle that previously used to be stored there. Besides Oracle, salesforce uses at least a file system and a RAM-based data store about which I have no details. Even so, much of salesforce.com&#8217;s data is stored in Oracle &#8212; a single instance of Oracle, which it believes may be the largest instance of Oracle in the world.</p>
<p><span id="more-5237"></span>Salesforce did spell out some of its database story in <a href="http://www.salesforce.com/au/assets/pdf/Force.com_Multitenancy_WP_101508.pdf">a 2008 force.com white paper</a>,<em> </em>which is good stuff, but potentially misleading in one important way. The paper tells of a level of abstraction, whereby what the application sees as logical &#8220;columns&#8221; are stored in a very different schema than one might assume. However, it doesn&#8217;t spell out a second level of abstraction, whereby that logical schema also isn&#8217;t how the database is actually laid out.</p>
<p><em>Another flaw in the paper is that it spins &#8220;We had to do this, to support multitenancy, so we did.&#8221; issues as &#8220;Because we&#8217;re multitenant, we can do this, while single-tenant systems can&#8217;t.&#8221; One example is the query optimization step around &#8220;user visibility&#8221; in Figure 11. Welcome to marketing.</em></p>
<p>At the first level of abstraction, data seems to be kept mainly in a single wide table, with hundreds of columns. What&#8217;s more, many of those are &#8220;flex columns&#8221;; a flex column can hold data of many different kinds and even datatypes. Notwithstanding the second level of abstraction, I imagine the idea of stuffing different kinds of thing into the same column has something to do with the fact that <a href="../../../../../2011/03/13/so-how-many-columns-can-a-single-table-have-anyway/">Oracle&#8217;s physical limit on columns</a> falls far short of the number of logical columns salesforce wants to use.</p>
<p>If we imagine that the different kinds of data in a flex column were each in their own column instead, the whole thing might sound like BigTable/Cassandra/HBase-style column-group NoSQL. Thus, much as <a href="../../../../../2010/08/22/workday-technology-stack/">Workday uses MySQL to simulate a key-value store</a>, salesforce.com can be said to use Oracle to simulate a different kind of NoSQL. In both cases, what&#8217;s going on seems to be a kind of object/relational mapping, but with the relational aspect strongly deemphasized. Or, if you take a more relational view, we could say that salesforce.com&#8217;s tables are a lot wider than any one user organization&#8217;s, because each user sees only its own custom columns (plus the standard ones common to all users).</p>
<p>The second layer of abstraction has a lot to do with multitenancy. If you want to stick data for many different user organizations into the same huge table, then you have to label it in some way to show who is permitted to see or update each part. Logically, this leads to a join, between one table carrying data plus a simple key showing which users/roles are entitled to see it, and a second table showing who actually is that kind of user/has that kind of role. But that join makes a lot of sense to store in a denormalized way, all the more because data is partitioned across the computer cluster in line with which user organization it actually belongs to.</p>
<p><em>Multitenant security isn&#8217;t the only reason for this denormalization, but it appears to be the biggest one.</em></p>
<p>The whole thing is doing 550 million or so transactions per day. salesforce.com thinks that fact should be regarded as evidence that it works. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/15/database-architecture-salesforce-com-force-com-and-database/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Renaming CEP &#8230; or not</title>
		<link>http://www.dbms2.com/2011/08/25/renaming-cep-or-not/</link>
		<comments>http://www.dbms2.com/2011/08/25/renaming-cep-or-not/#comments</comments>
		<pubDate>Fri, 26 Aug 2011 02:58:22 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[StreamBase]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5127</guid>
		<description><![CDATA[One of the less popular category names I deal with is &#8220;Complex Event Processing (CEP)&#8221;. The word &#8220;complex&#8221; looks weird, and many are unsure about the &#8220;event processing&#8221; part as well. CEP does have one virtue as a name, however &#8212; it&#8217;s concise. The other main alternative is to base the name on &#8220;stream processing&#8221; [...]]]></description>
			<content:encoded><![CDATA[<p>One of the less popular category names I deal with is &#8220;Complex Event Processing (CEP)&#8221;. The word &#8220;complex&#8221; looks weird, and many are unsure about the &#8220;event processing&#8221; part as well. CEP does have one virtue as a name, however &#8212; it&#8217;s concise.</p>
<p>The other main alternative is to base the name on &#8220;stream processing&#8221; instead.* The CEP-or-whatever industry is split between these choices, with <a href="http://www.streambase.com/about-home.htm">StreamBase</a> currently favoring &#8220;CEP&#8221; (despite its company name), <a href="../../../../../2009/05/13/ibm-system-s-infosphere-streams-processing/">IBM emphatically favoring &#8220;stream&#8221;</a>, and Sybase seemingly trying to have things both ways.</p>
<p><em>*And then, of course, there is &#8220;event stream processing&#8221;, regarding which please see below.</em></p>
<p><span id="more-5127"></span>I&#8217;ve been juggling this terminological divide myself, referring to <a href="../../../../../2007/08/12/applications-for-not-so-low-latency-cep/">complex event/stream processing</a> as long as four years ago. But enough is enough. I&#8217;d like to write more about the category without repeatedly apologizing for its name. And so, always bearing in mind <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">Monash&#8217;s Third Law of Commercial Semantics</a>, here&#8217;s where I&#8217;m coming down.</p>
<p>The more I think about it, the less I like the term &#8220;event processing&#8221;. Here&#8217;s why. Events happen; data is produced; CEP systems most commonly try to identify and categorize the events based on the data. The CEP systems may then do significant further processing, but more often they just pass the information on to another system (most commonly either persistent DBMS or &#8220;real-time&#8221; business intelligence). How much of that is really &#8220;event processing&#8221;? Relatively little, I&#8217;d say. And referring specifically to &#8220;complex&#8221; events doesn&#8217;t address my complaints at all.</p>
<p>So I&#8217;d like to go with some version of &#8220;stream&#8221;. But &#8220;<a href="http://en.wikipedia.org/wiki/Stream_processing">stream processing</a>&#8221; has other computer-related uses, while &#8220;Stream management&#8221; commonly describes care and planning for small waterways. So &#8220;stream&#8221; might do best with a modifier, such as &#8220;event&#8221; or &#8220;data&#8221;. Of the two, I prefer &#8220;data stream&#8221; (or &#8220;datastream&#8221;) to &#8220;event stream&#8221;; the events aren&#8217;t really streaming, but the data is.</p>
<p>So should it be &#8220;data stream processing&#8221; or &#8220;data stream management&#8221;? Well, the only one of numerous Wikipedia definitions I&#8217;ve actually liked while researching this post is the one for &#8220;<a href="http://en.wikipedia.org/wiki/Data_Stream_Management_System">Data Stream Management System</a>&#8220;:</p>
<blockquote><p>A <strong>Data Stream Management System</strong> (<strong>DSMS</strong>) is a set of computer programs that controls the maintenance and querying of data in data streams. The use of a DSMS to manage a data stream is roughly analogous to the use of a Database Management System (DBMS) to manage a conventional database.</p>
<p>A key feature of a DSMS is the ability to execute a <em>continuous query</em> against a data stream. A conventional database query executes once and returns a set of results for a given point in time. In contrast, a continuous query continues to execute over time, as new data enters the stream. The results of the continuous query are updated as new data appears.</p></blockquote>
<p>I think the data stream/database management analogy is spot on. Your queries work a little differently, but otherwise you&#8217;re doing pretty much the same things. Indeed, you&#8217;re probably even going to persistently store some of the data, and ideally that DBMS capability would be tightly integrated into your CEP system. (In practice they&#8217;re apt to be more loosely coupled; for most purposes that works well enough.) Query execution, data ingestion, performance monitoring/tuning, workload prioritization &#8212; it&#8217;s very DBMS-like stuff. And by the way, &#8220;data stream management system&#8221; is the term that was used by the researchers &#8212; Mike Stonebreaker, Stan Zdonik, Dan Abadi, et al. &#8212; who wrote a paper describing <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.67.8671&amp;rep=rep1&amp;type=pdf">the project on which StreamBase was based</a> &#8230; although some might question whether that particular observation is a strong signal of accuracy. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>This reasoning suggests <strong>Data Stream Management System</strong> is what it should be. The usual kinds of abbreviation &#8212; datastream (product), datastream manager, DSMS, etc. would no doubt follow. So should it be &#8220;Data Stream&#8221;, &#8220;Datastream&#8221;, or &#8220;Data-stream&#8221;? At that level of detail, I don&#8217;t yet have an opinion.</p>
<p>The only thing is &#8212; that&#8217;s all pretty wordy compared to <strong>CEP. </strong>So after all this, I&#8217;m still not sure which term(s) I prefer.</p>
<p>What are your thoughts?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/08/25/renaming-cep-or-not/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
		</item>
		<item>
		<title>Couchbase technical update</title>
		<link>http://www.dbms2.com/2011/08/13/couchbase-technical-update/</link>
		<comments>http://www.dbms2.com/2011/08/13/couchbase-technical-update/#comments</comments>
		<pubDate>Sun, 14 Aug 2011 04:08:03 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cache]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[memcached]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5081</guid>
		<description><![CDATA[My Couchbase business update with Bob Wiederhold was very interesting, but it didn&#8217;t answer much about the actual Couchbase product. For that, I talked with Dustin Sallings. We jumped around a lot, and some important parts of the Couchbase product haven&#8217;t had their designs locked down yet anyway. But here&#8217;s at least a partial explanation [...]]]></description>
			<content:encoded><![CDATA[<p>My <a href="http://www.dbms2.com/2011/08/13/couchbase-business-update/">Couchbase business update</a> with Bob Wiederhold was very interesting, but it didn&#8217;t answer much about the actual Couchbase product. For that, I talked with Dustin Sallings. We jumped around a lot, and some important parts of the Couchbase product haven&#8217;t had their designs locked down yet anyway. But here&#8217;s at least a partial explanation of what&#8217;s up.</p>
<p>memcached is a way to cache data in RAM across a cluster of servers and have it all look logically like a single memory pool, extremely popular among large internet companies. The Membase product &#8212; which is what Couchbase has been selling this year &#8212; adds persistence to memcached, an obvious improvement on requiring application developers to write both to memcached and to <a href="http://www.dbms2.com/2011/02/24/transparent-sharding/">non-transparently-sharded MySQL</a>. The main technical points in adding persistence seem to have been:</p>
<ul>
<li>A <strong>persistent backing store</strong> (duh), namely SQLite.</li>
<li>A <strong>change to the hashing algorithm,</strong> to avoid losing data when the cluster configuration is changed.</li>
</ul>
<p>Couchbase is essentially Membase improved by integrating CouchDB into it, with the main changes being:</p>
<ul>
<li><strong>Changing the backing store to CouchDB</strong> (duh). This will be in the first Couchbase release.</li>
<li><strong>Adding cross data center replication on CouchDB&#8217;s consistency model.</strong> This will not, I believe, be in the first Couchbase release.</li>
<li><strong>Offering CouchDB&#8217;s programming and query interfaces as an option.</strong> So far as I can tell, this will be implemented straightforwardly in the first Couchbase release, with elegance planned for later down the road.</li>
</ul>
<p>Let&#8217;s drill down a bit into <strong>Membase/Couchbase clustering and consistency. </strong><span id="more-5081"></span></p>
<ul>
<li>When data is written to RAM in memcached, it immediately gets copied to another server. The same is of course true in Membase/Couchbase. The terminology on all this is confusing, but I think:
<ul>
<li>The portion of data that is stored as a primary copy on any given server is called a &#8220;shard&#8221;.</li>
<li>That would seem to make sense, as that data could correspond to what goes &#8212; <a href="http://www.dbms2.com/2011/02/24/transparent-sharding/">non-transparently</a> &#8212; into an instance of MySQL in a classical memcached/MySQL set-up.</li>
</ul>
</li>
<li>Updates are of course also banged to disk ASAP &#8212; but at times of heavy load, that can take a while. A few seconds to a couple of minutes is normal operation; if it takes an hour, you really should buy more hardware. (Or solid-state storage.)</li>
<li>Similarly, the replication of data to a second machine&#8217;s RAM may not happen at times of heavy load &#8212; and that&#8217;s another sign you don&#8217;t have enough machines.</li>
<li>Each Membase/Couchbase &#8220;shard&#8221; has lots of logical sub-shards.* (1024 for now, at least as default, although Dustin finds that number excessive and is looking to lower it.)  So if you add a node, some of the sub-shards get sent over to the new node. Unlike the case for straight memcached, no data is lost from cache (and of course not also from the persistent store). Blocking of operations from such a move only happens in narrow time windows, and then only in edge cases.</li>
</ul>
<p><em>*Edit: They&#8217;re called <a href="http://dustin.github.com/2010/06/29/memcached-vbuckets.html">vbuckets</a>.</em></p>
<p><em></em>So if we consider Membase technology alone, Couchbase is CA in the CAP Theorem.  CouchDB, however, is gloriously AP in the CAP Theorem, in that it was written to assume an occasionally connected topology.* Based on that, Couchbase will allow AP operation between data centers (i.e. &#8220;stay synchronized if you can, to within the limitations of physics and so on, but don&#8217;t beat yourself up on the rare occasions that you can&#8217;t.&#8221;) I don&#8217;t know that that capability will quite be in the first release of Couchbase, but it&#8217;s coming soon.</p>
<p><em>*CouchDB also has other features friendly to occasionally-connected use cases, such as a lot of flexibility as to which parts of the database are or aren&#8217;t synced when you do reconnect. These are at the heart of the Couchbase Mobile offering.</em></p>
<p>memcached and Membase have a very simple key-value interface. CouchDB adds secondary indexes and so on. I think in the first release of Couchbase this is pretty much like having two different APIs for the same product; more elegant integration is planned down the road, and more language support as well.</p>
<p>The highest-performing way to use Couchbase will probably always be to just pretend it is Membase, which is to say memcached+. Dustin told me of Membase users who demanded 10-40 millisecond response times, and that not even for single queries but rather for sequences of several queries in succession. He further told me of customers asking for 1-200 microsecond response, and insisting on no worse than 1 millisecond. Frankly, the first requirement could be met by lots of technologies I can think of, at least if  you don&#8217;t rely on disk; the second is thoroughly impossible if you rely on disk, and pretty demanding no matter what kind of hardware and storage you have.</p>
<p>Couchbase performance against disk is a work in progress. CouchDB started out 8X slower than SQLite as a backing store, apples to apples, but Couchbase is fixing that before they roll the product out. (After all, they wouldn&#8217;t want to slow the product down in the course of an upgrade.) Beyond that, when you do exploit the indexing capability of CouchDB, performance of course slows down. Work is underway to lower the performance hit; I imagine much improvement can indeed be made, given how few resources CouchDB has been able to devote to date to <a href="http://www.dbms2.com/2009/08/21/bottleneck-whack-a-mole/">Bottleneck Whack-A-Mole</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/08/13/couchbase-technical-update/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>McObject and eXtremeDB</title>
		<link>http://www.dbms2.com/2011/07/22/mcobject-extremedb/</link>
		<comments>http://www.dbms2.com/2011/07/22/mcobject-extremedb/#comments</comments>
		<pubDate>Fri, 22 Jul 2011 12:32:16 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[McObject]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Object]]></category>
		<category><![CDATA[Objectivity and Infinite Graph]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[solidDB]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5004</guid>
		<description><![CDATA[I talked with McObject yesterday. McObject has two product lines, both of which are something like in-memory DBMS &#8212; eXtremeDB, which is the main one, and Perst. McObject has been around since at least 2003, probably has no venture capital, and probably has a very low double-digit number of employees.* *I could be wrong in [...]]]></description>
			<content:encoded><![CDATA[<p>I talked with McObject yesterday. McObject has two product lines, both of which are something like in-memory DBMS &#8212; eXtremeDB, which is the main one, and <a href="../../../../../2008/06/08/perst/">Perst</a>. McObject has been around since at least 2003, probably has no venture capital, and probably has a very low double-digit number of employees.*</p>
<p><em>*I could be wrong in those guesses; as small companies go, McObject is unusually prone to secrecy games.</em></p>
<p>As best I understand:</p>
<ul>
<li>eXtremeDB is something like an in-memory <a href="../../../../../2011/05/21/object-oriented-database-management-systems-oodbms/">object-oriented DBMS</a>, designed to be embeddable.</li>
<li>However, much as with Objectivity and other old-school OODBMS, eXtremeDB winds up being more of a toolkit with which to build DBMS than a full DBMS.</li>
<li>eXtremeDB has a few indexing schemes. The main one is good old B-trees. One customer wanted Patricia tries, so they&#8217;re in there. (Perhaps not coincidentally, solidDB relies on Patricia tries.) At least one wanted R-trees, so they&#8217;re in there too.</li>
<li>eXtremeDB has long had the option of persistent logs.</li>
<li>eXtremeDB newly has a hybrid memory-centric option, in which you can have more data in the database than fits into RAM.</li>
<li>eXtremeDB newly has multi-master two-phase-commit clustering.</li>
</ul>
<p>My guess three years ago that <a href="../../../../../2008/05/13/mcobject-extremedb-a-soliddb-alternative/">eXtremeDB might emerge as an alternative to solidDB</a> seems to have been borne out. McObject CEO Steve Graves says that the core of McObject&#8217;s business is OEMs, in sectors such as telecom equipment and defense/aerospace. That&#8217;s exactly solidDB&#8217;s traditional market, except that <a href="../../../../../2007/12/21/ibm-acquires-soliddb/">solidDB got acquired by IBM and deemphasized it</a>.</p>
<p>I&#8217;ve said before that if I were starting a SaaS effort &#8212; and it wasn&#8217;t just focused on analytics &#8212; <a href="../../../../../2011/05/21/object-oriented-database-management-systems-oodbms/">I&#8217;d look at using a memory-centric OODBMS</a>. Perhaps eXtremeDB is worth looking at in such scenarios.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/22/mcobject-extremedb/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Soundbites: the Facebook/MySQL/NoSQL/VoltDB/Stonebraker flap, continued</title>
		<link>http://www.dbms2.com/2011/07/15/facebook-mysql-nosql-voltdb-stonebraker/</link>
		<comments>http://www.dbms2.com/2011/07/15/facebook-mysql-nosql-voltdb-stonebraker/#comments</comments>
		<pubDate>Fri, 15 Jul 2011 08:27:18 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Akiban]]></category>
		<category><![CDATA[Cache]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Clustrix]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[MongoDB and 10gen]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[ScaleBase]]></category>
		<category><![CDATA[ScaleDB]]></category>
		<category><![CDATA[Schooner Information Technology]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Tokutek]]></category>
		<category><![CDATA[VoltDB and H-Store]]></category>
		<category><![CDATA[dbShards and CodeFutures]]></category>
		<category><![CDATA[memcached]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4977</guid>
		<description><![CDATA[As a follow-up to the latest Stonebraker kerfuffle, Derrick Harris asked me a bunch of smart followup questions. My responses and afterthoughts include: Facebook et al. are in effect Software as a Service (SaaS) vendors, not enterprise technology users. In particular: They have the technical chops to rewrite their code as  needed. Unlike packaged software [...]]]></description>
			<content:encoded><![CDATA[<p>As a follow-up to the latest <a href="http://www.dbms2.com/2011/07/14/an-odd-claim-attributed-to-mike-stonebraker/">Stonebraker kerfuffle</a>, Derrick Harris asked me a bunch of smart followup questions. My responses and afterthoughts include:</p>
<ul>
<li>Facebook et al. are in effect Software as a Service (SaaS) vendors, not enterprise technology users. In particular:
<ul>
<li>They have the technical chops to rewrite their code as  needed.</li>
<li>Unlike packaged software vendors, they&#8217;re not answerable to anybody for keeping legacy code alive after a rewrite. That makes migration a lot easier.</li>
<li>If they want to write different parts of their system on different technical underpinnings, nobody can stop them. For example &#8230;</li>
<li>&#8230;  <a href="http://www.dbms2.com/2008/07/21/project-cassandra-facebook-open-sourced-quasi-dbms/">Facebook innovated Cassandra</a>, and is now heavily committed to HBase.</li>
</ul>
</li>
<li>It makes little sense to talk of Facebook&#8217;s use of &#8220;MySQL.&#8221; Better to talk of Facebook&#8217;s use of &#8220;MySQL +  memcached  + non-transparent sharding.&#8221; That said:
<ul>
<li>It&#8217;s hard to see why somebody today would use MySQL +  memcached  + non-transparent sharding for a new project. At least one of <a href="http://www.dbms2.com/2011/02/08/couchbase-membase-couchone-couchdb/">Couchbase</a> or <a href="http://www.dbms2.com/2011/02/24/transparent-sharding/">transparently-sharded</a> MySQL is very likely a superior alternative. Other alternatives might be better yet.</li>
<li>As noted above in the example of Facebook, the many major web businesses that are using MySQL +  memcached  + non-transparent sharding for existing projects can be presumed able to migrate away from that stack as the need arises.</li>
</ul>
</li>
</ul>
<p>Continuing with that discussion of DBMS alternatives:</p>
<ul>
<li>If you just want to write to the memcached API anyway, why not go with Couchbase?</li>
<li>If you want to go relational, why not go with MySQL? There are many alternatives for scaling or accelerating MySQL &#8212; dbShards, Schooner, Akiban, Tokutek, ScaleBase, ScaleDB, Clustrix, and Xeround come to mind quickly, so there&#8217;s a great chance that one or more will fit your use case. (And if you don&#8217;t get the choice of MySQL flavor right the first time, porting to another one shouldn&#8217;t be all THAT awful.)</li>
<li>If you really, really want to go in-memory, and don&#8217;t mind writing Java stored procedures, and don&#8217;t need to do the kinds of joins it isn&#8217;t good at, but do need to do the kinds of joins it is, VoltDB could indeed be a good alternative.</li>
</ul>
<p>And while we&#8217;re at it &#8212; going <strong>schema-free</strong> often makes a whole lot of sense. I need to write much more about the point, but for now let&#8217;s just say that I look favorably on the Big Four schema-free/NoSQL options of MongoDB, Couchbase, HBase, and Cassandra.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/15/facebook-mysql-nosql-voltdb-stonebraker/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
	</channel>
</rss>

