<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Market share and customer counts</title>
	<atom:link href="http://www.dbms2.com/category/market-share/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Tue, 07 Feb 2012 06:49:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Sumo Logic and UIs for text-oriented data</title>
		<link>http://www.dbms2.com/2012/02/06/sumo-logic-and-uis-for-text-oriented-data/</link>
		<comments>http://www.dbms2.com/2012/02/06/sumo-logic-and-uis-for-text-oriented-data/#comments</comments>
		<pubDate>Mon, 06 Feb 2012 13:27:06 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Text]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5897</guid>
		<description><![CDATA[I talked with the Sumo Logic folks for an hour Thursday. Highlights included: Sumo Logic does SaaS (Software as a Service) log management. Sumo Logic is text indexing/Lucene-based. Thus, it is reasonable to think of Sumo Logic as &#8220;Splunk-like&#8221;. (However, Sumo Logic seems to have a stricter security/trouble-shooting orientation than Splunk, which is trying to [...]]]></description>
			<content:encoded><![CDATA[<p>I talked with the Sumo Logic folks for an hour Thursday. Highlights included:</p>
<ul>
<li>Sumo Logic does SaaS (Software as a Service) log management.</li>
<li>Sumo Logic is text indexing/Lucene-based. Thus, it is reasonable to think of Sumo Logic as &#8220;Splunk-like&#8221;. (However, Sumo Logic seems to have a stricter security/trouble-shooting orientation than Splunk, which is trying to <a href="../../../../../2012/01/10/splunk-update/">branch out</a>.)</li>
<li>Sumo Logic has hacked Lucene for faster indexing, and says 10-30 second latencies are typical.</li>
<li>Sumo Logic&#8217;s main differentiation is <strong>automated classification of events. </strong></li>
<li>There&#8217;s some kind of streaming engine in the mix, to update counters and drive alerts.</li>
<li>Sumo Logic has around 30 &#8220;customers,&#8221; free (mainly) or paying (around 5) as the case may be.</li>
<li>A truly typical Sumo Logic customer has single to low double digits of gigabytes of log data per day. However, Sumo Logic seems highly confident in its ability to handle a terabyte per customer per day, give or take a factor of 2.</li>
<li>When I asked about the implications of shipping that much data to a remote data center, Sumo Logic observed that log data compresses really well.</li>
<li>Sumo Logic recently raised a bunch of venture capital.</li>
<li>Sumo Logic&#8217;s founders are out of ArcSight, a log management company HP paid a bunch of money for.</li>
<li>Sumo Logic coined a marketing term &#8220;LogReduce&#8221;, but it has nothing to do with &#8220;MapReduce&#8221;. Sumo Logic seems to find this amusing.</li>
</ul>
<p>What interests me about Sumo Logic is that automated classification story. I thought I heard Sumo Logic say:<span id="more-5897"></span></p>
<ul>
<li>It&#8217;s largely unsupervised machine learning.</li>
<li>It&#8217;s specific to a particular user/data set.</li>
<li>It can be up and running and classifying things effectively almost instantly (i.e., on seconds&#8217; or minutes&#8217; worth of data).</li>
<li>It&#8217;s informed by what different users tag as false positives. (Or maybe that is planned for future versions.)</li>
</ul>
<p><em>I have a little trouble seeing how all those points fit exactly together, so perhaps I got some details wrong.</em></p>
<p>The payoff is that <strong>machine learning directly informs the Sumo Logic user interface</strong>. In particular, large numbers of events are bundled into a small number of categories, hopefully making it much easier for network operations types to scan the UI and pick out what&#8217;s important.</p>
<p>In general, the idea of machine-learning informing analytic UIs via some sort of classification is common in text-oriented technologies, notably in:</p>
<ul>
<li>Good ol&#8217; text search.</li>
<li>Text mining vendors&#8217; approaches to clustering hits on words or phrases that say substantially the same thing.</li>
</ul>
<p>But otherwise it seems kind of rare, if we stipulate that ad-serving/general internet personalization isn&#8217;t really an analytic UI &#8212; but I&#8217;d love to hear of any interesting examples I&#8217;ve overlooked.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/06/sumo-logic-and-uis-for-text-oriented-data/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Couchbase update</title>
		<link>http://www.dbms2.com/2012/02/01/couchbase-update/</link>
		<comments>http://www.dbms2.com/2012/02/01/couchbase-update/#comments</comments>
		<pubDate>Thu, 02 Feb 2012 04:00:24 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Basho and Riak]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[CouchDB]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[DataStax]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[MongoDB and 10gen]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Web analytics]]></category>
		<category><![CDATA[Zynga]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5877</guid>
		<description><![CDATA[I checked in with James Phillips for a Couchbase update, and I understand better what&#8217;s going on. In particular: Give or take minor tweaks, what I wrote in my August, 2010 Couchbase updates still applies. Couchbase now and for the foreseeable future has one product line, called Couchbase. Couchbase 2.0, the first version of Couchbase [...]]]></description>
			<content:encoded><![CDATA[<p>I checked in with James Phillips for a Couchbase update, and I understand better what&#8217;s going on. In particular:</p>
<ul>
<li>Give or take minor tweaks, what I wrote in my <a href="../../../../../2011/08/13/couchbase-business-update/">August, 2010 Couchbase updates</a> still applies.</li>
<li>Couchbase now and for the foreseeable future has one product line, called Couchbase.</li>
<li>Couchbase 2.0, the first version of Couchbase (the product) to use CouchDB for persistence, has slipped &#8230;</li>
<li>&#8230; because more parts of CouchDB had to be rewritten for performance than Couchbase (the company) had hoped.</li>
<li>Think mid-year or so for the release of Couchbase 2.0, hopefully sooner.</li>
<li>In connection with the need to rewrite parts of CouchDB, Couchbase has:
<ul>
<li><a href="../../../../../2012/01/18/notes-from-the-couch-blogs/">Gotten out of the single-server CouchDB business</a>.</li>
<li>Donated its proprietary single-sever CouchDB intellectual property to the Apache Foundation.</li>
</ul>
</li>
<li>The 150ish new customers in 2011 Couchbase brags about are real, subscription customers.</li>
<li>Couchbase has 60ish people, headed to &gt;100 over the next few months.</li>
</ul>
<p><span id="more-5877"></span><em>If you previously heard the brand names Couchbase Single or Couchbase Mobile, pay no further attention to them. Couchbase Single was CouchDB; Couchbase Mobile is part of Couchbase&#8217;s feature set.</em></p>
<p>The current product is Couchbase 1.8, which is a whole lot like what previously was called Membase. New features in Couchbase 1.8 (versus prior versions of Membase) were concentrated in client libraries/SDK (Software Development Kit). Not coincidentally, Couchbase has hired developer evangelists who are in charge of making Couchbase play nicely with various specific languages (e.g. C/C++)</p>
<p>Drilling down further into the CouchDB part of the story:</p>
<ul>
<li>Couchbase 2.0 will replace Couchbase 1.8/Membase&#8217;s SQLite back-end with CouchDB.</li>
<li>Parts of CouchDB that do things like read, write, or compact data have been rewritten from Erlang to C.</li>
<li>Couchbase still uses other Erlang parts of Apache CouchDB, and would be delighted if the community were to usefully enhance them.</li>
<li>Couchbase&#8217;s heavy contributions to development of open source CouchDB will, for the most part, continue.</li>
<li>CouchDB stuff donated to the Apache Foundation includes:
<ul>
<li>Documentation</li>
<li>Packaging</li>
<li>Performance enhancements</li>
</ul>
</li>
</ul>
<p>There&#8217;s at least one Couchbase user with &gt;1000 nodes (at a guess, <a href="../../../../../2011/09/05/zynga-linkedin-data-warehous/">Zynga</a>).  More typical might be 20 nodes or less. This led me to wonder how much data one puts on a Couchbase node anyway. The answer turns out to vary widely, in that you want your working set to be in RAM, and whether that&#8217;s your entire database or just a slice of it depends on the nature of the application.</p>
<p>James echoed a trend I&#8217;ve heard elsewhere as well, in which products one things of as being internet-specific are also sold in a few cases to conventional enterprises for &#8212; you guessed it! &#8212; their internet operations. I also asked him about competition, and he asserted:</p>
<ul>
<li>MongoDB is the big competition. He believes Couchbase has an excellent win rate vs. 10gen for actual paying accounts.</li>
<li>DataStax/Cassandra wins over Couchbase only when multi-data-center capability is important. Naturally, multi-data-center capability is planned for Couchbase. (Indeed, that&#8217;s one of the benefits of swapping in CouchDB at the back end.)</li>
<li>Redis has &#8220;dropped off the radar&#8221;, presumably because there&#8217;s no particular persistence strategy for it.</li>
<li>Riak doesn&#8217;t show up much.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/01/couchbase-update/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Notes from the Couch blogs</title>
		<link>http://www.dbms2.com/2012/01/18/notes-from-the-couch-blogs/</link>
		<comments>http://www.dbms2.com/2012/01/18/notes-from-the-couch-blogs/#comments</comments>
		<pubDate>Wed, 18 Jan 2012 07:57:09 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[CouchDB]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Open source]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5839</guid>
		<description><![CDATA[Couchbase in general, and CouchDB project founder Damien Katz in particular, are to some extent walking away from CouchDB. That is: The Couchbase product will not be upward compatible with CouchDB. Couchbase will no longer offer a CouchDB distribution, and is doing the natural and responsible thing, namely &#8230; &#8230; donating to the Apache Foundation [...]]]></description>
			<content:encoded><![CDATA[<p>Couchbase in general, and CouchDB project founder Damien Katz in particular, are to some extent walking away from CouchDB. That is:</p>
<ul>
<li>The Couchbase product will not be upward compatible with CouchDB.</li>
<li>Couchbase will no longer offer a CouchDB distribution, and is doing the natural and responsible thing, namely &#8230;</li>
<li>&#8230; donating to the Apache Foundation the previously proprietary aspects of that distribution.</li>
</ul>
<p>Even so:</p>
<ul>
<li>All &#8212; or at least &#8220;all&#8221; &#8212; the code Couchbase offers will, at least for now, be open source.</li>
</ul>
<p>The story unfolded in <a href="http://damienkatz.net/2012/01/the_future_of_couchdb.html">a bombshell post by Damien</a>, and clarification follow-ups by <a href="http://damienkatz.net/2012/01/why_couchbase.html">Damien</a> and by <a href="http://blog.couchbase.com/couchbase-commitment-to-open-source-and-couchdb">Couchbase CEO Bob Wiederhold</a>. The meatiest of the three was probably Damien&#8217;s follow-up, in which he said, among other things:<br />
<span id="more-5839"></span></p>
<blockquote><p>&#8230; maybe I should explain why I think Couchbase is the future?</p>
<p>Simple Fast Elastic.</p>
<p>That&#8217;s pretty much it. &#8230;</p>
<p>The Membase product was very fast and scalable, but a bit too simple,  with no reporting capability or cross-datacenter replication  capability.</p>
<p>The CouchDB product has a lot of features, but is too slow, unable to  keep up with high loads and inability scale-out on it&#8217;s own. &#8230;</p>
<p>Our 2.0 product is coming soon, adding CouchDB style views and  reporting with a nifty trick for extremely fast failover while  maintaining full coherency with the underling distributed data storage  (we are calling it our B-Superstar index). We&#8217;ll of course have lighting  fast reads (same as Memcached) but also very fast durable writes. For  2kb docs, we are currently getting sustained random insert/updates rates  of 25k writes/sec, fully durable, with compaction in background so it  can go all day and all night. We&#8217;ve got some more write work coming soon  which we are hoping will give us another performance boost too before  2.0. Stay tuned &#8230;</p>
<p>And so while we focus on the features and customers that most quickly  make us a viable business (and it&#8217;s growing fast), we are still looking  to build the features and technology to expand our use cases and, get  customers and developers excited. Future versions are planned to have  full CouchDB compatible replication technology, with the ability to  support all sorts of mobile and embedded databases, such as our new  TouchDB projects for iOS and Android.</p></blockquote>
<p>Meanwhile, in <a href="http://blog.couchbase.com/couchbase-2011-year-review">a separate blog post</a>, Bob said that in 2011 Couchbase</p>
<blockquote><p>&#8230; added thousands of open source deployments, as well as more than 150  paying customers who have put thousands of nodes into production  throughout the year.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/01/18/notes-from-the-couch-blogs/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Big data terminology and positioning</title>
		<link>http://www.dbms2.com/2012/01/08/big-data-terminology-and-positioning/</link>
		<comments>http://www.dbms2.com/2012/01/08/big-data-terminology-and-positioning/#comments</comments>
		<pubDate>Mon, 09 Jan 2012 01:35:57 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MarkLogic]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Splunk]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5768</guid>
		<description><![CDATA[Recently, I observed that Big Data terminology is seriously broken. It is reasonable to reduce the subject to two quasi-dimensions: Bigness &#8212; Volume, Velocity, size Structure &#8212; Variety, Variability, Complexity given that High-velocity &#8220;big data&#8221; problems are usually high-volume as well.* Variety, variability, and complexity all relate to the simply-structured/poly-structured distinction. But the conflation should [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, I observed that <a href="../../../../../2011/09/11/big-data-has-jumped-the-shark/">Big Data terminology is seriously broken</a>. It is reasonable to reduce the subject to two quasi-dimensions:</p>
<ul>
<li><strong>Bigness</strong> &#8212; Volume, Velocity, size</li>
<li><strong>Structure</strong> &#8212; Variety, Variability, Complexity</li>
</ul>
<p>given that</p>
<ul>
<li>High-velocity &#8220;big data&#8221; problems are usually high-volume as well.*</li>
<li>Variety, variability, and complexity all relate to the <a href="../../../../../2011/05/17/poly-structured-database/">simply-structured/poly-structured</a> distinction.</li>
</ul>
<p>But the conflation should stop there.</p>
<p><em>*Low-volume/high-velocity problems are commonly referred to as <a href="../2011/08/25/renaming-cep-or-not/">&#8220;event processing&#8221; and/or &#8220;streaming&#8221;</a>.</em></p>
<p>When people claim that bigness and structure are the same issue, they oversimplify into mush. So I think we need four pieces of terminology, reflective of a 2&#215;2 matrix of possibilities. For want of better alternatives, my suggestions are:</p>
<ul>
<li><strong>Relational big data</strong> is data of high volume that fits well into a relational DBMS.</li>
<li><strong>Multi-structured big data</strong> is data of high volume that doesn&#8217;t fit well into a relational DBMS. <em>Alternative: Poly-structured big data.</em></li>
<li><strong>Conventional relational data</strong> is data of not-so-high volume that fits well into a relational DBMS. <em>Alternatives: Ordinary/normal/smaller relational data.</em></li>
<li><strong>Smaller poly-structured data</strong> is data for which <a href="../../../../../2011/07/31/dynamic-fixed-schema-databases/">dynamic schema</a> capabilities are important, but which doesn&#8217;t rise to &#8220;big data&#8221; volume.</li>
</ul>
<p><span id="more-5768"></span>Notes on all this include:</p>
<ul>
<li>&#8220;Relational big data&#8221; is commonly what you need a scalable analytic relational DBMS for. But there are non-analytic use cases as well.</li>
<li>The paradigmatic example of &#8220;multi-structured big data&#8221; is log files. Thus, multi-structured big data is commonly what you need a <a href="../../../../../2011/06/04/dirty-data-stored-dirt-cheap/">big bit bucket</a> for.</li>
<li>One might want to equate non-analytic relational big data technology to &#8220;NewSQL&#8221;. However, I&#8217;m struggling to think of a database size range in which the entire NewSQL industry can match Oracle&#8217;s market share alone.</li>
<li>One might want to equate non-analytic multi-structured big data technology to &#8220;NoSQL&#8221;. However:
<ul>
<li>&#8220;NoSQL&#8221; is also used to encompass not-so-big-data use cases, such as prototyping in MongoDB.</li>
<li><a href="../../../../../2011/10/02/defining-nosql/">&#8220;NoSQL&#8221; has non-ACID/low(er)-data-integrity connotations</a> that aren&#8217;t appropriate for all non-relational systems.</li>
</ul>
</li>
<li>Up to a point, you can analyze relational big data in a conventional relational DBMS, but an analytic RDBMS will usually win on TCO (Total Cost of Ownership). In particular, reasonable thresholds for moving an analytic database off Oracle might be:
<ul>
<li>1-2 terabytes if you&#8217;ve never bought anything past Oracle Standard Edition.</li>
<li>5-10 terabytes if you&#8217;re already paying for Oracle Enterprise Edition.</li>
<li>A lot higher than that if you actually find Oracle Exadata to be cost-effective.</li>
</ul>
</li>
<li>Depending on how big one acknowledges as &#8220;big&#8221;, the market share leader in &#8220;big bit bucket&#8221; use cases is either Splunk or Hadoop.</li>
<li>If we look at multi-structured big data management overall, MarkLogic joins the list of market share contenders, as do various NoSQL alternatives.</li>
<li>It is wrong to say that the large web companies invented &#8220;big data&#8221; technology. But it is more reasonable to say they invented much of &#8220;multi-structured big data&#8221; management. In particular (and this is just a partial list), Google, Amazon, Yahoo, Facebook, et al. can reasonably be credited with Hadoop, Cassandra, HBase and various predecessors to same.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/01/08/big-data-terminology-and-positioning/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Clarifying SAND&#8217;s customer metrics, positioning and technical story</title>
		<link>http://www.dbms2.com/2011/11/12/clarifying-sands-customer-metrics-positioning-and-technical-story/</link>
		<comments>http://www.dbms2.com/2011/11/12/clarifying-sands-customer-metrics-positioning-and-technical-story/#comments</comments>
		<pubDate>Sun, 13 Nov 2011 02:45:36 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5669</guid>
		<description><![CDATA[Talking with my clients at SAND can be confusing. That said: I need to revise my figures for SAND&#8217;s customer count way downward. SAND finally has a reasonably clear positioning. SAND&#8217;s product actually seems to have a lot of features. A few months ago, I wrote: SAND Technology reported &#62;600 total customers, including &#62;100 direct. [...]]]></description>
			<content:encoded><![CDATA[<p>Talking with my clients at SAND can be confusing. That said:</p>
<ul>
<li>I need to revise my figures for SAND&#8217;s customer count way downward.</li>
<li>SAND finally has a reasonably clear positioning.</li>
<li>SAND&#8217;s product actually seems to have a lot of features.</li>
</ul>
<p>A few months ago, I wrote:</p>
<blockquote><p>SAND Technology reported &gt;600 total customers, including &gt;100 direct.</p></blockquote>
<p>Upon talking with the company, I need to revise that figure downward, from &gt; 600 to 15.</p>
<p><span id="more-5669"></span><em>One embarrassing point: SAND is a client, and I view it as part of my job to save clients from that kind of inadvertent misstatement.</em></p>
<p>It turns out that SAND has a very impressive customer &#8212; Dunnhumby, a data mart outsourcer with 200 terabytes of data in SAND, 30 or so incoming data streams, 400 or so nodes &#8230; and 600 or so end customers, all of which SAND was counting as OEM end customers for its DBMS. But I, other industry observers, and other vendors generally don&#8217;t count that way.</p>
<p>Besides Dunnhumby, SAND has 14 other customers on maintenance, with &lt; 1 terabyte of data each. Until recently, SAND had a couple dozen more customers than that, but it <a href="http://www.sand.com/sand-technology-announces-sale-sap-ilm-product-line/">sold its SAP-oriented archiving/near-line storage product line to Informatica</a>.</p>
<p>I still don&#8217;t know where the &#8220;&gt; 100 direct&#8221; part came from.</p>
<p>After the sale of its other product line, SAND is squarely in the market for analytic DBMS. SAND&#8217;s sales efforts seem to be focused on <a href="http://www.dbms2.com/2011/03/03/investigative-analytics/">investigative analytics</a>, although some of its existing users seem to be more focused on <a href="http://www.dbms2.com/2011/11/08/terminology-operational-analytics/">operational analytics</a>. Most specifically, SAND is trying to focus on &#8220;people data&#8221; &#8212; customer loyalty, health care, etc . &#8212; rather than purely <a href="http://www.dbms2.com/2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a>, with the paradigmatic target application being personalized marketing.</p>
<p>SAND technical highlights include:</p>
<ul>
<li>SAND sells a columnar analytic DBMS.</li>
<li>The SAND DBMS operates on bitmaps, with heavy use of run-length encoding on the bitmaps. Bitmaps are used for everything except BLOBs (Binary Large OBjects).</li>
<li>Actual data compression also comes into play, e.g. as result sets are being assembled. This is based on a true global dictionary &#8212; multiple columns are tokenized together.</li>
<li>Indeed, SAND can decompose columns and tokenize their parts (e.g. time stamps).</li>
<li>SAND&#8217;s workload management sees RAM and CPU, but not explicitly I/O.</li>
<li>SAND lets you pin certain tables or even table segments in RAM if you want to.</li>
</ul>
<p>SAND&#8217;s update story is straightforward &#8212; when data comes in, all the columns and bitmaps are updated as needed. Still, since SAND is columnar, you wouldn&#8217;t expect true updates in place, and you&#8217;d be right. Rather, there&#8217;s a story with MVCC (MultiVersion Concurrency Control) and garbage collection, lock-free. The MVCC is also exploited for a kind of time travel, and further for some kind of virtual data mart capability.</p>
<p>SAND&#8217;s parallelization story is a bit complicated.</p>
<ul>
<li>SAND has, or at least has the potential for, <a href="../../../../../2008/09/05/mpp-data-warehouse-nodes/">node specialization</a>, with database and storage nodes being different.</li>
<li>In principle, disks are specific to storage nodes, and it&#8217;s a configuration option as to whether a database node sees one, some, or all storage nodes.</li>
<li>In practice, only Dunnhumby among SAND&#8217;s customers operates on other than a shared-disk basis. Dunnhumby&#8217;s configuration is mixed/matched among various SAND sharing options.</li>
</ul>
<p>SAND is proud of its PMML (Predictive Modeling Markup Language) scoring capabilities, but otherwise hasn&#8217;t shipped much in the way of <a href="../../../../../2011/02/24/analytic-platforms/">analytic platform</a> capabilities. That said, work is underway on a user-defined table function capability that can also query external tables, fire off MapReduce jobs, and so on, under the code name UQL.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/12/clarifying-sands-customer-metrics-positioning-and-technical-story/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Exasol update</title>
		<link>http://www.dbms2.com/2011/11/12/exasol-update/</link>
		<comments>http://www.dbms2.com/2011/11/12/exasol-update/#comments</comments>
		<pubDate>Sun, 13 Nov 2011 02:37:13 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Exasol]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5661</guid>
		<description><![CDATA[I last wrote about Exasol in 2008. After talking with the team Friday, I&#8217;m fixing that now. The general theme was as you&#8217;d expect: Since last we talked, Exasol has added some new management, put some effort into sales and marketing, got some customers, kept enhancing the product and so on. Top-level points included: Exasol&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p><a href="../../../../../2008/08/16/exasol-technical-briefing/">I last wrote about Exasol in 2008</a>. After talking with the team Friday, I&#8217;m fixing that now. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  The general theme was as you&#8217;d expect: Since last we talked, Exasol has added some new management, put some effort into sales and marketing, got some customers, kept enhancing the product and so on.</p>
<p>Top-level points included:</p>
<ul>
<li>Exasol&#8217;s technical philosophy is substantially the same as before, albeit not with as extreme a focus on fitting everything in RAM.</li>
<li>Exasol believes its flagship DBMS EXASolution has great performance on a load-and-go basis.</li>
<li>Exasol has 25 EXASolution customers, all in Germany.*</li>
<li>5 of those are &#8220;cloud&#8221; customers, at hosting providers engaged by Exasol.</li>
<li>EXASolution database sizes now range from the low 100s of gigabytes up to 30 terabytes.</li>
<li>Pretty much the whole company is in Nuremberg.</li>
</ul>
<p><span id="more-5661"></span><em>*That excludes some money from Hitachi. Exasol&#8217;s Hitachi partnership is still in limbo, an apparent casualty of the world economic crisis.</em></p>
<p>On the technical side:</p>
<ul>
<li>As noted in my 2008 post, EXASolution is a columnar, no-head-node MPP (Massively Parallel Processing) DBMS.</li>
<li>The main way EXASolution compresses data is via dictionary/tokenization. 5:1 is a typical compression ratio before mirroring and so on, out of a 2-10:1 range.</li>
<li>EXASolution writes data to blocks in memory that are smaller than what is otherwise its preferred size (1/2 to 5 megabytes). These are sent to disk, where merge eventually happens. Exasol insists that write performance has always been fully satisfactory to customers to date.</li>
<li>EXASolution doesn&#8217;t have much in the way of performance tuning knobs. Exasol says they aren&#8217;t needed, and says that one really can start an EXASolution POC (Proof of Concept) in a day or so.</li>
<li>EXASolution doesn&#8217;t have much in the way of workload management capabilities, except what&#8217;s automagic (e.g., short query bias). However, it does collect statistics you can query via your favorite BI tool.</li>
<li>EXASolution doesn&#8217;t have much in the way of <a href="../../../../../2011/02/24/analytic-platforms/">analytic platform</a> capabilities, although there is some Lua-based scripting. However, there&#8217;s something NDA in the analytic platform area Coming Soon.*</li>
</ul>
<p>In general, the whole thing sounds somewhat like ParAccel, at least at a high level.</p>
<p><em>*Exasol is not and never has been our client, but we can keep secrets for them even so.</em></p>
<p>Naturally, Exasol believes EXASolution has fine concurrency, with at least one customer routinely running 2000 concurrent users, 200 concurrent sessions (via connection pooling), and 5-10 concurrent queries. Another customer has 3500 Cognos users. 1-200 concurrent queries appears to be the record peak load. Anyhow, Exasol says that plans to offer real workload management could be accelerated if a need were discovered.</p>
<p>Exasol says it almost never loses POCs, but admits that it competes fairly rarely against Vertica and ParAccel, no doubt for reasons of geography. Exasol boasts one visible Sybase IQ replacement (Sony Music).</p>
<p>While Exasol&#8217;s sales to date have been in Germany, there are plans to change that soon. At least one sales cycle is well underway in Eastern Europe. Offices in other Germanic countries are planned. Existing customers are planning to deploy additional copies outside Germany. Discussions are underway regarding other geographies, e.g. English-speaking ones.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/12/exasol-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MarkLogic 5, and why you might care</title>
		<link>http://www.dbms2.com/2011/11/01/marklogic-version-5/</link>
		<comments>http://www.dbms2.com/2011/11/01/marklogic-version-5/#comments</comments>
		<pubDate>Tue, 01 Nov 2011 04:03:59 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MarkLogic]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Structured documents]]></category>
		<category><![CDATA[Text]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5560</guid>
		<description><![CDATA[MarkLogic is releasing MarkLogic 5. Key elements of the announcement are: More-of-the-same in line with MarkLogic’s core positioning. A new bi-directional Hadoop connector. A free MarkLogic Express edition, limited in license terms more than in actual features, as per Slide 27 of the deck MarkLogic graciously supplied for me to post. Also, MarkLogic is early [...]]]></description>
			<content:encoded><![CDATA[<p>MarkLogic is releasing MarkLogic 5. Key elements of the announcement are:</p>
<ul>
<li>More-of-the-same      in line with MarkLogic’s core positioning.</li>
<li>A new      bi-directional Hadoop connector.</li>
<li>A free      MarkLogic Express edition, limited in license terms more than in actual      features, as per Slide 27 of <a href="http://www.monash.com/uploads/MarkLogic-5-Deck.pptx">the deck      MarkLogic graciously supplied for me to post</a>.</li>
</ul>
<p>Also, MarkLogic is early with a feature that most serious DBMS vendors will  soon have – support for tiered storage, with writes going first to  solid-state storage, then being flushed to disk via a caching-style  algorithm.* And as befits a sometime search-engine-substitute, MarkLogic has finally licensed a large set of document filters, from an Australian company called <a href="http://www.isys-search.com/index.html">Isys</a>. Apparently, the special virtue of the Isys filters is that they’re good at extracting not only text, but metadata as well.</p>
<p><em>*If there’s a caching algorithm that doesn’t contain a major element of LRU (Least Recently Used), I don’t recall ever hearing about it.</em></p>
<p>MarkLogic seems to have settled on a positioning that, although distressingly buzzword-heavy, is at least partly based upon reality. The real part includes:</p>
<ul>
<li>MarkLogic      is a serious, enterprise-class DBMS (see for example Slide 12 of <a href="http://www.monash.com/uploads/MarkLogic-5-Deck.pptx">the MarkLogic      deck</a>) …</li>
<li>…      which has been optimized from the getgo for <a href="../../../../../2011/05/17/poly-structured-database/">poly-structured      data</a>.</li>
<li>MarkLogic      can and does scale out to handle large amounts of data.</li>
<li>MarkLogic      is a general-purpose DBMS, suitable for <a href="../../../../../2011/03/30/short-request-and-analytic-processing/">both      short-request and analytic tasks</a>.</li>
<li>MarkLogic      is particularly well suited for analyses with long chains of “progressive      enhancement” (MarkLogic’s favorite term when talking about <a href="../../../../../2011/05/30/another-category-of-derived-data/">derived      data</a>).</li>
<li><a href="http://blogs.avalonconsult.com/blog/search/is-marklogic-a-search-engine/">MarkLogic      often plays the role of a content assembler and/or search engine</a>, and      the people who use MarkLogic in those ways are commonly doing things that can      be described as research and analysis.</li>
</ul>
<p>Based on that reality, MarkLogic talks a lot about Volume, Velocity, Variety, Big Data, unstructured data, semi-structured data, and big data analytics.</p>
<p><span id="more-5560"></span><em>My <a href="../../../../../2010/11/29/marklogic-and-its-document-dbms/">November, 2010 overview of MarkLogic technology</a> remains pretty relevant. One correction, however: Node heterogeneity configurations, in which “data” and “evaluation” nodes reside on separate servers, are the exception rather than the rule.</em></p>
<p>Like <a href="../../../../../2011/10/18/vertica-community-edition/">Vertica</a>, MarkLogic has laudably said that true academic researchers can get MarkLogic for free without the severe license restrictions. Free MarkLogic should be of particular interest to researchers who:</p>
<ul>
<li>Are      studying natural networks or graphs, such as social networks or biological      pathways. (This might be a fit in the social or biological sciences.)</li>
<li>Are      managing metadata for, say, a variety of disparate kinds of experimental      files. (This might be a fit anywhere in the natural sciences.)</li>
<li>Are      managing actual documents, images, videos, etc., or data about such      things. (This might be a fit in the humanities or social sciences.)</li>
</ul>
<p>MarkLogic provided some disclosable financial substance by email, which I shall quote verbatim:</p>
<ul>
<li><em>MarkLogic      has 45% revenue growth and 55-60% license growth year over year.</em></li>
<li><em>We      expect to finish this year with over $85 million in revenue, up from $55      million last year.</em></li>
</ul>
<p>Arithmetical purists might note that 85/55 is more than 145%, but I’m just going to settle for the information I got and move on.</p>
<p><em>Edit: I posted separately about the <a href="http://www.dbms2.com/2011/11/03/marklogic-hadoop-connector/">MarkLogic Hadoop connector.</a></em> <span style="text-decoration: line-through;">As for that Hadoop connector – stay tuned for a short follow-up post, as writing about it now would not be convenient. (My backup discipline isn’t what it should be, and the only copy of my notes about that product is on a heavy tower computer in a house that doesn’t have working power.)</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/01/marklogic-version-5/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>NoSQL notes</title>
		<link>http://www.dbms2.com/2011/10/23/nosql-notes/</link>
		<comments>http://www.dbms2.com/2011/10/23/nosql-notes/#comments</comments>
		<pubDate>Mon, 24 Oct 2011 04:20:27 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Basho and Riak]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[MongoDB and 10gen]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Parallelization]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5522</guid>
		<description><![CDATA[Last week I visited with James Phillips of Couchbase, Max Schireson and Eliot Horowitz of 10gen, and Todd Lipcon, Eric Sammer, and Omer Trajman of Cloudera. I guess it&#8217;s time for a round-up NoSQL post. Views of the NoSQL market horse race are reasonably consistent, with perhaps some elements of “Where you stand depends upon [...]]]></description>
			<content:encoded><![CDATA[<p>Last week I visited with James Phillips of Couchbase, Max Schireson and Eliot Horowitz of 10gen, and Todd Lipcon, Eric Sammer, and Omer Trajman of Cloudera. I guess it&#8217;s time for a round-up NoSQL post. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Views of the NoSQL market horse race are reasonably consistent, with perhaps some elements of “Where you stand depends upon where you sit.”</p>
<ul>
<li>As      James tells it, NoSQL is simply a three-horse race between Couchbase,      MongoDB, and Cassandra.</li>
<li>Max      would include HBase on the list.</li>
<li>Further,      Max pointed out that metrics such as job listings suggest MongoDB has the      most development activity, and Couchbase/Membase/CouchDB perhaps have      less.</li>
<li>The Cloudera      guys remarked on some serious HBase adopters.*</li>
<li>Everybody      I spoke with agreed that Riak had little current market presence, although      some Basho guys could surely be found who&#8217;d disagree.</li>
</ul>
<p><span id="more-5522"></span><em>*I hope to do a separate post on HBase adoption soon. In connection with that, any info on HBase adoption by Facebook (said to be very heavy), Twitter, et al. would be much appreciated.</em></p>
<p>The reasons for using NoSQL of course are, in some order, <a href="../../../../../2011/07/31/dynamic-fixed-schema-databases/">dynamic schemas</a>, scale-out, and open source. <a href="http://www.dbms2.com/2011/10/23/transparent-relational-oltp-scale-out/">I find the scale-out argument somewhat bogus</a>,* but the data model one is very real. Depending on whom you talk with, the most important point about dynamic schemas may actually be that they’re changeable, or it may just be that you don’t have to specify a schema at the time of initial application design. MongoDB gets particular praise as a good platform on which to throw something together quickly, although predictions as to how far the application will then scale may differ depending on whether you’re talking with, say, Max or Todd.</p>
<p><em>*It’s fair to say that NoSQL systems are more proven in scale-out than most relational DBMS. Even so, I would cringe at any line of reasoning that concluded one should adopt NoSQL because it is more mature than relational alternatives.</em></p>
<p>Finally, I was perhaps too extreme when <a href="../../../../../2011/10/20/more-notes-on-oracle-nosql/">I suggested there was no good reason for Oracle to have adopted the major key/minor key approach it took in its NoSQL offering</a>. Todd offered a reason why that approach – which he characterized as similar to Project Voldemort’s – could make sense:</p>
<ul>
<li>If you      have some kind of global secondary index, it’s hard to maintain that index      consistently without what amounts to distributed transactions.</li>
<li>If you      want to avoid the overhead of those, one alternative is a column-group      system such as HBase or Cassandra. Those have no indexes at all, except in      the sense that a column is its own index.</li>
<li>Another      alternative is to load as much indexing information as you can into the      key of a key-value store.</li>
</ul>
<p>I’d be interested to learn about the Couchbase and MongoDB answers to that challenge.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/10/23/nosql-notes/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Ingres deemphasized, company now named Actian</title>
		<link>http://www.dbms2.com/2011/09/25/ingres-actian/</link>
		<comments>http://www.dbms2.com/2011/09/25/ingres-actian/#comments</comments>
		<pubDate>Sun, 25 Sep 2011 11:48:18 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Hadapt]]></category>
		<category><![CDATA[Ingres]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[VectorWise]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5361</guid>
		<description><![CDATA[Ingres, the company, is: Changing its name to Actian. Deemphasizing Ingres, the product. Emphasizing a set of products that don&#8217;t exist yet (or at least aren&#8217;t shipping), namely lightweight mobile apps that are business-intelligence-plus-an-action, and technology for building them. These are called &#8220;Action Apps&#8221;, and are discussed on the Actian company blog. Positioning all this [...]]]></description>
			<content:encoded><![CDATA[<p>Ingres, the company, is:</p>
<ul>
<li>Changing its name to Actian.</li>
<li>Deemphasizing Ingres, the product.</li>
<li>Emphasizing a set of products that don&#8217;t exist yet (or at least aren&#8217;t shipping), namely lightweight mobile apps that are business-intelligence-plus-an-action, and technology for building them. These are called &#8220;Action Apps&#8221;, and are discussed on the <a href="http://blogs.actian.com/">Actian company blog</a>.</li>
<li>Positioning all this as something to do with &#8220;big data&#8221; (<a href="http://www.dbms2.com/2011/09/11/big-data-has-jumped-the-shark/">what a shock</a>).</li>
</ul>
<p>It turns out that Actian was the name of an ancient athletic competition commemorating Augustus&#8217; defeat of Anthony at Actium, a battle that was more recently memorialized in the movie Cleopatra. Frankly, I think Cleopatra Software might have been a more interesting company name, although that could mean execs would have to arrive at sales calls rolled up in a carpet.</p>
<p><span id="more-5361"></span>One <a href="http://www.v3.co.uk/v3-uk/news/2111814/ingres-rebrands-actian-push">article</a> said:</p>
<blockquote><p>Greg Wood, chief financial officer for Actian, told <em>V3</em> that while the firm would continue to develop and maintain the Ingres  database platform, its would be placing the spotlight on its Cloud  Action Platform and its line of Action Apps.</p>
<p>&#8220;The Ingres database is well-recognised  and we will continue to support it, but at the same time that brand was  more associated with an older-generation technology,&#8221; Wood said.</p>
<p>&#8220;We think Actian better reflects where we are going as a company, particularly the application strategy.&#8221;</p>
<p>Wood explained that the platform would  look to expand on the emerging field of big data applications by adding  functionality for end users. The small, specialised applications would  link up with data analytics tools, providing alerts and actions when  various conditions are spotted within a database.</p></blockquote>
<p>So what about VectorWise? Notwithstanding Actian&#8217;s stated focus on &#8220;big data&#8221;, I think VectorWise&#8217;s chances for market success are slim.* Reasons include:</p>
<ul>
<li>The market for shared-disk columnar analytic DBMS is crowded (Sybase IQ, Infobright, SAND). Those vendors also have to compete with MPP columnar analytic DBMS offerings from Vertica and ParAccel.</li>
<li>I&#8217;ve never heard anything to make me believe VectorWise is getting significant market traction.</li>
<li>Indeed, Daniel Abadi&#8217;s well-known flirtation with the idea of using VectorWise in HadoopDB/Hadapt excepted, I don&#8217;t recall any marketplace mention of VectorWise at all.</li>
</ul>
<p><em>*The possibility of some kind of Action App synergy leads me to elevate them to &#8220;slim&#8221; from &#8220;none&#8221;.</em></p>
<p>The Action App idea actually sounds cool, but it&#8217;s quite a change from Ingres&#8217; previous positioning and technology, and I have no basis for judging it as likely to succeed. On the other hand, companies have occasionally made successful transitions into business intelligence from relatively unrelated businesses before, most notably Cognos in the mid-1990s.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/25/ingres-actian/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>The database architecture of salesforce.com, force.com, and database.com</title>
		<link>http://www.dbms2.com/2011/09/15/database-architecture-salesforce-com-force-com-and-database/</link>
		<comments>http://www.dbms2.com/2011/09/15/database-architecture-salesforce-com-force-com-and-database/#comments</comments>
		<pubDate>Thu, 15 Sep 2011 16:09:32 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Object]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[salesforce.com]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5237</guid>
		<description><![CDATA[salesforce.com, force.com, and database.com use exactly the same database infrastructure and architecture. That&#8217;s the good news. The bad news is that salesforce.com is somewhat obscure about technical details, for reasons such as: A long-ago marketing decision to not give infrastructure details, so as to convey a &#8220;Don&#8217;t worry; we&#8217;ll take care of everything&#8221; message. Even [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.dbms2.com/2011/09/15/salesforce-force-database-data-heroku/">salesforce.com, force.com, and database.com use exactly the same database infrastructure and architecture</a>. That&#8217;s the good news. The bad news is that salesforce.com is somewhat obscure about technical details, for reasons such as:</p>
<ul>
<li>A long-ago marketing decision to not give infrastructure details, so as to convey a &#8220;Don&#8217;t worry; we&#8217;ll take care of everything&#8221; message.</li>
<li>Even so, a long-ago and perhaps now-regretted marketing decision to disclose and even exaggerate salesforce.com&#8217;s reliance on Oracle, as part of an early-days attempt to prove salesforce was using enterprise-class technology.</li>
<li>A desire to hide the recipe for salesforce.com&#8217;s secret sauce.</li>
<li>Force of habit &#8212; I&#8217;m not sure salesforce even knows how to tell its technical story with any clarity.</li>
</ul>
<p>Actually, salesforce.com has moved some kinds of data out of Oracle that previously used to be stored there. Besides Oracle, salesforce uses at least a file system and a RAM-based data store about which I have no details. Even so, much of salesforce.com&#8217;s data is stored in Oracle &#8212; a single instance of Oracle, which it believes may be the largest instance of Oracle in the world.</p>
<p><span id="more-5237"></span>Salesforce did spell out some of its database story in <a href="http://www.salesforce.com/au/assets/pdf/Force.com_Multitenancy_WP_101508.pdf">a 2008 force.com white paper</a>,<em> </em>which is good stuff, but potentially misleading in one important way. The paper tells of a level of abstraction, whereby what the application sees as logical &#8220;columns&#8221; are stored in a very different schema than one might assume. However, it doesn&#8217;t spell out a second level of abstraction, whereby that logical schema also isn&#8217;t how the database is actually laid out.</p>
<p><em>Another flaw in the paper is that it spins &#8220;We had to do this, to support multitenancy, so we did.&#8221; issues as &#8220;Because we&#8217;re multitenant, we can do this, while single-tenant systems can&#8217;t.&#8221; One example is the query optimization step around &#8220;user visibility&#8221; in Figure 11. Welcome to marketing.</em></p>
<p>At the first level of abstraction, data seems to be kept mainly in a single wide table, with hundreds of columns. What&#8217;s more, many of those are &#8220;flex columns&#8221;; a flex column can hold data of many different kinds and even datatypes. Notwithstanding the second level of abstraction, I imagine the idea of stuffing different kinds of thing into the same column has something to do with the fact that <a href="../../../../../2011/03/13/so-how-many-columns-can-a-single-table-have-anyway/">Oracle&#8217;s physical limit on columns</a> falls far short of the number of logical columns salesforce wants to use.</p>
<p>If we imagine that the different kinds of data in a flex column were each in their own column instead, the whole thing might sound like BigTable/Cassandra/HBase-style column-group NoSQL. Thus, much as <a href="../../../../../2010/08/22/workday-technology-stack/">Workday uses MySQL to simulate a key-value store</a>, salesforce.com can be said to use Oracle to simulate a different kind of NoSQL. In both cases, what&#8217;s going on seems to be a kind of object/relational mapping, but with the relational aspect strongly deemphasized. Or, if you take a more relational view, we could say that salesforce.com&#8217;s tables are a lot wider than any one user organization&#8217;s, because each user sees only its own custom columns (plus the standard ones common to all users).</p>
<p>The second layer of abstraction has a lot to do with multitenancy. If you want to stick data for many different user organizations into the same huge table, then you have to label it in some way to show who is permitted to see or update each part. Logically, this leads to a join, between one table carrying data plus a simple key showing which users/roles are entitled to see it, and a second table showing who actually is that kind of user/has that kind of role. But that join makes a lot of sense to store in a denormalized way, all the more because data is partitioned across the computer cluster in line with which user organization it actually belongs to.</p>
<p><em>Multitenant security isn&#8217;t the only reason for this denormalization, but it appears to be the biggest one.</em></p>
<p>The whole thing is doing 550 million or so transactions per day. salesforce.com thinks that fact should be regarded as evidence that it works. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/15/database-architecture-salesforce-com-force-com-and-database/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
	</channel>
</rss>

