<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Specific users</title>
	<atom:link href="http://www.dbms2.com/category/users/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Wed, 08 Feb 2012 22:51:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Couchbase update</title>
		<link>http://www.dbms2.com/2012/02/01/couchbase-update/</link>
		<comments>http://www.dbms2.com/2012/02/01/couchbase-update/#comments</comments>
		<pubDate>Thu, 02 Feb 2012 04:00:24 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Basho and Riak]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[CouchDB]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[DataStax]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[MongoDB and 10gen]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Web analytics]]></category>
		<category><![CDATA[Zynga]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5877</guid>
		<description><![CDATA[I checked in with James Phillips for a Couchbase update, and I understand better what&#8217;s going on. In particular: Give or take minor tweaks, what I wrote in my August, 2010 Couchbase updates still applies. Couchbase now and for the foreseeable future has one product line, called Couchbase. Couchbase 2.0, the first version of Couchbase [...]]]></description>
			<content:encoded><![CDATA[<p>I checked in with James Phillips for a Couchbase update, and I understand better what&#8217;s going on. In particular:</p>
<ul>
<li>Give or take minor tweaks, what I wrote in my <a href="../../../../../2011/08/13/couchbase-business-update/">August, 2010 Couchbase updates</a> still applies.</li>
<li>Couchbase now and for the foreseeable future has one product line, called Couchbase.</li>
<li>Couchbase 2.0, the first version of Couchbase (the product) to use CouchDB for persistence, has slipped &#8230;</li>
<li>&#8230; because more parts of CouchDB had to be rewritten for performance than Couchbase (the company) had hoped.</li>
<li>Think mid-year or so for the release of Couchbase 2.0, hopefully sooner.</li>
<li>In connection with the need to rewrite parts of CouchDB, Couchbase has:
<ul>
<li><a href="../../../../../2012/01/18/notes-from-the-couch-blogs/">Gotten out of the single-server CouchDB business</a>.</li>
<li>Donated its proprietary single-sever CouchDB intellectual property to the Apache Foundation.</li>
</ul>
</li>
<li>The 150ish new customers in 2011 Couchbase brags about are real, subscription customers.</li>
<li>Couchbase has 60ish people, headed to &gt;100 over the next few months.</li>
</ul>
<p><span id="more-5877"></span><em>If you previously heard the brand names Couchbase Single or Couchbase Mobile, pay no further attention to them. Couchbase Single was CouchDB; Couchbase Mobile is part of Couchbase&#8217;s feature set.</em></p>
<p>The current product is Couchbase 1.8, which is a whole lot like what previously was called Membase. New features in Couchbase 1.8 (versus prior versions of Membase) were concentrated in client libraries/SDK (Software Development Kit). Not coincidentally, Couchbase has hired developer evangelists who are in charge of making Couchbase play nicely with various specific languages (e.g. C/C++)</p>
<p>Drilling down further into the CouchDB part of the story:</p>
<ul>
<li>Couchbase 2.0 will replace Couchbase 1.8/Membase&#8217;s SQLite back-end with CouchDB.</li>
<li>Parts of CouchDB that do things like read, write, or compact data have been rewritten from Erlang to C.</li>
<li>Couchbase still uses other Erlang parts of Apache CouchDB, and would be delighted if the community were to usefully enhance them.</li>
<li>Couchbase&#8217;s heavy contributions to development of open source CouchDB will, for the most part, continue.</li>
<li>CouchDB stuff donated to the Apache Foundation includes:
<ul>
<li>Documentation</li>
<li>Packaging</li>
<li>Performance enhancements</li>
</ul>
</li>
</ul>
<p>There&#8217;s at least one Couchbase user with &gt;1000 nodes (at a guess, <a href="../../../../../2011/09/05/zynga-linkedin-data-warehous/">Zynga</a>).  More typical might be 20 nodes or less. This led me to wonder how much data one puts on a Couchbase node anyway. The answer turns out to vary widely, in that you want your working set to be in RAM, and whether that&#8217;s your entire database or just a slice of it depends on the nature of the application.</p>
<p>James echoed a trend I&#8217;ve heard elsewhere as well, in which products one things of as being internet-specific are also sold in a few cases to conventional enterprises for &#8212; you guessed it! &#8212; their internet operations. I also asked him about competition, and he asserted:</p>
<ul>
<li>MongoDB is the big competition. He believes Couchbase has an excellent win rate vs. 10gen for actual paying accounts.</li>
<li>DataStax/Cassandra wins over Couchbase only when multi-data-center capability is important. Naturally, multi-data-center capability is planned for Couchbase. (Indeed, that&#8217;s one of the benefits of swapping in CouchDB at the back end.)</li>
<li>Redis has &#8220;dropped off the radar&#8221;, presumably because there&#8217;s no particular persistence strategy for it.</li>
<li>Riak doesn&#8217;t show up much.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/01/couchbase-update/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Splunk update</title>
		<link>http://www.dbms2.com/2012/01/10/splunk-update/</link>
		<comments>http://www.dbms2.com/2012/01/10/splunk-update/#comments</comments>
		<pubDate>Tue, 10 Jan 2012 05:55:08 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Splunk]]></category>
		<category><![CDATA[Structured documents]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5791</guid>
		<description><![CDATA[Splunk is announcing the Splunk 4.3 point release. Before discussing it, let&#8217;s recall a few things about Splunk, starting with: Splunk is first and foremost an analytic DBMS &#8230; &#8230; used to manage logs and similar multistructured data. Splunk&#8217;s DML (Data Manipulation Language) is based on text search, not on SQL. Splunk has extended its [...]]]></description>
			<content:encoded><![CDATA[<p>Splunk is announcing the Splunk 4.3 point release. Before discussing it, let&#8217;s recall a few things about Splunk, starting with:</p>
<ul>
<li>Splunk is first and foremost an analytic DBMS &#8230;</li>
<li>&#8230; used to manage logs and similar multistructured data.</li>
<li>Splunk&#8217;s DML (Data Manipulation Language) is based on text search, not on SQL.</li>
<li>Splunk has extended its DML in natural ways (e.g., you can use it to do calculations and even some statistics).</li>
<li>Splunk bundles some (very) basic, Splunk-specific business intelligence capabilities.</li>
<li>The paradigmatic use of Splunk is to monitor IT operations in real time. However:
<ul>
<li>There also are plenty of non-real-time uses for Splunk.</li>
<li>Splunk is proudest of its growth in non-IT quasi-real-time uses, such as the marketing side of web operations.</li>
</ul>
</li>
</ul>
<p>As in any release, a lot of Splunk 4.3 is about &#8220;Oh, you didn&#8217;t have that before?&#8221; features and <a href="../../../../../2009/08/21/bottleneck-whack-a-mole/">Bottleneck Whack-A-Mole</a> performance speed-up. One performance enhancement is Bloom filters, which are a very hot topic these days. More important is a switch from Flash to HTML5, so as to accommodate mobile devices with less server-side rendering. Splunk reports that its users &#8212; especially the non-IT ones &#8212; really want to get Splunk information on the tablet devices. While this somewhat contradicts <a href="../../../../../2012/01/04/some-issues-in-business-intelligence/">what I wrote a few days ago pooh-poohing mobile BI</a>, let me hasten to point out:</p>
<ul>
<li>Splunk is used for a lot of (quasi) real-time monitoring.</li>
<li>Splunk&#8217;s desktop user interfaces are, by BI standards, quite primitive.</li>
</ul>
<p>That&#8217;s pretty much the ideal scenario for mobile BI: Timeliness matters and prettiness doesn&#8217;t.</p>
<p><span id="more-5791"></span><em>Hmm. Maybe <a href="../../../../../2011/11/10/streambase-liveview-push-based-real-time-bi/">StreamBase LiveView</a> needs a mobile option as well &#8230;</em></p>
<p>Splunk&#8217;s basic use is to take the text string that is a log and make sense of it. But Splunk now also supports JSON structures. It does this via something called spath, which as you might guess from the name has XPath similarities. That probably bore more discussion than we found the time to have.</p>
<p><em>By the way: If you&#8217;re interested in BI over XML, that&#8217;s what my former clients at Skytide were founded to do, before they pivoted a bit. I don&#8217;t think those capabilities have disappeared from the product</em>.</p>
<p><a href="http://www.monash.com/uploads/Splunk-4-3.pdf">Splunk has graciously allowed me to post a slide deck</a>. More stuff in there, including quotes from a customer &#8212; Expedia &#8212; that has 2700 Splunk users.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/01/10/splunk-update/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Big data terminology and positioning</title>
		<link>http://www.dbms2.com/2012/01/08/big-data-terminology-and-positioning/</link>
		<comments>http://www.dbms2.com/2012/01/08/big-data-terminology-and-positioning/#comments</comments>
		<pubDate>Mon, 09 Jan 2012 01:35:57 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MarkLogic]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Splunk]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5768</guid>
		<description><![CDATA[Recently, I observed that Big Data terminology is seriously broken. It is reasonable to reduce the subject to two quasi-dimensions: Bigness &#8212; Volume, Velocity, size Structure &#8212; Variety, Variability, Complexity given that High-velocity &#8220;big data&#8221; problems are usually high-volume as well.* Variety, variability, and complexity all relate to the simply-structured/poly-structured distinction. But the conflation should [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, I observed that <a href="../../../../../2011/09/11/big-data-has-jumped-the-shark/">Big Data terminology is seriously broken</a>. It is reasonable to reduce the subject to two quasi-dimensions:</p>
<ul>
<li><strong>Bigness</strong> &#8212; Volume, Velocity, size</li>
<li><strong>Structure</strong> &#8212; Variety, Variability, Complexity</li>
</ul>
<p>given that</p>
<ul>
<li>High-velocity &#8220;big data&#8221; problems are usually high-volume as well.*</li>
<li>Variety, variability, and complexity all relate to the <a href="../../../../../2011/05/17/poly-structured-database/">simply-structured/poly-structured</a> distinction.</li>
</ul>
<p>But the conflation should stop there.</p>
<p><em>*Low-volume/high-velocity problems are commonly referred to as <a href="../2011/08/25/renaming-cep-or-not/">&#8220;event processing&#8221; and/or &#8220;streaming&#8221;</a>.</em></p>
<p>When people claim that bigness and structure are the same issue, they oversimplify into mush. So I think we need four pieces of terminology, reflective of a 2&#215;2 matrix of possibilities. For want of better alternatives, my suggestions are:</p>
<ul>
<li><strong>Relational big data</strong> is data of high volume that fits well into a relational DBMS.</li>
<li><strong>Multi-structured big data</strong> is data of high volume that doesn&#8217;t fit well into a relational DBMS. <em>Alternative: Poly-structured big data.</em></li>
<li><strong>Conventional relational data</strong> is data of not-so-high volume that fits well into a relational DBMS. <em>Alternatives: Ordinary/normal/smaller relational data.</em></li>
<li><strong>Smaller poly-structured data</strong> is data for which <a href="../../../../../2011/07/31/dynamic-fixed-schema-databases/">dynamic schema</a> capabilities are important, but which doesn&#8217;t rise to &#8220;big data&#8221; volume.</li>
</ul>
<p><span id="more-5768"></span>Notes on all this include:</p>
<ul>
<li>&#8220;Relational big data&#8221; is commonly what you need a scalable analytic relational DBMS for. But there are non-analytic use cases as well.</li>
<li>The paradigmatic example of &#8220;multi-structured big data&#8221; is log files. Thus, multi-structured big data is commonly what you need a <a href="../../../../../2011/06/04/dirty-data-stored-dirt-cheap/">big bit bucket</a> for.</li>
<li>One might want to equate non-analytic relational big data technology to &#8220;NewSQL&#8221;. However, I&#8217;m struggling to think of a database size range in which the entire NewSQL industry can match Oracle&#8217;s market share alone.</li>
<li>One might want to equate non-analytic multi-structured big data technology to &#8220;NoSQL&#8221;. However:
<ul>
<li>&#8220;NoSQL&#8221; is also used to encompass not-so-big-data use cases, such as prototyping in MongoDB.</li>
<li><a href="../../../../../2011/10/02/defining-nosql/">&#8220;NoSQL&#8221; has non-ACID/low(er)-data-integrity connotations</a> that aren&#8217;t appropriate for all non-relational systems.</li>
</ul>
</li>
<li>Up to a point, you can analyze relational big data in a conventional relational DBMS, but an analytic RDBMS will usually win on TCO (Total Cost of Ownership). In particular, reasonable thresholds for moving an analytic database off Oracle might be:
<ul>
<li>1-2 terabytes if you&#8217;ve never bought anything past Oracle Standard Edition.</li>
<li>5-10 terabytes if you&#8217;re already paying for Oracle Enterprise Edition.</li>
<li>A lot higher than that if you actually find Oracle Exadata to be cost-effective.</li>
</ul>
</li>
<li>Depending on how big one acknowledges as &#8220;big&#8221;, the market share leader in &#8220;big bit bucket&#8221; use cases is either Splunk or Hadoop.</li>
<li>If we look at multi-structured big data management overall, MarkLogic joins the list of market share contenders, as do various NoSQL alternatives.</li>
<li>It is wrong to say that the large web companies invented &#8220;big data&#8221; technology. But it is more reasonable to say they invented much of &#8220;multi-structured big data&#8221; management. In particular (and this is just a partial list), Google, Amazon, Yahoo, Facebook, et al. can reasonably be credited with Hadoop, Cassandra, HBase and various predecessors to same.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/01/08/big-data-terminology-and-positioning/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>QlikView 11 and the rise of collaborative BI</title>
		<link>http://www.dbms2.com/2011/11/16/qlikview-collaborative-business-intelligence/</link>
		<comments>http://www.dbms2.com/2011/11/16/qlikview-collaborative-business-intelligence/#comments</comments>
		<pubDate>Wed, 16 Nov 2011 13:19:52 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[QlikTech and QlikView]]></category>
		<category><![CDATA[eBay]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5681</guid>
		<description><![CDATA[QlikView 11 came out last month. Let me start by pointing out: As one might expect, QlikView 11 contains fairly leading-edge stuff, but also some &#8220;better late than never&#8221; features. The leading-edge stuff is concentrated in the general area of &#8220;collaboration&#8221;. Additionally, QlikTech is always pushing the QlikView user interface ahead in various ways. The [...]]]></description>
			<content:encoded><![CDATA[<p>QlikView 11 came out last month. Let me start by pointing out:</p>
<ul>
<li>As one might expect, QlikView 11 contains fairly leading-edge stuff, but also some &#8220;better late than never&#8221; features.</li>
<li>The leading-edge stuff is concentrated in the general area of &#8220;collaboration&#8221;.</li>
<li>Additionally, QlikTech is always pushing the QlikView user interface ahead in various ways.</li>
<li>The &#8220;Well, it&#8217;s about time!&#8221; feature list starts with the ability to load QlikView via third-party ETL tools (Informatica now, others coming).</li>
<li>QlikTech is generally good at putting up pretty pictures of its product. You can find some in the &#8220;What&#8217;s New in QlikView 11&#8243; document via a general <a href="http://www.qlikview.com/us/explore/resources/brochures-datasheets?language=english&amp;page=1">QlikView resource page</a>.*</li>
<li>Stephen Swoyer wrote <a href="http://tdwi.org/articles/2011/11/01/QlikView-Update-Enterprise-Makeover.aspx">a good article summarizing QlikView 11</a>.</li>
</ul>
<p><em>*One confusing aspect to that paper:  non-standard uses of the terms &#8220;analytic app&#8221; and &#8220;document&#8221;.</em></p>
<p>As QlikTech tells it, QlikView 11 adds two kinds of collaboration features:</p>
<ul>
<li>Integration with social media, which QlikTech calls &#8220;asynchronous integration.&#8221;</li>
<li>Direct sharing of the QlikView UI, which QlikTech calls &#8220;synchronous integration.&#8221;</li>
</ul>
<p>I&#8217;d add a third kind, because QlikView 11 also takes some baby steps toward what I regard as a key aspect of BI collaboration &#8212; the ability to define and track your own metrics. It&#8217;s way, way short of what <a href="../../../../../2010/07/25/alerts-metrics-dashboards/">I called for in metric flexibility in a post last year</a>, but at least it&#8217;s a small start.</p>
<p><span id="more-5681"></span>That <strong>direct sharing of user interfaces is a cool feature, which every business intelligence vendor should offer. </strong>In an era of distributed workforces, when people can&#8217;t be assumed able to huddle around the same desk, it has value even for use among close coworkers. But it also should prove useful in a variety of more naturally remote use cases, multiple examples of which can be found in each of the areas of:</p>
<ul>
<li>Support (internal or external).</li>
<li>Faceoffs &#8212; I mean collaborations <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  &#8212; between two or more enterprise departments. Examples might include: manufacturing and purchasing, manufacturing and sales, or accounting and anybody else.</li>
</ul>
<p>As for <strong>social media being used for BI collaboration</strong> &#8212; that&#8217;s generally in the air. For example:</p>
<ul>
<li><a href="http://www.texttechnologies.com/2011/09/14/social-technology-in-the-enterprise/">salesforce.com is pushing enterprise social media use broadly</a>, and will surely increase its emphasis on the social media/BI intersection now that Dave Kellogg is there.</li>
<li>Spotfire has announced similar features in its latest release.</li>
<li>The more cumbersome side of the feature set (portal-based collaboration, emailing of individual reports) has been available from multiple vendors for years.</li>
<li>eBay open-sourced a more dataset-centric version of the idea, just as Oliver Ratzesberger left the firm.*</li>
</ul>
<p><em>*Umm &#8212; does anybody have a link to the project, or at least a name for it? <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </em></p>
<p>BI has been a communication tool since the first green paper report was dumped on the first desk. And there&#8217;s been collaboration in doing analysis at least since it&#8217;s been possible to email .XLS file attachments. Still<strong>, BI is too often used as bludgeon rather than binocular. Hopefully, the current generation of technology will finally serve to change that.</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/16/qlikview-collaborative-business-intelligence/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Clarifying SAND&#8217;s customer metrics, positioning and technical story</title>
		<link>http://www.dbms2.com/2011/11/12/clarifying-sands-customer-metrics-positioning-and-technical-story/</link>
		<comments>http://www.dbms2.com/2011/11/12/clarifying-sands-customer-metrics-positioning-and-technical-story/#comments</comments>
		<pubDate>Sun, 13 Nov 2011 02:45:36 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5669</guid>
		<description><![CDATA[Talking with my clients at SAND can be confusing. That said: I need to revise my figures for SAND&#8217;s customer count way downward. SAND finally has a reasonably clear positioning. SAND&#8217;s product actually seems to have a lot of features. A few months ago, I wrote: SAND Technology reported &#62;600 total customers, including &#62;100 direct. [...]]]></description>
			<content:encoded><![CDATA[<p>Talking with my clients at SAND can be confusing. That said:</p>
<ul>
<li>I need to revise my figures for SAND&#8217;s customer count way downward.</li>
<li>SAND finally has a reasonably clear positioning.</li>
<li>SAND&#8217;s product actually seems to have a lot of features.</li>
</ul>
<p>A few months ago, I wrote:</p>
<blockquote><p>SAND Technology reported &gt;600 total customers, including &gt;100 direct.</p></blockquote>
<p>Upon talking with the company, I need to revise that figure downward, from &gt; 600 to 15.</p>
<p><span id="more-5669"></span><em>One embarrassing point: SAND is a client, and I view it as part of my job to save clients from that kind of inadvertent misstatement.</em></p>
<p>It turns out that SAND has a very impressive customer &#8212; Dunnhumby, a data mart outsourcer with 200 terabytes of data in SAND, 30 or so incoming data streams, 400 or so nodes &#8230; and 600 or so end customers, all of which SAND was counting as OEM end customers for its DBMS. But I, other industry observers, and other vendors generally don&#8217;t count that way.</p>
<p>Besides Dunnhumby, SAND has 14 other customers on maintenance, with &lt; 1 terabyte of data each. Until recently, SAND had a couple dozen more customers than that, but it <a href="http://www.sand.com/sand-technology-announces-sale-sap-ilm-product-line/">sold its SAP-oriented archiving/near-line storage product line to Informatica</a>.</p>
<p>I still don&#8217;t know where the &#8220;&gt; 100 direct&#8221; part came from.</p>
<p>After the sale of its other product line, SAND is squarely in the market for analytic DBMS. SAND&#8217;s sales efforts seem to be focused on <a href="http://www.dbms2.com/2011/03/03/investigative-analytics/">investigative analytics</a>, although some of its existing users seem to be more focused on <a href="http://www.dbms2.com/2011/11/08/terminology-operational-analytics/">operational analytics</a>. Most specifically, SAND is trying to focus on &#8220;people data&#8221; &#8212; customer loyalty, health care, etc . &#8212; rather than purely <a href="http://www.dbms2.com/2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a>, with the paradigmatic target application being personalized marketing.</p>
<p>SAND technical highlights include:</p>
<ul>
<li>SAND sells a columnar analytic DBMS.</li>
<li>The SAND DBMS operates on bitmaps, with heavy use of run-length encoding on the bitmaps. Bitmaps are used for everything except BLOBs (Binary Large OBjects).</li>
<li>Actual data compression also comes into play, e.g. as result sets are being assembled. This is based on a true global dictionary &#8212; multiple columns are tokenized together.</li>
<li>Indeed, SAND can decompose columns and tokenize their parts (e.g. time stamps).</li>
<li>SAND&#8217;s workload management sees RAM and CPU, but not explicitly I/O.</li>
<li>SAND lets you pin certain tables or even table segments in RAM if you want to.</li>
</ul>
<p>SAND&#8217;s update story is straightforward &#8212; when data comes in, all the columns and bitmaps are updated as needed. Still, since SAND is columnar, you wouldn&#8217;t expect true updates in place, and you&#8217;d be right. Rather, there&#8217;s a story with MVCC (MultiVersion Concurrency Control) and garbage collection, lock-free. The MVCC is also exploited for a kind of time travel, and further for some kind of virtual data mart capability.</p>
<p>SAND&#8217;s parallelization story is a bit complicated.</p>
<ul>
<li>SAND has, or at least has the potential for, <a href="../../../../../2008/09/05/mpp-data-warehouse-nodes/">node specialization</a>, with database and storage nodes being different.</li>
<li>In principle, disks are specific to storage nodes, and it&#8217;s a configuration option as to whether a database node sees one, some, or all storage nodes.</li>
<li>In practice, only Dunnhumby among SAND&#8217;s customers operates on other than a shared-disk basis. Dunnhumby&#8217;s configuration is mixed/matched among various SAND sharing options.</li>
</ul>
<p>SAND is proud of its PMML (Predictive Modeling Markup Language) scoring capabilities, but otherwise hasn&#8217;t shipped much in the way of <a href="../../../../../2011/02/24/analytic-platforms/">analytic platform</a> capabilities. That said, work is underway on a user-defined table function capability that can also query external tables, fire off MapReduce jobs, and so on, under the code name UQL.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/12/clarifying-sands-customer-metrics-positioning-and-technical-story/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Exasol update</title>
		<link>http://www.dbms2.com/2011/11/12/exasol-update/</link>
		<comments>http://www.dbms2.com/2011/11/12/exasol-update/#comments</comments>
		<pubDate>Sun, 13 Nov 2011 02:37:13 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Exasol]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5661</guid>
		<description><![CDATA[I last wrote about Exasol in 2008. After talking with the team Friday, I&#8217;m fixing that now. The general theme was as you&#8217;d expect: Since last we talked, Exasol has added some new management, put some effort into sales and marketing, got some customers, kept enhancing the product and so on. Top-level points included: Exasol&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p><a href="../../../../../2008/08/16/exasol-technical-briefing/">I last wrote about Exasol in 2008</a>. After talking with the team Friday, I&#8217;m fixing that now. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  The general theme was as you&#8217;d expect: Since last we talked, Exasol has added some new management, put some effort into sales and marketing, got some customers, kept enhancing the product and so on.</p>
<p>Top-level points included:</p>
<ul>
<li>Exasol&#8217;s technical philosophy is substantially the same as before, albeit not with as extreme a focus on fitting everything in RAM.</li>
<li>Exasol believes its flagship DBMS EXASolution has great performance on a load-and-go basis.</li>
<li>Exasol has 25 EXASolution customers, all in Germany.*</li>
<li>5 of those are &#8220;cloud&#8221; customers, at hosting providers engaged by Exasol.</li>
<li>EXASolution database sizes now range from the low 100s of gigabytes up to 30 terabytes.</li>
<li>Pretty much the whole company is in Nuremberg.</li>
</ul>
<p><span id="more-5661"></span><em>*That excludes some money from Hitachi. Exasol&#8217;s Hitachi partnership is still in limbo, an apparent casualty of the world economic crisis.</em></p>
<p>On the technical side:</p>
<ul>
<li>As noted in my 2008 post, EXASolution is a columnar, no-head-node MPP (Massively Parallel Processing) DBMS.</li>
<li>The main way EXASolution compresses data is via dictionary/tokenization. 5:1 is a typical compression ratio before mirroring and so on, out of a 2-10:1 range.</li>
<li>EXASolution writes data to blocks in memory that are smaller than what is otherwise its preferred size (1/2 to 5 megabytes). These are sent to disk, where merge eventually happens. Exasol insists that write performance has always been fully satisfactory to customers to date.</li>
<li>EXASolution doesn&#8217;t have much in the way of performance tuning knobs. Exasol says they aren&#8217;t needed, and says that one really can start an EXASolution POC (Proof of Concept) in a day or so.</li>
<li>EXASolution doesn&#8217;t have much in the way of workload management capabilities, except what&#8217;s automagic (e.g., short query bias). However, it does collect statistics you can query via your favorite BI tool.</li>
<li>EXASolution doesn&#8217;t have much in the way of <a href="../../../../../2011/02/24/analytic-platforms/">analytic platform</a> capabilities, although there is some Lua-based scripting. However, there&#8217;s something NDA in the analytic platform area Coming Soon.*</li>
</ul>
<p>In general, the whole thing sounds somewhat like ParAccel, at least at a high level.</p>
<p><em>*Exasol is not and never has been our client, but we can keep secrets for them even so.</em></p>
<p>Naturally, Exasol believes EXASolution has fine concurrency, with at least one customer routinely running 2000 concurrent users, 200 concurrent sessions (via connection pooling), and 5-10 concurrent queries. Another customer has 3500 Cognos users. 1-200 concurrent queries appears to be the record peak load. Anyhow, Exasol says that plans to offer real workload management could be accelerated if a need were discovered.</p>
<p>Exasol says it almost never loses POCs, but admits that it competes fairly rarely against Vertica and ParAccel, no doubt for reasons of geography. Exasol boasts one visible Sybase IQ replacement (Sony Music).</p>
<p>While Exasol&#8217;s sales to date have been in Germany, there are plans to change that soon. At least one sales cycle is well underway in Eastern Europe. Offices in other Germanic countries are planned. Existing customers are planning to deploy additional copies outside Germany. Discussions are underway regarding other geographies, e.g. English-speaking ones.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/12/exasol-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lessons from T-Mobile&#8217;s epic fail</title>
		<link>http://www.dbms2.com/2011/11/04/lessons-from-t-mobiles-epic-fail/</link>
		<comments>http://www.dbms2.com/2011/11/04/lessons-from-t-mobiles-epic-fail/#comments</comments>
		<pubDate>Fri, 04 Nov 2011 06:01:25 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Text]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5593</guid>
		<description><![CDATA[When my electric power came back on but my Verizon FiOS internet connection didn&#8217;t, it was time for a mobile hotspot/prepaid wireless internet service. T-Mobile&#8217;s 4G Mobile Hotspot/Prepaid Mobile Broadband offering seemed like a good choice. But the experience of setting it up was a nightmare, and a possible instructive nightmare at that. T-Mobile&#8217;s instructions [...]]]></description>
			<content:encoded><![CDATA[<p>When my electric power came back on but my Verizon FiOS internet connection didn&#8217;t, it was time for a mobile hotspot/prepaid wireless internet service. T-Mobile&#8217;s 4G Mobile Hotspot/Prepaid Mobile Broadband offering seemed like a good choice. But the experience of setting it up was a nightmare, and a possible instructive nightmare at that.</p>
<p>T-Mobile&#8217;s instructions tell you that you need to know the factory defaults for network name and password. That makes sense. They don&#8217;t also tell you that you need to know your SIM card number (included), IMEI number (included), or authorization number (not included).</p>
<p>That&#8217;s right &#8212; you need a number that T-Mobile doesn&#8217;t tell you you need. But the story gets a lot worse from there, because it&#8217;s almost impossible to get the number from them. I eventually talked with approximately 8 T-Mobile call center associates over the course of the evening before getting successfully connected.</p>
<p><span id="more-5593"></span><em>One of the few redeeming features in this story is that T-Mobile call center folks pick up the phone quickly. One of the many non-redeeming ones is that they efficiently give you inaccurate information after they do. The one who finally got me the right answer was a young woman who put me on hold to call an internal resource approximately four times before finally handling the situation correctly.</em></p>
<p>At one point I found somewhat helpful information by searching on T-Mobile&#8217;s website for <em>mobile hotspot activation.</em> However, the same information did not surface on my earlier searches on strings like <em>activate mobile hotspot.</em> Stemming has been a basic feature of search engines since the mid 1990s, but evidently T-Mobile&#8217;s technological choices aren&#8217;t as current as that. Other inexcusable T-Mobile mistakes include:</p>
<ul>
<li>Not providing information about the need for an &#8220;activation number&#8221; in the product&#8217;s paper documentation.</li>
<li>Taking the buyer to a sign-on screen that doesn&#8217;t lead to the call center reps responsible for the product being signed-on for.</li>
<li>Not providing call center operators with the tools they need to get callers to the right place.</li>
</ul>
<p>Perhaps Elbonian* contractors were involved.</p>
<p><em>Elbonia is a fictitious country of outsourcers in Scott Adams&#8217; </em>Dilbert<em> comic strip. <a href="http://dilbert.com/strips/comic/2006-03-25/">Elbonian work is not noted for its high quality</a>. </em></p>
<p>In case you haven&#8217;t guessed yet, the missing T-Mobile  &#8220;activation number&#8221; was tantamount to a telephone number, complete with area code. <strong>T-Mobile was insisting on assigning a telephone number for a service that had nothing to do with making or receiving telephone calls.</strong> While I can believe there was some legitimate database/application design reason for having such inflexibility under the covers, it&#8217;s hard to see why T-Mobile didn&#8217;t get a composite application tool and hack a front-end that automatically generates the number without call-center intervention.</p>
<p>I did eventually get connected, and in my limited experience with T-Mobile’s Prepaid/Mobile Hotspot 4G “Broadband” offering, I get the impression it has good speed but conventionally flaky WiFi reliability. It may well be a T-Mobile service that is worthy of great success. <em>Edit: That&#8217;s not true.*</em> But it&#8217;s not going to experience such success as long as T-Mobile idiotically infuriates its users at the relationship start.</p>
<p><em>*Edit: It turns out that the T-Mobile Mobile Hotspot device has terrible range. We can use it to get online in my office, Linda&#8217;s office, or the living room/dining room area, but no 2 of the 3 at once.</em></p>
<p>My takeaways from this story include:</p>
<ul>
<li>Use competent documentation writers.</li>
<li>Run usability testing on your entire customer-experience processes.</li>
<li>Test your site search engine for usefulness.</li>
</ul>
<p>Beyond that, there&#8217;s not a single part of this story for which there isn&#8217;t a straightforward fix, most of them alluded to above.</p>
<p>If you see anything of your organization in this story, it&#8217;s probably time to raise your standards. This is obviously a different kind of failure as, say, the one last year at <a href="http://www.dbms2.com/category/users/jpmorgan-chase/">Chase</a>. Even so, &#8220;as awful as T-Mobile&#8221; would be a sad state to endure.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/04/lessons-from-t-mobiles-epic-fail/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What those nested data structures are about</title>
		<link>http://www.dbms2.com/2011/10/19/nested-data-structures/</link>
		<comments>http://www.dbms2.com/2011/10/19/nested-data-structures/#comments</comments>
		<pubDate>Wed, 19 Oct 2011 17:29:59 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[Web analytics]]></category>
		<category><![CDATA[eBay]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5506</guid>
		<description><![CDATA[As I&#8217;ve noted before, the very big web companies have an issue with nested data structures. The subject came up in XLDB talks yesterday too, so my big goal for lunch was to finally understand what was being talked about. Sitting at a table full of eBay and LinkedIn folks turned out to be a [...]]]></description>
			<content:encoded><![CDATA[<p>As I&#8217;ve noted before, <a href="http://www.dbms2.com/2010/07/31/nested-data-structures-keep-coming-up-especially-for-log-files/">the very big web companies have an issue with nested data structures</a>. The subject came up in XLDB talks yesterday too, so my big goal for lunch was to finally understand what was being talked about. Sitting at a table full of eBay and LinkedIn folks turned out to be a good tactic.</p>
<p>The explanation was led by Oliver Ratzesberger, late of eBay*and progenitor of <a href="http://www.dbms2.com/2010/10/06/ebay-followup-greenplum-out-teradata-10-petabytes-hadoop-has-some-value-and-more/">eBay&#8217;s Singularity project</a>. In simplest terms, <strong>one event can spawn a lot of event attribute information, perhaps in the form of name-value pairs,</strong> which it then makes sense to store together in some way. The example Oliver dwelled on was that, on any given web page, there can be 100+ pieces of information to record, including:</p>
<ul>
<li>All 50 search results you were shown, and their positions in the search rankings.</li>
<li>Every ad, image, or graphical element.</li>
<li>An ID as to which test you were participating in (every page you see on eBay has some element being tested).</li>
</ul>
<p><em>*Oliver is leaving eBay for a still-secret large company. I would conjecture that Michael McIntire is on the move too, either to replace Oliver or to go with him, but Oliver did a very good job of not commenting on the matter.</em></p>
<p>There are several reasons why one might wish to store this information in ways that grieve relational purists. First, reconstructing all this information via joins would be brutally expensive. What&#8217;s more, reconstructing all this information via joins could be impractical. Some comes from third party ad servers, which might not reproduce the same ads upon demand. Other is in the form of rankings, which can&#8217;t always be reliably reproduced from one query to the next. (That&#8217;s just one of several reasons <a href="http://www.dbms2.com/2005/12/09/relational-dbms-versus-text-data/">text search and relational DBMS are an awkward fit</a>.)</p>
<p>Also, there&#8217;s a strong <a href="http://www.dbms2.com/2011/07/31/dynamic-fixed-schema-databases/">dynamic schema</a> flavor to these databases. The list of attributes for one web click might be very different in kind from the list for the next page. Forcing that kind of variability into a fixed relational schema, while theoretically possible, doesn&#8217;t necessarily make a lot of sense.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/10/19/nested-data-structures/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Some notes on Hadoop (mainly) and appliances</title>
		<link>http://www.dbms2.com/2011/09/23/hadoop-appliances/</link>
		<comments>http://www.dbms2.com/2011/09/23/hadoop-appliances/#comments</comments>
		<pubDate>Fri, 23 Sep 2011 19:59:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapR]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[eBay]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5341</guid>
		<description><![CDATA[1. EMC Greenplum has evolved its appliance product line. As I read that, the latest announcement boils down to saying that you can neatly network together various Greenplum appliances in quarter-rack increments. If you take a quarter rack each of four different things, then Greenplum says &#8220;Hooray! Our appliance is all-in-one!&#8221; Big whoop. 2. That [...]]]></description>
			<content:encoded><![CDATA[<p>1. <a href="http://www.greenplum.com/products/greenplum-dca">EMC Greenplum has evolved its appliance product line</a>. As I read that, the latest announcement boils down to saying that you can neatly network together various Greenplum appliances in quarter-rack increments. If you take a quarter rack each of four different things, then Greenplum says &#8220;Hooray! Our appliance is all-in-one!&#8221; Big whoop.</p>
<p>2. That said, the Hadoop part of EMC &#8216;s story is based on MapR, which so far as I can tell is actually a pretty good Hadoop implementation. More precisely, MapR makes strong claims about performance and so on, and Apache Hadoop folks don&#8217;t reply &#8220;MapR is full of &amp;#$!&#8221; Rather, they say &#8220;We&#8217;re going to close the gap with MapR a lot faster than the MapR folks like to think &#8212; and by the way, guys, thanks for the butt-kick.&#8221; A lot more precision about MapR may be found in this <a href="http://www.slideshare.net/mcsrivas/design-scale-and-performance-of-maprs-distribution-for-hadoop">M. C. Srivas SlideShare</a>.</p>
<p>3. On its latest earnings call, Oracle clearly <a href="http://seekingalpha.com/article/294885-oracle-s-ceo-discusses-q1-2012-results-earnings-call-transcript?part=qanda">said it would introduce a Hadoop appliance</a>, versus just <a href="../../../../../2011/06/24/forthcoming-oracle-appliances/">hinting at a Hadoop appliance</a> the prior quarter. The money quote was:  <span id="more-5341"></span></p>
<blockquote><p>Finally, big data or the searching of large amounts of data using Hadoop. After Hadoop finishes filtering the data, the place you want to put that data is an Oracle Database, and that&#8217;s what a lot of our customers are doing. And we are exploiting the trend, the big data technology and the big data trend, if you prefer, by building a Hadoop appliance that attaches to the Oracle Exadata database or any Oracle Database for that matter. But you don&#8217;t have to buy our Hadoop appliance if you can use whatever servers you want running Hadoop, and we provide the interface between Hadoop and the Oracle Database.</p></blockquote>
<p>In other words, Oracle is saying &#8220;We&#8217;d like to sell you a Hadoop appliance, but you can run Hadoop in some other way and we&#8217;ll coexist with it just fine.&#8221; That makes sense; refusing to coexist with Hadoop is not exactly a realistic option.</p>
<p>4. Back in June, I expressed <a href="../../../../../2011/06/02/why-you-would-want-an-appliance-and-when-you-wouldnt/">great skepticism about the idea of a Hadoop appliance</a>. There was at least partial pushback in the comment thread from both Amr Awadallah and Eric Baldeschwieler. Oops.</p>
<p>Their reasoning seems to be centered around matters of installation, administration, and general packaging.</p>
<p>5. A month ago I noted aggressive near-term plans for <a href="../../../../../2011/08/21/hadoop-evolution/">Apache Hadoop evolution</a>. As noted above, one reason this is needed is competition from folks like MapR. Also, I note that:</p>
<ul>
<li>Three years ago, Oliver Ratzesberger&#8217;s group at eBay complained that <a href="../../../../../2008/10/15/ebay-doesnt-love-mapreduce/">CPU utilization running Hadoop was at 18%</a>.</li>
<li><a href="../../../../../2011/08/21/hadoop-evolution/#comment-241679">Now Oliver uses a figure of 10-15%.</a>, and attributes an even lower figure to &#8212; I&#8217;m guessing here &#8212; Yahoo. (Another possibility might be Facebook.)</li>
<li>In between eBay became one of the biggest and most prominent users of Hadoop.</li>
</ul>
<p>The moral of eBay&#8217;s Hadoop adventures, as I see it, is neither &#8220;Hadoop sucks!&#8221; nor &#8220;Hadoop doesn&#8217;t suck!&#8221;; rather, it&#8217;s that there&#8217;s a lot of scope for Hadoop to operate differently in the future than it does today.</p>
<p><em>Similarly, whatever throughput Yahoo does or doesn&#8217;t get, it clearly has adopted Hadoop at the expense of the <a href="../../../../../2008/05/29/yahoo-scales-web-analytics-database-petabyte/">columnar-in-Postgres</a> system it previously was so proud of.</em></p>
<p>Also, there has been a claim going around that &#8212; notwithstanding NameNode&#8217;s status as a single point of Hadoop failure &#8212;  no Hadoop installation has ever lost data due to a NameNode failure. The folks at MapR beg to differ, and sent over <a href="https://issues.apache.org/jira/browse/HDFS-1539">some</a> <a href="http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201107.mbox/%3CCAFUA3X2R_wH9GGGseUVSXVNVZQ+dBjZKDn0_pmDO8U31C05tMw@mail.gmail.com%3E">links</a> that sure seem to say the opposite.</p>
<p>6. Since we&#8217;ve just established that Hadoop will change, rapidly and pretty fundamentally, what exactly is the benefit of an appliance that is &#8220;balanced&#8221; for Hadoop usage today?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/23/hadoop-appliances/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>DataStax pivots back to its original strategy</title>
		<link>http://www.dbms2.com/2011/09/22/datastax-pivots-back-to-its-original-strategy/</link>
		<comments>http://www.dbms2.com/2011/09/22/datastax-pivots-back-to-its-original-strategy/#comments</comments>
		<pubDate>Thu, 22 Sep 2011 23:23:12 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[DataStax]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Open source]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5331</guid>
		<description><![CDATA[The DataStax and Cassandra stories are somewhat confusing. Unfortunately, DataStax chose to clarify them in what has turned out to be a crazy news week. I&#8217;m going to use this post just to report on the status of the DataStax product line, without going into any analysis beyond that. Pro tip: If you choose to [...]]]></description>
			<content:encoded><![CDATA[<p>The DataStax and Cassandra stories are somewhat confusing. Unfortunately, DataStax chose to clarify them in what has turned out to be a crazy news week. I&#8217;m going to use this post just to report on the status of the DataStax product line, without going into any analysis beyond that.</p>
<p><span id="more-5331"></span><em>Pro tip: If you choose to announce at a conference where many other vendors will surely announce news also, you naturally run the risk of not garnering much attention.</em></p>
<p>For starters, it may help to realize or recall that:</p>
<ul>
<li><a href="http://www.dbms2.com/2008/07/21/project-cassandra-facebook-open-sourced-quasi-dbms/">Cassandra was originally developed and revealed at Facebook</a>, to much early NoSQL fanfare. Facebook later backed away from Cassandra use.</li>
<li>Rackspace guys in Texas became Cassandra&#8217;s biggest backers. They eventually founded a company called Riptano to commercialize Cassandra.</li>
<li>Texas company Riptano became the California company DataStax.</li>
<li>DataStax came out with a <a href="http://www.dbms2.com/2011/03/23/datastax-cassandrafs-hadoop-brisk/">Hadoop-on-Cassandra offering called Brisk</a>. For a while, it sounded as if Hadoop was as big a focus for DataStax as Cassandra is.</li>
<li>DataStax is now recommitted to being <strong>the Cassandra company,</strong> and has accordingly backed away from Hadoop and Brisk as a separate or coequal focus. However, it sees Hadoop capability as a nice, or even major, feature of its Cassandra-centric offering.</li>
<li>To finalize its open source obligations with respect to Brisk, DataStax is in essence:
<ul>
<li>Donating a Hive driver for Cassandra straight into the main Apache Cassandra project.</li>
<li>Releasing the rest of Brisk as a separate open source project.</li>
<li>Disclaiming interest in further advancing open source Brisk.</li>
</ul>
</li>
<li>There&#8217;s also something called Solandra &#8212; evidently SOLR-on-Cassandra &#8212; whose status is similar to Brisk&#8217;s.</li>
<li>There are three main ways that DataStax helps you to consume Cassandra.
<ul>
<li>DataStax is the principal sponsor of Apache Cassandra development, and presumably long will be.<strong> Apache Cassandra </strong>is both<strong> free-like-speech and free-like-beer.</strong></li>
<li>DataStax is also introducing a paid-subscription version of Cassandra called DataStax Enterprise, which features proprietary code, support, and so on. <strong>DataStax Enterprise </strong>is <strong>neither free-like-speech nor free-like-beer.</strong></li>
<li>There will also be something called DataStax Community Edition. <strong>DataStax Community Edition </strong>is<strong> free-like-beer, </strong>but<strong> not free-like-speech.</strong></li>
</ul>
</li>
</ul>
<p>Various posts on the <a href="http://www.datastax.com/blog">DataStax blog</a> give DataStax&#8217;s explanation of what it&#8217;s doing. Ben Werther, the ex-Greenplum guy who briefly worked at DataStax and was most associated with telling the Hadoop/Brisk story, has moved on to his own startup Platfora.</p>
<p>DataStax Enterprise has three main aspects:</p>
<ul>
<li><strong>DataStax Server,</strong> which is the actual database and analytics code. At this time, there is little closed-source code in DataStax Server, but DataStax reserves the right to widen that gap in the future.</li>
<li><strong>DataStax OpsCenter,</strong> which is management tools around DataStax Server. DataStax OpsCenter is entirely closed-source, even though DataStax gives a limited version away for free.</li>
<li><strong>Support.</strong></li>
</ul>
<p>To describe DataStax Community Edition, I&#8217;ll just quote the press release verbatim, which characterizes it as:</p>
<blockquote><p>&#8230; a free platform based on Apache Cassandra that bundles the open source database with smart installers, drivers and connectors for popular development languages, demo apps, documentation, and a free version of DataStax OpsCenter for Apache Cassandra.</p></blockquote>
<p>DataStax Community Edition is crippleware only in terms of feature set; there are no limitations on its database size, cluster size, or usage rights. A core mission of DataStax Community Edition is to create happy Cassandra users, who may then become customers for DataStax Enterprise.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/22/datastax-pivots-back-to-its-original-strategy/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

