<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Data mart outsourcing</title>
	<atom:link href="http://www.dbms2.com/category/analytics-technologies/data-mart-warehouse-outsourcing/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Wed, 08 Feb 2012 22:51:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Comments on the analytic DBMS industry and Gartner&#8217;s Magic Quadrant for same</title>
		<link>http://www.dbms2.com/2012/02/08/gartner-magic-quadrant-data-warehouse-2011-2012/</link>
		<comments>http://www.dbms2.com/2012/02/08/gartner-magic-quadrant-data-warehouse-2011-2012/#comments</comments>
		<pubDate>Wed, 08 Feb 2012 17:17:32 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Exasol]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Kognitio]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[illuminate Solutions]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5926</guid>
		<description><![CDATA[This year&#8217;s Gartner Magic Quadrant for Data Warehouse Database Management Systems is out.* I shall now comment, just as I did on the 2010, 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants, to varying extents. To frame the discussion, let me start by saying: In general, I regard Gartner Magic [...]]]></description>
			<content:encoded><![CDATA[<p>This year&#8217;s Gartner Magic Quadrant for Data Warehouse Database Management Systems is out.* I shall now comment, just as I did on the <a href="http://www.dbms2.com/2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/">2010</a>, <a href="../../../../../2010/02/10/gartner-magic-quadrant-data-warehouse-2009-2010/">2009</a>, <a href="../../../../../2009/01/12/gartners-2008-data-warehouse-database-management-system-magic-quadrant-is-out/">2008</a>, <a href="../../../../../2007/10/19/gartner-2007-magic-quadrant-for-data-warehouse-database-management-systems/">2007</a>, and <a href="../../../../../2006/10/03/vendor-segmentation-for-data-warehouse-dbms/">2006</a> Gartner Data Warehouse Database Management System Magic Quadrants, to varying extents. To frame the discussion, let me start by saying:</p>
<ul>
<li>In general, I regard Gartner Magic Quadrants as a bad use of good research.</li>
<li>Illustrating the uselessness of &#8212; or at least poor execution on &#8212; the  overall quadrant metaphor, a large majority of the vendors covered are  lined up near the line x = y, each outpacing the one below in both of  the quadrant&#8217;s dimensions.</li>
<li>I find fewer specifics to disagree with in this Gartner Magic Quadrant than in previous year&#8217;s versions. Two factors jump to mind as possible reasons:
<ul>
<li>This year&#8217;s Gartner Magic Quadrant for Data Warehouse Database Management Systems is somewhat less ambitious than others; while it gives as much company detail as its predecessors, it doesn&#8217;t add as much discussion of overall trends. So there&#8217;s less to (potentially) disagree with.</li>
<li><a href="http://www.dbms2.com/2010/12/28/evolving-definitions-and-technology-categories-for-2011/">Merv Adrian is now at Gartner</a>.</li>
</ul>
</li>
<li>Whatever the problems may be with Gartner&#8217;s approach, the whole thing comes out better than do <a href="http://www.dbms2.com/2011/02/11/comments-on-the-2011-forrester-wave-for-enterprise-data-warehouse-platforms/">Forrester&#8217;s failed imitations</a>.</li>
</ul>
<p><em>*At the time of this posting, I don&#8217;t yet have a link. However, I expect that to change quickly, and I plan to edit this paragraph accordingly. If nothing else, I hope people will drop links into the comment thread. </em></p>
<p>Specific company comments, roughly in line with Gartner&#8217;s rough single-dimensional rank ordering, include: <span id="more-5926"></span></p>
<ul>
<li>The Gartner Magic Quadrant&#8217;s comments on Teradata seem pretty fair. I don&#8217;t think I&#8217;m much in disagreement when I say:
<ul>
<li>Teradata has the richest, most mature analytic DBMS offering.</li>
<li>Teradata has an outstanding track record both for <a href="http://www.dbms2.com/2011/09/24/confusion-about-teradatas-big-customers/">managing large data volumes</a> and for high-concurrency mixed workloads.</li>
<li>Aster Data was a cool Teradata acquisition, even if Teradata/Aster synergies or integration have been nominal to date.</li>
<li>Teradata still needs to get out of its own way in marketing, positioning, packaging, and/or defining its premium-priced system vs. its more moderately-priced alternatives. Indeed, as necessary as this approach may have been to fending off encroachments by Netezza and others, what Teradata really needs to do is evolve to a more pick-your-own-node-combination mix-match kind of offering.</li>
</ul>
</li>
<li>Gartner has talked with a lot of Oracle Exadata users who say that the product works; Gartner has also stopped beating Oracle up for <a href="http://www.dbms2.com/2010/06/14/best-practices-analytic-database-poc/">its previous policy of almost never doing onsite POCs (Proofs of Concept)</a>; both parts of that ring true with me. But Gartner also rightly dings Oracle for various issues in cost and cumbersomeness. Overall, while I agree there are organizations for which Oracle should indeed be a top-ranked choice, there are many others who shouldn&#8217;t put Oracle on their short list.</li>
<li>Third in the Gartner MQ rankings is IBM.
<ul>
<li>Gartner gets so caught up in reciting the names of various IBM product offerings that it neglects to say much good about DB2 itself. (I tend to have a similar problem.)</li>
<li>But Gartner does mention concurrency as a strength. I agree, especially if we presume that that was a reference to DB2 rather than Netezza.</li>
<li>Gartner cites Netezza&#8217;s post-acquisition annual growth rate as 30%. Gartner seems to think this is a good number. I disagree, but in Netezza&#8217;s defense, it has had to endure IBM&#8217;s post-acquisition on-boarding process.</li>
</ul>
</li>
<li>Arguably fourth in the Gartner Data Warehouse Magic Quadrant rankings is EMC/Greenplum.
<ul>
<li>In general, Gartner likes the taste of Greenplum Kool-Aid.</li>
<li>Gartner neglects to ding Greenplum for concurrency challenges, which I view as an oversight given Gartner&#8217;s general stress on that area.</li>
<li>Gartner does ding Greenplum for support challenges.</li>
<li>Gartner neglects to praise Greenplum for true <a href="http://www.dbms2.com/2009/10/14/greenplum-hybrid-columnar/">hybrid row/columnar data management</a>, a feature shared by <a href="http://www.dbms2.com/2011/09/22/teradata-columnar-compression/">Teradata</a> and <a href="http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/">Vertica</a>, among others, but not by <a href="http://www.dbms2.com/2011/02/06/columnar-compression-database-storage/">Oracle</a>, DB2, or Netezza.</li>
<li>Gartner located a half-petabyte Greenplum database. This doesn&#8217;t surprise me, even though Greenplum has frequently made exaggerated claims about large-size database successes in the past.</li>
<li>Gartner reports a &gt;400 figure for Greenplum customers, which is plausible.</li>
</ul>
</li>
<li>In its first deviation from strict one-dimensional rank ordering, the Gartner Magic Quadrant ranks Sybase ahead of Greenplum in completeness of vision but behind in &#8220;ability to execute&#8221;.
<ul>
<li>If that were the other way around, it might make more sense. Greenplum promises anything and everything you might ever want for analytic data management or the associated analysis; but Sybase has vastly more analytic DBMS users than Greenplum does, running a variety of demanding workloads.</li>
<li>Gartner appears to think that Sybase IQ requires less database administration than I do.</li>
<li>Gartner seems concerned that SAP will position HANA and Sybase ASE as, between them, the only DBMS you&#8217;ll ever need, casting doubt on Sybase IQ&#8217;s future. I wouldn&#8217;t worry about that if you have a problem you want to solve today.</li>
</ul>
</li>
<li>The Gartner Magic Quadrant for Data Warehouse Database Management Systems ranks Microsoft sixth overall, despite noting that there isn&#8217;t a single production reference for Microsoft&#8217;s Parallel Data Warehouse. In support of this ranking, it for example cites the compression feature, which distinguishes Microsoft SQL Server from no other product on the list except Kognitio. If you have such an undemanding data warehousing problem that many different analytic DBMS could meet your needs, there&#8217;s a good chance Microsoft SQL Server can also do the job; and if you&#8217;ve bought into the Microsoft technology stack, you might as well keep going down that path. Otherwise, I don&#8217;t know why somebody should adopt Microsoft&#8217;s offering at this time.</li>
<li>Seventh along the main diagonal path in the Gartner Magic Quadrant is HP Vertica. I&#8217;d rank Vertica higher than that, but in fairness I note two execution concerns. First, HP has a lousy track record, both in acquisitions and in data warehousing/analytics. Second, Vertica is bad about answering my email. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Anyhow, Gartner doesn&#8217;t seem to have given Vertica credit either for <a href="http://www.dbms2.com/2011/06/20/columnar-dbms-vendor-customer-metrics/">its full customer count or for the multiple petabyte-scale databases Vertica runs</a>.</li>
<li>1010data is an outlier, with Gartner noting that it only partly fits in with other &#8220;Data Warehousing Database Management&#8221; companies, and hence kind of confessing that 1010data on the Magic Quadrant is somewhat arbitrary. Stuff like that is bound to happen, given <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">the inherent difficulties of defining market categories</a>. Anyhow, my thoughts on 1010data include:
<ul>
<li>I&#8217;m nervous about the fact that 1010data doesn&#8217;t actually control its own DBMS technology, but rather relies on old code from the small private company KX Systems.</li>
</ul>
<ul>
<li> There are three main reasons to consider 1010data:
<ul>
<li>You want to enter the data mart outsourcing business in a casual way, and you like its SaaS offering.</li>
<li>You want to engage in <a href="http://www.dbms2.com/2010/05/15/stakeholder-facing-analytics/">stakeholder-facing analytics</a> in a casual way, and you like its SaaS offering.</li>
<li>You love 1010data&#8217;s particular set of interactive analytic features and performance.</li>
</ul>
</li>
</ul>
</li>
<li>Back to the main path winding along the Gartner Magic Quadrant main diagonal &#8212; next up is ParAccel. While I question some of the peripheral comments, I agree with Gartner&#8217;s core messages that:
<ul>
<li>ParAccel, the product, is blazingly fast in certain use cases.</li>
<li>ParAccel, the company, is dangerously small.</li>
</ul>
</li>
<li>Eighth on the Gartner MQ&#8217;s main path is Kognitio. This is too high. Kognitio positions itself as offering in-memory DBMS, yet stubbornly refuses to do any kind of data compression. That&#8217;s an awful combination of choices. As for using Kognitio&#8217;s data warehousing SaaS offering &#8212; why would you do that, when more modern products are available on a SaaS/cloud basis as well?</li>
<li>Ninth in the Gartner Magic Quadrant main rankings is SAND.
<ul>
<li>The SAND section is not a triumph of Gartner accuracy. For example:
<ul>
<li><a href="http://www.dbms2.com/2011/11/12/clarifying-sands-customer-metrics-positioning-and-technical-story/">Gartner completely missed the errors in SAND&#8217;s reported customer counts</a>.</li>
<li>Gartner refers to SAND as being &#8220;in existence for approximately nine years&#8221;, which is too low by at least a factor of 2.</li>
<li>Gartner says &#8220;SAND is a privately held company&#8221;, even though <a href="http://itmarketstrategy.com/2009/06/07/sand-technology-a-risky-bet/">Merv knows better than that</a>.</li>
</ul>
</li>
<li>Otherwise, Gartner&#8217;s opinion on SAND seems to boil down to &#8220;Interesting technology and ideas, but dangerously small company.&#8221; I agree.</li>
</ul>
</li>
<li>Tenth and too low in the Gartner MQ main rankings is Infobright.
<ul>
<li>At least by some metrics (e.g. customer count), Infobright isn&#8217;t as dangerously small as ParAccel, SAND, Kognitio, et al.</li>
<li>That said, Infobright is small and focused on <a href="http://www.dbms2.com/2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a>. So I wouldn&#8217;t be confident in Infobright&#8217;s future technology path for human-generated data use cases.</li>
<li>Infobright&#8217;s performance is uneven &#8212; blazing in cases where the Knowledge Grid helps, but not necessarily stellar by analytic DBMS standards when full table scans are called for.</li>
<li>I agree with Gartner that the possibility of Oracle/MySQL future shenanigans is a concern. But while the energy behind MySQL forking efforts doesn&#8217;t seem too great right now, I&#8217;d expect them to revive and offer a successful escape path if it seemed Oracle was going to indeed play hardball.</li>
<li>Also, given that it&#8217;s already an open source vendor, there are various kinds of assurances Infobright could give that would also help alleviate customer concerns.</li>
</ul>
</li>
<li>Actian, formerly Ingres, took a big tumble in Gartner&#8217;s rankings versus last year, when I simply wrote &#8220;<a href="http://www.dbms2.com/2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/">What Gartner said in connection with <strong>Ingres</strong> is too inaccurate to deserve detailed attention</a>.&#8221; I&#8217;m even a little harsher about <a href="http://www.dbms2.com/2011/09/25/ingres-actian/">Ingres/Actian&#8217;s DBMS products and prospects</a> than Gartner is, but at least now we&#8217;re in the same ballpark.</li>
<li>Along with Infobright, ParAccel, and SAND, <a href="http://www.dbms2.com/2011/11/12/exasol-update/">Exasol</a> appears to be another of the &#8220;good columnar technology/small company&#8221; crowd. As with other such products, one should be careful about fit-and-finish features that are missing today, as there is no assurance they&#8217;ll be added in a timely manner going forward.</li>
<li>illuminate Solutions, which was on last year&#8217;s Gartner list, <a href="http://www.dbms2.com/2012/01/16/has-illuminate-solutions-joined-the-choir-invisible/">now appears to be an ex-company</a>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/08/gartner-magic-quadrant-data-warehouse-2011-2012/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Departmental analytics &#8212; best practices</title>
		<link>http://www.dbms2.com/2012/01/25/departmental-analytics-best-practices/</link>
		<comments>http://www.dbms2.com/2012/01/25/departmental-analytics-best-practices/#comments</comments>
		<pubDate>Wed, 25 Jan 2012 16:47:59 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5867</guid>
		<description><![CDATA[I believe IT departments should support and encourage departmental analytics efforts, where &#8220;support&#8221; and &#8220;encourage&#8221; are not synonyms for &#8220;control&#8221;, &#8220;dominate&#8221;, &#8220;overwhelm&#8221;, or even &#8220;tame&#8221;. A big part of that is: Let, and indeed help, departments have the data they want, when they want it, served with blazing performance. Three things that absolutely should NOT [...]]]></description>
			<content:encoded><![CDATA[<p><a href="../../../../../2012/01/23/departmental-analytics-general-observations/">I believe IT departments should support and encourage departmental analytics efforts</a>, where &#8220;support&#8221; and &#8220;encourage&#8221; are not synonyms for &#8220;control&#8221;, &#8220;dominate&#8221;, &#8220;overwhelm&#8221;, or even &#8220;tame&#8221;. A big part of that is:<br />
<strong>Let, and indeed help, departments have the data they want, when they want it, served with blazing performance.</strong></p>
<p>Three things that absolutely should NOT be obstacles to these ends are:</p>
<ul>
<li>Corporate DBMS standards.</li>
<li>Corporate data governance processes.</li>
<li>The difficulties of ETL.</li>
</ul>
<p><span id="more-5867"></span>Reasons they shouldn&#8217;t or don&#8217;t need to be obstacles include:</p>
<ul>
<li>Analytic DBMS are often vastly more cost-effective than general-purpose ones.</li>
<li>In particular, analytic DBMS are often much easier to install and manage than general-purpose ones.</li>
<li>Heavy data governance bureaucracy is often unnecessary because:
<ul>
<li>The department should know what the limitations on the data&#8217;s accuracy are.</li>
<li>The department should know how much data accuracy is required.</li>
<li>The side-effects on other departments of any data inaccuracy would be minimal.</li>
</ul>
</li>
<li>There are multiple good schemes for populating data marts, managed by cost-effective analytic DBMS, with data from integrated data warehouses.
<ul>
<li>ELT (Extract/Load/Transform) almost always works, because data cleaning/data quality was handled at or before the IDW level, and because the analytic DBMS has the processing power to pull it off.</li>
<li>ETL (Extract/Transform/Load) should be easy as well. (If isn&#8217;t, something may be lacking in your ETL set-up.)</li>
<li>Analytic DBMS are increasingly adding capabilities for easy spin-out of real or virtual data marts. Other kinds of technology (e.g. virtualization) are having their database spin-out capabilities upgraded as well.</li>
</ul>
</li>
</ul>
<p>One point to remember in support of departmental autonomy <strong>is that departments&#8217; views of what data to use may be more expansive than central IT&#8217;s.</strong> One reason is that important data may be external to the company, outside IT&#8217;s natural realm  of concern. Examples of this include but are hardly limited to:</p>
<ul>
<li>Anything like &#8220;market data&#8221;.</li>
<li>Anything like &#8220;sentiment analysis&#8221;.</li>
<li>Data owned by supply chain partners.</li>
</ul>
<p>Further, even the more innovative internal data sources are commonly departmental, for example various kinds of multi-structured data (text verbatims from customers, log file data, and so on).</p>
<p>Whatever is true of data management (and ETL) is true for metadata management, even if it&#8217;s done by some kind of business intelligence tool. What I mean by that is:</p>
<ul>
<li><strong>Whoever manages data is also responsible for ingesting and emitting it &#8230;</strong></li>
<li>&#8230; and specifically for emitting it in<strong> understandable, well-organized, well-named formats, &#8230;</strong></li>
<li><strong>&#8230; </strong>so that <strong>departments can take responsibility for</strong> what amounts to <strong>lightweight analytic application development.</strong></li>
</ul>
<p>As for the &#8220;application development&#8221; itself, I&#8217;m envisioning at least three things:</p>
<ul>
<li>Math.</li>
<li>Sophisticated relational query.</li>
<li>Data visualization.</li>
</ul>
<p>I.e., I&#8217;m talking about what &#8220;analysts&#8221; and &#8220;quants&#8221; do. So to put the point even more simply:</p>
<ul>
<li><strong>Analysts and quants should be able to consume data that&#8217;s organized in a friendly manner.</strong></li>
<li><strong>Central IT should be friendly in how it serves data.</strong></li>
</ul>
<p>One corollary of this approach is that departments should try to adhere to corporate BI standards, at least for routine dashboard and reporting. Indeed, if a department brings in a business intelligence tool different from the corporate standard, there are three main possibilities:</p>
<ul>
<li>The tool is integrated with something else it makes sense to bring in, such as a third-party data supply or application.</li>
<li>The tool has an important capability the corporate standard doesn&#8217;t have, such as more flexible visualization and drilldown.</li>
<li>Central IT screwed up, making things much more difficult than they needed to be.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/01/25/departmental-analytics-best-practices/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Clarifying SAND&#8217;s customer metrics, positioning and technical story</title>
		<link>http://www.dbms2.com/2011/11/12/clarifying-sands-customer-metrics-positioning-and-technical-story/</link>
		<comments>http://www.dbms2.com/2011/11/12/clarifying-sands-customer-metrics-positioning-and-technical-story/#comments</comments>
		<pubDate>Sun, 13 Nov 2011 02:45:36 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5669</guid>
		<description><![CDATA[Talking with my clients at SAND can be confusing. That said: I need to revise my figures for SAND&#8217;s customer count way downward. SAND finally has a reasonably clear positioning. SAND&#8217;s product actually seems to have a lot of features. A few months ago, I wrote: SAND Technology reported &#62;600 total customers, including &#62;100 direct. [...]]]></description>
			<content:encoded><![CDATA[<p>Talking with my clients at SAND can be confusing. That said:</p>
<ul>
<li>I need to revise my figures for SAND&#8217;s customer count way downward.</li>
<li>SAND finally has a reasonably clear positioning.</li>
<li>SAND&#8217;s product actually seems to have a lot of features.</li>
</ul>
<p>A few months ago, I wrote:</p>
<blockquote><p>SAND Technology reported &gt;600 total customers, including &gt;100 direct.</p></blockquote>
<p>Upon talking with the company, I need to revise that figure downward, from &gt; 600 to 15.</p>
<p><span id="more-5669"></span><em>One embarrassing point: SAND is a client, and I view it as part of my job to save clients from that kind of inadvertent misstatement.</em></p>
<p>It turns out that SAND has a very impressive customer &#8212; Dunnhumby, a data mart outsourcer with 200 terabytes of data in SAND, 30 or so incoming data streams, 400 or so nodes &#8230; and 600 or so end customers, all of which SAND was counting as OEM end customers for its DBMS. But I, other industry observers, and other vendors generally don&#8217;t count that way.</p>
<p>Besides Dunnhumby, SAND has 14 other customers on maintenance, with &lt; 1 terabyte of data each. Until recently, SAND had a couple dozen more customers than that, but it <a href="http://www.sand.com/sand-technology-announces-sale-sap-ilm-product-line/">sold its SAP-oriented archiving/near-line storage product line to Informatica</a>.</p>
<p>I still don&#8217;t know where the &#8220;&gt; 100 direct&#8221; part came from.</p>
<p>After the sale of its other product line, SAND is squarely in the market for analytic DBMS. SAND&#8217;s sales efforts seem to be focused on <a href="http://www.dbms2.com/2011/03/03/investigative-analytics/">investigative analytics</a>, although some of its existing users seem to be more focused on <a href="http://www.dbms2.com/2011/11/08/terminology-operational-analytics/">operational analytics</a>. Most specifically, SAND is trying to focus on &#8220;people data&#8221; &#8212; customer loyalty, health care, etc . &#8212; rather than purely <a href="http://www.dbms2.com/2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a>, with the paradigmatic target application being personalized marketing.</p>
<p>SAND technical highlights include:</p>
<ul>
<li>SAND sells a columnar analytic DBMS.</li>
<li>The SAND DBMS operates on bitmaps, with heavy use of run-length encoding on the bitmaps. Bitmaps are used for everything except BLOBs (Binary Large OBjects).</li>
<li>Actual data compression also comes into play, e.g. as result sets are being assembled. This is based on a true global dictionary &#8212; multiple columns are tokenized together.</li>
<li>Indeed, SAND can decompose columns and tokenize their parts (e.g. time stamps).</li>
<li>SAND&#8217;s workload management sees RAM and CPU, but not explicitly I/O.</li>
<li>SAND lets you pin certain tables or even table segments in RAM if you want to.</li>
</ul>
<p>SAND&#8217;s update story is straightforward &#8212; when data comes in, all the columns and bitmaps are updated as needed. Still, since SAND is columnar, you wouldn&#8217;t expect true updates in place, and you&#8217;d be right. Rather, there&#8217;s a story with MVCC (MultiVersion Concurrency Control) and garbage collection, lock-free. The MVCC is also exploited for a kind of time travel, and further for some kind of virtual data mart capability.</p>
<p>SAND&#8217;s parallelization story is a bit complicated.</p>
<ul>
<li>SAND has, or at least has the potential for, <a href="../../../../../2008/09/05/mpp-data-warehouse-nodes/">node specialization</a>, with database and storage nodes being different.</li>
<li>In principle, disks are specific to storage nodes, and it&#8217;s a configuration option as to whether a database node sees one, some, or all storage nodes.</li>
<li>In practice, only Dunnhumby among SAND&#8217;s customers operates on other than a shared-disk basis. Dunnhumby&#8217;s configuration is mixed/matched among various SAND sharing options.</li>
</ul>
<p>SAND is proud of its PMML (Predictive Modeling Markup Language) scoring capabilities, but otherwise hasn&#8217;t shipped much in the way of <a href="../../../../../2011/02/24/analytic-platforms/">analytic platform</a> capabilities. That said, work is underway on a user-defined table function capability that can also query external tables, fire off MapReduce jobs, and so on, under the code name UQL.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/12/clarifying-sands-customer-metrics-positioning-and-technical-story/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Eight kinds of analytic database (Part 2)</title>
		<link>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/</link>
		<comments>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 08:18:18 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Buying processes]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data types]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MOLAP]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Rainstor]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[SenSage]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4867</guid>
		<description><![CDATA[In Part 1 of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I&#8217;ll cover four more kinds of analytic database &#8212; even newer, for the most part, with a use case/product short list [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/">Part 1</a> of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I&#8217;ll cover four more kinds of analytic database &#8212; even newer, for the most part, with a use case/product short list match that is even less clear.  <span id="more-4867"></span></p>
<p><strong><em>Bit bucket</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included: </em>Logs, other technical/external</li>
<li><em>Likely use styles:</em> Staging/ETL, investigative</li>
<li><em>Canonical example: </em>Log files in a Hadoop cluster<em> </em></li>
<li><em>Stresses:</em> TCO, scale-out, transform/big-query performance, ETL functionality</li>
</ul>
<p>With the explosion of <a href="../../../../../2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a> has come the need for a place to put it all, sometimes called the <a href="../../../../../2011/06/04/dirty-data-stored-dirt-cheap/">big bit bucket</a>. This is like the investigative data mart for big databases, but more <a href="../../../../../2011/05/17/poly-structured-database/">poly-structured</a>. In some cases it is focused on data staging and transformation; but it can also be used for analysis in place.</p>
<p>The list of candidate technologies to run your bit bucket starts with Hadoop and Splunk.</p>
<p><strong><em>Archival data store</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included: </em>Operational, CDR (call detail record), security log</li>
<li><em>Likely use styles:</em> Archival, reporting (for compliance), possibly also investigative</li>
<li><em>Examples:</em> Any long-term detailed historical store</li>
<li><em>Stresses: </em>TCO, compression, scale-out, performance (if multi-use)<em> </em></li>
</ul>
<p><em> </em></p>
<p>Analytic DBMS vendors have been insulting each other with the claim &#8220;that&#8217;s just an archival data store,&#8221; dating back at least to the first time Greenplum was deployed on an underpowered Sun Thumper system. Perhaps only <a href="../../../../../2010/06/11/rainstor-update/">Rainstor</a> truly embraces the archival positioning, and I&#8217;ve become pretty dubious about their technical claims and their company alike.</p>
<p>Still, there&#8217;s a legitimate need for data stores &#8212; especially relational analytic DBMS that:</p>
<ul>
<li>Store data cheaply, with high rates of compression.</li>
<li>Have decent performance if you do want to query the data.</li>
<li>May have archiving/compliance-specific features as well.</li>
</ul>
<p>Along with Rainstor, SAND and SenSage have at least partially targeted that use case. In addition, appliance vendors such as Teradata and Netezza try to have an archive-oriented product version in their lineups.</p>
<p><strong><em>Outsourced data mart</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All</li>
<li><em>Likely use styles:</em> Traditional BI, investigative analytics, staging/ETL</li>
<li><em>Examples:</em> Advertising tracking, SaaS CRM</li>
<li><em>Stresses:</em> Performance, TCO, reliability, concurrency</li>
</ul>
<p>Much of what happens in analytic database management can also be outsourced. Some applications that run via SaaS (Software as a Service) are analytic. I&#8217;ve had three different clients whose main business is picking marketing targets in various vertical segments; others who wanted to add analytics to what were historically OLTP applications; and others yet who just offered online business intelligence. Also, if your fundamental business is gathering data and reselling it to a variety of user organizations, that&#8217;s an analytic data management challenge. The possibilities expand from there.</p>
<p>Data outsourcers are in the IT business, and so their IT development is &#8212; hopefully! &#8212; more serious and less politically encumbered than at many conventional enterprises. Thus, legacy systems and master data management issues are commonly less prevalent, or at least more aggressively disposed of. The same, up to a point, goes for vendor politics.*  <a href="../../../../../2011/06/26/what-to-think-about-before-you-make-a-technology-decision/">Multitenancy</a> is commonly an issue, as is running in the cloud.<em> </em></p>
<p><em>*Even so, there&#8217;s often That Guy who doesn&#8217;t want to migrate away from Oracle, no matter what.<strong> </strong></em></p>
<p>Vertica gets the nod in a number of these cases; it&#8217;s cloud-friendly, and often the problem is naturally columnar. Other columnar products can be good choices too, with added brownie points for Infobright if the shop is MySQL-oriented anyway. Running Netezza or other appliances makes sense mainly if you&#8217;re pretty sure you want to keep operating your own data centers, but some data outsourcers are just fine with that assumption.</p>
<p><strong><em>Operational analytic(s) server</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> Customer-centric, log, financial trade</li>
<li><em>Likely use styles:</em> Advanced operational analytics</li>
<li><em>Examples:</em>
<ul>
<li>Lower latency: Web or call-center personalization, anti-fraud</li>
<li>Higher latency: Customer profiling, Basel 3 risk analysis</li>
</ul>
</li>
<li><em>Stresses:</em> Performance, reliability, analytic functionality, perhaps concurrency</li>
</ul>
<p>Even with eight different choices, I need a &#8220;catch-all&#8221; category; this is it.</p>
<p>Suppose you want to do reasonably sophisticated analytics, then use the results in operations. This is the classical challenge in <a href="../../../../../2011/03/30/short-request-and-analytic-processing/">integrating short-request and analytic processing</a>. There are multiple ways to tackle it, embodying different trade-offs in cost, convenience, or analytic accuracy. If the platform on which you want to run your investigative analytics also has the reliability and concurrency appropriate for mission-critical operations, you&#8217;re set. Otherwise, you may want to pipe <a href="../../../../../2010/11/29/data-that-is-derived-augmented-enhanced-adjusted-or-cooked/">derived data</a> into a more &#8220;industrial-strength&#8221; DBMS, ideally the one that runs your operational apps anyway</p>
<p>Another option is to integrate a limited amount of analytics immediately into your short-request processing system. For example, as bad as they are at the kinds of queries that require joins, NoSQL systems are often fast at simple aggregations. As MapReduce/NoSQL integrations mature, that option may not require pumping the data anywhere else for deeper analytics; even if it does, at least you&#8217;re starting out with the data in a convenient bit bucket.</p>
<p>Streaming/CEP-centric architectures could come into play as well. And it goes on from there. The possibilities in this last category are just too varied to generalize about.</p>
<p><em>So did I get them all? Or are there yet other analytic data management use cases that I don&#8217;t fit into my eight categories?</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>More on Sybase IQ, including Version 15.2</title>
		<link>http://www.dbms2.com/2010/05/23/sybase-iq-15/</link>
		<comments>http://www.dbms2.com/2010/05/23/sybase-iq-15/#comments</comments>
		<pubDate>Sun, 23 May 2010 08:34:28 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Application areas]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Text]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2186</guid>
		<description><![CDATA[Back in March, Sybase was kind enough to give me permission to post a slide deck about Sybase IQ. Well, I&#8217;m finally getting around to doing so. Highlights include but are not limited to: Slide 2 has some market success figures and so on. (&#62;3100 copies at &#62;1800 users, &#62;200 sales last year) Slides 6-11 [...]]]></description>
			<content:encoded><![CDATA[<p>Back in March, Sybase was kind enough to give me permission to post <a href="http://www.monash.com/uploads/Sybase-IQ-slides-March-2010.pdf">a slide deck about Sybase IQ</a>. Well, I&#8217;m finally getting around to doing so. Highlights include but are not limited to:</p>
<ul>
<li>Slide 2 has some market success figures and so on. (&gt;3100 copies at &gt;1800 users, &gt;200 sales last year)</li>
<li>Slides 6-11 give more detail on Sybase&#8217;s indexing and data access methods than I put into my recent <a href="http://www.dbms2.com/2010/05/17/technical-basics-of-sybase-iq/">technical basics of Sybase IQ</a> post.</li>
<li>Slide 16 reminds us that in-database data mining is quite competitive with what <a href="http://www.dbms2.com/2010/05/15/further-clarifying-in-database-mpp-sas/">SAS has actually delivered with its DBMS partners</a>, even if it doesn&#8217;t have the nice architectural approach of <a href="http://www.dbms2.com/2010/02/22/netezza-twinfin/">Aster or Netezza</a>. (I.e., Sybase IQ&#8217;s more-than-SQL advanced analytics story relies on C++ UDFs  &#8212; User Defined Functions &#8212; running in-process with the DBMS.) In particular, there&#8217;s a data mining/predictive analytics library &#8212; modeling and scoring both &#8212; licensed from a small third party.</li>
<li>A number of the other later slides also have quite a bit of technical crunch. (More on some of those points below too.)</li>
</ul>
<p>Sybase IQ may have a bit of a funky architecture (e.g., no MPP), but the age of the product and the substantial revenue it generates have allowed Sybase to put in a bunch of product features that newer vendors haven&#8217;t gotten around to yet.</p>
<p>More recently, Sybase volunteered permission for me to preannounce <strong>Sybase IQ Version 15.2</strong> by a few days (it&#8217;s scheduled to come out this week). <span id="more-2186"></span>Sybase IQ seems to be focused in large part on the government/intelligent market, with three major features being:</p>
<ul>
<li>A kind of <strong>data federation,</strong> querying external databases, that makes sense mainly in the context of rigorous security rules. (I find that confusing, since Sybase IQ&#8217;s indexes tend to hold all the information in the database, but I didn&#8217;t push the point.)</li>
<li>An upgrade to Sybase IQ&#8217;s built-in <strong>text indexing.</strong> I doubt anybody would confuse this with best-of-breed text search, but evidently that intelligence community is satisfied with less. But even before 15.2, Sybase IQ could do both LIKE and WHERE CONTAINS searching.</li>
<li>Improved LOB (Large OBject) management.</li>
</ul>
<p>One part of my Sybase IQ conversations I haven&#8217;t blogged yet in much details is <strong>scale-out, concurrency, </strong>and<strong> &#8220;multiplexing.&#8221;</strong></p>
<ul>
<li>Sybase feels that Sybase IQ&#8217;s competitive sweet spot, especially in terms of performance, is reached when there are 20 or more concurrent queries.</li>
<li>In general, Sybase asserts that a shared-everything architecture is great for concurrency &#8212; just run different queries on different boxes, all against the same data.</li>
<li>The ability to use a bunch of boxes run Sybase IQ is called &#8220;multiplexing.&#8221;  This is a chargeable option, without which one is limited to a single SMP box.</li>
<li>Just under 20% of the top 250 Sybase IQ customers have multi-node scale-out configuration (vs. single-node SMP scale-up). And around 8% have it overall.</li>
<li>Sybase IQ nodes can be heterogeneous (e.g., in compute power).</li>
<li>Sybase IQ nodes can be dedicated to be read-only, or can be read-write. Indeed, Sybase IQ nodes can change roles dynamically, for example becoming write-only during nightly batch load. (I didn&#8217;t clarify whether all this applies just to nodes-as-boxes, or if some parts apply to specific processors or cores within the same box.)</li>
<li>Sybase noted that data mart outsourcers can offer differentiated SLAs (Service Level Agreements) depending upon which nodes they give which customers access to.</li>
<li>Most Sybase IQ installations start at 8 cores or more. The Sybase IQ Small Business Edition, limited to 4 cores, is not a big seller.</li>
<li>Sybase IQ has a straightforward round-robin load-balancing story via third-party technology.</li>
</ul>
<p>Finally, along the way in the discussions I picked up various tidbits about the Sybase IQ user base. Unfortunately, Sybase is pretty vague in discussing database sizes &#8212; are they user data? Are they compressed? What do the numbers mean? With that huge caveat:</p>
<ul>
<li>By some metric or other, a couple of classified customers are approaching petabyte scale.</li>
<li>The largest commercial Sybase IQ customer &#8212; a credit card company &#8212; has a couple hundred terabytes or so.</li>
<li>The largest financial services Sybase IQ databases are 50-70 terabytes. This sounds low, frankly, so maybe those are compressed figures, with user data being 200+ terabytes. But I&#8217;m just speculating there.</li>
<li>Sybase IQ has a little less than 100 customers in the &#8220;data aggregator&#8221; market, which is a lot like what I call &#8220;data mart outsourcer.&#8221;</li>
<li><a href="http://www.dbms2.com/2009/08/25/sybase-iq-technical-highlights/">Sybase IQ&#8217;s ILM technology</a> is a chargeable option, with Sybase being &#8220;cautious&#8221; about sales. Compliance is a big market driver for it.</li>
<li>Sybase IQ&#8217;s #1 vertical market is financial services. Other biggies are government, telecom, marketing services, and to some extent retail.</li>
<li>As of February, there were 40-45 production users of Sybase IQ 15.0 and 15.1.</li>
</ul>
<p><!-- 		@page { margin: 0.79in } 		P { margin-bottom: 0.08in } --></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/23/sybase-iq-15/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Stakeholder-facing analytics</title>
		<link>http://www.dbms2.com/2010/05/15/stakeholder-facing-analytics/</link>
		<comments>http://www.dbms2.com/2010/05/15/stakeholder-facing-analytics/#comments</comments>
		<pubDate>Sat, 15 May 2010 07:58:05 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Fox and MySpace]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2149</guid>
		<description><![CDATA[There&#8217;s a point I keep making in speeches, and used to keep making in white papers, yet have almost never spelled out in this blog. Let me now (somewhat) correct the oversight. Analytic technology isn&#8217;t only for you. It&#8217;s also for your customers, citizens, and other stakeholders. I am not referring here to what is [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s a point I keep making in speeches, and used to keep making in white papers, yet have almost never spelled out in this blog. Let me now (somewhat) correct the oversight.</p>
<p><strong>Analytic technology isn&#8217;t only for you. It&#8217;s also for your customers, citizens, and other stakeholders.</strong></p>
<p>I am <strong>not</strong> referring here to what is well understood to be an important, fast-growing activity &#8212; providing data and its analysis to customers as your primary or only business &#8212; nor to the related business of taking people&#8217;s data, crunching it for them, and giving them results. That combined sector &#8212; which I am pretty alone in aggregating into one and calling <a href="http://www.dbms2.com/category/analytics-technologies/data-mart-warehouse-outsourcing/">data mart outsourcing</a> &#8212; is one of the top several vertical markets for a lot of the analytic DBMS vendors I write about. Rather, I&#8217;m talking about enterprises that gather data for some primary purpose, and have discovered that a good <strong>secondary</strong> use of the data is to reflect it back to stakeholders, often the same ones who provided or created it in the first place.</p>
<p>For now I&#8217;ll call this category <strong>stakeholder-facing analytics,</strong> as the shorter phrase &#8220;stakeholder analytics&#8221; would be ambiguous.* I first picked up the idea early this decade from Information Builders, for whom it had become something of a specialty. I&#8217;ve been asking analytics vendors for examples of stakeholder-facing analytics ever since, and a number have been able to comply. But the whole thing is in its early days even so; almost any sufficiently large enterprise should be more active in stakeholder-facing analytics than it currently is.<br />
<span id="more-2149"></span><br />
<em>*Comments as to what the category</em> should<em> be called are welcome below.</em></p>
<p>Examples of stakeholder-facing analytics include:</p>
<ul>
<li>Enterprises report back on the business customers do with them. For example:
<ul>
<li>Credit card companies provide reports on spending back to their credit card holders, especially small businesses.</li>
<li>So do office supply retailers.</li>
<li>Brokerage firms provide reporting back to their small-institution customers.</li>
</ul>
</li>
<li>Governments expose information to their citizens online.
<ul>
<li>In an early example, New York City restaurant ratings were put online.</li>
<li><a href="http://sec.gov/edgar/searchedgar/companysearch.html">Putting SEC filings online</a> has has been a huge success.</li>
<li>The Obama Administration has committed to putting <a href="http://www.data.gov/catalog">large amounts of information</a> online.</li>
</ul>
</li>
<li>Regulated companies (such as utilities) could be required to put data online directly, without even using the government as an intermediary.</li>
<li>Some part of Fox &#8212; perhaps MySpace Music? &#8212; offers free access to a PostgreSQL extract from <a href="http://www.dbms2.com/2009/03/05/fox-interactive-medias-multi-hundred-terabyte-database-running-on-greenplum/">its Greenplum database</a> to each of its largest advertisers.</li>
<li>Google Analytics offers some basic BI for free to website owners everywhere.</li>
<li>Anybody from web hosting companies to public utilities could open their kimonos and allow their customers to track adherence to actual or implied SLAs (Service Level Agreements) in areas such as uptime, length of outage, responsiveness, and the like.</li>
</ul>
<p>So what cool examples do you have of stakeholder-facing analytics?*</p>
<p><em>*Yes, this is an invitation to drop links to case studies into the comment thread below. </em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/15/stakeholder-facing-analytics/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Infobright blog update</title>
		<link>http://www.dbms2.com/2010/03/19/infobright-blog-update/</link>
		<comments>http://www.dbms2.com/2010/03/19/infobright-blog-update/#comments</comments>
		<pubDate>Fri, 19 Mar 2010 13:42:01 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1733</guid>
		<description><![CDATA[I often offer that, if a company puts up a sufficiently good blog post, I&#8217;ll link to it. Well, I just noticed that Infobright CEO Mark Burton (somewhere along the way he seems to have dropped the “interim”) put up an excellent post last month. Highlights on the market share/sector side include: Infobright’s customer base [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I often offer that, if a company puts up a sufficiently good blog post, I&#8217;ll link to it. Well, I just noticed that Infobright CEO Mark Burton <span style="text-decoration: line-through;">(somewhere along the way he seems to have dropped the “interim”)</span> put up <a href="http://www.infobright.com/Blog/Entry/infobright_strategy_and_plans">an excellent post</a> last month.</p>
<p style="margin-bottom: 0in;">Highlights on the market share/sector side include:<span id="more-1733"></span></p>
<ul>
<li>Infobright’s customer base grew 500% over the past year, to 	120 paying customers.</li>
<li>This included end users (60%), as well as ISVs and SaaS 	providers (40%) who embed Infobright&#8217;s DBMS in their application.</li>
<li>During the same period, Infobright&#8217;s open source software was 	downloaded 35,000 times.</li>
<li>The end user applications were heavily clustered around web 	and online analytics tracking, with a focus on understanding 	customer behavior on the web.</li>
<li>Infobright also continues to see the growth of 	application-specific data marts.</li>
<li>There is also continued interest and growth in using 	Infobright technology to analyze IT logs and telecom CDR (Call 	Detail Record) data, to identify fraud or security issues, to 	understand and improve network performance, and other purposes.</li>
</ul>
<p>Product highlights include:</p>
<ul>
<li>Infobright be much more transparent in 2010 about its plans.</li>
<li>Infobright will start posting and commenting on future 	releases and themes in March of this year. (However, they haven&#8217;t 	run much of that by me yet, and we&#8217;re past the middle of March.)</li>
<li>Infobright expects to drop 3-4 interim releases for every 	major release, with at least two major releases in 2010.</li>
<li>Some of Infobright&#8217;s major improvements this year will be:
<ul>
<li>Continued SMP performance improvements “without the need 	for complex hardware configurations or administrative effort”.</li>
<li>Extending the “hit rate” of the Knowledge Grid, which is 	central to Infobright&#8217;s performance story.</li>
<li>Better international support with UTF-8 extensions.</li>
</ul>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/19/infobright-blog-update/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Netezza Skimmer</title>
		<link>http://www.dbms2.com/2010/01/25/netezza-skimmer/</link>
		<comments>http://www.dbms2.com/2010/01/25/netezza-skimmer/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 14:39:00 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Pricing]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1441</guid>
		<description><![CDATA[As I previously complained, last week wasn&#8217;t a very convenient time for me to have briefings. So when Netezza emailed to say it would release its new entry-level Skimmer appliance this morning, while I asked for and got a Friday afternoon briefing, I kept it quick and basic. That said, highlights of my Netezza Skimmer [...]]]></description>
			<content:encoded><![CDATA[<p>As I previously <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/">complained</a>, last week wasn&#8217;t a very convenient time for me to have briefings. So when Netezza emailed to say it would release its new entry-level Skimmer appliance this morning, while I asked for and got a Friday afternoon briefing, I kept it quick and basic.</p>
<p>That said, highlights of my Netezza Skimmer briefing included:</p>
<ul>
<li>In essence, Netezza Skimmer is 1/3 of Netezza&#8217;s previously smallest appliance, for 1/3 the price.</li>
<li>I.e., Netezza Skimmer has 1 S-blade and 9 disks, vs. 3 S-blades and 24 disks on the Netezza TwinFin 3.</li>
<li>With 1 disk reserved as a hot spare, that boils down to a 1:1:1 ratio among CPU cores, FPGA cores, and 1-terabyte disks on Netezza skimmer. The same could pretty much be said of Netezza TwinFin, the occasional hot-spare disk notwithstanding.</li>
<li>Netezza Skimmer costs $125K.</li>
<li>With 2.8 or so TB of space for user data before compression, that&#8217;s right in line with the <a href="http://www.dbms2.com/2009/07/30/the-netezza-price-point/">Netezza price point</a> of slightly &lt;$20K/terabyte of user data.</li>
<li>That assumes Netezza&#8217;s usual 2.25X compression. I forgot to ask when 4X compression was actually being shipped.</li>
<li>I forgot to ask, but it seems obvious that Netezza Skimmer uses identical or substantially similar components to Netezza TwinFin&#8217;s.</li>
<li>Netezza Skimmer is 7 rack units high.</li>
<li>In place of the SMP hosts on TwinFin Systems, Netezza Skimmer has a host blade.</li>
<li>Netezza (specifically Phil Francisco) mentioned that when Kalido uses Netezza Skimmer for its appliance, there will be an additional host computer, but when it uses TwinFin for the same software, the built-in host will suffice. (Even so, I suspect it might be too strong to say that Skimmer&#8217;s built-in host computer is underpowered.)</li>
<li>Netezza also suggested that more appliance OEMs are coming down the pike specifically focused on the affordable Skimmer.</li>
</ul>
<p><span id="more-1441"></span>Obviously, Netezza Skimmer isn&#8217;t breaking any new technical ground. If Netezza had just called Skimmer &#8220;TwinFin 1,&#8221; nobody should have objected. So the main news here is that you can buy a Netezza box for $125K, plug it in, load a few terabytes of data, and be good to go with a pretty solid data warehouse.  For enterprises and data mart outsourcers with databases of the appropriate size, that could be a pretty attractive deal.</p>
<p>Is Netezza Skimmer as cheap as buying your own hardware and putting (free) <a href="http://www.dbms2.com/2009/10/19/greenplum-free-single-node-edition/">Greenplum Single-Node Edition</a> software on it? Not even close, especially since Greenplum&#8217;s free option limits you to lower overall compute power. Does Netezza Skimmer have as high availability as more expensive alternatives? In some cases, surely not. Skimmer is neither the cheapest thing around nor an utterly high-end product.</p>
<p>But Netezza Skimmer belongs on a lot of short lists even so.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/01/25/netezza-skimmer/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Infobright notes</title>
		<link>http://www.dbms2.com/2009/10/14/infobright-notes/</link>
		<comments>http://www.dbms2.com/2009/10/14/infobright-notes/#comments</comments>
		<pubDate>Wed, 14 Oct 2009 19:32:36 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Kickfire]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1091</guid>
		<description><![CDATA[I had lunch w/ Bob Zurek and Susan Davis of Infobright today. This wasn&#8217;t primarily a briefing, but a few takeaways are: Infobright now has &#62;100 paying customers. Typical database size is from the low 100s of gigabytes to the low single-digit number of terabytes. Agile development is at or approaching two-week release cycles. Like [...]]]></description>
			<content:encoded><![CDATA[<p>I had lunch w/ Bob Zurek and Susan Davis of Infobright today. This wasn&#8217;t primarily a briefing, but a few takeaways are:</p>
<ul>
<li>Infobright now has &gt;100 paying customers.</li>
<li>Typical database size is from the low 100s of gigabytes to the low single-digit number of terabytes.</li>
<li>Agile development is at or approaching two-week release cycles.</li>
<li>Like Kickfire, Infobright  has a multi-year deal with MySQL that insulates it against many potential Oracle/MySQL shenanigans.</li>
<li>From an industry perspective, Infobright&#8217;s customer base sounds a lot like other vendors&#8217;:
<ul>
<li>Data mart outsourcing/online analytics</li>
<li>Log files for websites</li>
<li>Telecommunications</li>
<li>Financial services</li>
<li>OEM, especially in the markets cited above</li>
<li>&#8220;Hey, we&#8217;re beginning to see the occasional energy deal&#8221;</li>
<li>A few random others</li>
</ul>
</li>
<li>Infobright is seeing some household-name customers, who surely have big-name analytic DBMS products, but who also have a policy that open source is the default choice, and if open source can get the job done then the favorite closed-source choices aren&#8217;t used.</li>
<li>Infobright has the usual open-source community story &#8212; lots of involvement and engagement in the forums, but contributions are limited mainly to connectivity, utility scripts, etc. (Maybe some national language translation too; I&#8217;m not sure.)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/14/infobright-notes/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>What Nielsen really uses in data warehousing DBMS</title>
		<link>http://www.dbms2.com/2009/09/29/a-c-nielsen-data-warehousing-dbms/</link>
		<comments>http://www.dbms2.com/2009/09/29/a-c-nielsen-data-warehousing-dbms/#comments</comments>
		<pubDate>Tue, 29 Sep 2009 15:04:26 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Specific users]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=950</guid>
		<description><![CDATA[In its latest earnings call, Oracle made a reference to The Nielsen Company that was &#8212; to put it politely &#8212; rather confusing. I just plopped down in a chair next to Greg Goff, who evidently runs data warehousing at Nielsen, and had a quick chat. Here&#8217;s the real story. The Nielsen Company has over [...]]]></description>
			<content:encoded><![CDATA[<p>In its latest earnings call, Oracle made <a href="../2009/09/19/oracle-database-siz/">a reference to The Nielsen Company</a> that was &#8212; to put it politely &#8212; rather confusing. I just plopped down in a chair next to Greg Goff, who evidently runs data warehousing at Nielsen, and had a quick chat. Here&#8217;s the real story.</p>
<ul>
<li>The Nielsen Company has over half a 	petabyte of data on Netezza in the US. This installation is growing.</li>
<li>The Nielsen Company indeed has 45 	terabytes or whatever of data on Oracle in its European (Customer) 	Information Factory. This is not particularly growing. Nielsen&#8217;s 	Oracle data warehouse has been built up over the past 9 years. It&#8217;s 	not new. It&#8217;s certainly not on Exadata, nor planned to move to 	Exadata.</li>
<li>These are not single-instance 	databases. Nielsen&#8217;s biggest single Netezza database is 20 terabytes 	or so of user data, and its biggest single Oracle database is 10 	terabytes or so.</li>
<li>Much (most?) of the rest of the 	installations are customer data marts and the like, based in each 	case on the “big” central database. (That&#8217;s actually a classic 	<a href="../2009/06/08/the-future-of-data-marts/">data 	mart use case</a>.) Greg said that Netezza&#8217;s capabilities to spin 	out those databases seemed pretty good.</li>
<li>That 10 terabyte Oracle data 	warehouse instance requires a lot of partitioning effort and so on 	in the usual way.</li>
<li>Nielsen has no immediate plans to 	replace Oracle with Netezza.</li>
<li>Nielsen actually has 800 terabytes 	or so of Netezza equipment. Some of that is kept more lightly loaded, 	for performance.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/09/29/a-c-nielsen-data-warehousing-dbms/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

