<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Infobright</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/infobright-brighthouse/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Wed, 08 Feb 2012 22:51:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Comments on the analytic DBMS industry and Gartner&#8217;s Magic Quadrant for same</title>
		<link>http://www.dbms2.com/2012/02/08/gartner-magic-quadrant-data-warehouse-2011-2012/</link>
		<comments>http://www.dbms2.com/2012/02/08/gartner-magic-quadrant-data-warehouse-2011-2012/#comments</comments>
		<pubDate>Wed, 08 Feb 2012 17:17:32 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Exasol]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Kognitio]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[illuminate Solutions]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5926</guid>
		<description><![CDATA[This year&#8217;s Gartner Magic Quadrant for Data Warehouse Database Management Systems is out.* I shall now comment, just as I did on the 2010, 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants, to varying extents. To frame the discussion, let me start by saying: In general, I regard Gartner Magic [...]]]></description>
			<content:encoded><![CDATA[<p>This year&#8217;s Gartner Magic Quadrant for Data Warehouse Database Management Systems is out.* I shall now comment, just as I did on the <a href="http://www.dbms2.com/2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/">2010</a>, <a href="../../../../../2010/02/10/gartner-magic-quadrant-data-warehouse-2009-2010/">2009</a>, <a href="../../../../../2009/01/12/gartners-2008-data-warehouse-database-management-system-magic-quadrant-is-out/">2008</a>, <a href="../../../../../2007/10/19/gartner-2007-magic-quadrant-for-data-warehouse-database-management-systems/">2007</a>, and <a href="../../../../../2006/10/03/vendor-segmentation-for-data-warehouse-dbms/">2006</a> Gartner Data Warehouse Database Management System Magic Quadrants, to varying extents. To frame the discussion, let me start by saying:</p>
<ul>
<li>In general, I regard Gartner Magic Quadrants as a bad use of good research.</li>
<li>Illustrating the uselessness of &#8212; or at least poor execution on &#8212; the  overall quadrant metaphor, a large majority of the vendors covered are  lined up near the line x = y, each outpacing the one below in both of  the quadrant&#8217;s dimensions.</li>
<li>I find fewer specifics to disagree with in this Gartner Magic Quadrant than in previous year&#8217;s versions. Two factors jump to mind as possible reasons:
<ul>
<li>This year&#8217;s Gartner Magic Quadrant for Data Warehouse Database Management Systems is somewhat less ambitious than others; while it gives as much company detail as its predecessors, it doesn&#8217;t add as much discussion of overall trends. So there&#8217;s less to (potentially) disagree with.</li>
<li><a href="http://www.dbms2.com/2010/12/28/evolving-definitions-and-technology-categories-for-2011/">Merv Adrian is now at Gartner</a>.</li>
</ul>
</li>
<li>Whatever the problems may be with Gartner&#8217;s approach, the whole thing comes out better than do <a href="http://www.dbms2.com/2011/02/11/comments-on-the-2011-forrester-wave-for-enterprise-data-warehouse-platforms/">Forrester&#8217;s failed imitations</a>.</li>
</ul>
<p><em>*At the time of this posting, I don&#8217;t yet have a link. However, I expect that to change quickly, and I plan to edit this paragraph accordingly. If nothing else, I hope people will drop links into the comment thread. </em></p>
<p>Specific company comments, roughly in line with Gartner&#8217;s rough single-dimensional rank ordering, include: <span id="more-5926"></span></p>
<ul>
<li>The Gartner Magic Quadrant&#8217;s comments on Teradata seem pretty fair. I don&#8217;t think I&#8217;m much in disagreement when I say:
<ul>
<li>Teradata has the richest, most mature analytic DBMS offering.</li>
<li>Teradata has an outstanding track record both for <a href="http://www.dbms2.com/2011/09/24/confusion-about-teradatas-big-customers/">managing large data volumes</a> and for high-concurrency mixed workloads.</li>
<li>Aster Data was a cool Teradata acquisition, even if Teradata/Aster synergies or integration have been nominal to date.</li>
<li>Teradata still needs to get out of its own way in marketing, positioning, packaging, and/or defining its premium-priced system vs. its more moderately-priced alternatives. Indeed, as necessary as this approach may have been to fending off encroachments by Netezza and others, what Teradata really needs to do is evolve to a more pick-your-own-node-combination mix-match kind of offering.</li>
</ul>
</li>
<li>Gartner has talked with a lot of Oracle Exadata users who say that the product works; Gartner has also stopped beating Oracle up for <a href="http://www.dbms2.com/2010/06/14/best-practices-analytic-database-poc/">its previous policy of almost never doing onsite POCs (Proofs of Concept)</a>; both parts of that ring true with me. But Gartner also rightly dings Oracle for various issues in cost and cumbersomeness. Overall, while I agree there are organizations for which Oracle should indeed be a top-ranked choice, there are many others who shouldn&#8217;t put Oracle on their short list.</li>
<li>Third in the Gartner MQ rankings is IBM.
<ul>
<li>Gartner gets so caught up in reciting the names of various IBM product offerings that it neglects to say much good about DB2 itself. (I tend to have a similar problem.)</li>
<li>But Gartner does mention concurrency as a strength. I agree, especially if we presume that that was a reference to DB2 rather than Netezza.</li>
<li>Gartner cites Netezza&#8217;s post-acquisition annual growth rate as 30%. Gartner seems to think this is a good number. I disagree, but in Netezza&#8217;s defense, it has had to endure IBM&#8217;s post-acquisition on-boarding process.</li>
</ul>
</li>
<li>Arguably fourth in the Gartner Data Warehouse Magic Quadrant rankings is EMC/Greenplum.
<ul>
<li>In general, Gartner likes the taste of Greenplum Kool-Aid.</li>
<li>Gartner neglects to ding Greenplum for concurrency challenges, which I view as an oversight given Gartner&#8217;s general stress on that area.</li>
<li>Gartner does ding Greenplum for support challenges.</li>
<li>Gartner neglects to praise Greenplum for true <a href="http://www.dbms2.com/2009/10/14/greenplum-hybrid-columnar/">hybrid row/columnar data management</a>, a feature shared by <a href="http://www.dbms2.com/2011/09/22/teradata-columnar-compression/">Teradata</a> and <a href="http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/">Vertica</a>, among others, but not by <a href="http://www.dbms2.com/2011/02/06/columnar-compression-database-storage/">Oracle</a>, DB2, or Netezza.</li>
<li>Gartner located a half-petabyte Greenplum database. This doesn&#8217;t surprise me, even though Greenplum has frequently made exaggerated claims about large-size database successes in the past.</li>
<li>Gartner reports a &gt;400 figure for Greenplum customers, which is plausible.</li>
</ul>
</li>
<li>In its first deviation from strict one-dimensional rank ordering, the Gartner Magic Quadrant ranks Sybase ahead of Greenplum in completeness of vision but behind in &#8220;ability to execute&#8221;.
<ul>
<li>If that were the other way around, it might make more sense. Greenplum promises anything and everything you might ever want for analytic data management or the associated analysis; but Sybase has vastly more analytic DBMS users than Greenplum does, running a variety of demanding workloads.</li>
<li>Gartner appears to think that Sybase IQ requires less database administration than I do.</li>
<li>Gartner seems concerned that SAP will position HANA and Sybase ASE as, between them, the only DBMS you&#8217;ll ever need, casting doubt on Sybase IQ&#8217;s future. I wouldn&#8217;t worry about that if you have a problem you want to solve today.</li>
</ul>
</li>
<li>The Gartner Magic Quadrant for Data Warehouse Database Management Systems ranks Microsoft sixth overall, despite noting that there isn&#8217;t a single production reference for Microsoft&#8217;s Parallel Data Warehouse. In support of this ranking, it for example cites the compression feature, which distinguishes Microsoft SQL Server from no other product on the list except Kognitio. If you have such an undemanding data warehousing problem that many different analytic DBMS could meet your needs, there&#8217;s a good chance Microsoft SQL Server can also do the job; and if you&#8217;ve bought into the Microsoft technology stack, you might as well keep going down that path. Otherwise, I don&#8217;t know why somebody should adopt Microsoft&#8217;s offering at this time.</li>
<li>Seventh along the main diagonal path in the Gartner Magic Quadrant is HP Vertica. I&#8217;d rank Vertica higher than that, but in fairness I note two execution concerns. First, HP has a lousy track record, both in acquisitions and in data warehousing/analytics. Second, Vertica is bad about answering my email. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Anyhow, Gartner doesn&#8217;t seem to have given Vertica credit either for <a href="http://www.dbms2.com/2011/06/20/columnar-dbms-vendor-customer-metrics/">its full customer count or for the multiple petabyte-scale databases Vertica runs</a>.</li>
<li>1010data is an outlier, with Gartner noting that it only partly fits in with other &#8220;Data Warehousing Database Management&#8221; companies, and hence kind of confessing that 1010data on the Magic Quadrant is somewhat arbitrary. Stuff like that is bound to happen, given <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">the inherent difficulties of defining market categories</a>. Anyhow, my thoughts on 1010data include:
<ul>
<li>I&#8217;m nervous about the fact that 1010data doesn&#8217;t actually control its own DBMS technology, but rather relies on old code from the small private company KX Systems.</li>
</ul>
<ul>
<li> There are three main reasons to consider 1010data:
<ul>
<li>You want to enter the data mart outsourcing business in a casual way, and you like its SaaS offering.</li>
<li>You want to engage in <a href="http://www.dbms2.com/2010/05/15/stakeholder-facing-analytics/">stakeholder-facing analytics</a> in a casual way, and you like its SaaS offering.</li>
<li>You love 1010data&#8217;s particular set of interactive analytic features and performance.</li>
</ul>
</li>
</ul>
</li>
<li>Back to the main path winding along the Gartner Magic Quadrant main diagonal &#8212; next up is ParAccel. While I question some of the peripheral comments, I agree with Gartner&#8217;s core messages that:
<ul>
<li>ParAccel, the product, is blazingly fast in certain use cases.</li>
<li>ParAccel, the company, is dangerously small.</li>
</ul>
</li>
<li>Eighth on the Gartner MQ&#8217;s main path is Kognitio. This is too high. Kognitio positions itself as offering in-memory DBMS, yet stubbornly refuses to do any kind of data compression. That&#8217;s an awful combination of choices. As for using Kognitio&#8217;s data warehousing SaaS offering &#8212; why would you do that, when more modern products are available on a SaaS/cloud basis as well?</li>
<li>Ninth in the Gartner Magic Quadrant main rankings is SAND.
<ul>
<li>The SAND section is not a triumph of Gartner accuracy. For example:
<ul>
<li><a href="http://www.dbms2.com/2011/11/12/clarifying-sands-customer-metrics-positioning-and-technical-story/">Gartner completely missed the errors in SAND&#8217;s reported customer counts</a>.</li>
<li>Gartner refers to SAND as being &#8220;in existence for approximately nine years&#8221;, which is too low by at least a factor of 2.</li>
<li>Gartner says &#8220;SAND is a privately held company&#8221;, even though <a href="http://itmarketstrategy.com/2009/06/07/sand-technology-a-risky-bet/">Merv knows better than that</a>.</li>
</ul>
</li>
<li>Otherwise, Gartner&#8217;s opinion on SAND seems to boil down to &#8220;Interesting technology and ideas, but dangerously small company.&#8221; I agree.</li>
</ul>
</li>
<li>Tenth and too low in the Gartner MQ main rankings is Infobright.
<ul>
<li>At least by some metrics (e.g. customer count), Infobright isn&#8217;t as dangerously small as ParAccel, SAND, Kognitio, et al.</li>
<li>That said, Infobright is small and focused on <a href="http://www.dbms2.com/2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a>. So I wouldn&#8217;t be confident in Infobright&#8217;s future technology path for human-generated data use cases.</li>
<li>Infobright&#8217;s performance is uneven &#8212; blazing in cases where the Knowledge Grid helps, but not necessarily stellar by analytic DBMS standards when full table scans are called for.</li>
<li>I agree with Gartner that the possibility of Oracle/MySQL future shenanigans is a concern. But while the energy behind MySQL forking efforts doesn&#8217;t seem too great right now, I&#8217;d expect them to revive and offer a successful escape path if it seemed Oracle was going to indeed play hardball.</li>
<li>Also, given that it&#8217;s already an open source vendor, there are various kinds of assurances Infobright could give that would also help alleviate customer concerns.</li>
</ul>
</li>
<li>Actian, formerly Ingres, took a big tumble in Gartner&#8217;s rankings versus last year, when I simply wrote &#8220;<a href="http://www.dbms2.com/2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/">What Gartner said in connection with <strong>Ingres</strong> is too inaccurate to deserve detailed attention</a>.&#8221; I&#8217;m even a little harsher about <a href="http://www.dbms2.com/2011/09/25/ingres-actian/">Ingres/Actian&#8217;s DBMS products and prospects</a> than Gartner is, but at least now we&#8217;re in the same ballpark.</li>
<li>Along with Infobright, ParAccel, and SAND, <a href="http://www.dbms2.com/2011/11/12/exasol-update/">Exasol</a> appears to be another of the &#8220;good columnar technology/small company&#8221; crowd. As with other such products, one should be careful about fit-and-finish features that are missing today, as there is no assurance they&#8217;ll be added in a timely manner going forward.</li>
<li>illuminate Solutions, which was on last year&#8217;s Gartner list, <a href="http://www.dbms2.com/2012/01/16/has-illuminate-solutions-joined-the-choir-invisible/">now appears to be an ex-company</a>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/08/gartner-magic-quadrant-data-warehouse-2011-2012/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Commercial software for academic use</title>
		<link>http://www.dbms2.com/2011/10/14/commercial-software-for-academic-use/</link>
		<comments>http://www.dbms2.com/2011/10/14/commercial-software-for-academic-use/#comments</comments>
		<pubDate>Fri, 14 Oct 2011 06:21:21 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Scientific research]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5483</guid>
		<description><![CDATA[As Jacek Becla explained: Academic scientists like their software to be open source, for reasons that include both free-like-speech and free-like-beer. What&#8217;s more, they like their software to be dead-simple to administer and use, since they often lack the dedicated human resources for anything else. Even so, I think that academic researchers, in the natural [...]]]></description>
			<content:encoded><![CDATA[<p>As <a href="../../../../../2009/10/04/jacek-becla-on-issues-in-scientific-data-management/">Jacek Becla</a> explained:</p>
<ul>
<li>Academic scientists like their software to be open source, for reasons that include both free-like-speech and free-like-beer.</li>
<li>What&#8217;s more, they like their software to be dead-simple to administer and use, since they often lack the dedicated human resources for anything else.</li>
</ul>
<p>Even so, I think that <strong>academic researchers,</strong> in the natural and social sciences alike, <strong>commonly overlook the wealth of commercial software</strong> that could help them in their efforts.</p>
<p>I further think that <strong>the commercial software industry could do a better job of exposing its work to academics,</strong> where by &#8220;expose&#8221; I mean:</p>
<ul>
<li>Give your stuff to academics for <strong>free.</strong></li>
<li>Call their attention to your free offering.</li>
</ul>
<p>Reasons to do so include:</p>
<ul>
<li><strong>Public benefit.</strong> Scientific research is important.</li>
<li><strong>Training future customers.</strong> There&#8217;s huge academic/commercial crossover, especially as students join the for-profit workforce.</li>
</ul>
<p><span id="more-5483"></span>The biggest issue is probably <strong>large-scale database management.</strong> There&#8217;s a feeling, permeating for example parts of the <a href="../../../../../2011/09/20/xldb-the-one-conference-i-like-to-go-to/">XLDB conference</a> and the associated SciDB project, that data stores suitable for holding large amounts of data are either:</p>
<ul>
<li>Hadoop or</li>
<li>Forbiddingly expensive.</li>
</ul>
<p>I think that&#8217;s overstated. In particular:</p>
<ul>
<li>You can put &gt;10 terabytes of machine-generated data (or any other kind) into Infobright and have it well taken care of; Infobright is open source.</li>
<li>You can put &gt;1 petabyte into [name redacted],* among others; [name redacted]* should be out soon with a generously free offering for academic users. <em>Edit: That would be <a href="http://www.dbms2.com/2011/10/18/vertica-community-edition/">Vertica</a>.</em></li>
<li>Conventional relational queries, graph analysis, statistical analysis preparation and more can all be much faster in a good analytic DBMS than in alternative kinds of data stores.</li>
<li>Integration between SQL and other analytic languages is ever improving, as analytic DBMS evolve into &#8220;<a href="../../../../../2011/02/24/analytic-platforms/">analytic platforms</a>&#8220;.</li>
</ul>
<p><em>*My permission to use the name was yanked after this post was largely drafted. I&#8217;m sufficiently pleased with the forthcoming offering itself that I can&#8217;t get upset about the procedural confusion.</em></p>
<p>With a couple of exceptions, the <strong>statistics/predictive analytics</strong> situation seems more reasonable. Industry leaders such as SAS Institute and SPSS (now an IBM company) have engaged in varying degrees of academic outreach. R is in the process of crossing over from academia to business.</p>
<p><em>Unfortunately, I know next to nothing about Stata or, elsewhere in the technical languages area, Mathworks/Matlab. (Who knew that Mathworks was a <a href="http://www.mathworks.com/company/aboutus/">$600 million company</a>, local to my geographical area?)</em></p>
<p>One statistical tool that should perhaps be more present in academia is KXEN. KXEN seems to have some nice differentiation in not making you understand in advance which of your variables are most important. Econometricians and others with large numbers of independent variables might wish to take note.</p>
<p><em>If you think the true situation is nonlinear, and you&#8217;re trying to approximate it with linear models, you almost always have a large number of variables to consider. True, monomials in independent variables aren&#8217;t actually independent, but it might be interesting to pretend that they are and see if any insights fall out that could help in more rigorous analysis.</em></p>
<p>I&#8217;d further argue that, as part of neglecting commercial analytic DBMS, the scientific community in particular neglects the potential of <strong>integrated analytic platforms. </strong>Admittedly, the early leaders in that area &#8212; Aster Data, perhaps followed by Netezza (now an IBM company) &#8212; aren&#8217;t exactly priced in an academic-friendly way. But Vertica, EMC Greenplum, et al. are playing catch-up with analogous technology, and they&#8217;re more likely to offer appealing academic pricing.</p>
<p>There&#8217;s also the <a href="../../../../../2011/03/03/investigative-analytics/">investigative analytics</a> side of business intelligence, especially in the area of visualization/discovery. While Spotfire (now a TIBCO company) got much of its start in research-oriented areas, the otherwise more visible &#8212; no pun intended &#8212; QlikTech and Tableau don&#8217;t seem to have done much in academia. Datameer and yet-younger Hadoop-oriented business intelligence startups don&#8217;t seem to be doing much on the academic front either, more&#8217;s the pity.</p>
<p>Frankly, <strong>I think that most scientific analytic technology needs are also found in the business world.*</strong> That convergence will only get closer as businesses focus more on <a href="../../../../../2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a>. Commercial software companies should pay more attention to scientists, and scientists should gaze out more often from their ramshackle, budget-constrained ivory towers.</p>
<p><em>*The converse isn&#8217;t as true. Businesses have issues not well reflected in science, derived (for example) from the complexity of their transactional schemas, or from office-politics considerations around &#8220;one version of the truth&#8221;.</em></p>
<p><strong><em>Edit: Some links that seem relevant to this year&#8217;s XLDB program</em></strong></p>
<ul>
<li><a href="http://www.dbms2.com/2011/09/05/zynga-linkedin-data-warehous/">Zynga and LinkedIn</a></li>
<li><a href="http://www.dbms2.com/2010/06/19/objectivity-infinite-graph/">Objectivity Infinite Graph</a></li>
<li><a href="http://www.dbms2.com/2010/10/06/ebay-followup-greenplum-out-teradata-10-petabytes-hadoop-has-some-value-and-more/">eBay as of last year&#8217;s XLDB</a> (the most expensive blog post I ever wrote, in light of Greenplum&#8217;s subsequent response)</li>
</ul>
<p><em><br />
</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/10/14/commercial-software-for-academic-use/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Are there any remaining reasons to put new OLTP applications on disk?</title>
		<link>http://www.dbms2.com/2011/09/19/oltp-disk-solid-state/</link>
		<comments>http://www.dbms2.com/2011/09/19/oltp-disk-solid-state/#comments</comments>
		<pubDate>Mon, 19 Sep 2011 18:07:07 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[dbShards and CodeFutures]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5257</guid>
		<description><![CDATA[Once again, I&#8217;m working with an OLTP SaaS vendor client on the architecture for their next-generation system. Parameters include: 100s of gigabytes of data at first, growing to &#62;1 terabyte over time. High peak loads. Public cloud portability (but they have private data centers they can use today). Simple database design &#8212; not a lot [...]]]></description>
			<content:encoded><![CDATA[<p>Once again, I&#8217;m working with an OLTP SaaS vendor client on the architecture for their next-generation system. Parameters include:</p>
<ul>
<li>100s of gigabytes of data at first, growing to &gt;1 terabyte over time.</li>
<li>High peak loads.</li>
<li>Public cloud portability (but they have <strong>private data centers they can use today).</strong></li>
<li>Simple database design &#8212; not a lot of tables, not a lot of columns, not a lot of joins, and everything can be distributed on the same customer_ID key.</li>
<li>Stream the data to a data warehouse, that will grow to a few terabytes. (Keeping only one year of OLTP data online actually makes sense in this application, but of course everything should go into the DW.)</li>
</ul>
<p>So I&#8217;m leaning to saying:   <span id="more-5257"></span></p>
<ul>
<li>They should go with a scalable, MySQL-based solution.
<ul>
<li>Lots of third-party software works with MySQL, in case that&#8217;s helpful.</li>
<li>Yes, any one vendor is small and not yet firmly established, but there are numerous vendors around with interesting MySQL scaling stories.</li>
<li>In a vendor emergency, just going with Oracle&#8217;s MySQL stuff would probably work &#8230;</li>
<li>&#8230; especially because there are these lovely things in the world called <strong>solid-state drives.</strong></li>
<li>There&#8217;s also good escapability if one wants to move away from MySQL, because everybody knows how to handle MySQL data.</li>
</ul>
</li>
<li>The first product to look at is dbShards, because it meets all the topology needs:
<ul>
<li>Local scale-out (<a href="http://www.dbms2.com/2011/02/24/transparent-sharding/">transparent sharding</a>).</li>
<li><a href="http://www.dbms2.com/2011/02/09/clarification-on-dbshards-shard-replication/">Local high availability</a>.</li>
<li>Remote disaster recovery (details of that are underway).</li>
</ul>
</li>
<li>The first analytic DBMS to look at is Infobright.
<ul>
<li>Yes, I know Infobright is focused more on machine-generated data these days, but this client&#8217;s analytic needs are so straightforward Infobright should pass with flying colors.</li>
<li>The MySQL-to-MySQL aspect should make ETL dead simple.</li>
<li>Again, there&#8217;s escapability.</li>
</ul>
</li>
</ul>
<p>Mainly, this is all fine. But I&#8217;m getting pushback on the solid-state aspect, for fear that it will compromise public cloud portability.</p>
<p>Am I missing something here? As far as I&#8217;m concerned, <strong>if you&#8217;re planning an OLTP system with a many-year lifespan today, </strong>of course <strong>you should assume solid-state storage.</strong> Maybe you scale out just as far as you would with disk, striping indexes or entire databases across the RAM of multiple servers. It that case, having solid-state backing reduces the risk of bottlenecks. Maybe you don&#8217;t scale out as far as you would with disk. In that case, solid-state backing saves you money.</p>
<p><strong>As for public-cloud support for solid-state storage, that&#8217;s coming fast, right? </strong>(Actually, I have data points in support of that theory, but they&#8217;re a bit tenuous.) A large fraction of web businesses with private data centers seem to be using solid-state storage &#8212; from Facebook on down &#8212; or so the NoSQL/NewSQL/<a href="http://www.dbms2.com/2011/03/02/short-request-processing/">short-request</a> DBMS guys tell me. Surely a number of public cloud vendors are close behind.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/19/oltp-disk-solid-state/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Hadapt update</title>
		<link>http://www.dbms2.com/2011/07/06/hadapt-update/</link>
		<comments>http://www.dbms2.com/2011/07/06/hadapt-update/#comments</comments>
		<pubDate>Wed, 06 Jul 2011 23:43:49 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Hadapt]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[VectorWise]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4925</guid>
		<description><![CDATA[I met with the Hadapt guys today.  I think I can be a bit crisper than before in positioning Hadapt and its use cases, namely: Hadapt is additional software on a cluster that also runs fully functional Hadoop/HDFS. (Cloudera Hadoop more than straight-from-Apache Hadoop to date, but that&#8217;s not a requirement.) The cluster also runs [...]]]></description>
			<content:encoded><![CDATA[<p>I met with the Hadapt guys today.  I think I can be a bit crisper than before in positioning Hadapt and its use cases, namely:</p>
<ul>
<li>Hadapt is additional software on a cluster that also runs fully functional Hadoop/HDFS. (Cloudera Hadoop more than straight-from-Apache Hadoop to date, but that&#8217;s not a requirement.)</li>
<li>The cluster also runs a DBMS on every node, such as PostgreSQL or one of Infobright/Vectorwise.</li>
<li>Hadapt&#8217;s software manages parallel SQL queries by distributing them to the DBMS living on each node. Hadapt says that the resulting query performance far outshines Hive&#8217;s.</li>
<li>Hadapt further says that, by exploiting the partner DBMS, its SQL functionality outpaces Hive&#8217;s as well.</li>
<li>Target Hadapt use cases are centered around keeping <a href="http://www.dbms2.com/2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated</a> or other <a href="http://www.dbms2.com/2011/05/17/poly-structured-database/">poly-structured</a> data in Hadoop, and extracting, enhancing, or otherwise deriving some of it to live in the relational store.</li>
<li>In particular, Hadapt seems like an interesting choice when you want to use that relational data as you work on other data that&#8217;s still in HDFS, or if you want to keep using the relational data in other kinds of MapReduce jobs.</li>
<li>That all fits well with my thoughts about the importance of <a href="http://www.dbms2.com/2011/05/30/another-category-of-derived-data/">derived data</a>.</li>
</ul>
<p>Other evolution from <a href="http://www.dbms2.com/2011/03/23/hadapt-commercialized-hadoopdb/">what  I wrote about Hadapt a few months ago</a> includes:</p>
<ul>
<li>Hadapt  is in beta now.</li>
<li>Hadapt has added adult supervision in the form  of <a href="http://www.hadapt.com/wickline-announcement/">Philip Wickline</a>,  late of Endeca.</li>
</ul>
<p>In other news, Hadapt is our newest client.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/06/hadapt-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Eight kinds of analytic database (Part 1)</title>
		<link>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/</link>
		<comments>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 08:17:44 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Buying processes]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MOLAP]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[QlikTech and QlikView]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Web analytics]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4868</guid>
		<description><![CDATA[Analytic data management technology has blossomed, leading to many questions along the lines of &#8220;So which products should I use for which category of problem?&#8221; The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for &#8220;big data&#8221; is little help. Let&#8217;s try eight categories instead. While no categorization [...]]]></description>
			<content:encoded><![CDATA[<p>Analytic data management technology has blossomed, leading to many questions along the lines of &#8220;So which products should I use for which category of problem?&#8221; The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for &#8220;big data&#8221; is little help.</p>
<p>Let&#8217;s try eight categories instead. While <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">no categorization is ever perfect</a>, these each have at least some degree of technical homogeneity. Figuring out which types of analytic database you have or need &#8212; and in most cases you&#8217;ll need several &#8212; is a great early step in your analytic technology planning.  <span id="more-4868"></span></p>
<p><strong><em>Enterprise data warehouse</em></strong> (Full or partial)</p>
<ul>
<li><em>Kinds of data likely to be included:</em> All, but especially operational</li>
<li><em>Likely use styles:</em> All</li>
<li><em>Canonical example:</em> Central EDW for a big enterprise</li>
<li><em>Stresses:</em> Concurrency, reliability, workload management</li>
</ul>
<p>The enterprise data warehouse (EDW) ideal says that you copy all your data into one place, and drive all decision-making from there. <a href="../../../../../2011/06/21/its-official-the-grand-central-edw-will-never-happen/">Full EDWs are pipedreams</a>. Still, a partial EDW makes sense for most large enterprises, and many indeed already have one. The first product lines to consider for classical EDWs are Teradata, DB2, Exadata, and maybe Microsoft SQL Server, especially if you&#8217;re going to stress concurrency and/or operational use cases.</p>
<p><strong><em>Traditional data mart</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All</li>
<li><em>Likely use styles:</em> Business intelligence, budgeting/consolidation, investigative</li>
<li><em>Examples:</em> Reporting servers, planning/consolidation servers, anything MOLAP, etc.</li>
<li><em>Stresses:</em> Performance, concurrency, TCO</li>
</ul>
<p>Whether or not you have something like an enterprise data warehouse, it&#8217;s common to have lighter-weight data marts as well. A traditional data mart might drive reports and dashboards. Or it might be specialized for budgeting, planning, and/or consolidation.  Some <a href="../../../../../2011/03/03/investigative-analytics/">investigative analytics</a> may be in the mix as well.</p>
<p>Any DBMS that can support an EDW can also support a data mart, but it may not be the most cost-effective way to do so. Columnar DBMS might have more attractive performance and TCO (Total Cost of Ownership); the same goes for Netezza. Some of them &#8212; e.g. Sybase IQ and <a href="../../../../../2011/06/20/vertica-release-5/">Vertica</a> &#8212; have excellent track records in concurrent usage as well. <a href="../../../../../2011/05/29/when-to-use-relational-database-management-system/">Ted Codd</a> pushed what amounts to MOLAP (Multidimensional OnLine Analytic Processing) systems for these use cases. But relational DBMS commonly do a better job, which is one reason most major MOLAP products have wound up at RDBMS companies.</p>
<p><strong><em>Investigative data mart &#8212; agile</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All, especially customer-centric</li>
<li><em>Likely use styles</em>: Investigative</li>
<li><em>Canonical example:</em> A few analysts getting a few TB to examine</li>
<li><em>Stresses:</em> Ease of setup/load, ease of admin, price/performance</li>
</ul>
<p>Besides the traditional data mart, there are at least two other kinds. Both are focused on investigative analytics, but they&#8217;re differentiated by database size.</p>
<p>If you have just a few analysts,* looking at no more than a few terabytes of data (perhaps even just some gigabytes) &#8212; and if that data is &#8220;single-subject&#8221; and fairly homogenous &#8212; your watchwords should be &#8220;cheap&#8221;, &#8220;easy&#8221;, and &#8220;fast&#8221;. You don&#8217;t need to invest in much hardware, in expensive software, in much administrative effort (the analysts can be their own DBAs),  nor should you endure much set-up time. Just grab a product, grab some data, and start running queries (or extracts into the statistical tool of your choice).</p>
<p><em>*If you have dozens or even hundreds of analysts hitting the same database, you&#8217;re probably back to the more concurrency-oriented scenarios outlined above.</em></p>
<p>Infobright is often cost-effective among columnar analytic DBMS. Other vendors might cut you a price break as well. If you have multiple terabytes of data, don&#8217;t rule out Netezza&#8217;s lowest-end products (even if they&#8217;d really rather sell you something bigger). Or, if you&#8217;re in the sub-terabyte range, maybe you can get by with an in-memory BI tool such as QlikView, and not do anything special on the DBMS side at all.</p>
<p><strong><em>Investigative data mart &#8212; big</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All, especially customer-centric, logs, financial trade, scientific</li>
<li><em>Likely use styles</em>: Investigative</li>
<li><em>Canonical example:</em> Single-subject 20 TB &#8211; 20 PB relational database<em></em></li>
<li><em>Stresses:</em> Performance, scale-out, analytic functionality</li>
</ul>
<p>But if you&#8217;re looking at tens of terabytes of relational data, or even more, you really do have a &#8220;big data&#8221; problem. Performance and scalability are major challenges, usually best addressed by MPP (Massively Parallel Processing) systems, such as Netezza, Vertica, Aster Data, ParAccel, Teradata, or Greenplum. Performance POCs (Proofs Of Concept) are a big part of the buying process. Vendor price negotiations are crucial too.</p>
<p><em>Actually, in the low tens of terabytes you might be able to get away with a shared-disk system that has excellent compression &#8212; e.g., columnar products like Sybase IQ, Infobright, or SAND, rather than just Vertica and ParAccel.</em></p>
<p>Assuming you have affordable, scalable query performance, the competitive differentiator can switch to additional analytic functionality. Aster, Netezza, ParAccel, Vertica, and Greenplum either offer full <a href="../../../../../2011/02/24/analytic-platforms/">analytic platforms</a>, or seem to be on the path to doing so. Teradata, which now owns Aster Data, offers substantial built-in analytic capability in its traditional products as well, and the same goes for Sybase IQ.</p>
<p><em>Continued in <a href="http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/">Part 2</a>,</em><em> where we cover some of the more difficult use cases.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Columnar DBMS vendor customer metrics</title>
		<link>http://www.dbms2.com/2011/06/20/columnar-dbms-vendor-customer-metrics/</link>
		<comments>http://www.dbms2.com/2011/06/20/columnar-dbms-vendor-customer-metrics/#comments</comments>
		<pubDate>Mon, 20 Jun 2011 05:41:54 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Games and virtual worlds]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4742</guid>
		<description><![CDATA[Last April, I asked some columnar DBMS vendors to share customer metrics. They answered, but it took until now to iron out a couple of details. Overall, the answers are pretty impressive.  Sybase said that Sybase IQ had &#62; 2000 direct customers and &#62;500 indirect customers (i.e., end customers of OEMs). That&#8217;s counting by customers; [...]]]></description>
			<content:encoded><![CDATA[<p>Last April, I asked some columnar DBMS vendors to share customer metrics. They answered, but it took until now to iron out a couple of details. Overall, the answers are pretty impressive.  <span id="more-4742"></span></p>
<p>Sybase said that <strong>Sybase IQ </strong>had<strong> &gt; 2000 direct customers </strong>and<strong> &gt;500 indirect customers</strong> (i.e., end customers of OEMs). That&#8217;s counting by customers; I know from prior discussions that Sybase IQ is running at close to two installations per customer. I also believe that Sybase counts different divisions of the same large enterprise as separate customers.</p>
<p><strong>Vertica</strong> cited a figure of <strong>500 customers</strong> as of April (end Q1?), which is close to <strong>600</strong> now, about <strong>40% or a little more direct.</strong> The difference between this and a <a href="http://www.dbms2.com/2011/02/14/now-we-know-why-vertica-has-been-so-weirdly-evasive/">2010 year-end figure of 328</a> is not only new sales, but also slow reporting by OEMs.  One cool figure &#8212; a single OEM reported 82 end sales in a single (quarterly?) report. And a number of those direct customers are substantial; Vertica&#8217;s <a href="http://www.vertica.com/customers/">customer logo</a> page features lots of telcos, lots of internet companies, and the national operation of Blue Cross/Blue Shield.</p>
<p><em>Pay no attention to small inconsistencies in the number of Vertica direct  customers (250 at year-end, no more than that now); Colin Mahony just  estimates these numbers for me from memory, and minor inaccuracies are quite excusable.</em></p>
<p>Even cooler &#8212; <strong>Vertica </strong>reports <strong>7 customers with a petabyte or more of user data each.</strong> About 5 of the 7 are obvious-suspect big-name firms; but unsurprisingly, those big names are NDA. I did secure permission to say that there are 2 telecom companies, a mobile gaming vendor, another internet company, and 3 financial services outfits of various kinds.</p>
<p><strong>SAND Technology </strong>reported <strong>&gt;600 total customers,</strong> including<strong> &gt;100 direct. </strong>Since SAND has been around since the 1990s, those aren&#8217;t great average annual figures, but they&#8217;re probably more than many people (including me) thought.</p>
<p><strong>Infobright</strong> reported around <strong>200 total paying customers, 130 direct.</strong> There are surely a lot more users of open source Infobright, but precise numbers are of course hard to come by.</p>
<p>If I asked <strong>ParAccel</strong> in the April go-round, I&#8217;ve misplaced their answer, but back in October the figure was &gt;30 customers, 2 of them over 100 terabytes. I&#8217;ve seen published figures of 40+ for ParAccel since.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/06/20/columnar-dbms-vendor-customer-metrics/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Infobright 4.0</title>
		<link>http://www.dbms2.com/2011/06/14/infobright-4-0/</link>
		<comments>http://www.dbms2.com/2011/06/14/infobright-4-0/#comments</comments>
		<pubDate>Tue, 14 Jun 2011 08:46:24 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4685</guid>
		<description><![CDATA[Infobright is announcing its 4.0 release, with imminent availability. In marketing and product alike, Infobright is betting the farm on machine-generated data. This hasn&#8217;t been Infobright&#8217;s strategy from the getgo, but it is these days, with pretty good focus and commitment. While some fraction of Infobright&#8217;s customer base is in the Sybase-IQ-like data mart market [...]]]></description>
			<content:encoded><![CDATA[<p>Infobright is announcing its 4.0 release, with imminent availability. In marketing and product alike, Infobright is betting the farm on <a href="../../../../../2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a>. This hasn&#8217;t been Infobright&#8217;s strategy from the getgo, but it is these days, with pretty good <a href="http://www.strategicmessaging.com/extending-the-layered-messaging-model/2011/06/13/">focus and commitment</a>. While some fraction of Infobright&#8217;s customer base is in the Sybase-IQ-like data mart market &#8212; and indeed Infobright put out <a href="http://www.prnewswire.com/news-releases/bell-helicopter-selects-zend-and-infobright-to-improve-enterprise-reporting-application-for-better-business-intelligence-123458269.html">a customer-win press release</a> in that market a few days ago &#8212; Infobright&#8217;s current customer targets seem to be mainly:</p>
<ul>
<li>Web companies, many of which are already MySQL users.</li>
<li>Telecommunication and similar log data, especially in OEM relationships.</li>
<li>Trading/financial services, especially at mid-tier companies.</li>
</ul>
<p>Key aspects of Infobright 4.0 include:  <span id="more-4685"></span></p>
<ul>
<li>&#8220;Rough Query,&#8221; which lets you get approximate query results &gt;10X faster than you could get precise ones, which is a good thing for iterative <a href="../../../../../2011/03/03/investigative-analytics/">investigative analytics</a>.</li>
<li>The start of a plan &#8212; &#8220;DomainExpert&#8221; &#8212; to compress and otherwise optimize data in specific, commonly machine-generated patterns, such as URLs or CDRs (call detail records).</li>
<li>&#8220;Distributed Load Manager&#8221; &#8212; i.e., load nodes that are separate from (and more parallelized than) query nodes.</li>
<li>A Hadoop connector.</li>
<li>Lots of cleanup and <a href="../../../../../2009/08/21/bottleneck-whack-a-mole/">Bottleneck Whack-A-Mole</a>, although I haven&#8217;t paid close attention as to which parts of that are truly new, and which were already handled in recent <a href="../../../../../2010/06/27/infobright-release-3-4/">Infobright point releases</a>.</li>
</ul>
<p>Items on that list focused on the machine-generated data market include:</p>
<ul>
<li>DomainExpert &#8212; obviously.</li>
<li>The Hadoop connector &#8212; also obviously.</li>
<li>The Distributed Load Manager &#8212; why would you need such load speeds unless the data is flowing in from machines?</li>
</ul>
<p>To understand Infobright Rough Query, recall the essence of <a href="../../../../../2007/10/22/infobright-brighthouse-mysql/">Infobright&#8217;s architecture</a>:</p>
<blockquote><p>Infobright’s core technical idea is to chop columns of data into 64K chunks, called <em>data packs,</em> and then store concise information about what’s in the packs. The more basic information is stored in <em>data pack nodes,*</em> one per data pack. If you’re familiar with Netezza <a href="../../../../../2006/09/20/netezza-vs-conventional-data-warehousing-rdbms/">zone maps</a>, data pack nodes sound like zone maps on steroids. They store maximum values, minimum values, and (where meaningful) aggregates, and also encode information as to which intervals between the min and max values do or don’t contain actual data values.</p></blockquote>
<p>I.e., a concise, imprecise representation of the database is always kept in RAM, in something Infobright calls the &#8220;Knowledge Grid.&#8221; Rough Query estimates query results based solely on the information in the Knowledge Grid &#8212; i.e., <strong>Rough Query always executes against information that&#8217;s already in RAM.</strong></p>
<p>To me, Rough Query is the most impressive part of the Infobright 4.0 announcement. DomainExpert sounds like it will be somewhat better than straightforward prefix/suffix compression, but Infobright hasn&#8217;t yet convinced me that the difference is substantial. Distributed Load Manager is indeed important, but only because Infobright doesn&#8217;t have a shared-nothing MPP (Massively Parallel Processing) option at this time. And the rest is mainly catch-up toward Infobright&#8217;s larger and more expensive peers.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/06/14/infobright-4-0/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>MySQL, hash joins and Infobright</title>
		<link>http://www.dbms2.com/2011/03/24/mysql-hash-joins-and-infobright/</link>
		<comments>http://www.dbms2.com/2011/03/24/mysql-hash-joins-and-infobright/#comments</comments>
		<pubDate>Fri, 25 Mar 2011 01:41:56 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Infobright]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4066</guid>
		<description><![CDATA[Over a 24 hour or so period, Daniel Abadi, Dmitriy Ryaboy and Randolph Pullen all remarked on MySQL&#8217;s lack of hash joins. (It relies on nested loops instead, which were state-of-the-art technology around the time of the Boris Yeltsin administration.) This led me to wonder &#8212; why is this not a problem for Infobright? Per [...]]]></description>
			<content:encoded><![CDATA[<p>Over a 24 hour or so period, <a href="http://www.dbms2.com/2011/03/23/hadapt-commercialized-hadoopdb/">Daniel Abadi</a>, Dmitriy  Ryaboy and <a href="http://www.dbms2.com/2011/03/15/mysql-soundbites/">Randolph Pullen</a> all remarked on MySQL&#8217;s lack of hash joins. (It relies on nested loops instead, which were state-of-the-art technology around the time of the Boris Yeltsin administration.) This led me to wonder &#8212; why is this not a problem for Infobright?</p>
<p>Per Infobright chief scientist Dominik Slezak, the answer is</p>
<blockquote><p>Infobright  perform joins using its own optimization/execution layers (that actually  include hash join algorithms and advanced knowledge-grid-based nested  loop optimizations in particular).</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/03/24/mysql-hash-joins-and-infobright/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Updating our vendor client disclosures</title>
		<link>http://www.dbms2.com/2011/02/28/updating-our-vendor-client-disclosures/</link>
		<comments>http://www.dbms2.com/2011/02/28/updating-our-vendor-client-disclosures/#comments</comments>
		<pubDate>Mon, 28 Feb 2011 08:03:39 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[About this blog]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[MarkLogic]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[QlikTech and QlikView]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Schooner Information Technology]]></category>
		<category><![CDATA[Splunk]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Tableau Software]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[dbShards and CodeFutures]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=3906</guid>
		<description><![CDATA[From time to time, I disclose our vendor client lists. Another iteration is below. To be clear: This is a list of Monash Advantage members. All our vendor clients are Monash Advantage members, unless &#8230; &#8230; we work with them primarily in their capacity as technology users. (A large fraction of our user clients happen [...]]]></description>
			<content:encoded><![CDATA[<p>From time to time, I <a href="http://www.monashreport.com/2010/01/06/updating-our-disclosures/">disclose</a> our vendor client lists. Another iteration is below. To be clear:</p>
<ul>
<li>This is a list of <a href="http://www.monash.com/advantage.html"><strong><em>Monash Advantage</em></strong></a> members.</li>
<li>All our vendor clients are <strong><em>Monash Advantage</em></strong> members, unless &#8230;</li>
<li>&#8230; we work with them primarily in their capacity as technology users. (A large fraction of our user clients happen to be SaaS vendors.)</li>
<li>We do not usually disclose our user clients.</li>
<li>We do not usually disclose our venture capital clients, nor those who invest in publicly-traded securities.</li>
<li>Included in the list below are two expired <strong><em>Monash Advantage</em></strong> members who haven&#8217;t said they will renew, as mentioned in <a href="http://www.strategicmessaging.com/money-analyst-attention-and-implied-analyst-endorsement/2011/02/28/">my recent post on analyst bias</a>. (You can probably imagine a couple of reasons for that obfuscation.)</li>
</ul>
<p>With that said, our vendor client disclosures at this time are:</p>
<ul>
<li>Aster Data</li>
<li>Cloudera</li>
<li>CodeFutures/dbShards</li>
<li>Couchbase</li>
<li>EMC/Greenplum</li>
<li>Endeca</li>
<li>IBM/Netezza</li>
<li>Infobright</li>
<li>Intel</li>
<li>MarkLogic</li>
<li>ParAccel</li>
<li>QlikTech</li>
<li>salesforce.com/database.com</li>
<li>SAND Technology</li>
<li>SAP/Sybase</li>
<li>Schooner Information Technology</li>
<li>Skytide</li>
<li>Splunk</li>
<li>Teradata</li>
<li>Vertica</li>
</ul>
<p><span id="more-3906"></span>That list includes the two I&#8217;m obfuscating, plus one more who just emailed to say a signed renewal contract is arriving this week. It does not include others who, less concretely, have said they will sign up soon.</p>
<p>Also, I guess there&#8217;s a bit of a gray area for Tableau. As far as I&#8217;m concerned, I&#8217;m doing <a href="http://www.dbms2.com/2011/02/12/upcoming-webinar-on-investigative-analytics/">an upcoming co-sponsored webinar</a> just for <em><strong>Monash Advantage</strong></em> member Aster Data. Indeed, I declined to contract with or bill Tableau directly for its share,  because I had no good way to do that paperwork. But even so, Tableau is a cosponsor, was involved in the planning discussions and, behind the scenes, is surely footing part of the bill.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/02/28/updating-our-vendor-client-disclosures/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Comments on the Gartner 2010/2011 Data Warehouse Database Management Systems Magic Quadrant</title>
		<link>http://www.dbms2.com/2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/</link>
		<comments>http://www.dbms2.com/2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/#comments</comments>
		<pubDate>Sat, 05 Feb 2011 15:49:39 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[1010data]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Ingres]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Workload management]]></category>
		<category><![CDATA[illuminate Solutions]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=3744</guid>
		<description><![CDATA[Edit: Comments on the February, 2012 Gartner Magic Quadrant for Data Warehouse Database Management Systems &#8212; and on the companies reviewed in it &#8212; are now up. The Gartner 2010 Data Warehouse Database Management Systems Magic Quadrant is out. I shall now comment, just as I did to varying degrees on the 2009, 2008, 2007, [...]]]></description>
			<content:encoded><![CDATA[<p><em>Edit: Comments on the February, 2012 <a href="http://www.dbms2.com/2012/02/08/gartner-magic-quadrant-data-warehouse-2011-2012/">Gartner Magic Quadrant for Data Warehouse Database Management Systems</a> &#8212; and on the companies reviewed in it &#8212; are now up.</em></p>
<p>The <a href="http://www.gartner.com/technology/media-products/reprints/teradata/vol3/article1/article1.html">Gartner 2010 Data Warehouse Database Management Systems Magic Quadrant</a> is out. I shall now comment, just as I did to varying degrees on the <a href="../../../../../2010/02/10/gartner-magic-quadrant-data-warehouse-2009-2010/">2009</a>, <a href="../../../../../2009/01/12/gartners-2008-data-warehouse-database-management-system-magic-quadrant-is-out/">2008</a>, <a href="../../../../../2007/10/19/gartner-2007-magic-quadrant-for-data-warehouse-database-management-systems/">2007</a>, and <a href="../../../../../2006/10/03/vendor-segmentation-for-data-warehouse-dbms/">2006</a> Gartner Data Warehouse Database Management System Magic Quadrants.</p>
<p><em>Note: Links to Gartner Magic Quadrants tend to be unstable. Please alert me if any problems arise; I&#8217;ll edit accordingly.</em></p>
<p>In <a href="../../../../../2009/01/12/gartners-2008-data-warehouse-database-management-system-magic-quadrant-is-out/">my comments on the 2008 Gartner Data Warehouse Database Management Systems Magic Quadrant</a>, I observed that <strong>Gartner&#8217;s &#8220;completeness of vision&#8221; scores were generally pretty reasonable,</strong> but their<strong> &#8220;ability to execute&#8221; rankings were somewhat bizarre;</strong> the same remains true this year. For example, Gartner ranks Ingres higher by that metric than Vertica, Aster Data, ParAccel, or Infobright. Yet each of those companies is growing nicely and delivering products that meet serious cutting-edge analytic DBMS needs, neither of which has been true of Ingres since about 1987.  <span id="more-3744"></span></p>
<p>The general list of &#8220;market forces, end-user expectations and vendors&#8217; resulting solution approaches&#8221; at the top of the 2010 Gartner Data Warehouse Database Management System Magic Quadrant article is a mixed bag. Following Gartner&#8217;s order, I&#8217;ll address those first, and particular companies cited afterwards. Specific items and comments include:</p>
<ul>
<li><strong>&#8220;Increased demand for optimization techniques and performance enhancement.</strong><strong>&#8220;</strong> Gartner seems to be saying that data warehouse DBMS buyers want lists of specific, esoteric performance features. Well, buyers always want their DBMS to run fast, and they&#8217;d like the products to be mature enough to have been through a few rounds of <a href="../../../../../2009/08/21/bottleneck-whack-a-mole/">Bottleneck Whack-A-Mole</a>, but otherwise I&#8217;m not sure I&#8217;d put that at the top of my list.</li>
<li><strong>&#8220;</strong><strong>The argument made by purchasing departments that buying power increases when dealing with a single, incumbent vendor.</strong><strong>&#8220;</strong><strong> </strong>I agree that <a href="../../../../../2011/02/02/exadata-notes/">vendor consolidation and account control</a> are a huge part of the Oracle, Microsoft, IBM and even Teradata stories. (Vertica can prove it&#8217;s 10X more price-performant than Oracle and still not get the business.) But it&#8217;s not just about price negotiations; once annual maintenance is included, one has to squint pretty hard to see Oracle as a low-cost alternative. Also important is reducing the number of total product-specific skill-sets needed on the IT staff.</li>
<li><strong>&#8220;</strong><strong>Prepackaged, prebalanced warehouse environments delivered using data warehouse appliances.</strong><strong>&#8220;</strong> Yep. To varying extents, Oracle, Microsoft, Teradata, and IBM are all committed to designed-hardware strategies.</li>
<li><strong>&#8220;</strong><strong>Expectations for the delivery of on-site POCs.</strong><strong>&#8220;</strong> Honestly, not as many buyers insist on on-site Proofs of Concept as should. Still, Oracle is shameful in its reluctance to do them. (Teradata tries to avoid them too, for obvious reasons of expense, but is much more gracious about capitulating when the buyer insists.)</li>
<li><strong>&#8220;</strong><strong>Cost controls and data warehouse performance management.</strong><strong>&#8220;</strong><strong> </strong>See next comment.</li>
<li><strong>&#8220;</strong><strong>Demands for delivering a fully mixed workload.</strong><strong>&#8220;</strong><strong> </strong>I&#8217;d have phrased the workload management and administrative tools points rather differently than this, but so be it.<strong> </strong></li>
<li><strong>&#8220;</strong><strong>Demands for departmental analytics delivered quickly via data marts.</strong><strong>&#8220;</strong><strong> </strong>Agreed. Data-mart-only installations are a huge part of the market of the analytic DBMS market. <a href="../../../../../2009/06/08/the-future-of-data-marts/">Data mart spin-out</a> is also important.</li>
<li><strong>&#8220;</strong><strong>Wider indexing and fast performance within clusters of data, delivered via column-based solutions.</strong><strong>&#8220;</strong> This bizarrely seems to conflate column stores and parallel processing (both of which are of course highly important).</li>
<li><strong>&#8220;</strong><strong>A wave of new data warehouse implementers seeking fast-track, low-risk delivery.</strong><strong>&#8220;</strong> Well, yes. Netezza noticed that quite some years ago. And by now the <a href="../../../../../2010/04/12/enterprise-data-warehouse-edw-myt/">long-gestation EDW (Enterprise Data Warehouse)</a> is widely disliked.</li>
<li><strong>&#8220;</strong><strong>Global organizations seeking distributed solutions as potential architecture.</strong><strong>&#8220;</strong> If this is the MPP point, it&#8217;s oddly phrased. If this is a suggestion that data warehouses should be partitioned across wide-area networks, it&#8217;s just plain odd. If it&#8217;s a reiteration that departments like to control their own data marts, I agree. And if it&#8217;s a comment on keep-data-in-the-country privacy laws, it could be the most prescient thing Donald Feinberg has said in many years.</li>
</ul>
<p>Long though it is, that list of general items and issues for the 2010 Gartner Data Warehouse Database Management System Magic Quadrant has some gaps. Most glaringly, I don&#8217;t see any references to <a href="../../../../../2011/01/24/analytic-computing-system/">advanced analytics</a> in general, or even to the specific case of <a href="../../../../../2010/05/15/further-clarifying-in-database-mpp-sas/">integrated predictive analytics</a>. There&#8217;s also nothing about solid-state memory or other storage-technology considerations, although in fairness it&#8217;s still early days for much of what vendors conceive of as competitive differentiation in those respects.</p>
<p>Here are some vendor-specific comments on the 2010 Gartner Data Warehouse Database Management System Magic Quadrant:</p>
<ul>
<li>It&#8217;s pretty bizarre to compare <strong>1010data</strong> to database.com or Microsoft Azure. Kognitio would be a better choice. So would cloud-hosted instances of Vertica, Aster Data nCluster, or others.</li>
<li>Gartner&#8217;s comments on <strong>Aster Data</strong> and nCluster are actually pretty reasonable.</li>
<li>Gartner&#8217;s comments on <strong>EMC/Greenplum</strong> are a bit Kool-Aid-drinky, and don&#8217;t account for the inevitable flailing that occurs right after an acquisition. But otherwise they&#8217;re pretty reasonable.</li>
<li>I don&#8217;t take <strong>IBM&#8217;s</strong> super-comprehensive-all-inclusive architectural stories as seriously as Gartner does.</li>
<li>I don&#8217;t take <strong>Netezza&#8217;s</strong> small stable of OEM partners as seriously as Gartner does. I also don&#8217;t share Gartner&#8217;s optimism for the continuation of Netezza&#8217;s NEC partnership in the face of IBM&#8217;s Netezza ownership.</li>
<li>I&#8217;m even more skeptical about <a href="../../../../../2008/03/27/the-illuminate-guys-have-a-cto-blog/">illuminate</a> than Gartner is.</li>
<li>I&#8217;m delighted that Gartner has adopted my phrase <a href="../../../../../2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a> <strong>(Infobright</strong> is one of several firms pushing that one).</li>
<li>&#8220;Only open-source column-store DBMS&#8221; is a bit exaggerated, but Infobright is indeed the only one with serious traction, or offered by a serious analytic DBMS vendor.</li>
<li>What Gartner said in connection with <strong>Ingres</strong> is too inaccurate to deserve detailed attention.</li>
<li>While Gartner&#8217;s write-up of <strong>Kognitio</strong> is a bit confused, that&#8217;s excusable. Kognitio&#8217;s strategy changes often.</li>
<li>I&#8217;m not persuaded by the claim of low <strong>Microsoft</strong> TCO. The days when Microsoft&#8217;s tools were vastly better than the competition&#8217;s are long gone. And using an OLTP DBMS for data warehousing generally takes more people effort than using something more purpose-built.</li>
<li>Gartner is right to ding <strong>Oracle</strong> for high prices, high people costs, and unwillingness to do onsite POCs.</li>
<li>Gartner is right that <strong>Exadata</strong> is a huge improvement over non-Exadata Oracle data warehousing.</li>
<li>Gartner is right to suggest that Exadata can easily handle data warehouses over 20 terabytes in size, but wrong to suggest that software-only Oracle also can. Just because the pain is less than it was with earlier releases of Oracle doesn&#8217;t mean it isn&#8217;t still bad.</li>
<li>Gartner&#8217;s comments on <strong>ParAccel</strong> are pretty reasonable.</li>
<li>Gartner&#8217;s comments on compression in connection with <strong>SAND</strong> make no technical sense (tokenization is a key form of columnar compression, not an alternative to it). Also, SAP&#8217;s acquisition of Sybase is a business challenge for SAND, not a technical one.</li>
<li>Unless I&#8217;m forgetting something, <strong>Sybase IQ</strong> has no more in-database data mining than any other Fuzzy Logix partner does.</li>
<li>Gartner failed to note that, like other DBMS dating back to the 1990s and before, Sybase IQ is more complex to administer than some newer products are.</li>
<li>Gartner&#8217;s take on <strong>Teradata </strong>is pretty reasonable.</li>
<li>Gartner&#8217;s take on <strong>Vertica, </strong>while sloppy, is basically sensible. However, Gartner failed to note that Vertica is a laggard in non-query analytics. (I am sure those deficiencies are being addressed, but Vertica&#8217;s competitors are moving ahead as well.)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/feed/</wfw:commentRss>
		<slash:comments>23</slash:comments>
		</item>
	</channel>
</rss>

