<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Microsoft and SQL*Server</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/microsoft-sqlserver/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 02 Sep 2010 09:06:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The Netezza and IBM DB2 approaches to compression</title>
		<link>http://www.dbms2.com/2010/06/21/netezza-ibm-db2-compression/</link>
		<comments>http://www.dbms2.com/2010/06/21/netezza-ibm-db2-compression/#comments</comments>
		<pubDate>Mon, 21 Jun 2010 12:05:47 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2320</guid>
		<description><![CDATA[Thursday, I spent 3 ½ hours talking with 10 of Netezza&#8217;s more senior engineers. Friday, I talked for 1 ½ hours with IBM Fellow and DB2 Chief Architect Tim Vincent, and we agreed we needed at least 2 hours more. In both cases, the compression part of the discussion seems like a good candidate to [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Thursday, <a href="http://www.dbms2.com/2010/06/21/netezza-database-software-technology-overview/" >I spent 3 ½ hours talking with 10 of Netezza&#8217;s more senior engineers</a>. Friday, I talked for 1 ½ hours with IBM Fellow and DB2 Chief Architect Tim Vincent, and we agreed we needed at least 2 hours more. In both cases, the compression part of the discussion seems like a good candidate to split out into a separate post. So here goes.</p>
<p style="margin-bottom: 0in;">When you sell a row-based DBMS, as Netezza and IBM do, there are a couple of approaches you can take to compression. First, you can compress the blocks of rows that your DBMS naturally stores. Second, you can compress the data in a column-aware way. Both Netezza and IBM have chosen completely column-oriented compression, with no block-based techniques entering the picture to my knowledge. But that&#8217;s about as far as the similarity between Netezza and IBM compression goes.  <span id="more-2320"></span></p>
<p style="margin-bottom: 0in;"><strong>IBM&#8217;s basic DB2 compression strategy</strong> is remarkably simple. In every table (not column) – or in each range partition in a range-partitioned table &#8212; <strong>the 4096 most common* values are identified; these are all encoded into 12-bit strings</strong>. And that&#8217;s that. This has been happening since DB2 9.1, released 4 ½ years ago. DB2&#8217;s compression persists through logs, buffer pools (i.e., RAM cache), and so on. In DB2 9.7, the most recent release, IBM extended the use of the compression to a few areas it hadn&#8217;t stretched before, such as log-based replication, native XML, or CLOBs (Character Large OBjects) that happen not to be too big.</p>
<p style="margin-bottom: 0in;"><em>*Actually, I&#8217;d presume it&#8217;s not exactly the “most common”; there surely is some minimum length of a value to be encoded, or some bias toward length. Also, the determination of what to encode is probably a little imprecise. E.g., I forgot to ask whether the choice of values ever changes as data got updated.</em></p>
<p style="margin-bottom: 0in;">The sophisticated part of DB2&#8217;s simple compression strategy is its breadth of applicability; DB2 compression can apply to:</p>
<ul>
<li>Values in columns (numeric, 	character, whatever)</li>
<li>Substrings of values in columns</li>
<li>Groups of columns (e.g., 	city/state/zip code)</li>
</ul>
<p style="margin-bottom: 0in;">Except for the 4096 values limit, that sounds at least as flexible as the <a href="http://www.dbms2.com/2009/05/14/the-secret-sauce-to-clearpaces-compression/" >Rainstor/Clearpace compression approach</a>.</p>
<p style="margin-bottom: 0in;"><strong>Netezza,</strong> unlike IBM, takes a grab-bag approach to compression – try out a bunch of techniques, see which work best, and incorporate those in the product. <a href="http://www.enzeecommunity.com/blogs/nzblog/2008/05/15/issue-19-the-compress-engine-the-netezza-philosophy" onclick="javascript:pageTracker._trackPageview('/www.enzeecommunity.com');">Netezza first introduced compression a couple of years ago,</a> for numeric columns only, especially integer.  Techniques  used in Netezza numeric compression include but are not limited to:</p>
<ul>
<li>Delta compression, wherein you 	store the increment between a value and its predecessor rather than 	a whole new value.</li>
<li>Ways of indicating that a value or 	increment was just the same as in the row before.</li>
</ul>
<p style="margin-bottom: 0in;">This was via something called Compress Engine,* now being renamed to Compress Engine 1. Netezza&#8217;s new Compress Engine 2 improves on what Netezza did in Compress Engine 1 for numeric data, most notably by trimming away excess field length. (Netezza says it got 28% better compression on a test data set with almost no character strings, primarily from that enhancement.) Further, Netezza Compress Engine 2 adds new compression techniques, allowing it to handle VARCHAR – i.e. character strings &#8212; as well.</p>
<p style="margin-bottom: 0in;"><em>*Fortunately, the original name or at least description of “Compiled Tables” is retreating ever more from view.</em></p>
<p style="margin-bottom: 0in;">Netezza&#8217;s Compress Engine 2 has two ways to compress character fields/text strings – <strong>prefix compression </strong><span style="font-weight: normal;">and </span><strong>Huffman coding.</strong> By way of contrast, Netezza tested suffix compression and decided it wasn&#8217;t beneficial enough to bother messing with.</p>
<ul>
<li>The idea behind prefix compression 	is that if two strings start with the same characters, for the 	second one you only have to record the part that&#8217;s different. Prefix 	compression has a lot of the same merits as delta compression; like 	delta compression, it works best on sorted columns. (An example of 	where prefix compression makes obvious sense is URLs, which tend to 	all start in similar ways.)</li>
<li>In Netezza&#8217;s version of Huffman 	coding, the alphabet is encoded symbol-by-symbol, with more common 	characters getting codes of shorter length. These codes are chosen 	on a column-by-column basis. (I presume the “/” character gets 	shorter code in a URL column than it would, for example, in one that 	stored addresses.)</li>
</ul>
<p style="margin-bottom: 0in;">While I didn&#8217;t ask explicitly, it seems pretty obvious that Compress Engine 2&#8217;s functionality is a strict superset of Compress Engine 1&#8217;s. <a href="http://www.dbms2.com/2010/06/21/netezza-silicon-balance/" >Netezza is going to run Compress Engines 1 and 2 side by side</a>, but expects pages to move from Compress Engine 1&#8217;s purview to Compress Engine 2&#8217;s as part of the new “table grooming” process.</p>
<p><em><strong>Related links</strong></em></p>
<ul>
<li>IBM kindly permitted me to post some of <a href="http://www.monash.com/uploads/ibm-db2-compression-june-2010.pdf" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">its slides in the area of compression</a></li>
<li><a href="http://msdn.microsoft.com/en-us/library/cc280464.aspx" onclick="javascript:pageTracker._trackPageview('/msdn.microsoft.com');">Microsoft SQL Server seems to rely on prefix and dictionary compression</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/21/netezza-ibm-db2-compression/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Notes on SciDB and scientific data management</title>
		<link>http://www.dbms2.com/2010/05/22/scidb-and-scientific-database-management/</link>
		<comments>http://www.dbms2.com/2010/05/22/scidb-and-scientific-database-management/#comments</comments>
		<pubDate>Sat, 22 May 2010 08:04:24 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[GIS and geospatial]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[SciDB]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[Web analytics]]></category>
		<category><![CDATA[eBay]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2178</guid>
		<description><![CDATA[I firmly believe that, as a community, we should look for ways to support scientific data management and related analytics. That&#8217;s why, for example, I went to XLDB3 in Lyon, France at my own expense. Eight months ago, I wrote about issues in scientific data management. Here&#8217;s some of what has transpired since then.
The main [...]]]></description>
			<content:encoded><![CDATA[<p>I firmly believe that, as a community, we should look for ways to support scientific data management and related analytics. That&#8217;s why, for example, I went to XLDB3 in Lyon, France at my own expense. Eight months ago, I wrote about <a href="http://www.dbms2.com/2009/10/03/issues-in-scientific-data-management/" >issues in scientific data management</a>. Here&#8217;s some of what has transpired since then.</p>
<p>The main new activity I know of has been in the open source <a href="http://www.scidb.org/" onclick="javascript:pageTracker._trackPageview('/www.scidb.org');">SciDB</a> project.   <span id="more-2178"></span></p>
<ul>
<li>A company called Zetics has been started to commercialize SciDB. As of now, the entire staff seems to be CEO Marilyn Matz, techie Paul Brown, and part of Mike Stonebraker. Marilyn says Zetics has some venture capital, but even under NDA didn&#8217;t tell me who it was from. Zetics does not have its own web site.</li>
<li>Marilyn tells me there are 20-25 contributors to SciDB, led by Paul Brown and Mike Stonebraker. Brown is full-time. Persistent Systems has been donating the efforts of a few of its employees. Some <a href="http://www.lsst.org/lsst" onclick="javascript:pageTracker._trackPageview('/www.lsst.org');">LSST</a> folks have been doing SciDB work backed by grant money. Most or all of the rest seem to be purer volunteers. Some Russians have been particularly active.</li>
<li>Release 0.5 of SciDB is expected in June. Release 1.0 is expected in September. This is a rewrite; prior demo code has been scrapped. Perhaps not coincidentally, it&#8217;s also a small slip from prior project plans.</li>
<li>The array data model is an example of what&#8217;s being implemented first. (Duh &#8212; you can&#8217;t have a DBMS without a data model.) Support for uncertainty is an example of what&#8217;s been deferred until later.</li>
<li>As has been clear since XLDB3 last August, one major target market for SciDB is genomic research.</li>
<li>It&#8217;s obvious that the oil and gas industry, with all its geospatial data, should be interested in SciDB. But there&#8217;s not much activity in that regard; outreach is evidently needed. If you can think of somebody in that sector (or anywhere else) who should be alerted to SciDB, please ping them.</li>
<li>Interest from web analytics users in SciDB seems to have receded a bit from the days when eBay almost funded the project.</li>
</ul>
<p>In other scientific data management news,</p>
<ul>
<li>Microsoft put out a book called <a href="http://research.microsoft.com/en-us/collaboration/fourthparadigm/" onclick="javascript:pageTracker._trackPageview('/research.microsoft.com');">The Fourth Paradigm</a> on scientific database management. The whole thing can be downloaded, very officially, as a giant PDF. I think it&#8217;s worth skimming. I don&#8217;t think it&#8217;s worth actually reading. (I did read it.)</li>
<li><a href="http://www-conf.slac.stanford.edu/xldb/" onclick="javascript:pageTracker._trackPageview('/www-conf.slac.stanford.edu');">XLDB4</a> will be at Stanford October 5-7. Unlike prior XLDBs, it will have an open (i.e., no invitation required) part.</li>
</ul>
<p>Finally, you surely are aware of the whole &#8220;Climategate&#8221; mess, in which major climate researchers&#8217; email was hacked and many unkind conclusions were drawn. Well, one of the most technical parts of the disclosure was in a long series of Read Me files, in which an unfortunate programmer lamented about <a href="http://di2.nu/foia/HARRY_READ_ME-20.html" onclick="javascript:pageTracker._trackPageview('/di2.nu');">the difficulty of reconstructing published results from files at hand</a>. These turned out to illustrate a classic problem that SciDB or alternatives are meant to solve:</p>
<ul>
<li>Raw data was impossible to use without various adjustments to regularize it (the word &#8220;regridding&#8221; comes up a lot, for example). Massaging was needed before analytics could be done on it.</li>
<li>The raw data was thrown out or lost, and could not be reconstructed (why they couldn&#8217;t have asked the suppliers of the data to give it to them again was unclear in this case, since it wasn&#8217;t original experimental data).</li>
<li>It was thus impossible to massage the data in any new or improved way.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/22/scidb-and-scientific-database-management/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Intelligent Enterprise’s Editors’/Editor’s Choice list for 2010</title>
		<link>http://www.dbms2.com/2010/02/11/intelligent-enterprise-editors-choice-201/</link>
		<comments>http://www.dbms2.com/2010/02/11/intelligent-enterprise-editors-choice-201/#comments</comments>
		<pubDate>Thu, 11 Feb 2010 23:13:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Ingres]]></category>
		<category><![CDATA[Intersystems and Cache']]></category>
		<category><![CDATA[Jaspersoft]]></category>
		<category><![CDATA[Kalido]]></category>
		<category><![CDATA[MarkLogic]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Pentaho]]></category>
		<category><![CDATA[QlikTech and QlikView]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Tableau Software]]></category>
		<category><![CDATA[Talend]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1578</guid>
		<description><![CDATA[As he has before, Intelligent Enterprise Editor Doug Henschen

Personally selected annual lists of 12 &#8220;Most influential&#8221; companies and 36 &#8220;Companies to watch&#8221; in analytics- and database-related sectors.
Made it clear that these are his personal selections.
Nonetheless has called it an Editors&#8217; Choice list, rather than Editor&#8217;s Choice.  

(Actually, he&#8217;s really called it an &#8220;award.&#8221;)
People advising [...]]]></description>
			<content:encoded><![CDATA[<p>As he has <a href="http://www.dbms2.com/2009/01/12/intelligent-enterprises-editorseditors-choice-list/" >before</a>, <em>Intelligent Enterprise</em> Editor Doug Henschen</p>
<ul>
<li>Personally selected <a href="http://intelligent-enterprise.informationweek.com/showArticle.jhtml;jsessionid=IANLOXCT2244BQE1GHPCKH4ATMY32JVN?articleID=222900034&amp;pgno=1" onclick="javascript:pageTracker._trackPageview('/intelligent-enterprise.informationweek.com');">annual lists</a> of 12 &#8220;Most influential&#8221; companies and 36 &#8220;Companies to watch&#8221; in analytics- and database-related sectors.</li>
<li>Made it clear that these are his personal selections.</li>
<li>Nonetheless has called it an Editors&#8217; Choice list, rather than Editor&#8217;s Choice. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
</ul>
<p>(Actually, he&#8217;s really called it an &#8220;award.&#8221;)</p>
<p><span id="more-1578"></span>People advising Doug &#8212; who come to think of it actually are Contributing Editors to <em>Intelligent Enterprise</em> or something like that &#8212; included Cindi Howson, Seth Grimes, three others, and me.</p>
<p>And if past is prologue, I will now get a flood of PR emails calling my attention to this award that I already have both participated in and blogged about. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>As usual, the sense:nonsense ratio on these lists was pleasingly high. Analytic DBMS vendors cited included IBM, Microsoft, Netezza, Oracle, Sybase, and Teradata in the &#8220;Most influential&#8221; group, with Aster, Greenplum, HP, Infobright, and Vertica among the &#8220;To watch&#8221; crowd. It&#8217;s tough to argue with those selections, whose most questionable element is probably the not-ridiculous supposition that HP could do something interesting over the coming year. Cloudera and Intersystems also made the list, deservedly.</p>
<p>All three of QlikTech, Tableau, and TIBCO made the list, which is appropriate given the potential for and interest in interactive data exploration technology.  The BI majors, independent or otherwise, were all on as well. In text mining, Doug included Attensity and Clarabridge, which I think is exactly right. (Plus OpenCalais.)  Upon reflection, I probably should have nominated Mark Logic, even though most of its business is non-enterprise; but hey, nobody&#8217;s perfect, and the same goes for lists. Open source was well represented, with Apache, Actuate, Jaspersoft, Eclipse, Infobright, Nuxeo and R all being cited (but not Ingres or Pentaho). Kalido made the list, with my endorsement, their silly I-CASE like marketing messaging notwithstanding.</p>
<p>Speaking of imperfections &#8212; there only are a few category names, and so category assignments can be pretty bizarre. (In an ideal world, middleware wouldn&#8217;t be included under &#8220;enterprise applications&#8221;.) Greenplum hasn&#8217;t really &#8220;extended&#8221; its DBMS with a &#8220;cloud&#8221; option. As much as I&#8217;d like Netezza to be more influential than SAP, that&#8217;s probably not the best way to rank them. And there are a number of &#8220;This company is on a roll!&#8221; kinds of comments that I wouldn&#8217;t necessarily endorse.</p>
<p>But those are all nitpicks. On the whole, it&#8217;s another nice job.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/11/intelligent-enterprise-editors-choice-201/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Facts and rumors</title>
		<link>http://www.dbms2.com/2009/09/30/facts-and-rumors/</link>
		<comments>http://www.dbms2.com/2009/09/30/facts-and-rumors/#comments</comments>
		<pubDate>Wed, 30 Sep 2009 06:21:44 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[DATAllegro]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Telecommunications]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=963</guid>
		<description><![CDATA[
Vertica is putting out a press 	release today touting its 100th customer, and talking of triple 	digit growth last year.
Multiple sources have told me that 	the DATAllegro system is being thrown out of Dell, so evidently Dell is telling this to one and all. If that goes 	through, this would presumably leave TEOCO as DATAllegro&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<ul>
<li>Vertica is putting out a press 	release today touting its 100th customer, and talking of triple 	digit growth last year.</li>
<li>Multiple sources have told me that 	the DATAllegro system is being thrown out of <a href="http://www.dbms2.com/2009/03/02/closing-the-book-on-the-datallegro-customer-base/" >Dell</a>, so evidently Dell is telling this to one and all. If that goes 	through, this would presumably leave <a href="http://www.dbms2.com/2008/05/23/data-warehouse-appliance-power-user-teoco/" >TEOCO</a> as DATAllegro&#8217;s single happy 	customer. (I haven&#8217;t checked with Microsoft for its view.)</li>
<li>A rumor has it that Infiniband 	technology vendor Voltaire, Ltd. privately claims triple-digit sales 	of switches for Exadata 1 (I think that one would be one switch per Exadata installation, not per rack). Based just on a quick glance, this is far from confirmed by 	Voltaire&#8217;s earnings <a href="http://seekingalpha.com/article/135775-voltaire-ltd-q1-2009-earnings-call-transcript" onclick="javascript:pageTracker._trackPageview('/seekingalpha.com');">conference 	call</a> <a href="http://seekingalpha.com/article/152278-voltaire-q2-2009-earnings-transcript" onclick="javascript:pageTracker._trackPageview('/seekingalpha.com');">transcripts</a> or <a href="http://sec.gov/cgi-bin/browse-edgar?action=getcompany&amp;CIK=0001401678&amp;owner=exclude&amp;count=40" onclick="javascript:pageTracker._trackPageview('/sec.gov');">SEC 	filings</a>. However, the most recent transcript does seem to 	indicate Voltaire got multiple Exadata deals in the 	telecommunications sector, and suggests some Exadata penetration in 	other sectors as well.</li>
<li>I was told of a 	classified-agency user that has &gt;1 petabyte of data on Exadata 1 	and 600 terabytes or so on Netezza. My not-obviously-biased source says 	the agency is distinctly happier with Netezza than Exadata.</li>
<li>Like <a href="http://paraccel.com/data_warehouse_blog/?p=104" onclick="javascript:pageTracker._trackPageview('/paraccel.com');">ParAccel</a>, 	<a href="http://www.theregister.co.uk/2009/09/29/tpc_slaps_oracle/" onclick="javascript:pageTracker._trackPageview('/www.theregister.co.uk');">Oracle 	just got dinged for TPC-related misbehavior</a>.</li>
<li>Rumor has it that Sun has no 	intention of helping ParAccel rerun its withdrawn TPC-H benchmark.</li>
<li>ParAccel has withdrawn the claim 	from its home page to be the &#8220;CERTIFIED&#8221; price-performance 	leader. This seems to confirm that the claim was a reference to the 	TPC-H. In my opinion, that was a gross misrepresentation of what the 	TPC-H shows.</li>
</ul>
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/09/30/facts-and-rumors/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Xkoto Gridscale highlights</title>
		<link>http://www.dbms2.com/2009/09/11/xkoto-gridscale-highlights/</link>
		<comments>http://www.dbms2.com/2009/09/11/xkoto-gridscale-highlights/#comments</comments>
		<pubDate>Fri, 11 Sep 2009 18:36:03 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Xkoto]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=881</guid>
		<description><![CDATA[I talked yesterday with cofounders Albert Lee and Ariff Kassam of Xkoto. Highlights included:

Xkoto sells Gridscale, a 	clustering server for DB2 and, more recently, MS SQL Server.
Xkoto Gridscale runs on a separate 	box, between the application and the database servers. This box is 	typically smaller and cheaper than the database server boxes.
Xkoto most typically sells [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I talked yesterday with cofounders Albert Lee and Ariff Kassam of Xkoto. Highlights included:<span id="more-881"></span></p>
<ul>
<li>Xkoto sells Gridscale, a 	clustering server for DB2 and, more recently, MS SQL Server.</li>
<li>Xkoto Gridscale runs on a separate 	box, between the application and the database servers. This box is 	typically smaller and cheaper than the database server boxes.</li>
<li>Xkoto most typically sells 	Gridscale into environments where there already are three database 	servers &#8212; one to do work, one for hot standby, and one for remote 	disaster recovery.</li>
<li>In such environments, Gridscale&#8217;s 	big benefit is that you can distribute the query workload among all 	three servers. Xkoto believes this big performance increase is the 	reason customers don&#8217;t get much past 3 database servers under Xkoto 	(they didn&#8217;t seem quite sure as to whether the all-time record was 4 	or 5).  Note that even if a remote server is a little too far away 	for OLTP query response, it can work fine for reporting.</li>
<li>Of course, if you don&#8217;t already 	have high/&#8221;continuous&#8221; availability and/or disaster 	recovery, then Xkoto would say those are core benefits of Gridscale 	as well.</li>
<li>Gridscale sends transactions (or 	just SQL statements?) to all servers in the cluster. Once any of 	them responds affirmatively, that update is reflected in queries. 	Gridscale maintains a small query log to make sure it gets the other 	database copies in sync. It also tries to make sure that queries 	always go to the most current copy of the database. (I didn&#8217;t ask 	what happens if Server A executes Transaction T but not U, while 	Server B executes Transaction U and not T &#8212; but that does seem like 	something of an edge case.).</li>
<li>Xkoto spun out of <a href="http://www.halcyoninc.com/" onclick="javascript:pageTracker._trackPageview('/www.halcyoninc.com');">Halcyon 	Monitoring</a> in 2006, starting with DB2 support. Microsoft SQL 	Server support was introduced in 2008.</li>
<li>Xkoto likes its partnerships with 	IBM and Microsoft. For example, IBM provides Level 1 and 2 support 	for Gridscale itself. Due in large part to this partnership 	strategy, Xkoto says it has no plans to support DBMS beyond DB2 and 	SQL Server.</li>
<li>Instead, Xkoto is pursuing 	partnerships with large application vendors and so on. (The figure 	&#8220;about 10&#8243; was mentioned.) I gather the idea is to make 	sure that neither the application support folks nor the app itself 	freak out from the fact that the app isn&#8217;t exactly talking to the 	DBMS any more.</li>
<li>Xkoto has done lab tests 	suggesting Gridscale offers near-linear scalability (in terms of SQL 	Server database throughput) on a query-only workload up to 10 	servers.</li>
<li>I gather that Xkoto and IBM have 	demos suggesting it&#8217;s a fine idea to have your disaster recovery 	server be in the Amazon cloud, but they haven&#8217;t yet made any sales 	based on that &#8212; er, based on that <em>premise.</em> <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </li>
<li>Gridscale pricing is measured in 	the same metrics as DB2 or SQL Server pricing, and in each case is 	around 1/3 what database pricing would be on the same box (I&#8217;m 	guessing that&#8217;s for enterprise additions without add-ons, but I 	didn&#8217;t probe). Specifically, Gridscale charges $12K per 100 PVUs for 	the DB2 edition, and $12K per socket for running with Microsoft SQL 	Server.</li>
<li>Gridscale typically runs on 	smaller boxes than the databases it talks to.</li>
<li>Xkoto has about 35 	revenue-recognized customers. Most are on DB2, the first environment 	Gridscale supported.</li>
<li>Average Gridscale selling prices 	are $180K on DB2, and $40-50K in the early going for SQL Server.</li>
<li>Xkoto has about 40 full-time 	employees, with engineering in Toronto and business operations in 	Waltham.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/09/11/xkoto-gridscale-highlights/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Not-so-great moments in planning</title>
		<link>http://www.dbms2.com/2009/07/24/not-so-great-moments-in-planning/</link>
		<comments>http://www.dbms2.com/2009/07/24/not-so-great-moments-in-planning/#comments</comments>
		<pubDate>Fri, 24 Jul 2009 06:18:30 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Fun stuff]]></category>
		<category><![CDATA[Humor]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=844</guid>
		<description><![CDATA[xkcd nails it again.
]]></description>
			<content:encoded><![CDATA[<p><a href="http://xkcd.com/612/" onclick="javascript:pageTracker._trackPageview('/xkcd.com');">xkcd nails it again</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/07/24/not-so-great-moments-in-planning/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Update on Microsoft&#8217;s Madison and Fast Track data warehouse products</title>
		<link>http://www.dbms2.com/2009/07/15/update-on-microsofts-madison-and-fast-track-data-warehouse-products/</link>
		<comments>http://www.dbms2.com/2009/07/15/update-on-microsofts-madison-and-fast-track-data-warehouse-products/#comments</comments>
		<pubDate>Wed, 15 Jul 2009 20:52:10 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=841</guid>
		<description><![CDATA[I chatted with Stuart Frost of Microsoft yesterday.  Stuart is and remains GM of Microsoft&#8217;s data warehouse product unit, covering about $1 billion or so of revenue. While rumors of Stuart&#8217;s departure from Microsoft are clearly exaggerated, it does seem that his role is more one of coordination than actual management.
Microsoft Madison availability remains [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I chatted with Stuart Frost of Microsoft yesterday.  Stuart is and remains GM of Microsoft&#8217;s data warehouse product unit, covering about $1 billion or so of revenue. While rumors of Stuart&#8217;s departure from Microsoft are clearly exaggerated, it does seem that his role is more one of coordination than actual management.</p>
<p style="margin-bottom: 0in;">Microsoft Madison availability remains scheduled for H1 2010. Nothing new there. Tangible progress includes a few customer commitments of various sorts, including one outright planned purchase (due to some internal customer considerations around using up a budget).  At the moment various Microsoft Madison technology &#8220;previews&#8221; are going on, which seem to amount to proofs-of-concept, that:</p>
<ul>
<li>Start with actual customer data (some from Microsoft, some from outside)</li>
<li>Generate larger synthesized data sets based on those (database size seems to be 10-100 TB)</li>
<li>Run in Microsoft data centers or &#8220;technology centers&#8221;, rather than on customer premises.</li>
</ul>
<p style="margin-bottom: 0in;">The basic Microsoft Madison product distribution strategy seems to be:<span id="more-841"></span></p>
<ul>
<li>Microsoft specifies configurations with different brands of hardware.</li>
<li>The respective hardware vendors sell and deliver those configurations, with Madison software 	pre-installed.</li>
<li>Actual software licenses are invoiced however makes sense, which in most cases may well be directly by Microsoft as part of a wider enterprise relationship.</li>
</ul>
<p style="margin-bottom: 0in;">Most of the usual-suspect big name hardware and storage vendors seem to be involved.</p>
<p style="margin-bottom: 0in;">Microsoft Madison is focused on &#8220;high-end&#8221; data warehousing, with Stuart candidly saying that everybody at Microsoft seems to have a different definition of what &#8220;high-end&#8221; means. In practice, this &#8220;high end&#8221; probably will be whatever conventional SQL Server doesn&#8217;t do a good job on &#8212; e.g., &gt;5 terabytes, or even smaller in table-scan-oriented workloads.</p>
<p style="margin-bottom: 0in;">Microsoft Madison seems further focused on being the &#8220;hub&#8221; to SQL Server data marts, with Stuart citing a survey saying Microsoft SQL Server has 44% market share in data marts when counting by unit.  When I pressed for technical strategy as to how the data would be moved and synchronized between the hub and spokes, details were vague. Obviously, Microsoft still has considerable work to do in this regard, whether in articulating strategy or in actual product development.  The same goes double for ease-of-data-mart creation ala <a href="../2009/06/08/the-future-of-data-marts/">Greenplum&#8217;s Enterprise Data Cloud messaging</a>.</p>
<p style="margin-bottom: 0in;">While Madison is a future, Stuart says <a href="../2009/02/23/microsoft-sql-server-fast-track/">Microsoft SQL Server Fast Track</a> is a &#8220;huge hit.&#8221;  He knows of 20 sales, and estimates the total number is in the 100s. The disparity is explained by the fact that that Fast Track comprises a set of recommended hardware configurations, rather than an identifiable Microsoft software invoice line item.  The distribution model for Microsoft SQL Server Fast Track seems to be similar to that planned for Madison, with hardware partners such as HP making many of the sales.</p>
<p style="margin-bottom: 0in;">Sales of all these Microsoft SQL Server data warehousing products seem to be focused on the Microsoft SQL Server installed base. No surprise there either.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/07/15/update-on-microsofts-madison-and-fast-track-data-warehouse-products/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>My current customer list among the analytic DBMS specialists</title>
		<link>http://www.dbms2.com/2009/06/25/my-current-customer-list-among-the-analytic-dbms-specialists/</link>
		<comments>http://www.dbms2.com/2009/06/25/my-current-customer-list-among-the-analytic-dbms-specialists/#comments</comments>
		<pubDate>Fri, 26 Jun 2009 00:28:19 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[About this blog]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Kickfire]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=821</guid>
		<description><![CDATA[(This is an updated version of an August, 2008 post.)
One of my favorite pages on the Monash Research website is the list of many current and a few notable past customers.  (Another favorite page is the one for testimonials.) For a variety of reasons, I won&#8217;t undertake to be more precise about my current [...]]]></description>
			<content:encoded><![CDATA[<p><em>(This is an updated version of <a href="http://www.dbms2.com/2008/08/24/data-warehouse-specialists/" >an August, 2008 post</a>.)</em></p>
<p>One of my favorite pages on the <a href="http://www.monash.com/" onclick="javascript:pageTracker._trackPageview('/www.monash.com');"><em>Monash Research</em></a> website is the list of <a href="http://www.monash.com/customers.html" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">many current and a few notable past customers</a>.  (Another favorite page is the one for <a href="http://www.monash.com/testimonials.html" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">testimonials</a>.) For a variety of reasons, I won&#8217;t undertake to be more precise about my current customer list than that. But I don&#8217;t think it would hurt anything to list the analytic/data warehouse DBMS/appliance specialists in the group. They are:</p>
<ul>
<li>Aster Data</li>
<li>Greenplum</li>
<li>Infobright</li>
<li>Kickfire</li>
<li>Kognitio</li>
<li>Microsoft</li>
<li>Netezza (my biggest client this year, probably, because of all the <a href="http://www.enzeeuniverse.com" onclick="javascript:pageTracker._trackPageview('/www.enzeeuniverse.com');">Enzee Universe</a> appearances)</li>
<li>Sybase</li>
<li>Teradata</li>
<li>Vertica</li>
<li>Attivio, which may or may not be construed as being in the analytic DBMS business</li>
<li>Clearpace, ditto</li>
</ul>
<p style="margin-bottom: 0in;">All of those are <a href="http://www.monash.com/advantage.html" onclick="javascript:pageTracker._trackPageview('/www.monash.com');"><em>Monash Advantage</em></a> members.</p>
<p style="margin-bottom: 0in;">If you care about all this, you may also be interested in the rest of my <a href="http://www.monashreport.com/2008/06/02/updating-my-standards-and-disclosures/" onclick="javascript:pageTracker._trackPageview('/www.monashreport.com');">standards and disclosures</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/06/25/my-current-customer-list-among-the-analytic-dbms-specialists/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The future of data marts</title>
		<link>http://www.dbms2.com/2009/06/08/the-future-of-data-marts/</link>
		<comments>http://www.dbms2.com/2009/06/08/the-future-of-data-marts/#comments</comments>
		<pubDate>Mon, 08 Jun 2009 08:25:07 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[DATAllegro]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[eBay]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=805</guid>
		<description><![CDATA[Greenplum is announcing today a long-term vision, under the name Enterprise Data Cloud (EDC). Key observations around the concept &#8212; mixing mine and Greenplum&#8217;s together &#8212; include:

Data marts aren&#8217;t just for 	performance (or price/performance). They also exist to give 	individual analysts or small teams control of their analytic 	destiny.
Thus, it would be really cool if [...]]]></description>
			<content:encoded><![CDATA[<p>Greenplum is announcing today a long-term vision, under the name <em>Enterprise Data Cloud (EDC). </em><span style="font-style: normal;">Key observations around the concept &#8212; mixing mine and Greenplum&#8217;s together &#8212; include:</span></p>
<ul>
<li><strong>Data marts aren&#8217;t just for 	performance</strong> (or price/performance). They also exist to give 	individual analysts or small teams control of their analytic 	destiny.</li>
<li>Thus, it would be really cool if 	business users could have their own <strong>analytic &#8220;sandboxes&#8221;</strong> &#8212; virtual or physical analytic databases that they can manipulate 	without breaking anything else.</li>
<li>In any case, business users want 	to analyze data when they want to analyze it. <strong>It is often unwise 	to ask business users to postpone analysis</strong> until after an 	enterprise data model can be extended to fully incorporate the new 	data they want to look at.</li>
<li>Whether or not you agree with 	that, it&#8217;s an empirical fact that enterprises have many <strong>legacy 	data marts</strong> (or even, especially due to M&amp;A, multiple legacy 	data warehouses).  Similarly, it&#8217;s an empirical fact that many 	business users have the clout to order up <strong>new data marts</strong> as 	well.</li>
<li><strong>Consolidating</strong> data marts 	onto one common technological platform has important benefits.</li>
</ul>
<p style="margin-bottom: 0in;">In essence, Greenplum is pitching the story:</p>
<ul>
<li>Thesis: Enterprise Data Warehouses 	(EDWs)</li>
<li>Antithesis: Data Warehouse 	Appliances</li>
<li>Synthesis: Greenplum&#8217;s Enterprise 	Data Cloud vision</li>
</ul>
<p style="margin-bottom: 0in;">When put that starkly, it&#8217;s overstated, not least because</p>
<p style="margin-left: 0.49in; margin-bottom: 0in;">Specialized Analytic DBMS != Data Warehouse Appliance</p>
<p style="margin-bottom: 0in;">But basically it makes sense, for two main reasons:</p>
<ul>
<li>Analysis is performed on all sorts 	of novel data, from sources far beyond an enterprise&#8217;s core 	transactions.  This data neither has to fit nor particularly 	benefits from being tightly fitted into the core enterprise data 	model.  Requiring it to do so is just an unnecessary and painful 	bureaucratic delay.</li>
<li>On the other hand, consolidation 	can be a good idea even when systems don&#8217;t particularly 	interoperate. Data marts, which commonly do in part interoperate 	with central data stores, have all the more reason to be 	consolidated onto a central technology platform/stack.</li>
</ul>
<p style="margin-bottom: 0in;"><span id="more-805"></span><span style="font-style: normal;">Of course, the EDC vision isn&#8217;t quite as new or differentiated as Greenplum ideally would wish one to believe.</span></p>
<ul>
<li><span style="font-style: normal;">To 	a first approximation, EDC sounds a lot like <a href="../2009/04/30/ebays-two-enormous-data-warehouses/">what 	eBay has already built on Teradata equipment.</a> </span></li>
<li><span style="font-style: normal;">Greenplum&#8217;s 	EDC vision also sounds a lot like what Stuart Frost was talking 	about at DATA</span>llegro, <a href="../2009/03/02/closing-the-book-on-the-datallegro-customer-base/">what 	Dell was planning to build on DATAllegro equipment</a>, and what Stuart 	continues to talk about now that he&#8217;s been acquired into Microsoft.</li>
<li>Something like EDC can also be 	presumed to be implicit in the strategies of the other 	one-size-fits-all vendors &#8212; i.e., Oracle and IBM.</li>
<li>Greenplum has only implemented a 	little more of the EDC vision so far than have other firms, unless 	you give it credit for being cheap/fast/MPP/running on commodity 	hardware, but deny that credit to Teradata (specialized hardware, 	and not cheap in its most popular configurations), Oracle (ditto for 	Exadata), IBM (also not cheap), or Microsoft/DATAllegro (not 	released yet).</li>
<li>Specifically: In <a href="../2009/06/05/greenplum-update-release-3-3/">Greenplum 	Release 3.3</a>, which is being announced today, Greenplum is 	introducing the (enhanced?) ability for data marts to be spun out as 	a background operation, while the database otherwise remains 	functional.  As of 3.3, spinning out a data mart is a command-line 	operation. But in Release 3.4, Greenplum plans to offer a web-based 	interface for same, at which point the &#8220;self-service data mart 	creation&#8221; discussion will become operative.  Otherwise, EDC is 	a roadmap/vision/statement-of-direction much more than it is a 	fully-baked technical project.</li>
</ul>
<p style="margin-bottom: 0in;">One particular source of potential confusion is Greenplum&#8217;s emphasis on the buzzphrase <em>self-service (data mart).</em> This seems to be a conflation of two related concepts:</p>
<ul>
<li><strong>End users should be able to 	create new data marts themselves.</strong> Strictly speaking, I view this 	ability as useless at most enterprises, and important at very few, 	because of logistical issues.  (Who gives the permissions? Who 	decides which hardware is used?)  That said, useless &#8220;end user&#8221; 	tools often wind up being important productivity aids for IT 	professionals, and this kind of &#8220;self-service&#8221; would 	surely be another example. <em> Edit: Hmm. Doug Henschen inspired me to think that over again, and I&#8217;m beginning to soften. Suppose users could order up the data mart they want, perhaps test it at a very low processing priority (if they choose), and then send the completed request to IT for approval and provisioning. That would have some value.</em></li>
<li><strong>End users should be able to 	manage data marts themselves, once created.</strong> That&#8217;s a great 	idea, full of agility and don&#8217;t-make-IT-a-roadblock goodness. Data 	miners and similar analytic professionals commonly have the 	technical ability to manage a simple database, and should be allowed 	to do so if it&#8217;s ensured that they don&#8217;t break anything for anybody 	else.</li>
</ul>
<p style="margin-bottom: 0in;">One thing that&#8217;s needed for this technology to come to full fruition is sophisticated data movement and synchronization.  Ideally, some tables in a data mart could be virtual &#8212; views against a central database. But others would be physically recopied from the center, with all the ETL/ELT/ETLT/replication issues that entails. Meanwhile, it&#8217;s not obvious that the ideal architecture is a simpleminded hub-spoke &#8212; perhaps one should be able to spin data marts out of other marts, perhaps at least somewhat reducing the proliferation of tables and the recopying of data. And it should be easy for administrators to change deployment strategies, e.g. by starting a table out as a view and changing over to making it a physical copy as usage profiles change.</p>
<p style="margin-bottom: 0in;">Oliver Ratzesberger of eBay also argues that workload management &#8212; <a href="../2009/06/08/more-on-fox-interactive-medias-use-of-greenplum/">not a current Greenplum strength</a> &#8212; can be crucial. For example, if the CEO wants the CFO to get her an answer TODAY, the fastest approach may be to create an entirely virtual data mart, with very favorable SLAs (Service Level Agreements).  More generally, if you&#8217;re setting up dozens of marts that contain views of the central database, sophisticated SLA management can be essential. There&#8217;s a big virtualization opportunity here &#8212; but virtualization requires a lot of system management infrastructure.</p>
<p style="margin-bottom: 0in;"><em><strong>Related links</strong></em></p>
<ul>
<li>My recent post on <a href="http://www.dbms2.com/2009/05/30/reinventing-business-intelligence/" >reinventing 	business intelligence</a></li>
<li>Greenplum adviser Joe 	Hellerstein&#8217;s pitch for <a href="http://databeta.wordpress.com/2009/03/20/mad-skills/" onclick="javascript:pageTracker._trackPageview('/databeta.wordpress.com');">agile data warehousing</a></li>
<li>Charlie 	Bachman&#8217;s &#8220;<a href="http://www.oberon2005.ru/paper/cb2004-01e.pdf" onclick="javascript:pageTracker._trackPageview('/www.oberon2005.ru');">private database</a>&#8221; idea, which never went 	anywhere (pp. 138-139)</li>
<li>Greenplum&#8217;s <a href="http://www.prweb.com/releases/2009/06/prweb2505854.htm" onclick="javascript:pageTracker._trackPageview('/www.prweb.com');">EDC</a> and <a href="http://www.prweb.com/releases/2009/06/prweb2505844.htm" onclick="javascript:pageTracker._trackPageview('/www.prweb.com');">Release 3.3</a> press releases</li>
<li>An interview with some of Greenplum co-founder <a href=" {x|r} is divisible by x when x is a prime and r doesnâ€™t = x or 0 because there are no prime factors in the denominator but there is one in the numerator. As in the above equation the binomial coefficients donâ€™t have r=x or r=0. So a common x may be brought out   That's very badly phrased.  For example, ">Scott Yara&#8217;s own words</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/06/08/the-future-of-data-marts/feed/</wfw:commentRss>
		<slash:comments>22</slash:comments>
		</item>
		<item>
		<title>Reinventing business intelligence</title>
		<link>http://www.dbms2.com/2009/05/30/reinventing-business-intelligence/</link>
		<comments>http://www.dbms2.com/2009/05/30/reinventing-business-intelligence/#comments</comments>
		<pubDate>Sat, 30 May 2009 12:38:25 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[SAP AG]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=794</guid>
		<description><![CDATA[I&#8217;ve felt for quite a while that business intelligence tools are due for a revolution. But I&#8217;ve found the subject daunting to write about because &#8212; well, because it&#8217;s so multifaceted and big.  So to break that logjam, here are some thoughts on the reinvention of business intelligence technology, with no pretense of being [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I&#8217;ve felt for quite a while that business intelligence tools are due for a revolution. But I&#8217;ve found the subject daunting to write about because &#8212; well, because it&#8217;s so multifaceted and big.  So to break that logjam, here are some thoughts on the reinvention of business intelligence technology, with no pretense of being in any way comprehensive.</p>
<p style="margin-bottom: 0in;"><strong>Natural language and classic science fiction</strong></p>
<p style="margin-bottom: 0in;">Actually, there&#8217;s a pretty well-known example of BI near-perfection &#8212; <strong>the </strong><em><strong>Star Trek</strong></em><strong> computers,</strong> usually voiced by the late Majel Barrett Roddenberry. They didn&#8217;t have a big role in the recent movie, which was so fast-paced nobody had time to analyze very much, but were a big part of the <em>Star Trek</em> universe overall. <em>Star Trek&#8217;s</em> computers integrated analytics, operations, and authentication, all with a great natural language/voice interface and visual displays. That example is at the heart of <a href="http://www.texttechnologies.com/2009/05/30/men-are-from-earth-computers-are-from-vulcan/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">a 1998 article on natural language recognition I just re-posted</a>.</p>
<p style="margin-bottom: 0in;">As for reality: For decades, dating back at least to Artificial Intelligence Corporation&#8217;s Intellect, there have been offerings that provided<strong> &#8220;natural language&#8221; command, control, and query</strong> against otherwise fairly ordinary analytic tools. Such efforts have generally fizzled, for reasons outlined at the link above. Wolfram Alpha is the latest try; fortunately for its prospects, natural language is really only a small part of the Wolfram Alpha story.</p>
<p style="margin-bottom: 0in;">A second theme has more recently emerged &#8212; <strong>using text indexing to get at data more flexibly than a relational schema would normally allow,</strong> either by searching on data values themselves (stressed by <em>Attivio</em>) or more by searching on the definitions of pre-built reports (the Google OneBox story). SAP&#8217;s Explorer is the latest such view, but I find <a href="http://www.intelligententerprise.com/blog/archives/2009/05/explorer_seems.html#comments" onclick="javascript:pageTracker._trackPageview('/www.intelligententerprise.com');">Doug Henschen&#8217;s skepticism about SAP Explorer</a> more persuasive than <a href="http://www.intelligententerprise.com/blog/archives/2009/05/explorer_splash.html#comments" onclick="javascript:pageTracker._trackPageview('/www.intelligententerprise.com');">Cindi Howson&#8217;s cautiously favorable view</a>.  Partly that&#8217;s because I know SAP (and Business Objects); partly it&#8217;s because of difficulties such as those I already noted.</p>
<p style="margin-bottom: 0in;"><strong>Flexibility and data exploration</strong></p>
<p style="margin-bottom: 0in;">It&#8217;s a truism that each generation of dashboard-like technology fails because it&#8217;s too inflexible. Users are shown the information that will provide them with the most insight.  They appreciate it at first. But eventually it&#8217;s old hat, and when they want to do something new, the baked-in data model doesn&#8217;t support it.</p>
<p style="margin-bottom: 0in;">The latest attempts to overcome this problem lie in two overlapping trends &#8212; <strong>cool data exploration/visualization tools, </strong><span>and </span><strong>in-memory analytics.</strong> <span id="more-794"></span><span style="font-style: normal;">Tableau and Spotfire</span> are known more for the former; hot BI ven<span style="font-style: normal;">dor <a href="../2008/08/04/qliktech-qlikview-update/">QlikTech</a> is know</span>n for both. And many vendors &#8212; established or otherwise &#8212; are goi<span style="font-style: normal;">ng to <a href="../2009/04/22/clearing-some-of-my-buffer/">in-memory OLAP</a>.</span></p>
<p style="margin-bottom: 0in;"><strong><span style="font-style: normal;">Collaboration and communication</span></strong></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">The reason I&#8217;m finally buckling down and posting on this subject is the announcement of <a href="http://www.texttechnologies.com/2009/05/29/google-wave-finally-a-microsoft-killer/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">Google Wave, which I think foreshadows a revolution in communication and collaboration technology</a>. Google Wave augurs two primary advances. First, it shows how to make email, instant messaging, microblogging, and so on much more useful. Second, Google Wave could evolve in a way that &#8212; finally &#8212;  makes it truly practical for end-users to set up ad-hoc mini-portals that combine arbitrary URL-possessing resources, exposed to arbitrary workgroups of people.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">If and when both of those promises are fulfilled, it will become vastly easier for people to reason together about analytic questions.  That may take a little while, as Google Wave obviously wasn&#8217;t designed with business intelligence in mind. But whether from Google or from a frightened Microsoft redoubling its SharePoint efforts, there&#8217;s hope that we&#8217;ll see a leap forward in general collaboration technology. And since BI vendors are doing a generally decent job of exposing queries, charts and so on as portlets, it seems likely that business intelligence will benefit from the collaboration arms race.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">That&#8217;s important. The first time I heard that reporting was as important for communication as for analytics was from Pilot Software a quarter-century or so ago, and it&#8217;s just as true now as it was then.  In its first incarnations it probably will be a little too dumb for my tastes, focusing more on mindless reporting and same-old KPIs than on deeper analysis.  Still, it&#8217;s a move in a good direction.</span></p>
<p style="margin-bottom: 0in;"><strong><span style="font-style: normal;">Other directions</span></strong></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">As I said at the beginning, I find it too daunting to try to cover all facets of this subject in one post. So I&#8217;ll leave out, at a minimum:</span></p>
<ul>
<li><span style="font-style: normal;"><a href="http://www.dbms2.com/2009/02/25/even-more-final-version-of-my-tdwi-slide-deck/" >Data 	warehousing performance and TCO</a>, which I of course write about 	extensively</span></li>
<li><span style="font-style: normal;"><a href="http://www.dbms2.com/2009/05/21/notes-on-cep-performance/" >Complex 	event/stream processing</a>, which I&#8217;ve written quite a bit about too</span></li>
<li><span style="font-style: normal;">Data 	mining and predictive analytics</span></li>
<li><span style="font-style: normal;">Operational 	BI</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">plus some hobby horses you probably don&#8217;t want to hear about anyway until I work out a better way of articulating my opinions.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">But by all means please comment on what I&#8217;ve left out just as vigorously as on what I&#8217;ve included.  This post is just the first of many to come.</span></p>
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/05/30/reinventing-business-intelligence/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
	</channel>
</rss>
