<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Analytic technologies</title>
	<atom:link href="http://www.dbms2.com/category/analytics-technologies/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Tue, 07 Feb 2012 06:49:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>WibiData, derived data, and analytic schema flexibility</title>
		<link>http://www.dbms2.com/2012/02/06/wibidata-derived-data-and-analytic-schema-flexibility/</link>
		<comments>http://www.dbms2.com/2012/02/06/wibidata-derived-data-and-analytic-schema-flexibility/#comments</comments>
		<pubDate>Tue, 07 Feb 2012 03:18:25 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Odiago and WibiData]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5907</guid>
		<description><![CDATA[My clients at Odiago, vendors of WibiData, have changed their company name simply to WibiData. Even better, they blogged with more detail as to how WibiData works, in what is essentially a follow-on to my original WibiData post last October. Among other virtues, WibiData turns out to be a poster child for my views on [...]]]></description>
			<content:encoded><![CDATA[<p>My clients at Odiago, vendors of WibiData, have changed their company name simply to WibiData. Even better, they blogged with more detail as to <a href="http://www.wibidata.com/2012/02/07/how-wibidata-works/">how WibiData works</a>, in what is essentially a follow-on to <a href="../../../../../2011/11/02/5576/">my original WibiData post</a> last October. Among other virtues, WibiData turns out to be a poster child for my views on <a href="../../../../../2011/09/06/derived-data-progressive-enhancement-and-schema-evolution/">derived data and the corresponding schema evolution</a>.</p>
<p>Interesting quotes include:</p>
<blockquote><p>WibiData is designed to store &#8230; transactional data side-by-side with profile and other derived data attributes.</p></blockquote>
<blockquote><p>&#8230; the ability to add new ad-hoc columns to a table enables more flexible analysis: output data that is the result of one analytic pipeline is stored adjacent to its input data, meaning that you can easily use this as input to second- or third-order derived data as well.</p></blockquote>
<blockquote><p>schemas can vary over time; you can easily add a field to a record, or delete a field. &#8230; But even though you start collecting that new data, your existing analysis pipelines can treat records like they always did; programs that don’t yet know about the new cookie are still compatible with both the old records already collected, and the new records with the additional field. New programs fill in default values for old data recorded before a field was added, applying the new schema at read time.</p></blockquote>
<blockquote><p>schemas for every column are stored in a data dictionary that matches column names with their schemas, as well as human-readable descriptions of the data.</p></blockquote>
<p>Interesting aspects of the post that don&#8217;t lend themselves as well to being excerpted include:</p>
<ul>
<li>How the Produce-Gather &#8220;analysis calculus&#8221; &#8212; i.e. framework &#8212; works.</li>
<li>How this all ties into Apache projects (and sub-projects) such as Hadoop, HBase, and Avro.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/06/wibidata-derived-data-and-analytic-schema-flexibility/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sumo Logic and UIs for text-oriented data</title>
		<link>http://www.dbms2.com/2012/02/06/sumo-logic-and-uis-for-text-oriented-data/</link>
		<comments>http://www.dbms2.com/2012/02/06/sumo-logic-and-uis-for-text-oriented-data/#comments</comments>
		<pubDate>Mon, 06 Feb 2012 13:27:06 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Text]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5897</guid>
		<description><![CDATA[I talked with the Sumo Logic folks for an hour Thursday. Highlights included: Sumo Logic does SaaS (Software as a Service) log management. Sumo Logic is text indexing/Lucene-based. Thus, it is reasonable to think of Sumo Logic as &#8220;Splunk-like&#8221;. (However, Sumo Logic seems to have a stricter security/trouble-shooting orientation than Splunk, which is trying to [...]]]></description>
			<content:encoded><![CDATA[<p>I talked with the Sumo Logic folks for an hour Thursday. Highlights included:</p>
<ul>
<li>Sumo Logic does SaaS (Software as a Service) log management.</li>
<li>Sumo Logic is text indexing/Lucene-based. Thus, it is reasonable to think of Sumo Logic as &#8220;Splunk-like&#8221;. (However, Sumo Logic seems to have a stricter security/trouble-shooting orientation than Splunk, which is trying to <a href="../../../../../2012/01/10/splunk-update/">branch out</a>.)</li>
<li>Sumo Logic has hacked Lucene for faster indexing, and says 10-30 second latencies are typical.</li>
<li>Sumo Logic&#8217;s main differentiation is <strong>automated classification of events. </strong></li>
<li>There&#8217;s some kind of streaming engine in the mix, to update counters and drive alerts.</li>
<li>Sumo Logic has around 30 &#8220;customers,&#8221; free (mainly) or paying (around 5) as the case may be.</li>
<li>A truly typical Sumo Logic customer has single to low double digits of gigabytes of log data per day. However, Sumo Logic seems highly confident in its ability to handle a terabyte per customer per day, give or take a factor of 2.</li>
<li>When I asked about the implications of shipping that much data to a remote data center, Sumo Logic observed that log data compresses really well.</li>
<li>Sumo Logic recently raised a bunch of venture capital.</li>
<li>Sumo Logic&#8217;s founders are out of ArcSight, a log management company HP paid a bunch of money for.</li>
<li>Sumo Logic coined a marketing term &#8220;LogReduce&#8221;, but it has nothing to do with &#8220;MapReduce&#8221;. Sumo Logic seems to find this amusing.</li>
</ul>
<p>What interests me about Sumo Logic is that automated classification story. I thought I heard Sumo Logic say:<span id="more-5897"></span></p>
<ul>
<li>It&#8217;s largely unsupervised machine learning.</li>
<li>It&#8217;s specific to a particular user/data set.</li>
<li>It can be up and running and classifying things effectively almost instantly (i.e., on seconds&#8217; or minutes&#8217; worth of data).</li>
<li>It&#8217;s informed by what different users tag as false positives. (Or maybe that is planned for future versions.)</li>
</ul>
<p><em>I have a little trouble seeing how all those points fit exactly together, so perhaps I got some details wrong.</em></p>
<p>The payoff is that <strong>machine learning directly informs the Sumo Logic user interface</strong>. In particular, large numbers of events are bundled into a small number of categories, hopefully making it much easier for network operations types to scan the UI and pick out what&#8217;s important.</p>
<p>In general, the idea of machine-learning informing analytic UIs via some sort of classification is common in text-oriented technologies, notably in:</p>
<ul>
<li>Good ol&#8217; text search.</li>
<li>Text mining vendors&#8217; approaches to clustering hits on words or phrases that say substantially the same thing.</li>
</ul>
<p>But otherwise it seems kind of rare, if we stipulate that ad-serving/general internet personalization isn&#8217;t really an analytic UI &#8212; but I&#8217;d love to hear of any interesting examples I&#8217;ve overlooked.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/06/sumo-logic-and-uis-for-text-oriented-data/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Comments on the 2012 Forrester Wave: Enterprise Hadoop Solutions</title>
		<link>http://www.dbms2.com/2012/02/06/comments-on-the-2012-forrester-wave-enterprise-hadoop-solutions/</link>
		<comments>http://www.dbms2.com/2012/02/06/comments-on-the-2012-forrester-wave-enterprise-hadoop-solutions/#comments</comments>
		<pubDate>Mon, 06 Feb 2012 05:16:20 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[MapR]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Pentaho]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5886</guid>
		<description><![CDATA[Forrester has released its Q1 2012 Forrester Wave: Enterprise Hadoop Solutions. (Googling turns up a direct link, but in case that doesn&#8217;t prove stable, here also is a registration-required link from IBM&#8217;s Conor O&#8217;Mahony.) My comments include: The Forrester Wave&#8217;s relative vendor rankings are meaningless, in that the document compares apples, peaches, almonds, and peanuts. [...]]]></description>
			<content:encoded><![CDATA[<p>Forrester has released its Q1 2012 Forrester Wave: Enterprise Hadoop Solutions. (Googling turns up a <a href="http://www.forrester.com/rb/go?docid=60755&amp;oid=1-K07LCA&amp;action=5">direct link</a>, but in case that doesn&#8217;t prove stable, here also is <a href="http://database-diary.com/2012/02/02/get-a-free-copy-of-the-forrester-wave-for-enterprise-hadoop-solutions/">a registration-required link from IBM&#8217;s Conor O&#8217;Mahony</a>.) My comments include:</p>
<ul>
<li>The Forrester Wave&#8217;s <strong>relative vendor rankings are meaningless,</strong> in that the document compares apples, peaches, almonds, and peanuts. Apparently, it covers any vendor that includes a distribution of Apache Hadoop MapReduce into something it offers, and that offered at least two (not necessarily full production) references for same.</li>
<li>The Forrester Wave for &#8220;enterprise Hadoop&#8221; contradicts itself on the subject of Hortonworks.
<ul>
<li>The Forrester Wave for &#8220;enterprise Hadoop&#8221; is correct when it says <strong>&#8220;Hortonworks &#8230; has Hadoop training and professional services offerings that are still embryonic.&#8221;</strong></li>
</ul>
<ul>
<li>Peculiarly, the Forrester Wave for &#8220;enterprise Hadoop&#8221; also says &#8220;Hortonworks offers an impressive Hadoop professional services portfolio&#8221;. Hortonworks will likely win one or more nice partnership deals with vendors in adjacent fields, but even so its professional services capabilities are &#8230; well, a good word might be &#8220;embryonic&#8221;.</li>
</ul>
</li>
<li><a href="http://www.dbms2.com/2011/02/11/comments-on-the-2011-forrester-wave-for-enterprise-data-warehouse-platforms/">Forrester Waves always seem to have weird implicit definitions of &#8220;data warehousing&#8221;</a>. This one is no exception.</li>
<li>Forrester gave top marks in &#8220;Functionality&#8221; to 11 of 13 &#8220;enterprise Hadoop&#8221; vendors. This seems odd.</li>
<li>I don&#8217;t know why MapR, which doesn&#8217;t like HDFS (Hadoop Distributed File System), got top marks in &#8220;Subproject integration&#8221;.</li>
<li>Forrester gave top marks in &#8220;Storage&#8221; to Datameer. It also gave higher marks to MapR than to EMC Greenplum, even though EMC Greenplum&#8217;s technology is a superset of MapR&#8217;s. Very strange. <em>(Edit: Actually, as per a comment below, there is some uncertainty about the EMC/MapR relationship.)</em></li>
<li>Forrester gave higher marks in &#8220;Acceleration and optimization&#8221; to Hortonworks than to Cloudera and IBM, and higher marks yet to Pentaho. Very odd.</li>
<li>I&#8217;m not sure what Forrester is calling a &#8220;Distributed EDW file store connector&#8221;, but it sounds like something that Cloudera has provided via partnership to a number of analytic DBMS vendors.</li>
<li>Forrester&#8217;s &#8220;Strategy&#8221; rankings seem to correlate to a metric of &#8220;We&#8217;re a large enough vendor to go in N directions at once&#8221;, for various values of N.</li>
<li>Forrester is correct to rank Cloudera&#8217;s &#8220;Adoption&#8221; as being stronger than EMC/Greenplum&#8217;s or MapR&#8217;s. But Hortonworks&#8217; strong mark for &#8220;Adoption&#8221; baffles me.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/06/comments-on-the-2012-forrester-wave-enterprise-hadoop-solutions/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Departmental analytics &#8212; best practices</title>
		<link>http://www.dbms2.com/2012/01/25/departmental-analytics-best-practices/</link>
		<comments>http://www.dbms2.com/2012/01/25/departmental-analytics-best-practices/#comments</comments>
		<pubDate>Wed, 25 Jan 2012 16:47:59 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5867</guid>
		<description><![CDATA[I believe IT departments should support and encourage departmental analytics efforts, where &#8220;support&#8221; and &#8220;encourage&#8221; are not synonyms for &#8220;control&#8221;, &#8220;dominate&#8221;, &#8220;overwhelm&#8221;, or even &#8220;tame&#8221;. A big part of that is: Let, and indeed help, departments have the data they want, when they want it, served with blazing performance. Three things that absolutely should NOT [...]]]></description>
			<content:encoded><![CDATA[<p><a href="../../../../../2012/01/23/departmental-analytics-general-observations/">I believe IT departments should support and encourage departmental analytics efforts</a>, where &#8220;support&#8221; and &#8220;encourage&#8221; are not synonyms for &#8220;control&#8221;, &#8220;dominate&#8221;, &#8220;overwhelm&#8221;, or even &#8220;tame&#8221;. A big part of that is:<br />
<strong>Let, and indeed help, departments have the data they want, when they want it, served with blazing performance.</strong></p>
<p>Three things that absolutely should NOT be obstacles to these ends are:</p>
<ul>
<li>Corporate DBMS standards.</li>
<li>Corporate data governance processes.</li>
<li>The difficulties of ETL.</li>
</ul>
<p><span id="more-5867"></span>Reasons they shouldn&#8217;t or don&#8217;t need to be obstacles include:</p>
<ul>
<li>Analytic DBMS are often vastly more cost-effective than general-purpose ones.</li>
<li>In particular, analytic DBMS are often much easier to install and manage than general-purpose ones.</li>
<li>Heavy data governance bureaucracy is often unnecessary because:
<ul>
<li>The department should know what the limitations on the data&#8217;s accuracy are.</li>
<li>The department should know how much data accuracy is required.</li>
<li>The side-effects on other departments of any data inaccuracy would be minimal.</li>
</ul>
</li>
<li>There are multiple good schemes for populating data marts, managed by cost-effective analytic DBMS, with data from integrated data warehouses.
<ul>
<li>ELT (Extract/Load/Transform) almost always works, because data cleaning/data quality was handled at or before the IDW level, and because the analytic DBMS has the processing power to pull it off.</li>
<li>ETL (Extract/Transform/Load) should be easy as well. (If isn&#8217;t, something may be lacking in your ETL set-up.)</li>
<li>Analytic DBMS are increasingly adding capabilities for easy spin-out of real or virtual data marts. Other kinds of technology (e.g. virtualization) are having their database spin-out capabilities upgraded as well.</li>
</ul>
</li>
</ul>
<p>One point to remember in support of departmental autonomy <strong>is that departments&#8217; views of what data to use may be more expansive than central IT&#8217;s.</strong> One reason is that important data may be external to the company, outside IT&#8217;s natural realm  of concern. Examples of this include but are hardly limited to:</p>
<ul>
<li>Anything like &#8220;market data&#8221;.</li>
<li>Anything like &#8220;sentiment analysis&#8221;.</li>
<li>Data owned by supply chain partners.</li>
</ul>
<p>Further, even the more innovative internal data sources are commonly departmental, for example various kinds of multi-structured data (text verbatims from customers, log file data, and so on).</p>
<p>Whatever is true of data management (and ETL) is true for metadata management, even if it&#8217;s done by some kind of business intelligence tool. What I mean by that is:</p>
<ul>
<li><strong>Whoever manages data is also responsible for ingesting and emitting it &#8230;</strong></li>
<li>&#8230; and specifically for emitting it in<strong> understandable, well-organized, well-named formats, &#8230;</strong></li>
<li><strong>&#8230; </strong>so that <strong>departments can take responsibility for</strong> what amounts to <strong>lightweight analytic application development.</strong></li>
</ul>
<p>As for the &#8220;application development&#8221; itself, I&#8217;m envisioning at least three things:</p>
<ul>
<li>Math.</li>
<li>Sophisticated relational query.</li>
<li>Data visualization.</li>
</ul>
<p>I.e., I&#8217;m talking about what &#8220;analysts&#8221; and &#8220;quants&#8221; do. So to put the point even more simply:</p>
<ul>
<li><strong>Analysts and quants should be able to consume data that&#8217;s organized in a friendly manner.</strong></li>
<li><strong>Central IT should be friendly in how it serves data.</strong></li>
</ul>
<p>One corollary of this approach is that departments should try to adhere to corporate BI standards, at least for routine dashboard and reporting. Indeed, if a department brings in a business intelligence tool different from the corporate standard, there are three main possibilities:</p>
<ul>
<li>The tool is integrated with something else it makes sense to bring in, such as a third-party data supply or application.</li>
<li>The tool has an important capability the corporate standard doesn&#8217;t have, such as more flexible visualization and drilldown.</li>
<li>Central IT screwed up, making things much more difficult than they needed to be.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/01/25/departmental-analytics-best-practices/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Microsoft SQL Server 2012 and enterprise database choices in general</title>
		<link>http://www.dbms2.com/2012/01/24/microsoft-sql-server-2012/</link>
		<comments>http://www.dbms2.com/2012/01/24/microsoft-sql-server-2012/#comments</comments>
		<pubDate>Tue, 24 Jan 2012 14:42:34 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Mid-range]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5859</guid>
		<description><![CDATA[Microsoft is launching SQL Server 2012 on March 7. An IM chat with a reporter resulted, and went something like this. Reporter: [Care to comment]? CAM: SQL Server is an adequate product if you don&#8217;t mind being locked into the Microsoft stack. For example, the ColumnStore feature is very partial, given that it can&#8217;t be [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.sqlserverlaunch.com/ww/Home">Microsoft is launching SQL Server 2012 on March 7</a>. An IM chat with a reporter resulted, and went something like this.</p>
<p><strong>Reporter: [Care to comment]?</strong><br />
<strong>CAM:</strong> SQL Server is an adequate product if you don&#8217;t mind being locked into the Microsoft stack. For example, the ColumnStore feature is very partial, given that <a href="http://msdn.microsoft.com/en-us/library/gg492088%28v=sql.110%29.aspx#Update">it can&#8217;t be updated</a>; but Oracle doesn&#8217;t have columnar storage at all.</p>
<p><strong>Reporter: Is the lock-in overall worse than IBM DB2, Oracle?</strong><br />
<strong>CAM:</strong> Microsoft locks you into an operating system, so yes.</p>
<p><strong>Reporter: Is this release something larger Oracle or IBM shops could consider as a lower-cost alternative a co-habitation scenario, in the event they&#8217;re mulling whether to buy more Oracle or IBM licenses?</strong><br />
<strong>CAM:</strong> If they have a strong Microsoft-stack investment already, sure. Otherwise, why?</p>
<p><strong>Reporter: [How about] just cost?</strong><br />
<strong>CAM:</strong> DB2 works just as well to keep Oracle honest as SQL Server does, and without a major operating system commitment. For analytic databases you want an analytic DBMS or appliance anyway.</p>
<p>Best is to have one major vendor of OTLP/general-purpose DBMS, a web DBMS, a DBMS for disposable projects (that may be the same as one of the first two), plus however many different analytic data stores you need to get the job done.</p>
<p>By &#8220;web DBMS&#8221; I mean MySQL, NewSQL, or NoSQL. Actually, you might need more than one product in that area.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/01/24/microsoft-sql-server-2012/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Departmental analytics &#8212; general observations</title>
		<link>http://www.dbms2.com/2012/01/23/departmental-analytics-general-observations/</link>
		<comments>http://www.dbms2.com/2012/01/23/departmental-analytics-general-observations/#comments</comments>
		<pubDate>Mon, 23 Jan 2012 14:29:06 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5843</guid>
		<description><![CDATA[Department-level adoption of analytic technology isn&#8217;t the exception; it&#8217;s the norm. Reasons include: Many analytic challenges are inherently departmental. In many cases, central IT control of analytics isn&#8217;t needed. Departments move ahead without central approval or involvement because they can. That said, arguments for centralizing analytic technology include: A lot of data is used by [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.softwarememories.com/2012/01/17/historical-notes-on-the-departmental-adoption-of-analytics/">Department-level adoption of analytic technology isn&#8217;t the exception; it&#8217;s the norm</a>. Reasons include:</p>
<ul>
<li><strong>Many analytic challenges are inherently departmental.</strong></li>
<li>In many cases,<strong> central IT control of analytics isn&#8217;t needed.</strong></li>
<li>Departments move ahead without central approval or involvement because they can.</li>
</ul>
<p>That said, arguments for centralizing analytic technology include:</p>
<ul>
<li><strong>A lot of data is used by more than one department</strong>, for example:
<ul>
<li>Financial transactions (one or more affected departments and also the central accounting group).</li>
<li>Web logs (marketing and IT/web operations).</li>
</ul>
</li>
<li><strong>Departments may not have the requisite technical expertise </strong>(and it may be redundant/cost-ineffective for them to acquire it).</li>
</ul>
<p>What&#8217;s more, there are IT best practices to support department-level analytics. Some of the key ones boil down to:</p>
<ul>
<li>Be <strong>flexible</strong> in your <strong>analytic DBMS support.</strong></li>
<li>Be <strong>responsive</strong> to requests for <strong>ETL.</strong></li>
</ul>
<p>My conclusion is that <strong>central IT should encourage (and aid) departmental analytics. </strong>Let&#8217;s look at some details.</p>
<p><span id="more-5843"></span>I think two huge categories of analytic problem are inherently departmental:</p>
<ul>
<li><a href="../../../../../2011/03/03/investigative-analytics/">Investigative analytics</a> (pretty much all of it).</li>
<li>Routine monitoring/dashboarding if the data is tracked just by one department.</li>
</ul>
<p>Investigative analytics is a kind of research activity &#8212; you&#8217;re looking to discover previously unrecognized patterns. There are two approaches to this &#8212; you can do it in the department that has the relevant business knowledge, or you can outsource it to a special group of &#8220;discoverers&#8221; (commonly statisticians).* Either way, this is a small team/departmental kind of activity.</p>
<p><em>*Combining the two approaches is common &#8212; a department can have its own analytically adept discoverers, whether they&#8217;re call &#8220;quants&#8221; or just &#8220;business analysts&#8221;.</em></p>
<p>Reporting/monitoring BI at least has the potential to be enterprise-wide &#8212; but commonly it isn&#8217;t, as each department has its own operational data sources and metrics. Marketing departments may watch external data that the rest of the company doesn&#8217;t worry about. But it can be true across the board. Factory operations folks may track machine tool data the rest of us barely understand.</p>
<p>Even if a business need is strictly departmental, there can be at least two reasons to centralize technology implementation:</p>
<ul>
<li>The department doesn&#8217;t have the critical mass of IT expertise.</li>
<li>Departmental IT has side effects on the rest of the company.</li>
</ul>
<p>Whether those reasons hold up depends a lot on what kind of analytic scenario we&#8217;re talking about.</p>
<p>Let&#8217;s organize that part of this discussion in line with the taxonomy from my <a href="../../../../../2011/07/05/eight-kinds-of-analytic-database-part-1/">eight kinds of analytic database</a> posts last July.</p>
<ul>
<li><strong>Enterprise data warehouses</strong> fall under the purview of major IT organizations. That remains true even if we pivot to the more realistic concept of <a href="../../../../../2011/11/28/terminology-data-mustering/">integrated data warehouse</a>. However, less stuff needs to be protected in an EDW/IDW than some data authoritarians like to think.</li>
<li>I wrote that the stresses on <strong>traditional data marts</strong> were &#8220;performance, concurrency, TCO.&#8221; This is a clue that the more demanding examples are right in IT&#8217;s wheelhouse. As for the less demanding cases &#8212; IT should be able to meet those needs without breaking a sweat.</li>
<li><strong>Agile investigative data marts</strong> are inherently departmental. If you have the talent to use one, you also have the talent to, for example, train into being a part time Netezza DBA. Who cares if you don&#8217;t have the expertise to do sophisticated tuning? Analytic DBMS are fast enough &#8212; and hardware is cheap enough &#8212; that you don&#8217;t that skill set anyway.</li>
<li><strong>Big investigative data marts</strong> can go either way. They&#8217;re technically challenging, so IT certainly has a claim on them. But in cases where the data, while big, is fairly homogeneous, it&#8217;s also not unrealistic for departments to handle the mart themselves.</li>
<li><strong>Bit buckets</strong> are often departmental today, with the department in question happening to be central IT. And central IT is where they&#8217;re likely to flourish, as the data they hold becomes ever more diverse.</li>
<li><strong>Archival data stores</strong> are a central IT matter. Nobody else is likely to care enough to do it right.</li>
<li><strong>Outsourced data marts,</strong> by definition, don&#8217;t live inside conventional enterprises. But they are often a way for business units to get access to data and analytics without relying on central IT.</li>
<li><strong>Operational analytics servers</strong> are likely to be sufficiently mission-critical that you want them handled by IT.</li>
</ul>
<p>So in most cases I&#8217;d say: <strong>Departments can manage their own investigative data marts</strong>, and so of course can SaaS vendors and third-party data providers;<strong> other analytic databases should be run by central IT.</strong> (And of course, large departments with serious local IT can fuzz those distinctions up.) Beyond that, it would seem that whoever administers the database should administer the rest of the analytic stack as well.</p>
<p>That still leaves us with some practical questions, such as:</p>
<ul>
<li>Exactly what products should IT departments buy for which purposes? I hope a lot of posts in this blog are helpful in that consideration.</li>
<li>How should development tasks be split between departments and central IT? It may take me a while to get a post together on that the subject, since in general the analytics-development picture is pretty complicated to lay out.</li>
<li>How should departments and central IT work together to manage departmental investigative data marts? I hope to post on that subject soon.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/01/23/departmental-analytics-general-observations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>KXEN clarifies its story</title>
		<link>http://www.dbms2.com/2012/01/18/kxen-clarifies-its-story/</link>
		<comments>http://www.dbms2.com/2012/01/18/kxen-clarifies-its-story/#comments</comments>
		<pubDate>Wed, 18 Jan 2012 05:28:30 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[KXEN]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5831</guid>
		<description><![CDATA[I frequently badger my clients to tell their story in the form of a company blog, where they can say what needs saying without being restricted by the rules of other formats. KXEN actually listened, and put up a pair of CTO posts that make the company story a lot clearer. Excerpts from the first [...]]]></description>
			<content:encoded><![CDATA[<p>I frequently badger my clients to tell their story in the form of a company blog, where they can say what needs saying without being restricted by the rules of other formats. KXEN actually listened, and put up a pair of CTO posts that make the company story a lot clearer.</p>
<p>Excerpts from the <a href="http://www.kxen.com/blog/2012/01/why-proprietary-algorithms/">first post</a> include (with minor edits for formatting, including added emphasis):</p>
<blockquote><p>Back in 1995, Vladimir Vapnik &#8230; changed the machine learning game with his new  ‘Statistical Learning Theory’: he provided the machine learning guys with a  mathematical framework that allowed them finally to understand, at the core, why  some techniques were working and some others were not. All of a sudden, a new  realm of algorithms could be written that would use mathematical equations  instead of engineering data science tricks (don’t get me wrong here: I am an  engineer at heart and I know the value of “tricks,” but tricks cannot overcome  the drawbacks of a bad mathematical framework). Here was a foundation for  <strong>automated data mining techniques that would perform as well as the best data  scientists</strong> deploying these tricks. Luck is not enough though; it was because we  knew a lot about statistics and machine learning that we were able to decipher  the nuggets of gold in Vladimir’s theory.</p></blockquote>
<p><span id="more-5831"></span></p>
<blockquote><p>The market needed a system that is able to perform classification and regression  (we later added clustering/segmentation, times series analysis, association  rules and social network analysis), that has the following characteristics:</p>
<ol>
<li><strong>Non-parametric: little user intervention and tuning should be required — it  should work well out of the box.</strong></li>
<li>Independent of the data and target distribution:
<ul>
<li>Target: the classification system should be able to handle rates of positive  values even as low as 0.1% (such as in fraud, for example), or be able to  forecast a continuous value with only 1% of non zero values.</li>
<li>Data: it should automate mixing and matching and comparing influence for  ordinal, nominal, continuous,and textual variables without any user  intervention.</li>
</ul>
</li>
<li>Scalable in number of rows: the<strong> training time should be linear with the  number of rows,</strong> and the quality of the models should increase with the number of  rows.</li>
<li>Scalable in number of columns: the<strong> training time should be close to linear  with respect to the number of columns,</strong> and<strong> the quality of the models should  increase with the number of columns. </strong>It is well known that most algorithms  present a problem of over-fitting in high dimensions; it is quite ironic that  companies spend billions of dollars in collecting data but often cannot take  advantage of all this data because most first-generation analytical workbenches  collapse trying to handle the high dimensionality inherent in all this  data.</li>
<li>Descriptive: a good predictive analytics package must be able to present its  findings in a way that a business user can understand. We have always believed  that there is a continuum between predictive and descriptive analytics:  predictive models should be descriptive enough and descriptive models should be  usable in a predictive manner to make decisions.</li>
<li>Deployable: the scoring equations should be simple enough to be deployed in  any operational environment: SQL for databases, Java code for the web (or even  for smartphones), etc.</li>
</ol>
</blockquote>
<blockquote><p>Vapnik’s theory provided us with a mathematical framework for capabilities 1, 2 and 4 above; what remained was 3, 5 and 6, which we solved with a well known pattern in machine learning: by using linear systems in a properly encoded space (the trick is to have the good encoded space).</p></blockquote>
<p>The <a href="http://www.kxen.com/blog/2012/01/how-infiniteinsight-can-make-you-a-great-data-scientist/">second post</a> seems to make some strong model-quality benchmark claims, but there also seems to be an in-house-vs.-publicly-checked mismatch going on.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/01/18/kxen-clarifies-its-story/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Splunk update</title>
		<link>http://www.dbms2.com/2012/01/10/splunk-update/</link>
		<comments>http://www.dbms2.com/2012/01/10/splunk-update/#comments</comments>
		<pubDate>Tue, 10 Jan 2012 05:55:08 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Splunk]]></category>
		<category><![CDATA[Structured documents]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5791</guid>
		<description><![CDATA[Splunk is announcing the Splunk 4.3 point release. Before discussing it, let&#8217;s recall a few things about Splunk, starting with: Splunk is first and foremost an analytic DBMS &#8230; &#8230; used to manage logs and similar multistructured data. Splunk&#8217;s DML (Data Manipulation Language) is based on text search, not on SQL. Splunk has extended its [...]]]></description>
			<content:encoded><![CDATA[<p>Splunk is announcing the Splunk 4.3 point release. Before discussing it, let&#8217;s recall a few things about Splunk, starting with:</p>
<ul>
<li>Splunk is first and foremost an analytic DBMS &#8230;</li>
<li>&#8230; used to manage logs and similar multistructured data.</li>
<li>Splunk&#8217;s DML (Data Manipulation Language) is based on text search, not on SQL.</li>
<li>Splunk has extended its DML in natural ways (e.g., you can use it to do calculations and even some statistics).</li>
<li>Splunk bundles some (very) basic, Splunk-specific business intelligence capabilities.</li>
<li>The paradigmatic use of Splunk is to monitor IT operations in real time. However:
<ul>
<li>There also are plenty of non-real-time uses for Splunk.</li>
<li>Splunk is proudest of its growth in non-IT quasi-real-time uses, such as the marketing side of web operations.</li>
</ul>
</li>
</ul>
<p>As in any release, a lot of Splunk 4.3 is about &#8220;Oh, you didn&#8217;t have that before?&#8221; features and <a href="../../../../../2009/08/21/bottleneck-whack-a-mole/">Bottleneck Whack-A-Mole</a> performance speed-up. One performance enhancement is Bloom filters, which are a very hot topic these days. More important is a switch from Flash to HTML5, so as to accommodate mobile devices with less server-side rendering. Splunk reports that its users &#8212; especially the non-IT ones &#8212; really want to get Splunk information on the tablet devices. While this somewhat contradicts <a href="../../../../../2012/01/04/some-issues-in-business-intelligence/">what I wrote a few days ago pooh-poohing mobile BI</a>, let me hasten to point out:</p>
<ul>
<li>Splunk is used for a lot of (quasi) real-time monitoring.</li>
<li>Splunk&#8217;s desktop user interfaces are, by BI standards, quite primitive.</li>
</ul>
<p>That&#8217;s pretty much the ideal scenario for mobile BI: Timeliness matters and prettiness doesn&#8217;t.</p>
<p><span id="more-5791"></span><em>Hmm. Maybe <a href="../../../../../2011/11/10/streambase-liveview-push-based-real-time-bi/">StreamBase LiveView</a> needs a mobile option as well &#8230;</em></p>
<p>Splunk&#8217;s basic use is to take the text string that is a log and make sense of it. But Splunk now also supports JSON structures. It does this via something called spath, which as you might guess from the name has XPath similarities. That probably bore more discussion than we found the time to have.</p>
<p><em>By the way: If you&#8217;re interested in BI over XML, that&#8217;s what my former clients at Skytide were founded to do, before they pivoted a bit. I don&#8217;t think those capabilities have disappeared from the product</em>.</p>
<p><a href="http://www.monash.com/uploads/Splunk-4-3.pdf">Splunk has graciously allowed me to post a slide deck</a>. More stuff in there, including quotes from a customer &#8212; Expedia &#8212; that has 2700 Splunk users.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/01/10/splunk-update/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Big data terminology and positioning</title>
		<link>http://www.dbms2.com/2012/01/08/big-data-terminology-and-positioning/</link>
		<comments>http://www.dbms2.com/2012/01/08/big-data-terminology-and-positioning/#comments</comments>
		<pubDate>Mon, 09 Jan 2012 01:35:57 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MarkLogic]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Splunk]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5768</guid>
		<description><![CDATA[Recently, I observed that Big Data terminology is seriously broken. It is reasonable to reduce the subject to two quasi-dimensions: Bigness &#8212; Volume, Velocity, size Structure &#8212; Variety, Variability, Complexity given that High-velocity &#8220;big data&#8221; problems are usually high-volume as well.* Variety, variability, and complexity all relate to the simply-structured/poly-structured distinction. But the conflation should [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, I observed that <a href="../../../../../2011/09/11/big-data-has-jumped-the-shark/">Big Data terminology is seriously broken</a>. It is reasonable to reduce the subject to two quasi-dimensions:</p>
<ul>
<li><strong>Bigness</strong> &#8212; Volume, Velocity, size</li>
<li><strong>Structure</strong> &#8212; Variety, Variability, Complexity</li>
</ul>
<p>given that</p>
<ul>
<li>High-velocity &#8220;big data&#8221; problems are usually high-volume as well.*</li>
<li>Variety, variability, and complexity all relate to the <a href="../../../../../2011/05/17/poly-structured-database/">simply-structured/poly-structured</a> distinction.</li>
</ul>
<p>But the conflation should stop there.</p>
<p><em>*Low-volume/high-velocity problems are commonly referred to as <a href="../2011/08/25/renaming-cep-or-not/">&#8220;event processing&#8221; and/or &#8220;streaming&#8221;</a>.</em></p>
<p>When people claim that bigness and structure are the same issue, they oversimplify into mush. So I think we need four pieces of terminology, reflective of a 2&#215;2 matrix of possibilities. For want of better alternatives, my suggestions are:</p>
<ul>
<li><strong>Relational big data</strong> is data of high volume that fits well into a relational DBMS.</li>
<li><strong>Multi-structured big data</strong> is data of high volume that doesn&#8217;t fit well into a relational DBMS. <em>Alternative: Poly-structured big data.</em></li>
<li><strong>Conventional relational data</strong> is data of not-so-high volume that fits well into a relational DBMS. <em>Alternatives: Ordinary/normal/smaller relational data.</em></li>
<li><strong>Smaller poly-structured data</strong> is data for which <a href="../../../../../2011/07/31/dynamic-fixed-schema-databases/">dynamic schema</a> capabilities are important, but which doesn&#8217;t rise to &#8220;big data&#8221; volume.</li>
</ul>
<p><span id="more-5768"></span>Notes on all this include:</p>
<ul>
<li>&#8220;Relational big data&#8221; is commonly what you need a scalable analytic relational DBMS for. But there are non-analytic use cases as well.</li>
<li>The paradigmatic example of &#8220;multi-structured big data&#8221; is log files. Thus, multi-structured big data is commonly what you need a <a href="../../../../../2011/06/04/dirty-data-stored-dirt-cheap/">big bit bucket</a> for.</li>
<li>One might want to equate non-analytic relational big data technology to &#8220;NewSQL&#8221;. However, I&#8217;m struggling to think of a database size range in which the entire NewSQL industry can match Oracle&#8217;s market share alone.</li>
<li>One might want to equate non-analytic multi-structured big data technology to &#8220;NoSQL&#8221;. However:
<ul>
<li>&#8220;NoSQL&#8221; is also used to encompass not-so-big-data use cases, such as prototyping in MongoDB.</li>
<li><a href="../../../../../2011/10/02/defining-nosql/">&#8220;NoSQL&#8221; has non-ACID/low(er)-data-integrity connotations</a> that aren&#8217;t appropriate for all non-relational systems.</li>
</ul>
</li>
<li>Up to a point, you can analyze relational big data in a conventional relational DBMS, but an analytic RDBMS will usually win on TCO (Total Cost of Ownership). In particular, reasonable thresholds for moving an analytic database off Oracle might be:
<ul>
<li>1-2 terabytes if you&#8217;ve never bought anything past Oracle Standard Edition.</li>
<li>5-10 terabytes if you&#8217;re already paying for Oracle Enterprise Edition.</li>
<li>A lot higher than that if you actually find Oracle Exadata to be cost-effective.</li>
</ul>
</li>
<li>Depending on how big one acknowledges as &#8220;big&#8221;, the market share leader in &#8220;big bit bucket&#8221; use cases is either Splunk or Hadoop.</li>
<li>If we look at multi-structured big data management overall, MarkLogic joins the list of market share contenders, as do various NoSQL alternatives.</li>
<li>It is wrong to say that the large web companies invented &#8220;big data&#8221; technology. But it is more reasonable to say they invented much of &#8220;multi-structured big data&#8221; management. In particular (and this is just a partial list), Google, Amazon, Yahoo, Facebook, et al. can reasonably be credited with Hadoop, Cassandra, HBase and various predecessors to same.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/01/08/big-data-terminology-and-positioning/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Some issues in business intelligence</title>
		<link>http://www.dbms2.com/2012/01/04/some-issues-in-business-intelligence/</link>
		<comments>http://www.dbms2.com/2012/01/04/some-issues-in-business-intelligence/#comments</comments>
		<pubDate>Thu, 05 Jan 2012 00:57:08 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business Objects]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Gooddata]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5763</guid>
		<description><![CDATA[In November I wrote two parts of a planned multi-post series on issues in analytic technology. Then I got caught up in year-end things and didn&#8217;t blog for a month. Well &#8230; Happy New Year! I&#8217;m back. Let&#8217;s survey a few BI-related topics. Mobile business intelligence &#8212; real business value or just a snazzy demo? [...]]]></description>
			<content:encoded><![CDATA[<p>In November I wrote <a href="../../../../../2011/11/21/analytic-trends-in-2012-qa/">two</a> <a href="../../../../../2011/11/21/big-vendor-execution-analytics/">parts</a> of a planned multi-post series on issues in analytic technology. Then I got caught up in year-end things and didn&#8217;t blog for a month. Well &#8230; Happy New Year! I&#8217;m back. Let&#8217;s survey a few BI-related topics.</p>
<p><strong>Mobile business intelligence &#8212; real business value or just a snazzy demo?</strong></p>
<p>I discussed some <a href="../../../../../2010/07/15/mobile-business-intelligence/">mobile BI use cases</a> in July 2010, but I&#8217;m still not convinced the whole area is a legitimate big deal. BI has a long history of snazzy, senior-exec-pleasing demos that have little to do with substantive business value. For now, I think mobile BI is another of those; few people will gain deep analytic insights staring into their iPhones. I don&#8217;t see anything coming that&#8217;s going to change the situation soon.</p>
<p><strong>BI-centric collaboration &#8212; real business value or just a snazzy demo?</strong></p>
<p>I&#8217;m more optimistic about <a href="../../../../../2011/11/16/qlikview-collaborative-business-intelligence/">collaborative business intelligence</a>. QlikView&#8217;s direct sharing of dashboards will, I think, be a feature competitors must and will imitate. Social media BI collaboration is still in the &#8220;mainly a demo&#8221; phase, but I think it meets a broader and deeper need than does mobile BI. Over the next few years, I expect numerous enterprises to establish strong cultures of analytic chatter (and then give frequent talks about same at industry conferences).   <span id="more-5763"></span></p>
<p><strong>Business intelligence for mid-market enterprises is problematic</strong></p>
<p>Given the saturation of the large-enterprise BI market with supposed enterprise-standard BI systems, it would seem that smaller enterprises comprise a large part of the BI growth opportunity. However, the large-enterprise and mid-range BI markets are very different. For example:</p>
<ul>
<li>Large enterprises typically have tough challenges in data integration; smaller enterprises may truly start out with their data in only a few systems.</li>
<li>There are many reasons for large enterprises not to do their BI in the cloud, such as bandwidth, internal politics, or the unsuitability of most cloud infrastructure for analytic DBMS scale-out. Smaller enterprises, however, may prefer SaaS (Software as a Service) BI.</li>
<li>The BI market for smaller enterprises is heavily OEM. But unless you&#8217;re buying some kind of data/analytics bundle, the large enterprise BI market still seems overwhelmingly standalone.</li>
<li>Large-enterprise BI tools incorporate much of a DBMS-like technology stack; at smaller enterprises, BI can often stick to its specialized-application-development-tool knitting. But on the other hand &#8230;</li>
<li>&#8230; large enterprises almost always already have a data warehousing infrastructure. Mid-range BI buyers may not have a separate analytic DBMS. Therefore &#8230;</li>
<li>&#8230; BI/DBMS bundles make more sense in the mid-market than they do at large enterprises.</li>
<li>Each large enterprise has a unique infrastructure, and  commonly a unique competitive situation as well. Thus, the idea that you&#8217;ll pre-build most of an analytic application for a large enterprises &#8212; because you know what data model they need to do their BI &#8212; usually turns out to be silly. But smaller enterprises can be more homogeneous, and so for them pre-built analytic applications can actually work.</li>
</ul>
<p>I don&#8217;t know of anybody who&#8217;s really cracked the code on mid-market BI. Crystal Reports (long owned by SAP Business Objects) has huge OEM share, but somehow hasn&#8217;t parlayed that into a comprehensive mid-market BI presence. Various SaaS or on-premise vendors have cool product ideas &#8212; e.g. <a href="../../../../../2009/12/27/introduction-to-gooddata/">Gooddata</a>, <a href="../../../../../2011/10/18/oracle-is-buying-endeca/">Endeca</a>, or my clients at PivotLink &#8212; but none seems to have set the world on fire to this point.</p>
<p><strong>Departmental BI is doing better</strong></p>
<p>The news is happier in a related market &#8212; business intelligence for departments of larger enterprises. However, this is a hard market to analyze, for at least two reasons. First &#8212; <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">as is often the case</a> &#8212; the distinction among large-enterprise-wise, smaller-enterprise-wide, and departmental BI is not a clear one.* Second, &#8220;departmental BI&#8221; has at least two major strains:</p>
<ul>
<li>Simple, pedestrian BI, implemented quickly.</li>
<li><a href="../../../../../2011/03/03/investigative-analytics/">Investigative analytics</a>.</li>
</ul>
<p><em>*In particular, it has been the case since the 1990s that BI tools first get sold to departments, hopefully for fast implementations &#8212; think 4-6 weeks as a base case &#8212; and then spread out internally after their initial successes. I am frequently amused by vendors who claim to have pioneered that sales model sometime over the past decade, or even within the past few years.</em></p>
<p>That said, there are two main kinds of reason to do your BI departmentally, at arm&#8217;s length from central IT.</p>
<ul>
<li>Perhaps, for good reason or bad, <strong>IT is being insufficiently helpful at managing the data.</strong>
<ul>
<li>This can be a straightforward matter of politics and priorities &#8212; IT controls the data, but is slow about giving you access.</li>
<li>Also, you may want to include data that&#8217;s outside IT&#8217;s purview, be it third-party or just purely departmental.</li>
</ul>
</li>
<li>Further, you may want <strong>functionality that corporate-standard BI doesn&#8217;t offer.</strong> Potential examples include:
<ul>
<li>Cool analytic visualization.</li>
<li>&#8220;Real-time&#8221; data visualization.</li>
<li>The ability to play nicely with particular kinds of data sets.</li>
</ul>
</li>
</ul>
<p>I have a lot more to say about those points &#8212; but not in a post that&#8217;s already as long as this one. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/01/04/some-issues-in-business-intelligence/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>

