<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; MOLAP</title>
	<atom:link href="http://www.dbms2.com/category/analytics-technologies/molap/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 09 Feb 2012 09:21:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>The Ted Codd guarantee</title>
		<link>http://www.dbms2.com/2011/07/31/the-ted-codd-guarantee/</link>
		<comments>http://www.dbms2.com/2011/07/31/the-ted-codd-guarantee/#comments</comments>
		<pubDate>Sun, 31 Jul 2011 22:44:21 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[MOLAP]]></category>
		<category><![CDATA[NoSQL]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5044</guid>
		<description><![CDATA[I write a lot about whether or not to use relational DBMS. For example: In May I surveyed relational vs. non-relational pros and cons at some length. Last November I mused about when it might be OK to do without joins. The question is implicit in a variety of posts about, say, document-oriented or object-oriented [...]]]></description>
			<content:encoded><![CDATA[<p>I write a lot about whether or not to use relational DBMS. For example:</p>
<ul>
<li>In May I surveyed <a href="../../../../../2011/05/29/when-to-use-relational-database-management-system/">relational vs. non-relational pros and cons</a> at some length.</li>
<li>Last November I mused about <a href="../../../../../2010/11/29/document-database-without-joins/">when it might be OK to do without joins</a>.</li>
<li>The question is implicit in a variety of posts about, say, <a href="../../../../../2011/02/07/notes-on-document-oriented-nosql/">document-oriented</a> or <a href="../../../../../2011/05/21/object-oriented-database-management-systems-oodbms/">object-oriented</a> DBMS.</li>
</ul>
<p>Before going further in that vein, I&#8217;d like to do a quick review of what E. F. &#8220;Ted&#8221; Codd was getting at with the relational model in the first place.  <span id="more-5044"></span></p>
<p>The first sentence of Codd&#8217;s famous 1970 <a href="http://www.seas.upenn.edu/%7Ezives/03f/cis550/codd.pdf">paper introducing the relational database concept</a> reads:</p>
<blockquote><p>Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation).</p></blockquote>
<p>In modern terms, that means <strong>&#8220;all you have to know to use the database is its logical schema; you don&#8217;t need to know anything about its physical representation.&#8221;</strong></p>
<p>Over the next 15 years, Codd&#8217;s thinking &#8212; and his employer IBM&#8217;s technology &#8212; evolved to the point that Codd proposed <a href="http://www.cse.ohio-state.edu/%7Esgomori/570/coddsrules.html">12 rules for a relational DBMS</a>, the three most fundamental of which are:</p>
<blockquote><p><em><strong>Foundation Rule</strong><br />
</em>A relational database management system must manage its stored data using only its relational capabilities.</p>
<p><em><strong>Information Rule</strong><br />
</em>All information in the database should be represented in one and only one way &#8212; as values in a table.</p>
<p><em><strong>Guaranteed Access Rule</strong><br />
</em>Each and every datum (atomic value) is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name.</p></blockquote>
<p>I.e., Codd was positively asserting that <strong>a database should have a fixed logical schema, </strong>in a<strong> tabular form. </strong>The clear implication was that programmers could or should be able to write anything they wanted to against that schema, without database performance being unduly compromised.</p>
<p>Of course, things never quite worked out that way. For most of the history of tabular DBMS, the best-performing <a href="http://www.dbms2.com/2011/03/30/short-request-and-analytic-processing/">short-request and analytic DBMS</a> have been designed quite differently from each other.* Non-relational systems &#8212; from IBM&#8217;s own IMS to various object-oriented DBMS &#8212; outperformed relational DBMS on particular applications. Designers of high-performance applications were sensitive to the database&#8217;s physical design, sometimes even going to the extreme of <a href="../../../../../2011/02/24/transparent-sharding/">non-transparent sharding</a>. But on the whole, it was generally agreed that programming against a fixed logical schema is a good thing.</p>
<p><em>*Codd acknowledged this himself by promoting multidimensional OLAP over traditional RDBMS. (I regard the multidimensional/relational divide to be a distinction without significant difference; it&#8217;s all just fixed-logical-schema tabular processing with different data manipulation languages.)</em></p>
<p>In my next post, I&#8217;ll return to the subject of <a href="http://www.dbms2.com/2011/07/31/dynamic-fixed-schema-databases/">why fixed schemas might not always be such a good idea after all</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/31/the-ted-codd-guarantee/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Eight kinds of analytic database (Part 2)</title>
		<link>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/</link>
		<comments>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 08:18:18 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Buying processes]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data types]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MOLAP]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Rainstor]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[SenSage]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4867</guid>
		<description><![CDATA[In Part 1 of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I&#8217;ll cover four more kinds of analytic database &#8212; even newer, for the most part, with a use case/product short list [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/">Part 1</a> of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I&#8217;ll cover four more kinds of analytic database &#8212; even newer, for the most part, with a use case/product short list match that is even less clear.  <span id="more-4867"></span></p>
<p><strong><em>Bit bucket</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included: </em>Logs, other technical/external</li>
<li><em>Likely use styles:</em> Staging/ETL, investigative</li>
<li><em>Canonical example: </em>Log files in a Hadoop cluster<em> </em></li>
<li><em>Stresses:</em> TCO, scale-out, transform/big-query performance, ETL functionality</li>
</ul>
<p>With the explosion of <a href="../../../../../2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a> has come the need for a place to put it all, sometimes called the <a href="../../../../../2011/06/04/dirty-data-stored-dirt-cheap/">big bit bucket</a>. This is like the investigative data mart for big databases, but more <a href="../../../../../2011/05/17/poly-structured-database/">poly-structured</a>. In some cases it is focused on data staging and transformation; but it can also be used for analysis in place.</p>
<p>The list of candidate technologies to run your bit bucket starts with Hadoop and Splunk.</p>
<p><strong><em>Archival data store</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included: </em>Operational, CDR (call detail record), security log</li>
<li><em>Likely use styles:</em> Archival, reporting (for compliance), possibly also investigative</li>
<li><em>Examples:</em> Any long-term detailed historical store</li>
<li><em>Stresses: </em>TCO, compression, scale-out, performance (if multi-use)<em> </em></li>
</ul>
<p><em> </em></p>
<p>Analytic DBMS vendors have been insulting each other with the claim &#8220;that&#8217;s just an archival data store,&#8221; dating back at least to the first time Greenplum was deployed on an underpowered Sun Thumper system. Perhaps only <a href="../../../../../2010/06/11/rainstor-update/">Rainstor</a> truly embraces the archival positioning, and I&#8217;ve become pretty dubious about their technical claims and their company alike.</p>
<p>Still, there&#8217;s a legitimate need for data stores &#8212; especially relational analytic DBMS that:</p>
<ul>
<li>Store data cheaply, with high rates of compression.</li>
<li>Have decent performance if you do want to query the data.</li>
<li>May have archiving/compliance-specific features as well.</li>
</ul>
<p>Along with Rainstor, SAND and SenSage have at least partially targeted that use case. In addition, appliance vendors such as Teradata and Netezza try to have an archive-oriented product version in their lineups.</p>
<p><strong><em>Outsourced data mart</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All</li>
<li><em>Likely use styles:</em> Traditional BI, investigative analytics, staging/ETL</li>
<li><em>Examples:</em> Advertising tracking, SaaS CRM</li>
<li><em>Stresses:</em> Performance, TCO, reliability, concurrency</li>
</ul>
<p>Much of what happens in analytic database management can also be outsourced. Some applications that run via SaaS (Software as a Service) are analytic. I&#8217;ve had three different clients whose main business is picking marketing targets in various vertical segments; others who wanted to add analytics to what were historically OLTP applications; and others yet who just offered online business intelligence. Also, if your fundamental business is gathering data and reselling it to a variety of user organizations, that&#8217;s an analytic data management challenge. The possibilities expand from there.</p>
<p>Data outsourcers are in the IT business, and so their IT development is &#8212; hopefully! &#8212; more serious and less politically encumbered than at many conventional enterprises. Thus, legacy systems and master data management issues are commonly less prevalent, or at least more aggressively disposed of. The same, up to a point, goes for vendor politics.*  <a href="../../../../../2011/06/26/what-to-think-about-before-you-make-a-technology-decision/">Multitenancy</a> is commonly an issue, as is running in the cloud.<em> </em></p>
<p><em>*Even so, there&#8217;s often That Guy who doesn&#8217;t want to migrate away from Oracle, no matter what.<strong> </strong></em></p>
<p>Vertica gets the nod in a number of these cases; it&#8217;s cloud-friendly, and often the problem is naturally columnar. Other columnar products can be good choices too, with added brownie points for Infobright if the shop is MySQL-oriented anyway. Running Netezza or other appliances makes sense mainly if you&#8217;re pretty sure you want to keep operating your own data centers, but some data outsourcers are just fine with that assumption.</p>
<p><strong><em>Operational analytic(s) server</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> Customer-centric, log, financial trade</li>
<li><em>Likely use styles:</em> Advanced operational analytics</li>
<li><em>Examples:</em>
<ul>
<li>Lower latency: Web or call-center personalization, anti-fraud</li>
<li>Higher latency: Customer profiling, Basel 3 risk analysis</li>
</ul>
</li>
<li><em>Stresses:</em> Performance, reliability, analytic functionality, perhaps concurrency</li>
</ul>
<p>Even with eight different choices, I need a &#8220;catch-all&#8221; category; this is it.</p>
<p>Suppose you want to do reasonably sophisticated analytics, then use the results in operations. This is the classical challenge in <a href="../../../../../2011/03/30/short-request-and-analytic-processing/">integrating short-request and analytic processing</a>. There are multiple ways to tackle it, embodying different trade-offs in cost, convenience, or analytic accuracy. If the platform on which you want to run your investigative analytics also has the reliability and concurrency appropriate for mission-critical operations, you&#8217;re set. Otherwise, you may want to pipe <a href="../../../../../2010/11/29/data-that-is-derived-augmented-enhanced-adjusted-or-cooked/">derived data</a> into a more &#8220;industrial-strength&#8221; DBMS, ideally the one that runs your operational apps anyway</p>
<p>Another option is to integrate a limited amount of analytics immediately into your short-request processing system. For example, as bad as they are at the kinds of queries that require joins, NoSQL systems are often fast at simple aggregations. As MapReduce/NoSQL integrations mature, that option may not require pumping the data anywhere else for deeper analytics; even if it does, at least you&#8217;re starting out with the data in a convenient bit bucket.</p>
<p>Streaming/CEP-centric architectures could come into play as well. And it goes on from there. The possibilities in this last category are just too varied to generalize about.</p>
<p><em>So did I get them all? Or are there yet other analytic data management use cases that I don&#8217;t fit into my eight categories?</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Eight kinds of analytic database (Part 1)</title>
		<link>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/</link>
		<comments>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 08:17:44 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Buying processes]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MOLAP]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[QlikTech and QlikView]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Web analytics]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4868</guid>
		<description><![CDATA[Analytic data management technology has blossomed, leading to many questions along the lines of &#8220;So which products should I use for which category of problem?&#8221; The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for &#8220;big data&#8221; is little help. Let&#8217;s try eight categories instead. While no categorization [...]]]></description>
			<content:encoded><![CDATA[<p>Analytic data management technology has blossomed, leading to many questions along the lines of &#8220;So which products should I use for which category of problem?&#8221; The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for &#8220;big data&#8221; is little help.</p>
<p>Let&#8217;s try eight categories instead. While <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">no categorization is ever perfect</a>, these each have at least some degree of technical homogeneity. Figuring out which types of analytic database you have or need &#8212; and in most cases you&#8217;ll need several &#8212; is a great early step in your analytic technology planning.  <span id="more-4868"></span></p>
<p><strong><em>Enterprise data warehouse</em></strong> (Full or partial)</p>
<ul>
<li><em>Kinds of data likely to be included:</em> All, but especially operational</li>
<li><em>Likely use styles:</em> All</li>
<li><em>Canonical example:</em> Central EDW for a big enterprise</li>
<li><em>Stresses:</em> Concurrency, reliability, workload management</li>
</ul>
<p>The enterprise data warehouse (EDW) ideal says that you copy all your data into one place, and drive all decision-making from there. <a href="../../../../../2011/06/21/its-official-the-grand-central-edw-will-never-happen/">Full EDWs are pipedreams</a>. Still, a partial EDW makes sense for most large enterprises, and many indeed already have one. The first product lines to consider for classical EDWs are Teradata, DB2, Exadata, and maybe Microsoft SQL Server, especially if you&#8217;re going to stress concurrency and/or operational use cases.</p>
<p><strong><em>Traditional data mart</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All</li>
<li><em>Likely use styles:</em> Business intelligence, budgeting/consolidation, investigative</li>
<li><em>Examples:</em> Reporting servers, planning/consolidation servers, anything MOLAP, etc.</li>
<li><em>Stresses:</em> Performance, concurrency, TCO</li>
</ul>
<p>Whether or not you have something like an enterprise data warehouse, it&#8217;s common to have lighter-weight data marts as well. A traditional data mart might drive reports and dashboards. Or it might be specialized for budgeting, planning, and/or consolidation.  Some <a href="../../../../../2011/03/03/investigative-analytics/">investigative analytics</a> may be in the mix as well.</p>
<p>Any DBMS that can support an EDW can also support a data mart, but it may not be the most cost-effective way to do so. Columnar DBMS might have more attractive performance and TCO (Total Cost of Ownership); the same goes for Netezza. Some of them &#8212; e.g. Sybase IQ and <a href="../../../../../2011/06/20/vertica-release-5/">Vertica</a> &#8212; have excellent track records in concurrent usage as well. <a href="../../../../../2011/05/29/when-to-use-relational-database-management-system/">Ted Codd</a> pushed what amounts to MOLAP (Multidimensional OnLine Analytic Processing) systems for these use cases. But relational DBMS commonly do a better job, which is one reason most major MOLAP products have wound up at RDBMS companies.</p>
<p><strong><em>Investigative data mart &#8212; agile</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All, especially customer-centric</li>
<li><em>Likely use styles</em>: Investigative</li>
<li><em>Canonical example:</em> A few analysts getting a few TB to examine</li>
<li><em>Stresses:</em> Ease of setup/load, ease of admin, price/performance</li>
</ul>
<p>Besides the traditional data mart, there are at least two other kinds. Both are focused on investigative analytics, but they&#8217;re differentiated by database size.</p>
<p>If you have just a few analysts,* looking at no more than a few terabytes of data (perhaps even just some gigabytes) &#8212; and if that data is &#8220;single-subject&#8221; and fairly homogenous &#8212; your watchwords should be &#8220;cheap&#8221;, &#8220;easy&#8221;, and &#8220;fast&#8221;. You don&#8217;t need to invest in much hardware, in expensive software, in much administrative effort (the analysts can be their own DBAs),  nor should you endure much set-up time. Just grab a product, grab some data, and start running queries (or extracts into the statistical tool of your choice).</p>
<p><em>*If you have dozens or even hundreds of analysts hitting the same database, you&#8217;re probably back to the more concurrency-oriented scenarios outlined above.</em></p>
<p>Infobright is often cost-effective among columnar analytic DBMS. Other vendors might cut you a price break as well. If you have multiple terabytes of data, don&#8217;t rule out Netezza&#8217;s lowest-end products (even if they&#8217;d really rather sell you something bigger). Or, if you&#8217;re in the sub-terabyte range, maybe you can get by with an in-memory BI tool such as QlikView, and not do anything special on the DBMS side at all.</p>
<p><strong><em>Investigative data mart &#8212; big</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All, especially customer-centric, logs, financial trade, scientific</li>
<li><em>Likely use styles</em>: Investigative</li>
<li><em>Canonical example:</em> Single-subject 20 TB &#8211; 20 PB relational database<em></em></li>
<li><em>Stresses:</em> Performance, scale-out, analytic functionality</li>
</ul>
<p>But if you&#8217;re looking at tens of terabytes of relational data, or even more, you really do have a &#8220;big data&#8221; problem. Performance and scalability are major challenges, usually best addressed by MPP (Massively Parallel Processing) systems, such as Netezza, Vertica, Aster Data, ParAccel, Teradata, or Greenplum. Performance POCs (Proofs Of Concept) are a big part of the buying process. Vendor price negotiations are crucial too.</p>
<p><em>Actually, in the low tens of terabytes you might be able to get away with a shared-disk system that has excellent compression &#8212; e.g., columnar products like Sybase IQ, Infobright, or SAND, rather than just Vertica and ParAccel.</em></p>
<p>Assuming you have affordable, scalable query performance, the competitive differentiator can switch to additional analytic functionality. Aster, Netezza, ParAccel, Vertica, and Greenplum either offer full <a href="../../../../../2011/02/24/analytic-platforms/">analytic platforms</a>, or seem to be on the path to doing so. Teradata, which now owns Aster Data, offers substantial built-in analytic capability in its traditional products as well, and the same goes for Sybase IQ.</p>
<p><em>Continued in <a href="http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/">Part 2</a>,</em><em> where we cover some of the more difficult use cases.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>When it&#8217;s still best to use a relational DBMS</title>
		<link>http://www.dbms2.com/2011/05/29/when-to-use-relational-database-management-system/</link>
		<comments>http://www.dbms2.com/2011/05/29/when-to-use-relational-database-management-system/#comments</comments>
		<pubDate>Sun, 29 May 2011 19:56:37 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[MOLAP]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Object]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4569</guid>
		<description><![CDATA[There are plenty of viable alternatives to relational database management systems. For short-request processing, both document stores and fully object-oriented DBMS can make sense. Text search engines have an important role to play. E. F. &#8220;Ted&#8221; Codd himself once suggested that relational DBMS weren&#8217;t best for analytics.* Analysis of machine-generated log data doesn&#8217;t always have [...]]]></description>
			<content:encoded><![CDATA[<p>There are plenty of viable alternatives to relational database management systems. For <a href="../../../../../2011/03/30/short-request-and-analytic-processing/">short-request processing</a>, both <a href="../../../../../2011/02/07/notes-on-document-oriented-nosql/">document stores</a> and <a href="../../../../../2011/05/21/object-oriented-database-management-systems-oodbms/">fully object-oriented DBMS</a> can make sense. Text search engines have an important role to play. E. F. &#8220;Ted&#8221; Codd himself once suggested that <a href="http://www.minet.uni-jena.de/dbis/lehre/ss2005/sem_dwh/lit/Cod93.pdf">relational DBMS weren&#8217;t best for analytics</a>.* Analysis of <a href="../../../../../2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated</a> log data doesn&#8217;t always have a naturally relational aspect. And I could go on with more examples yet.</p>
<p><em>*Actually, he didn&#8217;t admit that what he was advocating was a different kind of DBMS, namely a MOLAP one &#8212; but he was. And he was wrong anyway about the necessity for MOLAP. But let&#8217;s overlook those details. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </em></p>
<p>Nonetheless, relational DBMS dominate the market. As I see it, the reasons for relational dominance cluster into four areas (which of course overlap):</p>
<ul>
<li><strong>Data re-use.</strong> Ted Codd&#8217;s famed original paper referred to <a href="http://www.seas.upenn.edu/%7Ezives/03f/cis550/codd.pdf">shared data banks</a> for a reason.</li>
<li>The benefits of <strong>normalization,</strong> which include:
<ul>
<li>You only have to do programming work of writing something once &#8230;</li>
<li>&#8230; and you don&#8217;t have to do the programming work of keeping multiple versions of the information consistent.</li>
<li>You only have to do processing work of writing something once.</li>
<li>You only have to buy storage to hold each fact once.</li>
</ul>
</li>
<li>Separation of concerns.
<ul>
<li>Different people can worry about programming and &#8220;database stuff.&#8221;</li>
<li>Indeed, even performance optimization can sometimes be separated from programming (i.e., when all you have to do to get speed is implement the correct indexes).</li>
</ul>
</li>
<li>Maturity and momentum, as reflected in the availability of:
<ul>
<li>People.</li>
<li>A broad variety of mature relational DBMS.</li>
<li>Vast amounts of packaged software that &#8220;talks&#8221; SQL.</li>
</ul>
</li>
</ul>
<p>Generally speaking, I find the reasons for sticking with relational technology compelling in cases such as:  <span id="more-4569"></span></p>
<ul>
<li><strong>You&#8217;re building a low-volume, medium-complexity suite of applications that will evolve over time.</strong> This is the use case for which relational DBMS were invented, and they&#8217;re still great for it.</li>
<li><strong>Your (duplicated) data volumes would be ridiculous if you didn&#8217;t do a reasonable amount of normalization.</strong> Once you need to normalize, you need to do joins &#8212; and if you&#8217;re doing joins, you&#8217;re in relational territory.</li>
<li><strong>You simply don&#8217;t see a cost/benefit advantage to moving away from proven legacy technology.</strong> If you&#8217;re looking for an off-the-shelf answer to your needs &#8212; or if you&#8217;re inventorying your own technological shelves &#8212; relational-oriented technology has overwhelming share.</li>
</ul>
<p>For many enterprises, that third point alone should be decisive in a large fraction of cases.</p>
<p>But the advantages of relational technology are less clear when you&#8217;re doing <strong>serious engineering of path-breaking new applications, </strong>where by &#8220;serious engineering&#8221; I mean:</p>
<ul>
<li>The problem is big enough that you simply want the best solution, with only loose coupling needed to the rest of your technical environment.</li>
<li>Long-lasting &#8220;strategic&#8221; or legacy technology is not a great concern; you&#8217;re willing to keep &#8220;rebuilding the 747 while it&#8217;s flying&#8221; if that&#8217;s what&#8217;s necessary to get the best possible result.</li>
<li>You have access to sufficient quantities of sufficiently smart people.</li>
</ul>
<p>For example:</p>
<ul>
<li>I recently suggested that <a href="../../../../../2011/05/21/object-oriented-database-management-systems-oodbms/">innovative SaaS vendors could adopt object-oriented database technology.</a></li>
<li>Major web applications are rarely very relational. Until recently, the default approach to scaling out web databases was memcached/sharded MySQL, hardly a whole-hearted adoption of relational technology. Now NoSQL DBMS are vigorous competitors.</li>
<li>Analytic challenges that amount to teasing out signals from streams of data are sometimes handled non-relationally as well, although it&#8217;s often nice to be able to do a few joins to mix in information from more relationally-structured data.</li>
</ul>
<p>Not coincidentally, in a lot of those cases, throwing performance concerns &#8220;over the wall&#8221; to the database administrator isn&#8217;t going to work.</p>
<p><em>*I do expect the pendulum to swing back a bit as high-performance/highly-scalable MySQL implementations mature, but there are relatively few supporting examples to date.</em></p>
<p>To look at it another way, it&#8217;s right to be skeptical about relational DBMS when you can defeat all of the reasons to favor them. For example:</p>
<ul>
<li>Data re-use may not arise when applications are self-contained and rapidly-changing.</li>
<li>Sometimes you don&#8217;t need to normalize your data.</li>
<li>It&#8217;s not obvious that the relational approach to separation of concerns is the best one. Perhaps you&#8217;d be better off with the people who understand a specific application best being responsible for all the decisions connected with it.</li>
<li>As for that maturity and momentum:
<ul>
<li>People don&#8217;t actually learn much SQL in school.</li>
<li>Are any of the mature relational DBMS what you really want?</li>
<li>Is any of that packaged software out there really helpful for your specific problem?</li>
</ul>
</li>
</ul>
<p>I should probably stop there. But in an appeal to authority, I&#8217;ll close instead with a quote from Codd&#8217;s own OLAP paper:</p>
<blockquote><p>IT should never forget that technology is a means to an end, and not an end in itself. Technologies must be evaluated individually in terms of their ability to satisfy the needs of their respective users. IT should never be reluctant to use the most appropriate interface to satisfy users’ requirements. Attempting to force one technology or tool to satisfy a particular need for which another tool is more effective and efficient is like attempting to drive a screw into a wall with a hammer when a screwdriver is at hand: the screw may eventually enter the wall but at what cost?</p></blockquote>
<p><strong><em>Related link</em></strong></p>
<ul>
<li><a href="../../../../../2008/02/15/database-management-system-choices-overview/">My exchange with Mike Stonebraker highlighting our shared advocacy for database diversity</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/05/29/when-to-use-relational-database-management-system/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>Evolving definitions and technology categories for 2011</title>
		<link>http://www.dbms2.com/2010/12/28/evolving-definitions-and-technology-categories-for-2011/</link>
		<comments>http://www.dbms2.com/2010/12/28/evolving-definitions-and-technology-categories-for-2011/#comments</comments>
		<pubDate>Tue, 28 Dec 2010 09:27:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[DBMS product categories]]></category>
		<category><![CDATA[Data types]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[MOLAP]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=3450</guid>
		<description><![CDATA[It seems my prediction of a limited blogging schedule in December came emphatically true. I shall re-start with a collection of quick thoughts, clearing the decks for more detailed posts to follow. If you&#8217;d like to contribute thoughts on these subjects, now might be a really good time. 1.  Not many terms I coin gets [...]]]></description>
			<content:encoded><![CDATA[<p>It seems my prediction of <a href="http://www.dbms2.com/2010/11/29/im-partway-back/">a limited blogging schedule in December</a> came emphatically true. I shall re-start with a collection of quick thoughts, clearing the decks for more detailed posts to follow. <span id="more-3450"></span>If you&#8217;d like to contribute thoughts on these subjects, now might be a really good time.</p>
<p>1.  Not many terms I coin gets marketing traction, but <a href="http://www.dbms2.com/2010/04/08/machine-generated-data-example/">machine-generated data</a> has grown some legs. Clients (Infobright, Cloudera) and non-clients alike have adopted it. I need to follow up with a more official description/definition of the concept. The Wikipedia article on same doesn&#8217;t get the job done yet. <em>(Edit: Here&#8217;s my take on <a href="http://www.dbms2.com/2010/12/30/examples-and-definition-of-machine-generated-data/">defining machine-generated data</a>. Be sure to read through to Daniel Abadi&#8217;s response.)</em></p>
<p>2.  Merv Adrian is going to Gartner Group. Expect great improvement in Gartner&#8217;s DBMS coverage, in areas beyond the straightforward &#8220;This is what users say they are doing&#8221; Gartner already excels at. That said, Merv is probably not starting at Gartner soon enough to help make the 2010 analytic DBMS Magic Quadrant any better than the <a href="http://www.dbms2.com/2010/02/10/gartner-magic-quadrant-data-warehouse-2009-2010/">Gartner 2009 data warehouse database management system magic quadrant</a>,<a href="http://www.dbms2.com/2009/01/12/gartners-2008-data-warehouse-database-management-system-magic-quadrant-is-out/"> the Gartner  2008 data warehouse database management system magic quadrant</a>, and so on.</p>
<p>In particular, Merv has a good understanding of trends and technology on analytic DBMS and related markets. Judging by his Twitter stream, James Kobielus at Forrester if anything overrates the shift to general <strong>&#8220;analytic platforms.&#8221; </strong>And I of course am expected to help define the &#8220;analytic platform&#8221;/&#8221;advanced analytics&#8221;/whatever category. Taking all those analyst efforts together, it&#8217;s reasonable to expect a lot more market awareness &#8212; and also market confusion &#8212; around these areas.</p>
<p>3.  All that plugs into a larger project I was working on before my family issues came crashing in. The <a href="http://www.dbms2.com/2010/04/12/enterprise-data-warehouse-edw-myt/">enterprise data warehouse is a myth</a>, and that&#8217;s just the first reason that the old EDW vs. data mart bifurcation is grossly inadequate for understanding analytic data management choices. So I&#8217;m working on some ideas to<strong> categorize types of data warehouse/mart/whatever</strong> according to what kind of data you have and how you use that data. Multiple industry players (OK, vendors) have offered interesting and  useful feedback in this process, although I&#8217;m still waiting for <span style="text-decoration: line-through;">Teradata  and</span> IBM. <em>(Edit: My bad. Teradata actually had sent a helpful response some time ago.)</em></p>
<p>In connection with that effort, the last outline I did back in October of <strong>analytic data use styles</strong> read:</p>
<ul>
<li>Traditional BI
<ul>
<li>Reporting, dashboards, &amp; light-weight ad-hoc query</li>
<li>(Even if you make this more into data exploration, you&#8217;re probably not stressing the underlying DBMS much more than traditional BI does)</li>
<li>(If integrated into operational apps, your DBMS choice for this may be constrained by your choice of operational apps)</li>
</ul>
</li>
<li>Near-real-time BI
<ul>
<li>E.g., dashboards w/ constant or 1-minute refresh</li>
<li>(Actually, this isn&#8217;t a great fit for most analytic DBMS yet)</li>
<li>(Also, it&#8217;s not a big market yet, except in specialized niches such as trading or network control)</li>
</ul>
</li>
<li>Budgeting &amp; consolidation
<ul>
<li>(MOLAP is still strong here)</li>
<li>(I took out the word &#8220;planning&#8221; because it has several meanings)</li>
</ul>
</li>
<li><a href="http://www.dbms2.com/2011/03/03/investigative-analytics/">Investigative analytics</a>*
<ul>
<li>Can be but doesn&#8217;t have to be long-running</li>
<li>Example technologies include:
<ul>
<li>Heavy ad-hoc query</li>
<li>Data mining/machine learning/predictive analytics modeling</li>
<li>Simulation</li>
<li>Other advanced analytics</li>
</ul>
</li>
</ul>
</li>
<li>(Advanced) operational analytics
<ul>
<li>Inputs to operational apps</li>
<li>Technologically similar to investigative analytics
<ul>
<li>Data mining/machine learning/predictive analytics scoring</li>
<li>Simulation</li>
<li>Other advanced analytics</li>
</ul>
</li>
<li>Example applications include:
<ul>
<li>Customer classification or scoring</li>
<li>Wholesale telecom pricing</li>
<li>Basel 3 risk analysis</li>
</ul>
</li>
</ul>
</li>
<li>Pre-processing, staging, and ETL</li>
<li>Archive &amp; compliance</li>
<li>(Test/dev)</li>
</ul>
<p>The data warehouse/mart categories weren&#8217;t in exact one-to-one correlation to those use styles, but the connection was of course pretty close.</p>
<p><em>*I&#8217;ve really struggled with terminology in the area of data exploration (over-used already)/discovery analytics (sounds weird)/research analytics (caused confusion when I tried it).</em> Investigative analytics<em> is my latest try.</em></p>
<p>4.  And finally &#8212; like most people, I find the terms <em>unstructured</em> or <em>semi-structured data</em> to be misleading, for at least two reasons:</p>
<ul>
<li>When the data is human-generated, what&#8217;s really happening is usually that the structure is just in a different place &#8212; structured databases generally tend to hold unstructured data, and vice-versa.</li>
<li>In the case of machine-generated data, you really can start out with unstructured sets of individually unstructured logs. So what do you do then? You <a href="http://www.dbms2.com/2010/11/29/data-that-is-derived-augmented-enhanced-adjusted-or-cooked/">derive</a> data, which has some kind of structure, and do most of your operations on that.</li>
</ul>
<p>So I&#8217;ve been playing for a couple of years with the thought of introducing the term <em>polystructured data</em>. This is not a finished concept, because there are at least three different things I could mean by it:</p>
<ul>
<li>&#8220;Polystructured data is data that has considerable structure, but whose structure is in some important way unpredictable.&#8221; That&#8217;s a direct quote from a draft of a never-published paper. The paper, conceived before the days of NoSQL, was meant to be very XML-centric.</li>
<li>&#8220;Polystructured data is data whose structure is apt to be interpreted in different ways at different times&#8221; &#8212; e.g., data that will variously get referenced by free text and structured searches. The example I gave illustrates part of the problem with that version, as increasingly many software vendors think it&#8217;s a dandy idea to do free-text searches across various columns of relational tables.</li>
<li>&#8220;Polystructured data is data that gets restructured over time.&#8221; That&#8217;s the derived data point.</li>
</ul>
<p>It may take a while to find, but I think there&#8217;s a pony in there somewhere.</p>
<p><em>Edit: Here&#8217;s the definition of <a href="http://www.dbms2.com/2011/05/17/poly-structured-database/">poly-structured database</a> I eventually came up with.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/12/28/evolving-definitions-and-technology-categories-for-2011/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Ray Wang on SAP</title>
		<link>http://www.dbms2.com/2009/12/11/ray-wang-on-sap/</link>
		<comments>http://www.dbms2.com/2009/12/11/ray-wang-on-sap/#comments</comments>
		<pubDate>Fri, 11 Dec 2009 23:16:54 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[MOLAP]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Solid-state memory]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1286</guid>
		<description><![CDATA[Ray Wang made a terrific post based on SAP&#8217;s annual influencer love-in, an event which I no longer attend. Ray believes SAP has been in a &#8220;crisis&#8221;, and sums up his views as The Bottom Line  &#8211; SAP’s Turning The Corner Credit must be given to SAP for charting a new course.  A shift in [...]]]></description>
			<content:encoded><![CDATA[<p>Ray Wang made <a href="http://blog.softwareinsider.org/2009/12/11/event-report-2009-sap-influencer-summit-sap-must-put-strategy-to-execution-in-order-to-prove-clarity-of-vision/">a terrific post based on SAP&#8217;s annual influencer love-in</a>, an event which <a href="http://www.monashreport.com/2007/01/03/sap-nonsense-ethics/">I no longer attend</a>. Ray believes SAP has been in a &#8220;crisis&#8221;, and sums up his views as</p>
<blockquote><p><strong>The Bottom Line  &#8211; SAP’s Turning The Corner<br />
</strong></p>
<p>Credit must be given to SAP for charting a new course.  A shift in the management philosophy and product direction will take years to realize, however, its not too late for change.  SAP must remember its roots and become more German and less American.  The renewed focus must put customer requests and priorities ahead of SAP’s bureaucracy.  The emphasis must focus on the <a href="http://blog.softwareinsider.org/2009/03/16/mondays-musings-its-the-relationship-stupid-part-1-commoditizing-the-workforce/">relationship</a>.  When that reemerges in how SAP works with customers, partners, influencers, and its own employees, SAP will be back in good graces. In the meantime, its  time to get to work and deliver.  Oracle’s Fusions Apps are coming soon and competitors such as IBM, Microsoft, Epicor, IFS, and SalesForce.com will not relent.</p></blockquote>
<p>I recall the 1980s, when SAP&#8217;s main differentiator, at least in the English-speaking US, was a total commitment to customer success, and when it could be taken for granted that SAP would do business ethically. Things change, and not always for the better.</p>
<p>Anyhow, the reason I&#8217;m highlighting Ray&#8217;s post is that he makes reference to a number of interesting SAP-cetric technology trends or initiatives.<span id="more-1286"></span> In no particular order, Ray suggests:</p>
<ul>
<li>SAP&#8217;s and Oracle&#8217;s (Fusion) <a href="http://www.dbms2.com/2009/07/07/hasso-plattner-calls-for-in-memory-oltp-column-stores/">efforts to meld memory-centric analytics with operational apps</a> will be crucial for large enterprises &#8212; but perhaps only around the middle of the next decade. (I basically agree, although I&#8217;d note that:
<ul>
<li>Wisely, Ray suggested a very long time frame.</li>
<li>BI/operational app integration has been, on the whole, glacial.</li>
<li>The idea that you have to put pre-built aggregates into RAM to get performance is an indictment of market-leading RDBMS &#8212; but it&#8217;s a fair indictment.</li>
<li>I&#8217;m not sure whether memory-centric OLAP will wind up in RAM or Flash. If the data stores are updated at near-transactional speeds, RAM may make more sense. Otherwise, Flash should have major advantages.)</li>
</ul>
</li>
<li>SAP&#8217;s long-standing attempts to support third-party development of SAP add-ons are a technological mess, in line with <a href="http://www.dbms2.com/2007/10/12/sap-is-losing-crucial-managerial-talent/">my fears of a couple of years ago</a>. However, the business-relationship part of the effort is vastly stronger.</li>
<li>As SAP focused more on the mid-market, it is partnering closely with Microsoft. (If you think about it, that makes all kinds of sense.)</li>
<li>Energy/environmental/safety tracking &#8212; i.e., sustainability &#8212; tools are a big deal. (See also <em><a href="http://www.economist.com/businessfinance/displaystory.cfm?story_id=15022465">The Economist</a></em> on that point.)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/12/11/ray-wang-on-sap/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A question on MDX performance</title>
		<link>http://www.dbms2.com/2009/10/30/a-question-on-mdx-performance/</link>
		<comments>http://www.dbms2.com/2009/10/30/a-question-on-mdx-performance/#comments</comments>
		<pubDate>Fri, 30 Oct 2009 05:11:20 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[MOLAP]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1186</guid>
		<description><![CDATA[An enterprise user wrote in with a question that boils down to: What are reasonable MDX performance expectations? MDX doesn&#8217;t come up in my life very much, and I don&#8217;t have much intuition about it. E.g., I don&#8217;t know whether one can slap an MDX-to-SQL converter on top of a fast analytic RDBMS and go [...]]]></description>
			<content:encoded><![CDATA[<p>An enterprise user wrote in with a question that boils down to:</p>
<p><strong>What are reasonable MDX performance expectations?</strong></p>
<p>MDX doesn&#8217;t come up in my life very much, and I don&#8217;t have much intuition about it. E.g., I don&#8217;t know whether one can slap an MDX-to-SQL converter on top of a fast analytic RDBMS and go to town. What&#8217;s more, I&#8217;m heading off on vacation and don&#8217;t feel like researching the matter myself in the immediate future. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>So here&#8217;s the long form of the question. Any thoughts?</p>
<p style="padding-left: 30px;">I have a general question on assessing  the performance of an OLAP technology using a set of MDX queries. I would be  interested to know if there are any benchmark MDX performance tests/results  comparing different OLAP technologies (which may be based on different  underlying DBMS&#8217;s if appropriate) on similar hardware setup, or even comparisons  of complete appliance solutions. More generally, I want to determine what  performance limits I could reasonably expect on what I think are fairly standard servers.</p>
<p style="padding-left: 30px;">In my own work, I have set up a star  schema model centered on a Fact table of 100 million rows (approx 60 columns), with dimensions ranging in cardinality from 5 to 10,000. In ad hoc analytics, is  it expected that any query against such a dataset should return a result within  a minute or two (i.e. before a user gets impatient), regardless of whether that query returns 100 cells or 50,000 cells (without relying on any aggregate table  or caching mechanism)? Or is that level of performance only expected with a high  end massively parallel software/hardware solution? The server specs I&#8217;m testing  with are: 32-bit 4 core, 4GB RAM, 7.2k RPM SATA drive, running Windows Server 2003; 64-bit 8 core, 32GB RAM, 3 Gb/s  SAS drive, running Windows Server 2003 (x64).</p>
<p style="padding-left: 30px;">I realise that caching of query results  and pre-aggregation mechanisms can significantly improve performance, but I&#8217;m  coming from the viewpoint that in purely exploratory analytics, it is not  possible to have all combinations of dimensions calculated in advance, in  addition to being maintained.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/30/a-question-on-mdx-performance/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Clearing some of my buffer</title>
		<link>http://www.dbms2.com/2009/04/22/clearing-some-of-my-buffer/</link>
		<comments>http://www.dbms2.com/2009/04/22/clearing-some-of-my-buffer/#comments</comments>
		<pubDate>Wed, 22 Apr 2009 17:21:46 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[CouchDB]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Expressor]]></category>
		<category><![CDATA[MOLAP]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[Tableau Software]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=763</guid>
		<description><![CDATA[I have a large number of posts still in backlog.  For starters, there are ones based on recent visits with Aster, Greenplum, Sybase, Vertica, and a Very Large User.  I suspect I&#8217;ll write more soon on Oracle as well.  Plus there&#8217;s my whole future-of-online-media area.  And quite a bit more will grow out of planned [...]]]></description>
			<content:encoded><![CDATA[<p>I have a large number of posts still in backlog.  For starters, there are ones based on recent visits with Aster, Greenplum, Sybase, Vertica, and a Very Large User.  I suspect I&#8217;ll write more soon on Oracle as well.  Plus there&#8217;s my whole future-of-online-media area.  And quite a bit more will grow out of planned research.</p>
<p>So there are a whole lot of other worthy subjects I doubt I&#8217;ll be getting to any time soon.  In some cases, of course, other people are doing great jobs of writing about same. Here are pointers to a few links that I am glad to recommend:</p>
<ul>
<li>I wrote <a href="http://www.dbms2.com/2009/04/01/business-intelligence-notes-and-trends/">recently</a> that I&#8217;ve discovered a number of different in-memory OLAP engines. Cindi Howson far outdid that, writing at length for <em>Intelligent Enterprise</em> on <a href="http://www.intelligententerprise.com/channels/business_intelligence/showArticle.jhtml?articleID=216900096&amp;pgno=3">in-memory analytics</a>, in an article that seems to itself be a teaser for a longer, free white paper on the subject.</li>
<li>CouchDB posted <a href="http://www.slideshare.net/mattetti/couchdb-perform-like-a-pr0n-star">an eye-catching, risque slide presentation</a> promoting CouchDB and, more generally, key-value stores, at least for internet applications.  And yes, they&#8217;ve integrated MapReduce.</li>
<li>Merv Adrian <a href="http://mervadrian.wordpress.com/2009/04/22/birst-hopes-to-ride-on-demand-bi-wave/">posted favorably about Birst</a>, with special reference to its OEM efforts.  As <a href="http://www.dbms2.com/2009/04/01/business-intelligence-notes-and-trends/">previously noted</a>, I was highly unimpressed with Birst&#8217;s end-user BI story at the time of its September roll-out, and Jerome Pineau&#8217;s recent examination did nothing to reassure me.  But perhaps OEM is a different matter.</li>
<li>Merv also offers an interesting post about data integration upstart <a href="http://mervadrian.wordpress.com/2009/04/18/expressor-software-hits-the-complex-data-integration-market-running/">Expressor</a>, and a highly favorable one about &#8220;visualization&#8221; vendor <a href="http://mervadrian.wordpress.com/2009/04/19/tableau-software-visibly-catching-on-and-catching-up/">Tableau</a>.</li>
<li>Ann All interviewed <a href="http://www.itbusinessedge.com/cm/community/features/interviews/blog/bi-vendors-tell-users-what-they-want-but-are-users-listening/?cs=31761">Nigel Pendse</a>, who grumped that BI features are overrated, and what end users really want is great query performance. I&#8217;m not so sure about the features side of that, but I&#8217;m hugely in agreement about the performance. That&#8217;s a big part of why the analytic DBMS industry is so vibrant. It&#8217;s also why in-memory OLAP is suddenly so hot.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/04/22/clearing-some-of-my-buffer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Aleri update</title>
		<link>http://www.dbms2.com/2009/03/25/aleri-update/</link>
		<comments>http://www.dbms2.com/2009/03/25/aleri-update/#comments</comments>
		<pubDate>Thu, 26 Mar 2009 03:03:28 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aleri and Coral8]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Application areas]]></category>
		<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[Games and virtual worlds]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[MOLAP]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=732</guid>
		<description><![CDATA[My skeptical remarks on the Aleri/Coral8 merger generated some pushback. Today I actually got around to talking with John Morell, who was marketing chief at Coral8 and has remained with the combined company. First, some quick metrics: The combined Aleri has around 100 employees, 60-40 from Aleri vs. Coral8. The combined Aleri has around 80 [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.dbms2.com/2009/03/09/independent-cep-vendors-continue-to-flounder/">My skeptical remarks on the Aleri/Coral8 merger</a> generated some <a href="http://www.dbms2.com/2009/03/20/the-cep-guys-are-getting-a-bit-chippy/">pushback</a>. Today I actually got around to talking with John Morell, who was marketing chief at Coral8 and has remained with the combined company. First, some quick metrics:</p>
<ul>
<li>The combined Aleri has around 100 	employees, 60-40 from Aleri vs. Coral8.</li>
<li>The combined Aleri has around 80 	customers. All of Aleri&#8217;s, with one sort-of exception at <a href="http://www.aleri.com/news/press-releases/ecommerce-portal-provider-installs-aleri-cep-engine-develop-real-time-targeted-i">Banks.com</a>, 	were in financial services. A large minority of Coral8&#8242;s were in 	financial services too.</li>
<li>However, half of Aleri&#8217;s marketing 	spend going forward is budgeted outside the financial services 	markets. Not unreasonably, John presents this as a proof point Aleri 	is serious about selling to other markets.</li>
<li>Aleri had 12-14 people in the UK 	pre-merger. Coral8 had none in Europe.</li>
<li>Coral8 had 15 OEMs pre-merger, 	some actually generating revenue. Aleri had substantially none.</li>
<li>Coral8 had been closing a &#8220;couple&#8221; 	of customers/quarter in online commerce. But recently, that rate 	ramped up to a &#8220;few.&#8221;</li>
<li>Aleri&#8217;s engine is used to handle 	&#8220;many&#8221; hundreds of thousands of messages per second. 	Coral8&#8242;s  highest-throughput user processes 100-150,000 	messages/second.</li>
</ul>
<p style="margin-bottom: 0in;">John is sticking by the company line 	that there will be an integrated Aleri/Coral8 engine in around 12 	months, with all the performance optimization of Aleri and 	flexibility of Coral8, that compiles and runs code from any of the 	development tools either Aleri or Coral8 now has. While this is a 	lot faster than, say, the Informix/Illustra or Oracle/IRI Express 	integrations, John insists that integrating CEP engines is a lot 	easier. We&#8217;ll see.</p>
<p style="margin-bottom: 0in;">I focused most of the conversation on Aleri&#8217;s forthcoming efforts outside the financial services market.  John sees these as being focused around<a href="http://www.dbms2.com/2008/10/20/coral8-proposes-cep-as-a-bi-data-platform/"> Coral8&#8242;s old &#8220;Continuous (Business) Intelligence&#8221; message</a>, enhanced by Aleri&#8217;s Live OLAP.  Aleri Live OLAP is an in-memory OLAP engine, real-time/event-driven, fed by CEP. Queries can be submitted via ODBO/MDX today.  XMLA is coming.  John reports that quite a few Coral8 customers are interested in Live OLAP, and positions the capability as one Coral8 would have had to develop had the company remained independent.<span id="more-732"></span></p>
<p style="margin-bottom: 0in;"><em>I&#8217;m a bit confused about how new or mature Aleri Live OLAP is. Although <a href="http://www.aleri.com/news/press-releases/aleri-live-olap-50-delivers-first-true-real-time-multi-dimensional-analysis-dyna">a &#8220;5.0&#8243; version was announced last May</a>, John seemed to describe Live OLAP  as a new technology just coming to market. </em></p>
<p style="margin-bottom: 0in;">Generally, the applications that kept coming up were anti-fraud and ad/interaction-targeting, across multiple markets (including online gaming). An energy management deal to be announced soon seems to be an exception.  I&#8217;m a little unclear as to how much is dashboards and how much is integrated operational BI.  John views a natural progression as being dashboard-to-hard-coded-operational-BI. But I&#8217;m not year clear as to whether the initial dashboards provide much business value, or whether they are more just tools to get executive buy-in for the real opportunities.</p>
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/03/25/aleri-update/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Analytics&#8217; role in a frightening economy</title>
		<link>http://www.dbms2.com/2009/02/07/analytics-role-in-a-frightening-economy/</link>
		<comments>http://www.dbms2.com/2009/02/07/analytics-role-in-a-frightening-economy/#comments</comments>
		<pubDate>Sat, 07 Feb 2009 11:55:15 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Application areas]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cognos]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[MOLAP]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=683</guid>
		<description><![CDATA[I chatted yesterday with the general business side (as opposed to the trading operation) of a household-name brokerage firm, one that&#8217;s in no immediate financial peril. It seems their #1 analytic-technology priority right now is changing planning from an annual to a monthly cycle.* That&#8217;s a smart idea. While it&#8217;s especially important in their business, [...]]]></description>
			<content:encoded><![CDATA[<p>I chatted yesterday with the general business side (as opposed to the trading operation) of a household-name brokerage firm, one that&#8217;s in no immediate financial peril. It seems their #1 analytic-technology priority right now is changing planning from an annual to a monthly cycle.* That&#8217;s a smart idea.  While it&#8217;s especially important in their business, larger enterprises of all kinds should consider following suit.<span id="more-683"></span></p>
<p><em>*By the way, they seem to want use Applix technology, now owned by IBM/Cognos, to do it, more for the planning tools than for the cool in-memory OLAP e</em><em>ngine itself. Your mileage may vary.</em></p>
<p>If you don&#8217;t go for fancy slice/dice tools, then do something else to make sure your drilldown or exploration are up to snuff. Just about every enterprise is going to be seeing some distressing numbers right now. But that doesn&#8217;t mean every part of every line of business is in equal trouble. Teasing that apart is important at times like this, or employment and investment could get strangled do to overreaction.</p>
<p><em>I have a small enough business that I can keep things like this in my head, without bothering to run precise three-significant-figures calculations. But I&#8217;ll tell you this &#8212; my revenue in analytics is sure healthier than my business in OLTP or custom publishing.</em></p>
<p>And don&#8217;t necessarily stop there. We&#8217;ve heard the horror stories of investment models failing in this unusual economy. Well, what about your own predictive analytic models? If you have formal models of buyer behavior, what makes you think the future will be like the past? Times have changed.  No matter what your usual schedule is, you should be revalidating and perhaps strengthening your more important models <strong>now. </strong><span> If that overtaxes your infrastructure, and you can&#8217;t afford the capital investment to do something about that &#8212; well, <a href="http://www.dbms2.com/category/pricing/">appliances are pretty cheap</a>, and some <a href="http://www.dbms2.com/2008/07/01/jerry-held-cloud-data-warehousing-business-intelligence/">SaaS data warehousing</a> offerings are (at least in the short term) even cheaper.</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/02/07/analytics-role-in-a-frightening-economy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

