<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Teradata</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/teradata/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Wed, 08 Feb 2012 12:22:57 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Terminology: Data mustering</title>
		<link>http://www.dbms2.com/2011/11/28/terminology-data-mustering/</link>
		<comments>http://www.dbms2.com/2011/11/28/terminology-data-mustering/#comments</comments>
		<pubDate>Mon, 28 Nov 2011 19:10:11 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5736</guid>
		<description><![CDATA[I find myself in need of a word or phrase that means bring data together from various sources so that it&#8217;s ready to be used, where the use can be analysis or operations. The first words I thought of were &#8220;aggregation&#8221; and &#8220;collection,&#8221; but they both have other meanings in IT. Even &#8220;data marshalling&#8221; has [...]]]></description>
			<content:encoded><![CDATA[<p>I find myself in need of a word or phrase that means <strong>bring data together from various sources so that it&#8217;s ready to be used,</strong> where the use can be analysis or operations. The first words I thought of were &#8220;aggregation&#8221; and &#8220;collection,&#8221; but they both have other meanings in IT. Even &#8220;data marshalling&#8221; has a specific meaning different from what I want. So instead, I&#8217;ll go with <strong>data mustering.</strong></p>
<p>I mean for the term &#8220;data mustering&#8221; to encompass at least three scenarios:</p>
<ul>
<li>Integrated (relational) data warehouse.</li>
<li>Big bit bucket.</li>
<li>Big bit stream.</li>
</ul>
<p>Let me explain what I mean by each.  <span id="more-5736"></span></p>
<p><strong>&#8220;Integrated data warehouse&#8221;</strong> is a phrase Teradata has started using for enterprise data warehouses that, <a href="../../../../../2010/04/12/enterprise-data-warehouse-edw-myt/">like approximately every other EDW in the entire history of data warehousing</a>, aren&#8217;t truly enterprise-wide. In other words, it means &#8220;not just a data mart&#8221;. <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">No category name is perfect</a>, but I think that one works reasonably well.</p>
<p>I previously described the <strong><a href="../../../../../2011/06/04/dirty-data-stored-dirt-cheap/">big bit bucket</a></strong> use case as</p>
<blockquote><p>Users take a whole lot of data, often <a href="../../../../../2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a> in logs of different kinds, and dump it into one place, managed by Hadoop, at open-source pricing.</p></blockquote>
<p>and quickly added</p>
<blockquote><p>Of course, there are various outfits who’d like to sell you not-so-cheap bit buckets. Contending technologies include <a href="../../../../../2011/06/02/why-you-would-want-an-appliance-and-when-you-wouldnt/">Hadoop appliances</a> (which I don’t believe in), <a href="../../../../../2009/10/18/technical-introduction-to-splunk/">Splunk</a> (which in many use cases I do), and <a href="../../../../../2010/11/29/marklogic-and-its-document-dbms/">MarkLogic</a> (ditto, but often the cases are different from Splunk’s). Cloudera and IBM, among other vendors, would also like to sell you some proprietary software to go with your standard Apache Hadoop code.</p></blockquote>
<p>I think I&#8217;ll stand pat on that explanation. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>By analogy, a <strong>big bit stream </strong>is various streams of data, assembled in the custody of a streaming engine. Sybase told me Wednesday that this scenario appears in both of the traditional markets for CEP/streaming &#8212; national intelligence, where it is a major use of streaming, and capital markets in some use cases as well. And it&#8217;s consistent with what I&#8217;ve heard from other CEP/streaming vendors as well.</p>
<p>As for where I got the word &#8220;mustering&#8221; &#8212; it&#8217;s a military term, for when you assemble your troops and their gear either for inspection or for actual use. The main modern usage I know of the word is as part of the phrase &#8220;pass muster&#8221;, which originally referred to the concept that the person being paid to put a regiment together should from time to time demonstrate that the regiment physically existed in the form that regimental records seemed to show.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/28/terminology-data-mustering/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Teradata Unity and the idea of active-active data warehouse replication</title>
		<link>http://www.dbms2.com/2011/10/03/teradata-unity-active-replication/</link>
		<comments>http://www.dbms2.com/2011/10/03/teradata-unity-active-replication/#comments</comments>
		<pubDate>Mon, 03 Oct 2011 09:07:02 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5398</guid>
		<description><![CDATA[Teradata is having its annual conference, Teradata Partners, at the same time as Oracle OpenWorld this week. That made it an easy decision for Teradata to preannounce its big news, Teradata Columnar and the rest of Teradata 14. But of course it held some stuff back, notably Teradata Unity, which is the name chosen for [...]]]></description>
			<content:encoded><![CDATA[<p>Teradata is having its annual conference, Teradata Partners, at the same time as Oracle OpenWorld this week. That made it an easy decision for Teradata to preannounce its big news, <a href="../../../../../2011/09/22/teradata-columnar-compression/">Teradata Columnar</a> and the rest of <a href="../../../../../2011/09/25/workload-management-and-ram/">Teradata 14</a>. But of course it held some stuff back, notably Teradata Unity, which is the name chosen for replication technology based on <a href="../../../../../2010/07/31/teradata-xkoto-gridscale-rip-and-active-active-clustering/">Teradata&#8217;s Xkoto acquisition</a>.</p>
<p>The core mission of Teradata Unity is asynchronous, near-real-time replication across Teradata systems. The point of &#8220;asynchronous&#8221; is performance. The point of &#8220;near-real-time&#8221; is that it Teradata Unity can be used for high availability and disaster recovery, and further can be used to allow real work on HA and DR database copies. Teradata Unity works request-at-a-time, which limits performance somewhat;* Unity has a lock manager that makes sure updates are applied in the same order on all copies, in cases where locks are needed at all.</p>
<p><span id="more-5398"></span><em>*Other options, more suitable for bulk loading and so on, are on the Teradata Unity roadmap.</em></p>
<p>The idea of doing real work on your high availability or disaster recovery database copies is an important one. Teradata systems are often used for the kinds of mission-critical purposes that call for such extra 2- or 3-way mirroring; so<strong> the ability to use all the systems for real work offers, if not exactly 2-3X price/performance savings, at least something significant. </strong>Teradata reports low but non-zero penetration in its customer base for active-active replication today. But I&#8217;m hopeful that number will increase, as Teradata Unity looks to be a big improvement over the possibilities that existed before.</p>
<p>In theory, the whole workload could be split among mirror-copy systems, although I&#8217;m sure we could construct various edge-case scenarios in which doing so would be a Bad Idea. In practice, I&#8217;d normally think of using second/third copies of a data warehouse for specific workloads, such as:</p>
<ul>
<li>Long-running queries or other analytic exercises.</li>
<li>Virtual data marts.</li>
<li>Backups, exports, and so on.</li>
</ul>
<p>Another possibility to consider is only mirroring part of your database for HA or DR, since not all missions are equally critical. Yet another possibility is to mirror the whole thing, but on systems with different performance characteristics; in case of failover, you might only keep the most crucial applications up, while turning the others off until you can again run on a system powerful enough to handle them.</p>
<p>As Teradata tells it, Teradata Unity has two key aspects:</p>
<ul>
<li><strong>Multi-System Synchronization.</strong> DDL, DML and DCL (Data Description/Manipulation/Control Language) all are replicated. I.e., data gets copied around, and so does everything else.</li>
<li><strong>Query Management. </strong>Queries get shipped around to appropriate systems based on:
<ul>
<li>Which systems manage all the data needed to execute the query.</li>
<li>Which is up and running at the moment.</li>
<li>Which is backlogged at the moment. (I get the impression Teradata Unity load balancing is fairly basic in the first release, but there is some.)</li>
</ul>
</li>
</ul>
<p>Further details may be seen in the <a href="http://www.monash.com/uploads/Teradata-Unity.pdf">slide deck</a> Teradata graciously sent over for posting.</p>
<p>And finally, here&#8217;s some Teradata product name housekeeping:</p>
<ul>
<li>The first release of Teradata Unity will be numbered 13.10. Not coincidentally, that&#8217;s the version number of Teradata&#8217;s latest database software. Teradata Unity 13.10 will support Teradata 13 and 13.10.</li>
<li>Teradata Unity 14 will ship soon after Teradata 14. It will support Teradata 13, 13.10, and 14.</li>
<li>Teradata Unity runs on a &#8220;Managed Server&#8221;. &#8220;Managed Servers&#8221; are nodes inside Teradata&#8217;s cabinets, managed by Teradata&#8217;s system software and so on, but which do not run Teradata database software.</li>
<li>In the 14.10 release, Teradata Unity:
<ul>
<li>Will scale out across multiple Managed Servers.</li>
<li>Will be able to serve as a general load facility for Teradata.</li>
<li>Teradata Unity replaces Teradata Query Direct.</li>
<li>While I suspect Teradata Unity may replace Teradata Data Mover (bulk data copy) in the future, it surely doesn&#8217;t yet.</li>
<li>Teradata Unity works with Teradata Multi-System Manager, which does things like end-to-end job management).</li>
</ul>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/10/03/teradata-unity-active-replication/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Highlights of a busy news week</title>
		<link>http://www.dbms2.com/2011/09/26/highlights-of-a-busy-news-week/</link>
		<comments>http://www.dbms2.com/2011/09/26/highlights-of-a-busy-news-week/#comments</comments>
		<pubDate>Mon, 26 Sep 2011 05:50:35 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[DataStax]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Ingres]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[VectorWise]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5372</guid>
		<description><![CDATA[I put up 14 posts over the past week, so perhaps you haven&#8217;t had a chance yet to read them all. Highlights included: My most important post of the week was a general guide to IT vendor strategy. That one has already spawned discussion at many companies, from the tiny to the multi-billion-dollar. The best [...]]]></description>
			<content:encoded><![CDATA[<p>I put up 14 posts over the past week, so perhaps you haven&#8217;t had a chance yet to read them all. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Highlights included:</p>
<ul>
<li>My most important post of the week was a general <a href="http://www.strategicmessaging.com/strategy-for-it-vendors-a-worksheet/2011/09/18/">guide to IT vendor strategy</a>. That one has already spawned discussion at many companies, from the tiny to the multi-billion-dollar.</li>
<li>The best comment thread of the week was probably on my post about <a href="http://www.dbms2.com/2011/09/19/oltp-disk-solid-state/">scale-out relational OLTP choices</a>, in which people discussed the merits of various particular alternatives.</li>
<li>I recommended that people strongly consider attending <a href="http://www.dbms2.com/2011/09/20/xldb-the-one-conference-i-like-to-go-to/">XLDB 5 in Menlo Park on October 18-19</a>.</li>
</ul>
<p>Most of the posts, however, were reactions to news events. In particular:</p>
<ul>
<li>Teradata announced that <a href="http://www.dbms2.com/2011/09/22/teradata-columnar-compression/">Teradata 14 will be hybrid-columnar</a>, more in Vertica&#8217;s way than in Greenplum&#8217;s or Aster Data&#8217;s. (Pay no attention to the <em>Wall Street Journal&#8217;s</em> apparent belief that <a href="http://www.dbms2.com/2011/09/22/hybrid-columnar-soundbites/">no other analytic DBMS is hybrid-columnar at all</a>.)</li>
<li>Aster announced the unsurprising news that there will be a Teradata Aster appliance. Also, <a href="http://www.dbms2.com/2011/09/22/aster-database-release-5-and-teradata-aster-appliance/">Aster talked about greater analytic flexibility in the forthcoming Aster 5.0</a>.</li>
<li>With Oracle OpenWorld coming up, Oracle decided to get some of its announcing out of the way early. In particular, it announced the <a href="http://www.dbms2.com/2011/09/21/oracle-database-appliance-soundbites/">Oracle Database Appliance</a>, which is small-business-friendly hardware for running the Oracle DBMS. However, the Oracle Database Appliance doesn&#8217;t seem to do much about the complexity of running the Oracle DBMS software.</li>
<li>In <a href="http://www.dbms2.com/2011/09/23/hadoop-appliances/">a catch-all Hadoop post</a>, I noted that:
<ul>
<li>Oracle has now clearly said it has a Hadoop appliance coming, no doubt next week at OpenWorld.</li>
<li>I still can&#8217;t see why Hadoop appliances would succeed, but a lot of smart folks seem to disagree with me.</li>
<li>Greenplum announced what looks like a nice but unimportant little product upgrade.</li>
<li>It&#8217;s a really good thing that previously reported plans to revamp Hadoop are underway.</li>
</ul>
</li>
<li>DataStax announced that <a href="http://www.dbms2.com/2011/09/22/datastax-pivots-back-to-its-original-strategy/">it really is a Cassandra company after all</a>. Pay no attention to previous marketing that seemed to put DataStax in the same Hadoop-alternative category as, say, MapR.</li>
<li><a href="../2011/09/25/ingres-actian/">Ingres has changed its name to Actian</a>. The announcement seems like a confession that Ingres and VectorWise are going nowhere.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/26/highlights-of-a-busy-news-week/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Workload management and RAM</title>
		<link>http://www.dbms2.com/2011/09/25/workload-management-and-ram/</link>
		<comments>http://www.dbms2.com/2011/09/25/workload-management-and-ram/#comments</comments>
		<pubDate>Sun, 25 Sep 2011 05:04:35 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5354</guid>
		<description><![CDATA[Closing out my recent round of Teradata-related posts, here&#8217;s a little anomaly: Teradata is proud that Teradata 14&#8242;s workload management now explicitly manages I/O, to go with Teradata&#8217;s long-standing management of CPU. Teradata&#8217;s WLM still does not explicitly manage RAM. Aster is proud that Aster 5&#8242;s workload management now explicitly manages RAM, to go along [...]]]></description>
			<content:encoded><![CDATA[<p>Closing out my recent round of Teradata-related posts, here&#8217;s a little anomaly:</p>
<ul>
<li>Teradata is proud that <a href="../../../../../2011/09/22/teradata-columnar-compression/">Teradata 14&#8242;s</a> workload management now explicitly manages I/O, to go with Teradata&#8217;s long-standing management of CPU. Teradata&#8217;s WLM still does not explicitly manage RAM.</li>
<li>Aster is proud that <a href="../../../../../2011/09/22/aster-database-release-5-and-teradata-aster-appliance/">Aster 5&#8242;s workload management now explicitly manages RAM</a>, to go along with <a href="../../../../../2009/10/30/aster-data-application-server-ncluster/">the WLM capabilities Aster has had for a while managing CPU and I/O</a>. Aster&#8217;s Tasso Argyros believes this is an important capability, at least in some edge cases.</li>
<li>Mike Pilcher of SAND emailed me that SAND&#8217;s WLM capabilities to explicitly manage CPU, I/O, and RAM are very well-received by the marketplace.</li>
</ul>
<p><span id="more-5354"></span>One would think that Teradata&#8217;s workload management is more sophisticated and powerful than Aster Data&#8217;s.* So I asked Scott Gnau what gives (he was pretty much the ideal guy to comment, since he runs development for Teradata and oversees Teradata&#8217;s Aster acquisition as well).</p>
<p><em>*Except, of course, that Aster was a pioneer in having workload management cover all kinds of analytic processes, rather than just traditional database requests.</em></p>
<p>Scott&#8217;s main response was that Aster&#8217;s system was much more consumptive  of RAM than Teradata&#8217;s; indeed, he reminded me that in the very old  days, Teradata could make do with as little as 4 megabytes. Scott also  did not argue when I suggested that Aster&#8217;s not-just-database analytic  processes might require large amounts of RAM as well.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/25/workload-management-and-ram/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Confusion about Teradata&#8217;s big customers</title>
		<link>http://www.dbms2.com/2011/09/24/confusion-about-teradatas-big-customers/</link>
		<comments>http://www.dbms2.com/2011/09/24/confusion-about-teradatas-big-customers/#comments</comments>
		<pubDate>Sun, 25 Sep 2011 03:50:02 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5349</guid>
		<description><![CDATA[Evidently further attempts to get information on this subject would be fruitless, but anyhow: Teradata emailed me a couple of months ago saying something like that at that point they could count 16 petabyte-level customers. In response to my repeated requests for clarification, Teradata has explicitly refused to identify the metric used in reaching that [...]]]></description>
			<content:encoded><![CDATA[<p>Evidently further attempts to get information on this subject would be fruitless, but anyhow:</p>
<ul>
<li>Teradata emailed me a couple of months ago saying something like that at that point they could count 16 petabyte-level customers. In response to my repeated requests for clarification, Teradata has explicitly refused to identify the metric used in reaching that conclusion.</li>
<li>At some point Teradata did something &#8212; as per a tweet of his &#8212; to convince Neil Raden that they have 20 petabyte-class users.</li>
<li>That tweet was made around the time that Teradata apparently showed a slide naming big users at the Strata conference (last week).</li>
<li>If Teradata is counting <a href="http://www.dbms2.com/2008/10/15/teradatas-petabyte-power-players/">the way they did three years ago</a>, that count of 16 or 20 or whatever is probably inflated compared to, say, <a href="http://www.dbms2.com/2011/06/20/columnar-dbms-vendor-customer-metrics/">Vertica&#8217;s figure of 7</a> a few months back.</li>
<li>Even so, it&#8217;s obvious &#8212; and not just from the <a href="http://www.dbms2.com/2010/10/06/ebay-followup-greenplum-out-teradata-10-petabytes-hadoop-has-some-value-and-more/">eBay</a> example &#8212; that Teradata has one of the most scalable analytic DBMS offerings around.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/24/confusion-about-teradatas-big-customers/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Hybrid-columnar soundbites</title>
		<link>http://www.dbms2.com/2011/09/22/hybrid-columnar-soundbites/</link>
		<comments>http://www.dbms2.com/2011/09/22/hybrid-columnar-soundbites/#comments</comments>
		<pubDate>Thu, 22 Sep 2011 18:06:30 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5326</guid>
		<description><![CDATA[Busy couple of days talking with reporters. A few notes on hybrid-columnar analytic DBMS, all backed up by yesterday&#8217;s post on Teradata columnar: Oracle does not actually offer columnar I/O; the other three systems do. But see the &#8220;I won&#8217;t be surprised&#8221; part in yesterday&#8217;s Teradata post. Aster does not offer columnar compression; the other [...]]]></description>
			<content:encoded><![CDATA[<p>Busy couple of days talking with reporters. A few notes on hybrid-columnar analytic DBMS, all backed up by <a href="http://www.dbms2.com/2011/09/22/teradata-columnar-compression/">yesterday&#8217;s post on Teradata columnar</a>:</p>
<ul>
<li>Oracle does not actually offer columnar I/O; the other three systems do. But see the &#8220;I won&#8217;t be surprised&#8221; part in yesterday&#8217;s Teradata post.</li>
<li>Aster does not offer columnar compression; the other three do.</li>
<li>EMC  Greenplum and Teradata offer different kinds of ways to mix column and  row storage in the same table; each has its advantages.</li>
<li>Teradata  generally has a more mature and capable offering than EMC Greenplum, for  most purposes, whichever way you choose to organize your tables.</li>
</ul>
<p><em>Edit: The <a href="http://online.wsj.com/article/BT-CO-20110921-715547.html">Wall Street Journal</a> got this wrong, writing that Teradata was the first-ever hybrid columnar system. Specifically, they wrote</em></p>
<p><em> </em></p>
<blockquote><p><em>While columnar technology has been around for years, Teradata says its  product is unique because it allows users to include both columns and  rows in the same database.</em></p></blockquote>
<p><em> </em></p>
<p><em>Googling on &#8220;Teradata To Unveil New Analytics Product To Speed Business Adoption&#8221; might get you around the paywall to see the offending piece.<br />
</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/22/hybrid-columnar-soundbites/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Aster Database Release 5 and Teradata Aster appliance</title>
		<link>http://www.dbms2.com/2011/09/22/aster-database-release-5-and-teradata-aster-appliance/</link>
		<comments>http://www.dbms2.com/2011/09/22/aster-database-release-5-and-teradata-aster-appliance/#comments</comments>
		<pubDate>Thu, 22 Sep 2011 05:56:45 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5304</guid>
		<description><![CDATA[It was obviously just a matter of time before there would be an Aster appliance from Teradata and some tuned bidirectional Teradata-Aster connectivity. These have now been announced. I didn&#8217;t notice anything particularly surprising in the details of either. About the biggest excitement is that Aster is traditionally a Red Hat shop, but for the [...]]]></description>
			<content:encoded><![CDATA[<p>It was obviously just a matter of time before there would be an Aster appliance from Teradata and some tuned bidirectional Teradata-Aster connectivity. These have now been announced. I didn&#8217;t notice anything particularly surprising in the details of either. About the biggest excitement is that Aster is traditionally a Red Hat shop, but for the purposes of appliance delivery has now embraced SUSE Linux.</p>
<p>Along with the announcements comes updated positioning such as:</p>
<ul>
<li>Better SQL than the MapReduce alternatives have.</li>
<li>Better MapReduce than the SQL alternatives have.</li>
<li>Easy(ier) way to do complex analytics on <a href="../../../../../2011/05/15/what-to-do-about-unstructured-data/">multi-structured data</a>. (Aster has embraced that term.)</li>
</ul>
<p>and of course</p>
<ul>
<li>Now also with Teradata&#8217;s beautifully engineered hardware and system management software!</li>
</ul>
<p><span id="more-5304"></span>As might also be expected, the announcements are accompanied by pictures along the lines of &#8220;There are your various data sources; there&#8217;s Teradata; there&#8217;s Aster; there&#8217;s Hadoop; look at all the nice arrows connecting them!&#8221;</p>
<p>Teradata Aster further decided it was time for a 5.0 DBMS release. Highlights include:</p>
<ul>
<li>Aster&#8217;s SQL-MapReduce has more flexible inputs. Specifically, if you view SQL/ MapReduce as steroid-enhanced table functions, those functions can now each have multiple tables as input. Aster is rightly positioning this as the key feature of the Aster 5.0 release.</li>
<li>Workload management now explicitly manages not only CPU and I/O, but also RAM. That surely makes it safer to use algorithms which aggressively create temporary data structures. And the allocation is dynamic, in that it can be throttled back if workloads require.</li>
<li>There&#8217;s more SQL functionality &#8212; I think this is minor, as Aster seems to have had pretty good SQL coverage already.</li>
<li>Performance has been improved; i.e., <a href="../../../../../2009/08/21/bottleneck-whack-a-mole/">Bottleneck Whack-A-Mole</a> has progressed in multiple ways. One improvement Aster thinks is cutting-edge is a hybrid kind of join that tries to be a hash, then reverts to a merge if it has to spill out of memory. (E.g., if the available RAM is throttled back.)</li>
</ul>
<p>Also, Aster is always expanding its library of <a href="../../../../../2010/06/27/lots-of-aster-data-analytic-packages/">prebuilt analytic functions/packages</a> &#8212; often in connection with specific customer engagements &#8212; and took this opportunity to mention numerous recent or near-future additions to the list.</p>
<p>Part of Aster&#8217;s motivation in making multiple input tables available to its parallel analytic functions seems to be to allow the use of intermediate result sets alongside raw data. In some ways, this seems to be an alternative to <a href="../../../../../2011/04/21/sas-hpa-does-make-sense-after-all/">the MPI-based approach favored by SAS</a>, and highlights limitations of the vanilla MapReduce paradigm. The specific examples given were k-means clustering and &#8212; which I&#8217;d never heard of before &#8212; SAX pattern matching.</p>
<p>For an example of two true data tables being used as inputs, Aster offered a case of advertising attribution, with the data being about impressions and also conversions. Frankly, I suspect a &#8220;join them all and let MapReduce sort them out&#8221; strategy would also work for that application; if you join on something like Customer_ID, just how big would the result set really be? Even so, we can imagine other cases in which messy boundaries for graphs or time series makes that strategy unappealing, and &#8212; you read it here first! &#8212; <a href="../../../../../2011/09/08/aster-data-business-trends/">Aster&#8217;s target use cases are focused on time series and graphs</a>.</p>
<p>And finally: Whenever I ask the Aster folks &#8220;So, how big are Aster databases that are actually in production?&#8221;, they try to convince me that this is the wrong thing to ask. But &#8212; without actually answering the question &#8212; they did say:</p>
<ul>
<li>The new Teradata Aster appliance has been tested to a couple hundred terabytes.</li>
<li>They are very confident about scaling Aster to a few hundred terabytes.</li>
<li>They don&#8217;t have much in the way of proof in the 1 petabyte range.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/22/aster-database-release-5-and-teradata-aster-appliance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Teradata Columnar and Teradata 14 compression</title>
		<link>http://www.dbms2.com/2011/09/22/teradata-columnar-compression/</link>
		<comments>http://www.dbms2.com/2011/09/22/teradata-columnar-compression/#comments</comments>
		<pubDate>Thu, 22 Sep 2011 05:25:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Rainstor]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5296</guid>
		<description><![CDATA[Teradata is pre-announcing Teradata 14, for delivery by the end of this year, where by &#8220;Teradata 14&#8243; I mean the latest version of the DBMS that drives the classic Teradata product line. Teradata 14&#8242;s flagship feature is Teradata Columnar, a hybrid-columnar offering that follows in the footsteps of Greenplum (now part of EMC) and Aster [...]]]></description>
			<content:encoded><![CDATA[<p>Teradata is pre-announcing Teradata 14, for delivery by the end of this year, where by &#8220;Teradata 14&#8243; I mean the latest version of the DBMS that drives the classic Teradata product line. Teradata 14&#8242;s flagship feature is Teradata Columnar, a hybrid-columnar offering that follows in the footsteps of <a href="../../../../../2009/10/14/greenplum-hybrid-columnar/">Greenplum</a> (now part of EMC) and <a href="../../../../../2010/09/15/aster-data-ncluster-version-4-6/">Aster Data</a> (now part of Teradata).</p>
<p>The basic idea of Teradata Columnar is:</p>
<ul>
<li>Each table can be stored in Teradata in row format, column format, or a mix.</li>
<li>You can do almost anything with a Teradata columnar table that you can do with a row-based one.</li>
<li>If you choose column storage, you also get some new compression choices.</li>
</ul>
<p><span id="more-5296"></span>The &#8220;mix&#8221; option is like Vertica&#8217;s <a href="../../../../../2009/08/04/flexstore-and-the-rest-of-vertica-35/">FlexStore</a>, in that different columns (e.g. different components of a street address) can be grouped into a mini-row, even if you otherwise choose to store that table in a columnar way. Teradata does not at this time offer the Greenplum or Aster way of mixing rows and columns, whereby some of the rows in a table can be stored in a column-store way, while other rows are stored in entire-row row-store solidarity</p>
<p>Thus, Teradata Columnar gives you many of the basic I/O and compression benefits of columnar DBMS, along with all the usual Teradata goodness of concurrency, workload management, system management, concurrency, SQL support, and so on. By way of comparison:</p>
<ul>
<li>Similar things are true of Greenplum&#8217;s offering (except for the parts about concurrency, advanced workload management, and so on).</li>
<li>Aster doesn&#8217;t have columnar compression.</li>
<li>Oracle has <a href="../../../../../2011/02/06/columnar-compression-database-storage/">columnar compression but no true columnar storage</a>.*</li>
</ul>
<p>Also, as I noted above, Teradata mixes rows and columns in a different way than Aster or EMC Greenplum do.</p>
<p><em>*However, I won&#8217;t be surprised if Oracle soon announces true hybrid-columnar as well. I originally heard about Teradata Columnar and Oracle&#8217;s efforts to develop true hybrid-columnar storage the same week, 23 months ago.</em></p>
<p>Going hybrid-columnar is a big deal. Aster Data, for example, told me that a considerable fraction of all its workloads ran faster with columnar than row-based storage.* And it&#8217;s of extra importance to a vendor that, like Teradata, needs to play catch-up in the compression derby.</p>
<p><em>*Anything in which the queries eliminated more than half or so of the columns (60%, if I recall correctly, but it was definitely an approximate figure). That pretty much means any query except full and near-full table scans.</em></p>
<p>Teradata&#8217;s columnar compression story is pretty complicated. To quote from a forthcoming press release:</p>
<blockquote><p>Teradata automatically chooses from among six types of compression: run length, dictionary, trim, delta on mean, null and UTF8. based on the column demographics.</p></blockquote>
<p>The trickiest words in that are &#8220;automatic&#8221; and &#8220;dictionary&#8221;. Teradata divides column-store data into &#8220;column containers&#8221; of, say, 8 KB. (Current thinking is 8 KB default, 65 KB maximum, but that could change by the time of product release.) By default, Teradata software decides separately for each column container which compression algorithm(s) to use. It can even change its mind dynamically over time, as the contents of the container change.</p>
<p>What I find weird about Teradata&#8217;s columnar dictionary compression is that the dictionary is container-specific. One benefit versus having a more global dictionary is that, since you compress fewer items, compression tokens can each be shorter. (The length of a typical token is a lot like the log of the cardinality of the dictionary.) Another benefit is that smaller dictionaries are faster to search. The obvious offsetting drawback is that a larger and more global dictionary has the potential to compress various items that wind up being left uncompressed in this smaller-scale scheme.</p>
<p>Other notes about Teradata compression include:</p>
<ul>
<li>Teradata has for a while had a more manual form of dictionary compression.</li>
<li>Teradata also has block-level compression.</li>
<li>You can do block-level compression even on top of the columnar compression described above.</li>
<li>The Teradata/Rainstor partnership for archiving-level compression that Rainstor made so much fuss about doesn&#8217;t seem to actually be happening; Teradata seems content with the other compression choices it offers.</li>
</ul>
<p>And finally, Teradata 14 extends <a href="../../../../../2008/10/14/teradata-virtual-storage/">Teradata Virtual Storage</a> with a feature called Compress on Cold. The idea is that &#8220;cold&#8221; data can safely get (extra) compression &#8212; that block-level stuff &#8212; automatically. If the data heats up again (e.g. by becoming relevant for a while to the latest year-over-year comparisons) it can be just as automatically removed from compression. Teradata thinks this is significantly better than the alternative of making manual compression choices based on not-so-granular range partitions.</p>
<p>Unsurprisingly, Teradata lacks some features and benefits found in certain columnar-first analytic DBMS. One biggie is that, absent clever workarounds such as Vertica&#8217;s in-memory write-optimized store, columnar DBMS have a single-row-update performance problem, because you are putting the information in many places on disk rather than just one. I generally take it for granted that a columnar-first vendor has such a workaround. Row-based vendors gone columnar, however, are a different story. Teradata et al. are also likely to decompress data and reassemble it into full rows as soon as it hits RAM, which obviates the potential benefit that you have less data per row clogging up cache.*<em> (Edit: As per Todd Walter&#8217;s comments below, this is not accurate &#8212; and that&#8217;s a potentially important feature.)</em></p>
<p><em>*Late decompression actually depends on columnar compression, not columnar storage, and hence can also be enjoyed by row-based DBMS such as </em><a href="../../../../../2010/06/21/netezza-ibm-db2-compression/"><em>DB2</em></a><em>. </em></p>
<p>To use Teradata Columnar, you need to be using round-robin data distribution rather than, say, hash. Teradata jargon for this is NoPI, where the &#8220;PI&#8221; stands for Primary Index.* Drawbacks to that include:</p>
<ul>
<li>You don&#8217;t get the hash distribution benefit of saving a data redistribution step on joins whose join key happens to be the same as the hash key.</li>
<li>In Teradata-land, NoPI implies append-only, so you get the garbage collection/compactification that implies.</li>
</ul>
<p>However, that&#8217;s a physical append-only; you can still do logical updates.</p>
<p><em>*PI is not to be confused with PPI, which stands for Primary Partition Index, and is Teradata&#8217;s name for range (or case-statement-based) partitioning. PPI works just fine with Teradata Columnar. As of Teradata 14, you can do PPI up to 62 levels deep.</em></p>
<p>The Teradata folks also sent along a slide deck laying out parts of the <a href="http://www.monash.com/uploads/Teradata-Columnar-September-2011.ppt">Teradata Columnar</a> story. But it&#8217;s not one of the better Teradata decks I&#8217;ve ever posted.<em><br />
</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/22/teradata-columnar-compression/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Aster Data business trends</title>
		<link>http://www.dbms2.com/2011/09/08/aster-data-business-trends/</link>
		<comments>http://www.dbms2.com/2011/09/08/aster-data-business-trends/#comments</comments>
		<pubDate>Thu, 08 Sep 2011 05:33:56 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Application areas]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[DataStax]]></category>
		<category><![CDATA[Liberty and privacy]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5204</guid>
		<description><![CDATA[Last month, I reviewed with the Aster Data folks which markets they were targeting and selling into, subsequent to acquisition by their new orange overlords. The answers aren&#8217;t what they used to be. Aster no longer focuses much on what it used to call frontline (i.e., low-latency, operational) applications; those are of course a key [...]]]></description>
			<content:encoded><![CDATA[<p>Last month, I reviewed with the Aster Data folks which markets they were targeting and selling into, subsequent to <a href="../../../../../2011/03/04/teradata-aster-data-ncluster/">acquisition</a> by their new orange overlords. The answers aren&#8217;t what they used to be. Aster no longer focuses much on what it used to call <a href="../../../../../2008/10/22/aster-data-systems-ncluster/">frontline</a> (i.e., low-latency, operational) applications; those are of course a key strength for Teradata. Rather, Aster focuses on <a href="../../../../../2011/03/03/investigative-analytics/">investigative analytics</a> &#8212; they&#8217;ve long <a href="../../../../../2011/02/12/upcoming-webinar-on-investigative-analytics/">endorsed</a> my use of the term &#8212; and on the batch run/scoring kinds of applications that inform operational systems.</p>
<p><span id="more-5204"></span>Also, Aster no longer focuses much on the general internet industry where it got its earliest sales, its <a href="../../../../../2011/09/05/zynga-linkedin-data-warehous/">continued success at LinkedIn</a> and a recent win at <span style="text-decoration: line-through;">an (NDA) fairly-big-name internet new account</span> <em>Razorfish</em> notwithstanding. That said, the first target market Aster did share with me was &#8220;digital marketing optimization,&#8221; which includes &#8220;marketing optimization&#8221; (duh), search engine optimization (SEO), clickstream analysis, and the like. Also, Aster is going after &#8220;data scientists&#8221; in general, and that&#8217;s a term I&#8217;m still seeing used most frequently in the internet area.</p>
<p><em>I&#8217;m seeing ever more granularity as companies break down internet-related market segments. DataStax showed me a chart last week of 15 different market segments it had sold into, and at least 14 were in some way internet-related.</em></p>
<p>Rather, if Aster is to name three industries in which it has pleasingly strong sales traction, it would say manufacturing (which in Teradata lingo includes resource extraction), financial services (including insurance), and retail. A cynic might note that that breakdown, like many similar ones, adds up to fairly large swaths of the economy and the computer market, but never mind that part. (Other firms might have thrown in telecommunications and health care as well, to get even more coverage.</p>
<p>Two of Aster&#8217;s other favorite application areas are social network analysis/influencer identification and &#8212; which is analytically very similar &#8212; fraud detection/prevention. Taken together, that&#8217;s a whole lot of graph analysis. And I note with interest that the influencer identification stuff does NOT seem to be concentrated in telecom, which is the traditional sector one would imagine it being used in; all those call records are a lovely source of graph edges. Rather, the influencers seem to be identified from sources such as social media and credit card data .</p>
<p><em>Once again, this kind of thing gives me privacy jitters.</em></p>
<p>The match between Aster&#8217;s favorite industries and application areas is pretty much as you might expect &#8212; fraud in financial services, influencer analysis in retailing (and probably consumer financial services too), and digital marketing in both. As for manufacturing, the opportunities there seem to be focused on machine-generated data. That would be at least in high-tech manufacturing (I bet especially in flow-oriented stuff such as semiconductor fab) and oil/gas. Smart grid opportunities don&#8217;t seem to have arisen yet for Aster the way they have for a couple other vendors.</p>
<p>As for general Aster business trends, I think they&#8217;re good, while Aster would perhaps want to portray them as very good. Aster named a couple of impressive joint Teradata/Aster wins under NDA, but only a couple. Ramping up sales headcount is proving challenging, and some sales leadership turnover probably hasn&#8217;t helped. I do believe Aster&#8217;s spin that this is a matter of somebody being promoted quickly to a bigger job, and am optimistic about the current team &#8212; still, such moves tend to have at least short-term cost.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/08/aster-data-business-trends/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Eight kinds of analytic database (Part 1)</title>
		<link>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/</link>
		<comments>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 08:17:44 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Buying processes]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MOLAP]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[QlikTech and QlikView]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Web analytics]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4868</guid>
		<description><![CDATA[Analytic data management technology has blossomed, leading to many questions along the lines of &#8220;So which products should I use for which category of problem?&#8221; The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for &#8220;big data&#8221; is little help. Let&#8217;s try eight categories instead. While no categorization [...]]]></description>
			<content:encoded><![CDATA[<p>Analytic data management technology has blossomed, leading to many questions along the lines of &#8220;So which products should I use for which category of problem?&#8221; The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for &#8220;big data&#8221; is little help.</p>
<p>Let&#8217;s try eight categories instead. While <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">no categorization is ever perfect</a>, these each have at least some degree of technical homogeneity. Figuring out which types of analytic database you have or need &#8212; and in most cases you&#8217;ll need several &#8212; is a great early step in your analytic technology planning.  <span id="more-4868"></span></p>
<p><strong><em>Enterprise data warehouse</em></strong> (Full or partial)</p>
<ul>
<li><em>Kinds of data likely to be included:</em> All, but especially operational</li>
<li><em>Likely use styles:</em> All</li>
<li><em>Canonical example:</em> Central EDW for a big enterprise</li>
<li><em>Stresses:</em> Concurrency, reliability, workload management</li>
</ul>
<p>The enterprise data warehouse (EDW) ideal says that you copy all your data into one place, and drive all decision-making from there. <a href="../../../../../2011/06/21/its-official-the-grand-central-edw-will-never-happen/">Full EDWs are pipedreams</a>. Still, a partial EDW makes sense for most large enterprises, and many indeed already have one. The first product lines to consider for classical EDWs are Teradata, DB2, Exadata, and maybe Microsoft SQL Server, especially if you&#8217;re going to stress concurrency and/or operational use cases.</p>
<p><strong><em>Traditional data mart</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All</li>
<li><em>Likely use styles:</em> Business intelligence, budgeting/consolidation, investigative</li>
<li><em>Examples:</em> Reporting servers, planning/consolidation servers, anything MOLAP, etc.</li>
<li><em>Stresses:</em> Performance, concurrency, TCO</li>
</ul>
<p>Whether or not you have something like an enterprise data warehouse, it&#8217;s common to have lighter-weight data marts as well. A traditional data mart might drive reports and dashboards. Or it might be specialized for budgeting, planning, and/or consolidation.  Some <a href="../../../../../2011/03/03/investigative-analytics/">investigative analytics</a> may be in the mix as well.</p>
<p>Any DBMS that can support an EDW can also support a data mart, but it may not be the most cost-effective way to do so. Columnar DBMS might have more attractive performance and TCO (Total Cost of Ownership); the same goes for Netezza. Some of them &#8212; e.g. Sybase IQ and <a href="../../../../../2011/06/20/vertica-release-5/">Vertica</a> &#8212; have excellent track records in concurrent usage as well. <a href="../../../../../2011/05/29/when-to-use-relational-database-management-system/">Ted Codd</a> pushed what amounts to MOLAP (Multidimensional OnLine Analytic Processing) systems for these use cases. But relational DBMS commonly do a better job, which is one reason most major MOLAP products have wound up at RDBMS companies.</p>
<p><strong><em>Investigative data mart &#8212; agile</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All, especially customer-centric</li>
<li><em>Likely use styles</em>: Investigative</li>
<li><em>Canonical example:</em> A few analysts getting a few TB to examine</li>
<li><em>Stresses:</em> Ease of setup/load, ease of admin, price/performance</li>
</ul>
<p>Besides the traditional data mart, there are at least two other kinds. Both are focused on investigative analytics, but they&#8217;re differentiated by database size.</p>
<p>If you have just a few analysts,* looking at no more than a few terabytes of data (perhaps even just some gigabytes) &#8212; and if that data is &#8220;single-subject&#8221; and fairly homogenous &#8212; your watchwords should be &#8220;cheap&#8221;, &#8220;easy&#8221;, and &#8220;fast&#8221;. You don&#8217;t need to invest in much hardware, in expensive software, in much administrative effort (the analysts can be their own DBAs),  nor should you endure much set-up time. Just grab a product, grab some data, and start running queries (or extracts into the statistical tool of your choice).</p>
<p><em>*If you have dozens or even hundreds of analysts hitting the same database, you&#8217;re probably back to the more concurrency-oriented scenarios outlined above.</em></p>
<p>Infobright is often cost-effective among columnar analytic DBMS. Other vendors might cut you a price break as well. If you have multiple terabytes of data, don&#8217;t rule out Netezza&#8217;s lowest-end products (even if they&#8217;d really rather sell you something bigger). Or, if you&#8217;re in the sub-terabyte range, maybe you can get by with an in-memory BI tool such as QlikView, and not do anything special on the DBMS side at all.</p>
<p><strong><em>Investigative data mart &#8212; big</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All, especially customer-centric, logs, financial trade, scientific</li>
<li><em>Likely use styles</em>: Investigative</li>
<li><em>Canonical example:</em> Single-subject 20 TB &#8211; 20 PB relational database<em></em></li>
<li><em>Stresses:</em> Performance, scale-out, analytic functionality</li>
</ul>
<p>But if you&#8217;re looking at tens of terabytes of relational data, or even more, you really do have a &#8220;big data&#8221; problem. Performance and scalability are major challenges, usually best addressed by MPP (Massively Parallel Processing) systems, such as Netezza, Vertica, Aster Data, ParAccel, Teradata, or Greenplum. Performance POCs (Proofs Of Concept) are a big part of the buying process. Vendor price negotiations are crucial too.</p>
<p><em>Actually, in the low tens of terabytes you might be able to get away with a shared-disk system that has excellent compression &#8212; e.g., columnar products like Sybase IQ, Infobright, or SAND, rather than just Vertica and ParAccel.</em></p>
<p>Assuming you have affordable, scalable query performance, the competitive differentiator can switch to additional analytic functionality. Aster, Netezza, ParAccel, Vertica, and Greenplum either offer full <a href="../../../../../2011/02/24/analytic-platforms/">analytic platforms</a>, or seem to be on the path to doing so. Teradata, which now owns Aster Data, offers substantial built-in analytic capability in its traditional products as well, and the same goes for Sybase IQ.</p>
<p><em>Continued in <a href="http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/">Part 2</a>,</em><em> where we cover some of the more difficult use cases.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

