<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Kognitio</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/kognitio/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Wed, 08 Feb 2012 12:22:57 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Database implications if IBM acquires Sun</title>
		<link>http://www.dbms2.com/2009/03/18/database-implications-if-ibm-acquires-sun/</link>
		<comments>http://www.dbms2.com/2009/03/18/database-implications-if-ibm-acquires-sun/#comments</comments>
		<pubDate>Wed, 18 Mar 2009 14:48:13 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EnterpriseDB and Postgres Plus]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Ingres]]></category>
		<category><![CDATA[Kickfire]]></category>
		<category><![CDATA[Kognitio]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Mid-range]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[solidDB]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=722</guid>
		<description><![CDATA[Reported or rumored merger discussions between IBM and Sun are generating huge amounts of discussion today (some links below). Here are some quick thoughts around the subject of how the IBM/Sun deal &#8212; if it happens &#8212; might affect the database management system industry. IBM is already serious about supporting multiple database management systems. DB2 [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Reported or rumored merger discussions between IBM and Sun are generating huge amounts of discussion today (some links below).  Here are some quick thoughts around the subject of how the IBM/Sun deal &#8212; <strong>if</strong> it happens &#8212; might affect the database management system industry.<span id="more-722"></span></p>
<ul>
<li><strong>IBM is already serious about 	supporting multiple database management systems.</strong> DB2 on open 	systems is IBM&#8217;s flagship DBMS.  But DB2 on mainframes and at least 	one flavor of Informix seem to be getting maintained and enhanced 	fairly seriously as well.  And IBM has further DBMS products as well 	(e.g., DB/2 on the AS/400). <strong>There&#8217;s little reason to think IBM 	would orphan MySQL or any other DBMS product.</strong></li>
<li><strong>IBM is very 	open-source-friendly. </strong><span>For a 	company that grew up for decades on proprietary  software &#8212; and 	still is a huge software products vendor &#8212; IBM is very serious 	about open source.  If you doubt that, I have two words for you:  	&#8220;Linux&#8221; and &#8220;Eclipse&#8221;.</span></li>
<li><strong>MySQL might finally get its 	industrial-strength act together.</strong> IBM is good at database 	management and good at open source.  MySQL becoming a no-apologies 	transactional DBMS would obviously put pressure on Ingres, 	PostgreSQL, and EnterpriseDB, although there surely would be lots of 	happy talk about the open source DBMS market being validated, 	lifting all the vendors and so on. Also, a better MySQL could be bad 	news for Microsoft SQL Server too.</li>
<li><strong>Sun has a lot of DBMS partnerships 	right now.</strong> Obviously, Sun owns MySQL, and has partnerships with 	MySQL storage engine vendors such as Infobright and Kickfire. Sun 	also has a substantial partnership with Greenplum, and a 	Barneyesque* one with ParAccel.  And of course Sun has strong 	working relationships with major database vendors such as Oracle and 	Sybase. What&#8217;s more, on a case-by-case basis, Sun may cooperate in 	the field with yet other DBMS sellers.  E.g., I&#8217;ve confirmed at 	least one instance of a Sun sales rep recommending a Kognitio DBMS.</li>
<li><strong>IBM partners with outside DBMS 	vendors too.</strong> You&#8217;d think IBM&#8217;s gazillion DBMS product lines 	would be enough. But nooooo. I frequently hear rumblings of IBM&#8217;s 	hardware or services operations working with other DBMS products as 	well.  (This is, of course, actually to their credit.)</li>
<li><strong>Short-term, there probably 	would be little effect on partnerships.</strong> Greenplum runs on Sun&#8217;s 	Thumper/Thor line of boxes. DB2 doesn&#8217;t, and certainly isn&#8217;t 	optimized for same. In the short term, to sell Thors, Sun would 	presumably continue to sell Greenplum.</li>
<li><strong>Longer-term, there could be a 	DBMS rationalization.</strong> DB2, Informix, MySQL + storage engines, 	and big independent vendors such as Oracle and Sybase would surely 	always get attention.  That&#8217;s a lot. There might not be room for 	much mind share for many database products and vendors beyond that 	list.</li>
</ul>
<p style="margin-bottom: 0in;"><em>*A Barney partnership is one in which two or more vendors get on stage and do a song and dance about how much they love each other, with little substance beyond that. </em></p>
<p style="margin-bottom: 0in;"><em><strong>Related links</strong></em></p>
<ul>
<li>Larry Dignan thinks <a href="http://blogs.zdnet.com/BTL/?p=14817">the IBM/Sun 	deal is sensible and ripe to happen</a>.</li>
<li>Dana Gardner thinks <a href="http://blogs.zdnet.com/Gardner/?p=2857">otherwise</a>.</li>
<li>Matt Asay seems to agree that <a href="http://news.cnet.com/8301-13505_3-10198900-16.html">IBM 	understands the open source business</a>.</li>
<li>Before IBM acquired it, <a href="http://www.dbms2.com/2006/04/26/solidmysql-fit/">solidDB 	was scheduled to provide a serious MySQL transaction processing 	engine</a>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/03/18/database-implications-if-ibm-acquires-sun/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>One vendor&#8217;s trash is another&#8217;s treasure</title>
		<link>http://www.dbms2.com/2009/02/02/one-vendors-trash-is-anothers-treasure/</link>
		<comments>http://www.dbms2.com/2009/02/02/one-vendors-trash-is-anothers-treasure/#comments</comments>
		<pubDate>Mon, 02 Feb 2009 07:05:44 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Kognitio]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=676</guid>
		<description><![CDATA[A few months ago, CEO Mayank Bawa of Aster Data commented to me on his surprise at how &#8220;profound&#8221; the relationship was between design choices in one aspect of a data warehouse DBMS and choices in other parts. The word choice in that was all Mayank, but the underlying thought is one I&#8217;ve long shared, [...]]]></description>
			<content:encoded><![CDATA[<p>A few months ago, CEO Mayank Bawa of Aster Data commented to me on his surprise at how &#8220;profound&#8221; the relationship was between design choices in one aspect of a data warehouse DBMS and choices in other parts. The word choice in that was all Mayank, but the underlying thought is one I&#8217;ve long shared, and that I&#8217;m certain architects of many analytic DBMS share as well.</p>
<p style="margin-bottom: 0in; font-style: normal;">For that matter, the observation is no doubt true in many other product categories as well.  But in the analytic database management arena, where there are literally 10-20+ competitors with different, non-stupid approaches, it seems most particularly valid.  Here are some examples of what I mean.<span id="more-676"></span></p>
<p style="margin-bottom: 0in; font-style: normal;"><strong>Hash <span style="text-decoration: line-through;">partitioning</span> distribution. </strong><span>In shared-nothing or shared-not-very-much database architectures, multiple processors pull data off disk in parallel. Ideally, it will be the case that for each long-running query, the amount of data retrieved at each node is almost identical. That way, each node is done at the same time, with no wasteful waiting.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>Consequently, data should be distributed more or less randomly across the nodes. That can be done through &#8220;round-robin&#8221; allocation &#8212; each node takes a turn in strict order receiving new records or blocks. Or it can be done by </span><em><span>hashing</span></em><span> on a particular key &#8212; in essence, by assigning data to different disks depending on the value in some particular field or combination of fields.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>Hash <span style="text-decoration: line-through;">partitioning</span> distribution is a wonderful optimization.  For most large tables, there&#8217;s a obvious join key that will be relevant to a significant fraction of all long-running queries. Pre-hashing on that key saves a huge step in the execution of hash joins involving that key, and hence can provide a significant reduction in the total query processing workload.  Nor is this benefit confined to single-fact-table or single-primary-key schemas. When different kinds of data are stored in the same warehouse, each large fact table can be hash <span style="text-decoration: line-through;">partitioned</span> distributed on its own key.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>For almost all databases on almost all shared-nothing vendors&#8217; systems, hash <span style="text-decoration: line-through;">partitioning</span> distribution is the way to go.  Even so, a couple of products don&#8217;t even bother supporting it.  Oracle Exadata isn&#8217;t going to perform joins of that kind anyway until data is moved from the storage to the database tier, so hash <span style="text-decoration: line-through;">partitioning</span> distribution has no benefit in <a href="ven so, a couple of products don't even bother supporting it.  Oracle Exadata isn't going to perform joins of that kind anyway until data is moved from the storage to the database tier; hash partitioning in Exadata's multi-tier architecture.  Kogntio">Exadata&#8217;s multi-tier architecture</a>.  Kognitio, while not having such a clean proof of why hash <span style="text-decoration: line-through;">partitioning</span> distribution is utterly beside the point, thinks the costs of violating strict randomness outweigh the costs in its <a href="../2008/12/14/kognitio-and-wx-2-update/">silicon-centric approach</a>.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><strong>Indexing alternatives.</strong><span> More generally, analytic DBMS generally differ from OLTP DBMS in that they&#8217;re optimized to run more table scans and fewer updates and pinpoint queries. I&#8217;ve written about that many times, even coining the phrase <a href="../2007/03/26/index-light-mpp-data-warehouse-appliances/">index-light</a> to encapsulate the story.  The general idea is that if you&#8217;re retrieving a lot of rows per query, it becomes inefficient to keep spinning the disk to ensure you get only the rows you want.  You get a lot more bytes/second doing sequential than random reads, so if a sufficiently large fraction of the rows are ones you actually want, it&#8217;s better to just scan them all.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>If you&#8217;re going to follow an extreme form of that approach (e.g. Netezza, DATAllegro), you might as well have huge block sizes for your data (1megabyte+). If you think indexes of various kinds will actually be useful a reasonable fraction of the time, you might go with smaller sizes, such as 128K, which is what Teradata and HP (Neoview) favor.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>Meanwhile, columnar vendor Vertica recreates some of the benefits of indexes by storing the same column in multiple sort orders.  And that leads me to the next point.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><strong>High availability/failover alternatives.</strong><span> Most analytic DBMS mirror the data on-the-fly. But strategies differ. Some just rely on a storage vendor&#8217;s technology; others build in their own forms redundancy. </span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>Particularly interesting is Vertica&#8217;s approach. Not only does Vertica allow multiple copies of the data to each be used for querying; it encourages the storage of the same columns in different sort orders, with the optimizer obviously choosing to query the copy that&#8217;s sorted in the way most useful for a specific query&#8217;s execution plan.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>Redundancy and failover strategies are tightly tied to other administration issues too. For example, <a href="../2008/09/02/introduction-to-aster-data-and-ncluster/">Aster Data</a> and <a href="../2008/10/01/automatic-redistribution-of-data-warehouse-data/">other vendors</a> brag, with varying degrees of emphasis, that a new node can be added to a system, and the whole thing reconfigures itself automagically with zero down time. Similarly, different systems respond differently to node failure, in terms of metrics such as time to reestablish normal operation, performance hit (if any) after normal operation resumes, performance hit before normal operation resumes, and time window (if any) that redundancy is lost &#8212; so that a second failure would crash the whole system.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><strong>Bottom line:  There never will be an analytic DBMS that simultaneously  possesses <em>all</em> highly desirable architectural attributes for the product category.</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/02/02/one-vendors-trash-is-anothers-treasure/feed/</wfw:commentRss>
		<slash:comments>22</slash:comments>
		</item>
		<item>
		<title>Database SaaS gains a little visibility</title>
		<link>http://www.dbms2.com/2009/01/12/database-saas-gains-a-little-visibility/</link>
		<comments>http://www.dbms2.com/2009/01/12/database-saas-gains-a-little-visibility/#comments</comments>
		<pubDate>Mon, 12 Jan 2009 15:47:32 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[1010data]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Information Builders]]></category>
		<category><![CDATA[Kognitio]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=657</guid>
		<description><![CDATA[Way back in the 1970s, a huge fraction of analytic database management was done via timesharing, specifically in connection with the RAMIS and FOCUS business-intelligence-precursor fourth-generation languages.  (Both were written by Gerry Cohen, who built his company Information Builders around the latter one.)  The market for remoting-computing business intelligence has never wholly gone away since. [...]]]></description>
			<content:encoded><![CDATA[<p>Way back in the 1970s, a huge fraction of analytic database management was done via timesharing, specifically in connection with the RAMIS and FOCUS business-intelligence-precursor fourth-generation languages.  (Both were written by Gerry Cohen, who built his company Information Builders around the latter one.)  The market for remoting-computing business intelligence has never wholly gone away since. Indeed, it&#8217;s being revived now, via everything from the analytics part of Salesforce.com to the service category I call <a href="http://www.dbms2.com/2008/05/08/outsourced-data-marts/">data mart outsourcing</a>.</p>
<p>Less successful to date are efforts in the area of pure database software-as-a-service.  It seems that if somebody is going for SaaS anyway, they usually want a more complete, integrated offering. The most noteworthy exceptions I can think of to this general rule are Kognitio and Vertica, and they only have a handful of database SaaS customers each. To wit:<span id="more-657"></span></p>
<p>1.  <strong>Kognitio</strong> has built a lot of its marketing around database SaaS, which it calls DaaS for data-as-a-service, and runs primarily from its own facility.  On a small sample size, it reports a very roughly 50-50 split in new business activity (that&#8217;s customers/prospects, not revenue) between DaaS and conventionally licensed software.</p>
<p>2.  <strong>Vertica</strong> has expressed <a href="http://www.dbms2.com/2008/07/01/jerry-held-cloud-data-warehousing-business-intelligence/">high hopes</a> for its <a href="http://www.dbms2.com/2008/05/13/vertica-in-the-cloud/">Amazon cloud offering</a>. Actual production usage has so far only matched part of that, but it isn&#8217;t exactly zero either. Specifically, marketing chief Dave Menninger writes by email:</p>
<blockquote><p>In addition to approximately a dozen POCs running on the cloud at any point in time we have five customers using the cloud on a regular  basis. Three of these customers do short lived projects so they start up instances, run them for the duration of a project, and shut them  down. They are three different types of orgs: govt agency, pharma  consulting org and SaaS provider.</p>
<p>Two financial services companies use the cloud as spare resource/capacity.  When they need additional computing resource or capacity they will temporarily move some projects onto the cloud with the anticipation of moving them back off once the capacity constraint  is relieved (new hardware arrives, other projects or systems come to an end, etc.</p></blockquote>
<p>3.  <strong><a href="http://www.dbms2.com/2008/05/08/outsourced-data-marts/">1010data</a> </strong>offers its data warehousing product by remote service only.  However, <a href="http://www.dbms2.com/2009/01/12/gartners-2008-data-warehouse-database-management-system-magic-quadrant-is-out/">unlike Gartner</a> I&#8217;m not totally convinced 1010data should be regarded as comparable to DBMS vendors; perhaps it&#8217;s more like a SaaS business intelligence provider.</p>
<p><em>Edits:</em></p>
<ul>
<li><em>A comment below says Gerry Cohen wrote Nomad too.<br />
</em></li>
<li><em>Kognitio commented on Twitter that they actually use DaaS to mean Data warehouse As A Service.</em></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/01/12/database-saas-gains-a-little-visibility/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Kognitio and WX-2 update</title>
		<link>http://www.dbms2.com/2008/12/14/kognitio-and-wx-2-update/</link>
		<comments>http://www.dbms2.com/2008/12/14/kognitio-and-wx-2-update/#comments</comments>
		<pubDate>Sun, 14 Dec 2008 23:17:44 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Application areas]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Kognitio]]></category>
		<category><![CDATA[Scientific research]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=636</guid>
		<description><![CDATA[I went to Bracknell Wednesday to spend time with the Kognitio team. I think I came away with a better understanding of what the technology is all about, and why certain choices have been made. Like almost every other contender in the market,* Kognitio WX-2 queries disk-based data in the usual way. Even so, WX-2&#8242;s [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I went to Bracknell Wednesday to spend time with the Kognitio team.  I think I came away with a better understanding of what the technology is all about, and why certain choices have been made.</p>
<p style="margin-bottom: 0in;">Like almost every other contender in the market,* Kognitio WX-2 queries disk-based data in the usual way. Even so, WX-2&#8242;s design is very RAM-centric.  Data gets on and off disk in mind-numbingly simple ways – table scans only, round-robin partitioning only (as opposed to the more common hash), and no compression.  However, once the data is in RAM, WX-2 gets to work, happily redistributing as seems optimal, with little concern about which node retrieved the data in the first place. (I must confess that I don&#8217;t yet understand why this strategy doesn&#8217;t create ridiculous network bottlenecks.)  How serious is Kognitio about RAM? Well, they believe they&#8217;re in the process of selling a system that will include 40 terabytes of the stuff.  Apparently, the total hardware cost will be in the $4 million range.</p>
<p style="margin-bottom: 0in;"><em>*Exasol is the big exception.  They basically use disk as a source from which to instantiate in-memory databases.</em></p>
<p style="margin-bottom: 0in;">Other technical highlights of the Kognitio WX-2 story include:<span id="more-636"></span></p>
<ul>
<li><strong>WX-2 is designed for 	shared-nothing MPP. </strong> But like most other shared-nothing vendors, 	Kognitio often winds up supporting SAN-in-a-box disk arrays.</li>
<li>WX-2 is fairly silicon-heavy.  In 	a typical installation, <strong>8-core nodes will each manage 140-300 	gigabytes of disk. </strong> I get the impression WX-2 is more CPU- than disk-bound, which may be why Kognitio has little interest in disk-based data compression.<strong><br />
</strong></li>
<li><strong>WX-2 has complete equality 	among nodes;</strong> there is no head/queen. For example, any node can 	receive, optimize, or compile a query.</li>
<li><strong>WX-2 compiles queries into 	low-level code.</strong> Roger says this reduces code path length by a 	factor of 10.</li>
<li><strong>WX-2&#8242;s optimizer is aware of 	what data is in RAM.</strong> A WX-2 DBA can deliberately replicate part 	or all of the database to RAM, in a way that the optimizer is aware 	of. <strong>This is not an ordinary cache,</strong> although I forgot to ask 	whether there&#8217;s also an ordinary cache in addition..  Roger says 	that most WX-2 clients use this capability.</li>
<li>Typical WX-2 installations have a 	little less than <strong>one data storage process per disk</strong> assigned 	to a node.  Those disks that set aside a little space for the actual 	software get mirrored in their entirety, and the default is one 	process for that mirrored pair and one process for each other disk.</li>
<li>Typical WX-2 installations have 	<strong>one query execution process per core.</strong></li>
<li>WX-2 has its own RAID-like scheme, 	rather than relying on RAID from storage providers.</li>
<li>For those who care about such 	things, Kognitio claims WX-2 has <strong>linear scalability.</strong></li>
</ul>
<p style="margin-bottom: 0in;">Non-technical highlights include:</p>
<ul>
<li><strong>Kognitio 	is privately held.</strong> The investors at this point seem mainly to be 	two individuals, one of whom is Geoff Squire of Oracle fame.</li>
<li>Kognitio has 	<strong>$16 million in revenue. </strong> Almost $10 million of that is in the 	WX-2 business.</li>
<li>Kognitio 	names <strong>14 customers for WX-2,</strong> all of whom are references.</li>
<li>Long little 	more a UK national champion, Kognitio now has four WX-2 customers in 	the US market.</li>
<li>Current 	business activity is about 50-50 license and SaaS (which Kognitio 	calls DaaS for Data As A Service).</li>
<li>Installed 	WX-2 customers top out around 10 terabytes of user data. But a 50 	terabyte deal has been sold, and a 100 terabyte deal looks really 	good in the pipeline.</li>
<li>On the 	strength of two academic customers, genetic research is a bit of a 	focus vertical for Kognitio right now.  Or, since Kognitio also has 	a university astronomy deal, we could say science is a focus market 	overall.  (Their slides also mention oil &amp; gas.)</li>
<li><span>Most 	other WX-2 users fall into the usual verticals – telecom, analytic 	outsourcing, media/advertising analysis, retailing, etc.</span></li>
</ul>
<ul>
<p style="margin-bottom: 0in;">
</ul>
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/12/14/kognitio-and-wx-2-update/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Big scientific databases need to be stored somehow</title>
		<link>http://www.dbms2.com/2008/11/07/big-scientific-databases-need-to-be-stored-somehow/</link>
		<comments>http://www.dbms2.com/2008/11/07/big-scientific-databases-need-to-be-stored-somehow/#comments</comments>
		<pubDate>Fri, 07 Nov 2008 18:36:21 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data types]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Kognitio]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Scientific research]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=619</guid>
		<description><![CDATA[A year ago, Mike Stonebraker observed that conventional DBMS don&#8217;t necessarily do a great job on scientific data, and further pointed out that different kinds of science might call for different data access methods. Even so, some of the largest databases around are scientific ones, and they have to be managed somehow. For example: Microsoft [...]]]></description>
			<content:encoded><![CDATA[<p>A year ago, Mike Stonebraker observed that conventional DBMS don&#8217;t necessarily do a great job on scientific data, and further pointed out that <a href="http://www.databasecolumn.com/2007/11/databases-for-big-science.html">different kinds of science might call for different data access methods</a>.   Even so, some of the largest databases around are scientific ones, and they have to be managed somehow.  For example:</p>
<ul>
<li>Microsoft just put out an <a href="http://www.microsoft.com/presspass/press/2008/nov08/11-06AlzHeavensPR.mspx">overwrought press release</a>.  The substance seems to be that Pan-STARRS &#8212; a Jim Gray legacy also discussed in <a href="http://www.computerworld.com/action/article.do?command=printArticleBasic&amp;articleId=9112018">an August, 2008 <em>Computerworld</em> article</a> &#8212; is adding 1.4 terabytes of image data per night, and one not so new database adds 15 terabytes per year of some kind of computer simulation output used to analyze protein folding.  Both run on SQL Server, of course.</li>
<li>Kognitio has an astronomical database too, at <a href="http://kognitio.com/news/pressreleases/index.php?id=45">Cambridge University</a>, adding 1/2 a terabyte of data per night.</li>
<li>Oracle is used for a McGill University proteonomics database called <a href="http://www.genomequebecplatforms.com/mcgill/services/proteomics/bioinfo.aspx">CellMapBase</a>.  A figure of 50 terabytes of &#8220;mass storage&#8221; is included, which doesn&#8217;t include tape backup and so on.</li>
<li>The Large Hadron Collider, once it actually starts functioning, is projected to generate <a href="http://lcg.web.cern.ch/LCG/">15 petabytes of data</a> annually, which will be initially stored on tape and then distributed to various computing centers around the world.</li>
<li>Netezza is proud of its ability to serve images and the like quickly, although off the top of my head I&#8217;m not thinking of a major customer it has in that area.  (But then, if you just sell software, your academic discount can approach 100%; but if like Netezza you have an actual cost of goods sold, that&#8217;s not as appealing an option.)</li>
</ul>
<p>Long-term, I imagine that the most suitable DBMS for these purposes will be MPP systems with strong datatype extensibility &#8212; e.g., DB2, PostgreSQL-based Greenplum, PostgreSQL-based Aster nCluster, or maybe Oracle.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/11/07/big-scientific-databases-need-to-be-stored-somehow/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Netezza overseas</title>
		<link>http://www.dbms2.com/2008/09/17/netezza-overseas/</link>
		<comments>http://www.dbms2.com/2008/09/17/netezza-overseas/#comments</comments>
		<pubDate>Wed, 17 Sep 2008 07:38:23 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Kognitio]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=553</guid>
		<description><![CDATA[22% of Netezza&#8217;s revenue comes from outside the US, at least if we use last quarter&#8217;s figures as a guide.  At first blush, that doesn&#8217;t sound like much.  Indeed, percentage-wise it surely lags behind Teradata, Greenplum (which has sold a lot in Asia/Pacific under Netezza&#8217;s former head of that region), and a few smaller competitors [...]]]></description>
			<content:encoded><![CDATA[<p>22% of Netezza&#8217;s revenue comes from outside the US, at least if we use <a href="http://www.dbms2.com/2008/09/12/some-netezza-customer-metrics/">last quarter&#8217;s figures</a> as a guide.  At first blush, that doesn&#8217;t sound like much.  Indeed, percentage-wise it surely lags behind Teradata, Greenplum (which has sold a lot in Asia/Pacific under Netezza&#8217;s former head of that region), and a few smaller competitors headquartered outside the US.  But a few conversations I had today suggest a rosier view.  <span id="more-553"></span></p>
<p>1.  Dave Shuttleworth of <a href="http://www.edge-a.co.uk">Edge Associates</a>, a UK consultancy, told me that his firm had done a number of installations in the UK over the past 2 1/2 years.  He believes Kognitio has a few UK sales in that time period, and Teradata &#8212; at least that he&#8217;s aware of &#8212; have had no new UK customers in that time frame at all.   (This dovetails with other indications I&#8217;d had that Netezza is strong in the UK.)</p>
<p>2.  Giovanni Faccioll of <a href="http://www.icare.it/english/english.htm">ICare</a>, an Italian Netezza partner, said that Netezza had gotten 4 customers in Italy in a relativity short period of time.  By way of contrast, he thinks Teradata only has 15 or so Italian customers, despite many years of trying.</p>
<p><em>Note:  I haven&#8217;t run either of these data points by Teradata, which surely has different information about those markets.</em></p>
<p>Netezza&#8217;s own list of international offices is <a href="http://netezza.com/company/locations.aspx">here</a>.  The Germanic operation is just getting set up; ditto South Korea and India (the latter is not even on that list yet).  The first whole-rack sale in Japan was just two quarters ago, with the first two-rack one coming last quarter.  There also are 5 Netezza systems installed in Slovenia, of all places, at a total of 2 or 3 customers.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/09/17/netezza-overseas/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Positioning the data warehouse appliances and specialty DBMS</title>
		<link>http://www.dbms2.com/2008/04/05/positioning-the-data-warehouse-appliances-and-specialty-dbms/</link>
		<comments>http://www.dbms2.com/2008/04/05/positioning-the-data-warehouse-appliances-and-specialty-dbms/#comments</comments>
		<pubDate>Sun, 06 Apr 2008 02:10:07 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[DATAllegro]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Dataupia]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Kognitio]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Relational database management systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/2008/04/05/positioning-the-data-warehouse-appliances-and-specialty-dbms/</guid>
		<description><![CDATA[There now are four hardware vendors that each offer or seem about to announce two different tiers of data warehouse appliances: Sun, HP, EMC, and Teradata. Specifically: Sun partners with both Greenplum and ParAccel. HP sells Neoview, and also is partnered with Vertica. EMC (together with Dell in North America and Bull in Europe) sells [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in">There now are four hardware vendors that each offer or seem about to announce two different tiers of data warehouse appliances:  Sun, HP, EMC, and Teradata.  Specifically:</p>
<ul>
<li>
<p style="margin-bottom: 0in">Sun partners with both Greenplum 	and ParAccel.</p>
</li>
<li>
<p style="margin-bottom: 0in">HP sells Neoview, and also is 	<a href="http://www.dbms2.com/2007/11/07/vertica-hp-appliance-and-customers/">partnered with Vertica</a>.</p>
</li>
<li>
<p style="margin-bottom: 0in">EMC (together with Dell in North 	America and Bull in Europe) sells DATAllegro. Now EMC is also 	entering a <a href="http://www.dbms2.com/2008/04/05/emc-is-partnering-with-paraccel/">partnership with ParAccel</a>.</p>
</li>
<li>
<p style="margin-bottom: 0in">Teradata is pretty far down the 	road toward releasing <span>a</span><a href="http://www.dbms2.com/2008/01/23/is-teradata-bringing-out-a-low-end-data-warehouse-appliance/"><strong> </strong>low-end product</a>.</p>
</li>
</ul>
<p><span id="more-397"></span></p>
<p>In addition, multiple hardware vendors have “reference architecture” technical arrangements with Oracle, to try to capture some of the benefits of appliances.  And IBM is constantly in partnership discussions with data warehouse specialists, notwithstanding having multiple data warehouse offerings of its own.</p>
<p>Positioning of these various offerings is confused.  Part of the reason is the large vendors&#8217; postures “We&#8217;re big and trustworthy, and those little upstart vendors aren&#8217;t – until the moment we partner with one of them.” Part of the reason is the small vendors&#8217; stances of “We can do all things for all people – and by the way, 9 of the 14 customers we&#8217;ve ever had are all doing pretty much the same thing.”  And part of the reason is just an industry penchant for secrecy.</p>
<p>To a first approximation, I think there are two sensible ways to define the tiers.  In each case, we&#8217;re talking about what kinds of databases the various products are suited for.</p>
<ul>
<li>
<p style="margin-bottom: 0in; font-style: normal">Criterion S (for 	“Size”).  “Bigger than Oracle can handle” vs. “Small 	enough that Oracle can handle it” (but that <a href="http://www.dbms2.com/2008/03/14/data-warehousing-with-paper-clips-and-duct-tape/">depends on what the 	definition of “handle” is</a>).</p>
</li>
<li>
<p style="margin-bottom: 0in; font-style: normal">Criterion U (for “Usage”). 	  “Full enterprise data warehouse” vs. “big honking data 	mart”.</p>
</li>
</ul>
<p style="margin-bottom: 0in; font-style: normal">But those are very different classification rules – many products that might be upper-tier by Criterion S are lower-tier by Criterion U, and vice-versa.  For example:</p>
<ul>
<li>
<p style="margin-bottom: 0in; font-style: normal"><strong>Teradata&#8217;s</strong> current products are at the upper end by either criterion.  Even so, 	a significant fraction of older Teradata installations are below 5 	terabytes or even 1 terabyte in size.</p>
</li>
<li>
<p style="margin-bottom: 0in; font-style: normal"><span>More 	generally,</span><strong> Teradata </strong><span>emphasizes 	Criterion U.  Hence any future low-end products will surely be 	positioned as lower-tier by that criterion.  Beyond that, I wouldn&#8217;t 	be surprised if release is delayed, with the final version of those 	products being different than what previously leaked.  E.g., they 	might well be designed to compete with newer vendors that are 	upper-tier by Criterion S.</span></p>
</li>
<li>
<p style="margin-bottom: 0in; font-style: normal"><strong>Netezza</strong> has clearly made it into the upper tier by the Size criterion. Most 	of its installations are lower-tier by Criterion U, but it trumpets 	a few exceptions that it describes as “enterprise data warehouses” 	in success stories.</p>
</li>
<li>
<p style="margin-bottom: 0in; font-style: normal"><strong>DATAllegro</strong> is upper tier by Criterion S &#8212; more so than any other vendor except 	Teradata, in that there are at least two credible stories of 	DATAllegro warehouses at or above the quarter-petabyte mark.  Even 	so, DATAllegro is still mainly in the lower tier by Criterion U.  	I.e., the most natural use of DATAllegro technology is to build Very 	Big data marts.</p>
</li>
<li>
<p style="margin-bottom: 0in; font-style: normal"><strong>Vertica</strong> is a purely lower-tier Criterion U player, given its <a href="http://www.dbms2.com/2007/10/23/vertica-star-snowflake-schema/">focus on 	single fact table schemas</a>.  But it&#8217;s well on its way into the 	upper tier by Criterion S.</p>
</li>
<li>
<p style="margin-bottom: 0in; font-style: normal"><strong>Dataupia</strong> straddles the boundary of the tiers by Criterion S. That is, it&#8217;s 	meant to offload existing Oracle, SQL Server, or DB2 databases, or 	in some OEM cases to be a cheaper alternative.  That sounds 	lower-tier.  On the other hand, it has <a href="http://www.dbms2.com/2008/03/14/dataupia-catch-up/">one 120 terabyte reference</a>, which 	puts it squarely in in the upper tier. By Criterion U it&#8217;s pretty lower-tier.</p>
</li>
<li>
<p style="margin-bottom: 0in; font-style: normal"><strong>ParAccel</strong><span> seems lower-tier by either criterion. And I&#8217;m too burned out on 	ParAccel&#8217;s secrecy to probe hard for exceptions.</span></p>
</li>
<li>
<p style="margin-bottom: 0in; font-style: normal"><strong>Oracle, MS 	SQL Server, et al. </strong><span>are – 	pretty much by definition – lower-tier by Criterion S, but 	upper-tier by Criterion U.</span></p>
</li>
<li>
<p style="margin-bottom: 0in; font-style: normal"><strong>HP Neoview</strong> is obviously meant to get to the higher end by both criteria. But 	like most specialty products, right now it&#8217;s further along by the 	Size criterion than the Usage one.  Even so, it seems no further 	along by Criterion S than partner HP&#8217;s partner Vertica is.</p>
</li>
<li>
<p style="margin-bottom: 0in; font-style: normal"><strong>Greenplum</strong><span> has clearly gotten to the upper tier by the Size criterion.  But 	like most of the competition, it still seems to be in the lower tier 	by Usage.</span></p>
</li>
<li>
<p style="margin-bottom: 0in; font-style: normal"><strong>Infobright</strong><span> is in the lower tier by either criterion.  (They don&#8217;t even have an MPP offering yet.) </span></p>
</li>
<li>
<p style="margin-bottom: 0in; font-style: normal"><strong>Kognitio 	KX2</strong> is in the lower tier by either criterion.  However, Kognitio	aspires to move up when measured by Usage.</p>
</li>
<li>
<p style="margin-bottom: 0in; font-style: normal">The last time 	I looked, <strong>Sybase IQ</strong> was lower tier by either criterion.</p>
</li>
</ul>
<p><strong>Related links:</strong></p>
<ul>
<li><a href="http://www.dbms2.com/2007/12/14/data-warehouse-database-management/">A quick survey of data warehouse management technology</a></li>
<li><a href="http://www.dbms2.com/2007/12/03/data-warehouse-appliances-%e2%80%93-fact-and-fiction/">Data warehouse appliances &#8212; fact and fiction</a></li>
</ul>
<p style="margin-bottom: 0in; font-style: normal"><em><strong>Please <a href="http://www.monash.com/signup.html">subscribe</a> to our feed!</strong></em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/04/05/positioning-the-data-warehouse-appliances-and-specialty-dbms/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Why not database SaaS?</title>
		<link>http://www.dbms2.com/2008/01/31/why-not-database-saas/</link>
		<comments>http://www.dbms2.com/2008/01/31/why-not-database-saas/#comments</comments>
		<pubDate>Thu, 31 Jan 2008 14:26:06 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Kognitio]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[SaaS]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/2008/01/31/why-not-database-saas/</guid>
		<description><![CDATA[After a flurry of recent announcements of database SaaS (Software as a Service), eWeek has published a backlash article. The angle is that database SaaS is too expensive, because you can get decent DBMS for free and per-gig usage charges might be expensive for big databases. I think that&#8217;s missing the point. Most OLTP databases [...]]]></description>
			<content:encoded><![CDATA[<p>After a flurry of recent announcements of database SaaS (Software as a Service), <a href="http://www.eweek.com/c/a/Database/Pricing-Clouds-Enterprise-Adoption-of-SAAS-Databases/">eWeek</a> has published a backlash article.  The angle is that database SaaS is too expensive, because you can get decent DBMS for free and per-gig usage charges might be expensive for big databases.</p>
<p>I think that&#8217;s missing the point.  Most OLTP databases are pretty small.  Or, if they&#8217;re big, they get that way through a <em>lot</em> of transactions.  In the first case, hosted management is cheap.  In the second case, hosted management is taking care of a large burden for you.<span id="more-344"></span></p>
<p>Indeed, even data warehouse SaaS has a market, which should be a huge plausibility market for any other kind of database SaaS.  Part of it exists for regulatory reasons &#8212; there&#8217;s a whole lot of CRM analysis using credit data that maybe shouldn&#8217;t be imported onto the analyzer&#8217;s premises.  But Kognitio insists, apparently with a small amount of customer evidence backing it up, that there&#8217;s a general SaaS market for data warehousing as well.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/01/31/why-not-database-saas/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Kognitio WX2 overview</title>
		<link>http://www.dbms2.com/2008/01/26/kognitio-wx2/</link>
		<comments>http://www.dbms2.com/2008/01/26/kognitio-wx2/#comments</comments>
		<pubDate>Sat, 26 Jan 2008 06:09:16 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Kognitio]]></category>
		<category><![CDATA[Relational database management systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/2008/01/26/kognitio-wx2/</guid>
		<description><![CDATA[I had a call today with Kognitio execs Paul Groom and John Thompson. Hopefully I can now clear up some confusion that was created in this comment thread. (Most of what I wrote about Kognitio in October, 2006 still applies.) Here are some highlights. With one exception, Kognitio WX2 gets data on and off disk [...]]]></description>
			<content:encoded><![CDATA[<p>I had a call today with Kognitio execs Paul Groom and John Thompson.  Hopefully I can now clear up some confusion that was created in <a href="http://www.dbms2.com/2007/12/14/data-warehouse-database-management/">this</a> comment thread.  (Most of what I wrote about Kognitio in <a href="http://www.dbms2.com/2006/10/05/introduction-to-kognitio-wx-2/">October, 2006</a> still applies.)  Here are some highlights.<span id="more-337"></span></p>
<ul>
<li><strong>With one exception, Kognitio WX2 gets data on and off disk in the simplest possible way.</strong> Data goes on via round-robin partitioning.  It comes off via table scans (with no indexes at all).</li>
<li>The exception is that WX2 lets you put <strong>local bitmapped indexes</strong> on each disk, which Kognitio thinks are useful for cardinalities up into the 1000s.  These indexes record which data values appear anywhere in a block, so the system knows which blocks it needs to scan.  The benefit is similar to that of a Netezza zone map, but less, in that the Kognitio bitmap only works well for equalities and not ranges.</li>
<li><strong>The bitmaps are compressed.</strong> <strong>Otherwise, Kognitio uses no compression.</strong> They tried compression in the past and it didn&#8217;t go well.</li>
<li><strong>Where Kognitio gets fancy is in RAM.</strong> WX2 can have tables or views in memory that are kept synced up with disk.  (Or even run purely in memory, an extreme transience that seems useful mainly for ELT.)  These tables can be replicated, hashed, or whatever as makes sense.</li>
<li>The biggest (measured by data) WX2 customer has bought a license for<strong> 9 ½ terabytes of user data.</strong> Kognitio expresses optimism about competing in the 10s of terabytes range, and thinks its technology actually scales up to the 100s of terabytes.</li>
<li>Typical Kognitio WX2 configurations have <strong>a couple of hundred gigs of user data per CPU core. </strong>The biggest current system measured by nodes has 300 servers. (A past system had 900.)  If you multiply that out it would seem there&#8217;s an extra zero, so I presume the servers in question are particularly small and well-aged.</li>
<li>Kognitio stresses that <strong>WX2 runs on a broad variety of systems,</strong> just so long as the chips are x86 and the operating system is one of Kognitio&#8217;s preferred flavors of Linux.  Blades, SMP nodes  &#8212; WX2 doesn&#8217;t care.  The nodes can even have heterogeneous hardware, although that&#8217;s sub-optimal since system performance is gated by the least powerful node.  Kognitio seems to think that the cutoff for where bigger boxes are better than blades is probably in the 30-50 terabyte range, although as noted above that&#8217;s mainly a theoretical point right now.</li>
<li>Kognitio also stresses a <strong>diversity of deployment models.</strong> WX2 runs on server farms in the cloud.  You can install it on hardware of your choice.  Or Kognitio will build a turnkey system for you, out of the brand of hardware of your choice.</li>
<li><strong>WX2 running over solid-state disks is likely not in the cards.</strong> Due to Kognitio&#8217;s lack of compression, this would be a very expensive solution.</li>
<li>Kognitio is proud of its <strong>“plug-ins,”</strong> which amount to user-defined functions.  There&#8217;s one on the price list for telecom call repricing that sounds a lot like what one member of the <a href="http://www.dbms2.com/2007/09/27/the-netezza-developer-network/">Netezza Developer Network</a> is doing.  There&#8217;s also a set of half a dozen for astronomical research, which I&#8217;m hoping somebody from Kognitio will describe in an email I can post.</li>
<li><strong>Kognitio&#8217;s sales have been focused in the UK.</strong> There&#8217;s one US customer, whose name I forget.  John Thompson has been hired back into the company to expand US operations, but that seems to be waiting for a VC round to complete.  Large hardware companies and systems integrators seem to play a big part in Kognitio&#8217;s distribution strategy.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/01/26/kognitio-wx2/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>A quick survey of data warehouse management technology</title>
		<link>http://www.dbms2.com/2007/12/14/data-warehouse-database-management/</link>
		<comments>http://www.dbms2.com/2007/12/14/data-warehouse-database-management/#comments</comments>
		<pubDate>Fri, 14 Dec 2007 14:41:03 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Cognos]]></category>
		<category><![CDATA[DATAllegro]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Dataupia]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Kognitio]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[SAS Institute]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Relational database management systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/2007/12/14/data-warehouse-database-management/</guid>
		<description><![CDATA[There are at least 16 different vendors offering appliances and/or software that do database management primarily for analytic purposes.* That&#8217;s a lot to keep up with,. So I&#8217;ve thrown together a little overview of the analytic data management landscape, liberally salted with links to information about specific vendors, products, or technical issues. In some ways, [...]]]></description>
			<content:encoded><![CDATA[<p>There are at least 16 different vendors offering appliances and/or software that do database management primarily for analytic purposes.*  That&#8217;s a lot to keep up with,.  So I&#8217;ve thrown together a little overview of the analytic data management landscape, liberally salted with links to information about specific vendors, products, or technical issues.  In some ways, this is a companion piece to my prior post about <a href="http://www.dbms2.com/2007/12/03/data-warehouse-appliances-%e2%80%93-fact-and-fiction/">data warehouse appliance myths and realities</a>.<br />
<em><br />
*And that&#8217;s just the tabular/alphanumeric guys.  Add in text search and you run the total a lot higher.</em></p>
<p><strong>Numerous data warehouse specialists offer traditional row-based relational DBMS architectures, but optimize them for analytic workloads.</strong>  These include Teradata, Netezza, DATAllegro, <a href="http://www.dbms2.com/2007/03/13/greenplum-strategy/">Greenplum</a>, <a href="http://www.dbms2.com/2007/07/26/dataupia-low-end-appliance/">Dataupia</a>, and <a href="http://www.dbms2.com/2006/10/04/sas-intelligence-storage/">SAS</a>.  All of those except SAS are wholly or primarily vendors of MPP/shared-nothing data warehouse appliances.  EDIT:  See the comment thread for a correction re <a href="http://www.dbms2.com/2006/10/05/introduction-to-kognitio-wx-2/">Kognitio</a>.</p>
<p><strong>Numerous data warehouse specialists offer column-based relational DBMS architectures.  </strong>These include Sybase (with the Sybase IQ product, originally from Expressway), Vertica, <a href="http://www.dbms2.com/2007/10/29/paraccel-opens-the-kimono-slightly/">ParAccel</a>, <a href="http://www.dbms2.com/2007/10/22/infobright-brighthouse-mysql/">Infobright</a>, <del datetime="2007-12-14T16:35:05+00:00"><a href="http://www.dbms2.com/2006/10/05/introduction-to-kognitio-wx-2/">Kognitio</a> (formerly White Cross),</del> and Sand. <span id="more-301"></span> Their products are generally available in software-only formats, although Vertica and ParAccel package their offerings as appliances too. </p>
<p><strong>There are some array-based MOLAP (Multidimensional OnLine Analytical Processing) systems left.</strong>   But the major ones are all now at Oracle, Microsoft, and IBM.  Essbase wound up at Oracle, via the <a href="http://www.dbms2.com/2007/03/01/how-hyperion-will-change-oracle/">Hyperion acquisition</a>. Express went to Oracle long ago, and got tightly integrated into the Oracle DBMS.  Microsoft Analysis Services contains a MOLAP engine federated to Microsoft SQL Server.  <a href="http://www.dbms2.com/2007/09/06/applix-%e2%80%93-three-huge-opportunities-cognos-will-probably-ignore/">Applix</a>&#8216;s memory-centric TM1 went to Cognos, which had a couple of other MOLAP engines as well; Cognos is being bought by IBM.</p>
<p><strong>There aren&#8217;t any star-schema specialists of note left.</strong>  Most of them – actually just two, namely Red Brick and Stanford &#8212; merged into Informix a decade ago.  Informix was later bought (in two stages) by IBM. Star schemas are now just a feature of general-purpose systems.</p>
<p>Of course, <strong>every general-purpose relational database management system can be used for a lot of analytic purposes. </strong> That&#8217;s the whole reason Codd introduced the relational model.  What&#8217;s more, the leading SMP/shared-everything DBMS – Oracle, DB2 mainframe, and to a lesser extent Microsoft SQL Server – can be used even for very large databases, if you partition carefully and write your SQL code accordingly.</p>
<p>That&#8217;s 14 vendors already, without mentioning Calpont (hasn&#8217;t briefed me recently enough), HP (ditto, and partly <a href="http://www.dbms2.com/2007/11/07/vertica-hp-appliance-and-customers/">working through Vertica</a>), Sun (working through Greenplum and ParAccel), <a href="http://www.texttechnologies.com/2007/12/12/attivio-tries-to-do-it-all/">Attivio</a>, the memory-centric engines of BI vendors such as <a href="http://www.dbms2.com/2006/08/10/qlik-view-%e2%80%93-a-leader-in-memory-centric-bi/">QlikTech</a> and <a href="http://www.dbms2.com/2006/09/20/saps-bi-accelerator/">SAP</a> (not exactly database management), or the complex event/stream processing vendors such as <a href="http://www.dbms2.com/2007/08/10/coral8-versus-streambase/">Coral8, StreamBase</a>, or <a href="http://www.dbms2.com/2007/08/03/a-deeper-dive-into-apama/">Progress Apama</a> (ditto).  Methinks there&#8217;s some consolidation ahead.</p>
<p><strong>Yet more links:</strong></p>
<ul>
<li><a href="http://www.dbms2.com/2007/03/06/why-oracle-and-microsoft-will-lose-in-vldb-data-warehousing/">Why Oracle and Microsoft are losing in VLDB data warehousing</a></li>
<li><a href="http://www.dbms2.com/2007/10/12/three-ways-oracle-and-microsoft-could-go-mpp/">Three ways Oracle and Microsoft could catch up in MPP data warehousing</a></li>
<li><a href="http://www.dbms2.com/2007/10/05/the-four-horsemen-of-data-warehousing/">IBM is oddly weak in the data warehouse market</a></li>
<li><a href="http://www.dbms2.com/2007/10/09/marketing-versus-reality-on-the-one-petabyte-barrier/">Some very big Teradata sites</a></li>
<li>Extensive and overlapping coverage of <a href="http://www.dbms2.com/category/products-and-vendors/netezza/">Netezza</a>, <a href="http://www.dbms2.com/category/products-and-vendors/vertica-systems/">Vertica</a>,  <a href="http://www.dbms2.com/category/database-theory-practice/database-compression/">database compression</a>, and <a href="http://www.dbms2.com/category/database-theory-practice/columnar-database-management/">column-oriented database architectures</a>.</li>
<li>DATAllegro as an exemplar of <a href="http://www.dbms2.com/2007/05/10/another-short-white-paper-on-mpp-data-warehouse-appliances/">non-proprietary index-light MPP data warehouse appliances</a></li>
<li>An <em>old</em> article on <a href="http://www.monash.com/oracleOLAP.html">Oracle&#8217;s integration of Express</a>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2007/12/14/data-warehouse-database-management/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
	</channel>
</rss>

