<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS2 -- DataBase Management System Services &#187; Vertica Systems</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/vertica-systems/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Sun, 14 Mar 2010 23:24:45 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>February 2010 data warehouse DBMS news roundup</title>
		<link>http://www.dbms2.com/2010/02/22/data-warehouse-dbms-news-roundup/</link>
		<comments>http://www.dbms2.com/2010/02/22/data-warehouse-dbms-news-roundup/#comments</comments>
		<pubDate>Mon, 22 Feb 2010 08:30:23 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1628</guid>
		<description><![CDATA[February is usually a busy month for data warehouse DBMS product releases, product announcements, and other real or contrived data warehouse DBMS news, and it can get pretty confusing trying to keep those categories of “news” apart.*  This year is no exception, although several vendors – including Teradata and Netezza – are taking “rolling thunder” [...]]]></description>
			<content:encoded><![CDATA[<p>February is usually a busy month for data warehouse DBMS product releases, product announcements, and other real or contrived data warehouse DBMS news, and it can get pretty confusing trying to keep those categories of “news” apart.*  This year is no exception, although several vendors – including Teradata and Netezza – are taking “rolling thunder” approaches, doing some of their announcements this month while holding others back for March or April.</p>
<p><em>*I probably have it worse than most people in that regard, because my clients run tentative feature lists and announcement schedules by me well in advance, which may get changed multiple times before the final dates roll around. I also occasionally miss some detail, if it wasn&#8217;t in a pre-briefing but gets added at the end.</em></p>
<p>Anyhow, the three big themes of this month&#8217;s announcements are probably:</p>
<ul>
<li><strong>Integrating different kinds of analytic processing into databases and DBMS. </strong></li>
<li><strong>Taking advantage of hardware advances.</strong></li>
<li><strong>Playing catchup</strong> in areas where small vendors&#8217; products weren&#8217;t mature yet.</li>
</ul>
<p><span id="more-1628"></span>For example, the three biggest data warehouse DBMS product announcements this month are probably:</p>
<ul>
<li><strong>Aster Data nCluster 4.5.</strong> Much like Aster&#8217;s prior release &#8212; <a href="../../../../../2009/10/30/aster-data-application-server-ncluster/">Aster Data nCluster 4.0</a> – <a href="http://www.dbms2.com/2010/02/22/aster-data-ncluster-4-5/" >Aster Data nCluster 4.5</a> has a major focus on integrating analytics and database processing. This time, the emphasis is on application development tools and pre-built analytic packages. In addition, Aster&#8217;s management tool GUIs have been upgraded, building on catch-up functionality in the Aster Data nCluster 4.0.</li>
<li><strong>Netezza&#8217;s “i” add-on to its existing TwinFin products.</strong> With <a href="../../../../../2010/02/22/netezza-twinfin/">Netezza TwinFin(i)</a>, Netezza becomes the second MPP RDBMS vendor with a comprehensive “Big Data Analytic Platform” kind of strategy. (Netezza would surely argue that it was the first, but that depends on how seriously one took <a href="../../../../../2007/09/27/the-netezza-developer-network/">Netezza&#8217;s prior attempt</a>.) Many of the details are different from Aster&#8217;s, of course, but the general philosophy is similar. So far, Netezza has announced one interesting proprietary library of analytic packages (for linear/matrix algebra), plus the port of 4,000 or so functions in open source libraries.</li>
<li><strong>Vertica 4.0.</strong> Vertica has had a highly innovative columnar DBMS architecture from the getgo, but at the cost of some restrictions or awkwardness in the relationship between data layout and SQL processing. Vertica says that <a href="../../../../../2010/02/22/vertica-4/">Vertica 4.0</a> fixes all that. In addition, it has some analytic processing enhancements, especially in the time series area, where Vertica doesn&#8217;t vigorously dispute that Sybase IQ previously had an advantage.</li>
</ul>
<p>In addition,</p>
<ul>
<li><strong>Teradata is announcing its Data Warehouse Appliance 2580, the successor to the Teradata 2550.</strong> This is purely a hardware refresh; Teradata&#8217;s hardware and software upgrades are not generally synced. The Teradata 2580 upgrades CPUs from Harpertown to Nehalem, includes 3X the RAM of its predecessor, and offers an option for 1 TB disks (thus lowering the bottom price/TB a lot, to $31K list).</li>
<li>Aster, Vertica, and ParAccel have all called attention to the fact that, if solid-state drives have interfaces like those of disk drives, and if a DBMS supports disk drives, then a DBMS also supports solid-state drives as well. At least Aster and ParAccel have signaled that they have at least one customer or prospect each interested in Fusion I/O&#8217;s solid-state technology, especially in the retail sector. This is basically a hardware matter as well, and a big deal only for those who were somehow unaware of <a href="../../../../../2010/01/31/flash-pcmsolid-state-memory-disk/">the impending dominance of solid-state memory technology</a>.</li>
<li>Sybase announced its <a href="../../../../../2010/02/05/sybase-aleri-rap/">Aleri</a> acquisition earlier this month.</li>
<li>Various vendors have bragged about various rankings, awards, or benchmarks, or – sometimes less tediously &#8212; about last year&#8217;s sales results.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/22/data-warehouse-dbms-news-roundup/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Vertica 4.0</title>
		<link>http://www.dbms2.com/2010/02/22/vertica-4/</link>
		<comments>http://www.dbms2.com/2010/02/22/vertica-4/#comments</comments>
		<pubDate>Mon, 22 Feb 2010 08:19:00 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1607</guid>
		<description><![CDATA[Vertica briefed me last month on its forthcoming Vertica 4.0 release. I think it&#8217;s fair to say that Vertica 4.0 is mainly a cleanup/catchup release, washing away some of the tradeoffs Vertica had previously made in support of its innovative DBMS architecture.
For starters, there&#8217;s a lot of new analytic functionality. This isn&#8217;t Aster/Netezza-style ambitious. Rather, [...]]]></description>
			<content:encoded><![CDATA[<p>Vertica briefed me last month on its forthcoming Vertica 4.0 release. I think it&#8217;s fair to say that Vertica 4.0 is mainly a cleanup/catchup release, washing away some of the tradeoffs Vertica had previously made in support of its innovative DBMS architecture.</p>
<p>For starters, there&#8217;s a lot of new analytic functionality. This isn&#8217;t Aster/Netezza-style ambitious. Rather, there&#8217;s a lot more SQL-99 functionality, plus some time series extensions of the sort that financial services firms – an important market for Vertica – need and love. Vertica did suggest a couple of these time series extensions are innovative, but I haven&#8217;t yet gotten detail about those.</p>
<p>Perhaps even more important, Vertica is cleaning up a lot of its previous SQL optimization and execution weirdnesses. In no particular order, I was told:<span id="more-1607"></span></p>
<ul>
<li>Vertica&#8217;s delete performance is up “literally” 30-100X, at least in the case of “large” deletes. Performance for “large” updates has been enhanced as well.</li>
<li>Vertica has finally cleaned up all vestiges of its prior <a href="http://www.dbms2.com/2007/10/23/vertica-star-snowflake-schema/" >bias to star schemas</a>. For example, Vertica concedes that its product previously would sometimes force a star execution plan that wasn&#8217;t really appropriate.</li>
<li>It is no longer the case that you need to define projections before you load a table into Vertica. This is now fully automatic.</li>
<li>Vertica 4.0 automatically redesigns the database when new nodes are added to the system.</li>
<li>When a database designer does hand-tune projections – and there&#8217;s no shame in this still being a possibility in Vertica 4.0 – that hand-tuning is now pulled back into the automatic generation/recommendation/whatever wizards for further projections. I.e., there&#8217;s a kind of DBA round-trip engineering going on.</li>
<li>Vertica used to require that tables being joined be identically “segmented” (I think this means distributed across joins). That is no longer the case in 4.0.</li>
<li>In connection with this new-found flexibility, Vertica now supports full outer joins directly, rather than requiring the left outer join/right outer join/UNION kluge.</li>
<li>The Vertica 4.0 optimizer is smarter than its predecessor about things like predicate pushdown into subqueries, or exploiting commonality between predicates and partition keys.</li>
<li>There&#8217;s a fundamental change that I don&#8217;t understand very well in the Vertica execution engine basic unit of work. It sounds as if in the past all the disk-based data containers the query needed got opened at once and read into memory, whether or not there was enough RAM and CPU cores to handle them, and this problem has now been fixed.</li>
<li>Vertica always seemed to say that you could query immediately on new data, because even if it hadn&#8217;t hit disk yet – the ROS (Read-Optimized Store) – it was available in memory – the WOS (Write-Optimized Store). And queries were in essence federated between the ROS and WOS. But apparently it&#8217;s a new feature in Vertica 4.0 that you can read totally fresh data without locking. I confess to not understanding this very well either. (It has something to do with what  Vertica calls “Epochs”.)</li>
<li>Temporary tables can now be created in Vertica on a local/session basis without any DDL. Make temporary tables easier and more performant is important for a variety of reasons:
<ul>
<li>Microstrategy, Company V* et al. use lots of temp tables. E.g,, Company V on Vertica has 3000 permanent tables and 5-7000 temporary ones.</li>
<li>Vertica rightly points out that temporary tables are also important for ELT (Extract/Load/Transform).</li>
<li>Vertica further says that single-node OEMs such as security appliance vendors use lots of temp tables.</li>
</ul>
</li>
</ul>
<p><em>*Company V = one of the more prominent vertical-market application providers.</em></p>
<p>In other Vertica highlights:</p>
<ul>
<li>It sounds as if 4.0 is the first Vertica release with what I would regard as serious workload management.</li>
<li>While Vertica has stored and retrieved Unicode since Vertica 3.5 or so, 4.0 will be the first Vertica release in which Unicode is sorted and collated properly.</li>
<li>Stored-procedure-like functionality is still a future for Vertica.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/22/vertica-4/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Intelligent Enterprise’s Editors’/Editor’s Choice list for 2010</title>
		<link>http://www.dbms2.com/2010/02/11/intelligent-enterprise-editors-choice-201/</link>
		<comments>http://www.dbms2.com/2010/02/11/intelligent-enterprise-editors-choice-201/#comments</comments>
		<pubDate>Thu, 11 Feb 2010 23:13:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Ingres]]></category>
		<category><![CDATA[Intersystems and Cache']]></category>
		<category><![CDATA[Jaspersoft]]></category>
		<category><![CDATA[Kalido]]></category>
		<category><![CDATA[Mark Logic]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Pentaho]]></category>
		<category><![CDATA[QlikTech and QlikView]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Tableau Software]]></category>
		<category><![CDATA[Talend]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1578</guid>
		<description><![CDATA[As he has before, Intelligent Enterprise Editor Doug Henschen

Personally selected annual lists of 12 &#8220;Most influential&#8221; companies and 36 &#8220;Companies to watch&#8221; in analytics- and database-related sectors.
Made it clear that these are his personal selections.
Nonetheless has called it an Editors&#8217; Choice list, rather than Editor&#8217;s Choice.  

(Actually, he&#8217;s really called it an &#8220;award.&#8221;)
People advising [...]]]></description>
			<content:encoded><![CDATA[<p>As he has <a href="http://www.dbms2.com/2009/01/12/intelligent-enterprises-editorseditors-choice-list/" >before</a>, <em>Intelligent Enterprise</em> Editor Doug Henschen</p>
<ul>
<li>Personally selected <a href="http://intelligent-enterprise.informationweek.com/showArticle.jhtml;jsessionid=IANLOXCT2244BQE1GHPCKH4ATMY32JVN?articleID=222900034&amp;pgno=1" onclick="javascript:pageTracker._trackPageview('/intelligent-enterprise.informationweek.com');">annual lists</a> of 12 &#8220;Most influential&#8221; companies and 36 &#8220;Companies to watch&#8221; in analytics- and database-related sectors.</li>
<li>Made it clear that these are his personal selections.</li>
<li>Nonetheless has called it an Editors&#8217; Choice list, rather than Editor&#8217;s Choice. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
</ul>
<p>(Actually, he&#8217;s really called it an &#8220;award.&#8221;)</p>
<p><span id="more-1578"></span>People advising Doug &#8212; who come to think of it actually are Contributing Editors to <em>Intelligent Enterprise</em> or something like that &#8212; included Cindi Howson, Seth Grimes, three others, and me.</p>
<p>And if past is prologue, I will now get a flood of PR emails calling my attention to this award that I already have both participated in and blogged about. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>As usual, the sense:nonsense ratio on these lists was pleasingly high. Analytic DBMS vendors cited included IBM, Microsoft, Netezza, Oracle, Sybase, and Teradata in the &#8220;Most influential&#8221; group, with Aster, Greenplum, HP, Infobright, and Vertica among the &#8220;To watch&#8221; crowd. It&#8217;s tough to argue with those selections, whose most questionable element is probably the not-ridiculous supposition that HP could do something interesting over the coming year. Cloudera and Intersystems also made the list, deservedly.</p>
<p>All three of QlikTech, Tableau, and TIBCO made the list, which is appropriate given the potential for and interest in interactive data exploration technology.  The BI majors, independent or otherwise, were all on as well. In text mining, Doug included Attensity and Clarabridge, which I think is exactly right. (Plus OpenCalais.)  Upon reflection, I probably should have nominated Mark Logic, even though most of its business is non-enterprise; but hey, nobody&#8217;s perfect, and the same goes for lists. Open source was well represented, with Apache, Actuate, Jaspersoft, Eclipse, Infobright, Nuxeo and R all being cited (but not Ingres or Pentaho). Kalido made the list, with my endorsement, their silly I-CASE like marketing messaging notwithstanding.</p>
<p>Speaking of imperfections &#8212; there only are a few category names, and so category assignments can be pretty bizarre. (In an ideal world, middleware wouldn&#8217;t be included under &#8220;enterprise applications&#8221;.) Greenplum hasn&#8217;t really &#8220;extended&#8221; its DBMS with a &#8220;cloud&#8221; option. As much as I&#8217;d like Netezza to be more influential than SAP, that&#8217;s probably not the best way to rank them. And there are a number of &#8220;This company is on a roll!&#8221; kinds of comments that I wouldn&#8217;t necessarily endorse.</p>
<p>But those are all nitpicks. On the whole, it&#8217;s another nice job.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/11/intelligent-enterprise-editors-choice-201/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Vertica slaughters Sybase in patent litigation</title>
		<link>http://www.dbms2.com/2010/01/15/vertica-sybase-ipatent-litigation/</link>
		<comments>http://www.dbms2.com/2010/01/15/vertica-sybase-ipatent-litigation/#comments</comments>
		<pubDate>Fri, 15 Jan 2010 13:07:32 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1409</guid>
		<description><![CDATA[Back in August, 2008, I pooh-poohed Sybase&#8217;s patent lawsuit against Vertica. Filed in the notoriously patent-holder-friendly East Texas courts, the suit basically claimed patent rights over the whole idea of a columnar RDBMS. It was pretty clear that this suit was meant to be a model for claims against other columnar RDBMS vendors as well, [...]]]></description>
			<content:encoded><![CDATA[<p>Back in August, 2008, <a href="http://www.dbms2.com/2008/08/14/patent-nonsense-in-the-data-warehouse-dbms-market/" >I pooh-poohed Sybase&#8217;s patent lawsuit against Vertica</a>. Filed in the notoriously patent-holder-friendly East Texas courts, the suit basically claimed patent rights over the whole idea of a columnar RDBMS. It was pretty clear that this suit was meant to be a model for claims against other columnar RDBMS vendors as well, should they ever achieve material marketplace success.</p>
<p>If a recent Vertica press release is to be believed, <a href="http://www.vertica.com/company/news/Vertica-prevails-in-Sybase-patent-lawsuit" onclick="javascript:pageTracker._trackPageview('/www.vertica.com');">Sybase got clobbered</a>. The meat is:</p>
<blockquote><p>&#8230;  Sybase has admitted that under the claim construction order issued by the Court on November 9, 2009, <em>&#8220;Vertica does not infringe Claims 1-15 of U.S. Patent No. 5,794,229.&#8221;</em> Sybase further acknowledged that because the Court ruled that all the remaining claims in the patent (claims 16-24) were invalid, <em>&#8220;Sybase cannot prevail on those claims.&#8221; </em></p></blockquote>
<p>For those counting along at home &#8212; the patent only has 24 claims in total.</p>
<p>I have no idea whether Sybase can still cobble together grounds for appeal, or claims under some other patent. But for now, this sounds like a total victory for Vertica.</p>
<p><em>Edit: I&#8217;ve now seen a PDF of a filing suggesting the grounds under which Sybase will appeal. Basically, it alleges that the judge erred in defining a &#8220;page&#8221; of data too narrowly. Note that if Sybase prevails on appeal on that point, Vertica has a bunch of other defenses that haven&#8217;t been litigated yet. It further seems that Sybase may have recently filed another patent case against Vertica, in a different venue, based on a different patent.</em></p>
<p>One annoying blog troll excepted, is anybody surprised at this outcome?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/01/15/vertica-sybase-ipatent-litigation/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>This and that</title>
		<link>http://www.dbms2.com/2009/12/29/this-and-that/</link>
		<comments>http://www.dbms2.com/2009/12/29/this-and-that/#comments</comments>
		<pubDate>Tue, 29 Dec 2009 09:14:46 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[Mark Logic]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Structured documents]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1348</guid>
		<description><![CDATA[I have various subjects backed up that I don&#8217;t really want to write about at traditional blog-post length.  Here are a few of them.
Vertica offers a post on its 3.5 release, with a riff on the popular theme &#8220;We&#8217;ve fixed some weaknesses in our prior versions that we didn&#8217;t previously say we had.&#8221; More important, [...]]]></description>
			<content:encoded><![CDATA[<p>I have various subjects backed up that I don&#8217;t really want to write about at traditional blog-post length.  Here are a few of them.<span id="more-1348"></span></p>
<p><strong>Vertica</strong> offers a post on<a href="http://databasecolumn.vertica.com/database-innovation/vertica-3-5-flexstoretm-the-next-generation-of-column-stores/" onclick="javascript:pageTracker._trackPageview('/databasecolumn.vertica.com');"> its 3.5 release</a>, with a riff on the popular theme &#8220;We&#8217;ve fixed some weaknesses in our prior versions that we didn&#8217;t previously say we had.&#8221; More important, Vertica is pretty clear on the virtues of its <a href="http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/" >hybrid columnar architecture</a>.</p>
<p>Speaking of which &#8212; <strong>Oracle is going true hybrid columnar</strong> as well. I don&#8217;t have details or timing, however.</p>
<p>Dave Kellogg of <strong>Mark Logic</strong> wrote in to amusedly point out <a href="http://www.oracle.com/technology/tech/xml/xmldb/Current/marklogicserver_4.1_v1.0.pdf" onclick="javascript:pageTracker._trackPageview('/www.oracle.com');" target="_blank"><span style="color: #0000ff;"> </span>Oracle&#8217;s anti-MarkLogic collateral.</a> The very first charge Oracle levies is that MarkLogic goes beyond the emerging XQuery standard to add additional functionality. Considering Oracle&#8217;s approach to SQL standards, I tend to share Dave&#8217;s amusement.</p>
<div><span style="font-family: Calibri,sans-serif; font-size: small;"> </span></div>
<p>Bill Conniff of <a href="http://www.xponentsoftware.com/" onclick="javascript:pageTracker._trackPageview('/www.xponentsoftware.com');">Xponent LLC</a> wrote in to tell of a vastly cheaper and less functional approach to <strong>XML management,</strong> apparently geared to looking at very large XML files one at a time.</p>
<p><strong>Cayuga</strong> is a Cornell research project in complex event processing (CEP). There&#8217;s a <a href="http://www.cs.cornell.edu/bigreddata/cayuga/" onclick="javascript:pageTracker._trackPageview('/www.cs.cornell.edu');">Cayuga academic home page</a>, a Sourceforge page for some <a href="http://sourceforge.net/projects/cayuga/" onclick="javascript:pageTracker._trackPageview('/sourceforge.net');">open source Cayuga CEP code</a>, and so on. Minsheng Hong, writing from a Vertica email address, tipped me off some months ago. The basic idea seems to be to do <em>lots</em> of queries very quickly, rather than a smaller number of queries over and over again. Whether this is an advance in anything but open-sourceness over Apama or Aleri I couldn&#8217;t say, but I do think it&#8217;s a different focus than that of StreamBase or pre-Aleri Coral8.</p>
<p>And finally, editor Doug Henschen listed his <a href="http://intelligent-enterprise.informationweek.com/blog/archives/2009/12/intelligent_ent_2.html;jsessionid=0YRB5UUISPBXLQE1GHRSKH4ATMY32JVN" onclick="javascript:pageTracker._trackPageview('/intelligent-enterprise.informationweek.com');">15 favorite <em>Intelligent Enterprise</em> blog posts of 2009</a> &#8212; four each by Seth Grimes and Doug himself, three by Cindi Howson, two by me,* and one each by Mark Smith and Neil Raden.</p>
<p><em>*Doug selects up to three posts a month from here to republish.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/12/29/this-and-that/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How 30+ enterprises are using Hadoop</title>
		<link>http://www.dbms2.com/2009/10/10/enterprises-using-hadoo/</link>
		<comments>http://www.dbms2.com/2009/10/10/enterprises-using-hadoo/#comments</comments>
		<pubDate>Sat, 10 Oct 2009 10:19:29 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Application areas]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Data types]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[Structured documents]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Text]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1073</guid>
		<description><![CDATA[MapReduce is definitely gaining traction, especially but by no means only in the form of Hadoop. In the aftermath of Hadoop World, Jeff Hammerbacher of Cloudera walked me quickly through 25 customers he pulled from Cloudera&#8217;s files. Facts and metrics ranged widely, of course:

Some are in heavy production with 	Hadoop, and closely engaged with Cloudera. [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">MapReduce is definitely gaining traction, especially but by no means only in the form of Hadoop. In the aftermath of <a href="http://www.dbms2.com/2009/10/01/mapreduce-tidbits/" >Hadoop World</a>, Jeff Hammerbacher of Cloudera walked me quickly through 25 customers he pulled from Cloudera&#8217;s files. Facts and metrics ranged widely, of course:</p>
<ul>
<li>Some are in heavy production with 	Hadoop, and closely engaged with Cloudera. Others are active Hadoop 	users but are very secretive. Yet others signed up for initial 	Hadoop training last week.</li>
<li>Some have Hadoop clusters in the 	thousands of nodes. Many have Hadoop clusters in the 50-100 node 	range. Others are just prototyping Hadoop use. And one seems to be 	&#8220;OEMing&#8221; a small Hadoop cluster in each piece of equipment 	sold.</li>
<li>Many export data from Hadoop to a 	relational DBMS; many others just leave it in HDFS (Hadoop 	Distributed File System), e.g. with <a href="http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/" >Hive</a> as the query 	language, or in exactly one case Jaql.</li>
<li>Some are household names, in web 	businesses or otherwise. Others seem to be pretty obscure.</li>
<li>Industries include financial 	services, telecom (Asia only, and quite new), bioinformatics (and 	other research), intelligence, and lots of web and/or 	advertising/media.</li>
<li>Application areas mentioned &#8212; and 	these overlap in some cases &#8212; include:
<ul>
<li>Log and/or clickstream analysis of 	various kinds</li>
<li>Marketing analytics</li>
<li>Machine learning and/or 	sophisticated data mining</li>
<li>Image processing</li>
<li>Processing of XML messages</li>
<li>Web crawling and/or text 	processing</li>
<li>General archiving, including of 	relational/tabular data, e.g. for compliance</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><span id="more-1073"></span>We went over this list so quickly that we didn&#8217;t go into much detail on any one user. But one example that stood out was of an ad serving firm that had an &#8220;aggregation pipeline&#8221; consisting of 70-80 MapReduce jobs.</p>
<p style="margin-bottom: 0in;">I also talked yesterday again w/ Omer Trajman of Vertica, who surprised me by indicating a high single-digit number of Vertica&#8217;s customers were in production with Hadoop &#8212; i.e., over 10% of Vertica&#8217;s production customers.  (Vertica recently made its 100th sale, and of course not all those buyers are in production yet.) <a href="http://www.dbms2.com/2009/08/04/verticas-version-of-mapreduce-integration/" >Vertica/Hadoop</a> usage seems to have started in Vertica&#8217;s financial services stronghold &#8212; specifically in financial trading &#8212; with web analytics and the like coming on afterwards. Based on current prototyping efforts, Omer expects bioinformatics to be the third production market for Vertica/Hadoop, with telecommunications coming in fourth.</p>
<p style="margin-bottom: 0in;">Unsurprisingly, the general Vertica/Hadoop usage model seems to be:</p>
<ul>
<li>Do something to the data in Hadoop</li>
<li>Dump it into Vertica to be queried</li>
</ul>
<p style="margin-bottom: 0in;">What I did find surprising is that the data often isn&#8217;t reduced by this analysis, but rather exploded in size.  E.g., a complete store of mortgage trading data might be a few terabytes in size, but Hadoop-based post processing can increase that by 1 or 2 orders of magnitude. (Analogies to the importance and magnitude of <em>&#8220;cooked&#8221; data</em> in scientific data processing come to mind.)</p>
<p style="margin-bottom: 0in;">And finally, I talked to Aster a few days ago about the usage of its nCluster/Hadoop connector. Aster characterized Aster/Hadoop users&#8217; Hadoop usage as being of the batch/ETL variety, which is the classic use case one concedes to Hadoop even if one believes that MapReduce should commonly be done right in the DBMS.</p>
<p style="margin-bottom: 0in;"><em><strong>Related link</strong></em></p>
<ul>
<li><a href="../2008/08/26/known-applications-of-mapreduce/">An 	August, 2008 round-up of MapReduce applications</a>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/10/enterprises-using-hadoo/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Oracle and Vertica on compression and other physical data layout features</title>
		<link>http://www.dbms2.com/2009/10/06/oracle-and-vertica-on-compression-and-other-physical-data-layout-features/</link>
		<comments>http://www.dbms2.com/2009/10/06/oracle-and-vertica-on-compression-and-other-physical-data-layout-features/#comments</comments>
		<pubDate>Tue, 06 Oct 2009 12:18:59 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1042</guid>
		<description><![CDATA[In my recent post on Exadata pricing, I highlighted the importance of Oracle&#8217;s compression figures to the discussion, and the uncertainty about same. This led to a Twitter discussion featuring Greg Rahn* of Oracle and Dave Menninger and Omer Trajman of Vertica.  I also followed up with Omer on the phone.
*Guys like Greg Rahn and [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">In my recent post on <a href="http://www.dbms2.com/2009/10/05/oracle-exadata-2-capacity-pricing/" >Exadata pricing</a>, I highlighted the importance of Oracle&#8217;s compression figures to the discussion, and the uncertainty about same. This led to a Twitter discussion featuring <a href="http://twitter.com/GregRahn" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Greg Rahn</a>* of Oracle and <a href="http://twitter.com/dmenninger" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Dave Menninger</a> and <a href="http://twitter.com/otrajman" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Omer Trajman</a> of Vertica.  I also followed up with Omer on the phone.<span id="more-1042"></span></p>
<p style="margin-bottom: 0in;"><em>*Guys like Greg Rahn and Kevin Closson are huge assets to Oracle, which is absurdly and self-defeatingly unhelpful through conventional public/analyst relations channels.<br />
</em>
</p>
<p style="margin-bottom: 0in; font-style: normal;"><a href="http://twitter.com/GregRahn/status/4611513531" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Six</a> <a href="http://twitter.com/GregRahn/status/4612142101" onclick="javascript:pageTracker._trackPageview('/twitter.com');">key</a> <a href="http://twitter.com/GregRahn/status/4612190133" onclick="javascript:pageTracker._trackPageview('/twitter.com');">tweets</a> <a href="http://twitter.com/GregRahn/status/4612253629" onclick="javascript:pageTracker._trackPageview('/twitter.com');">by</a> <a href="http://twitter.com/GregRahn/status/4612966887" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Greg</a> <a href="http://twitter.com/GregRahn/status/4613110620" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Rahn</a> said:</p>
<blockquote>
<p style="margin-bottom: 0in; font-style: normal;">I think the HCC 10x compression is a slideware (common) number. Personally I&#8217;ve seen it in the 12-17x range on customer data&#8230;</p>
<p style="margin-bottom: 0in; font-style: normal;">This was on a dimensional model. Can&#8217;t speak to the specific industry. I do believe Oracle is working on getting industry #s.</p>
<p style="margin-bottom: 0in; font-style: normal;">As far as I know, Exadata HCC uses a superset of compression algorithms that the commonly known column stores use&#8230;</p>
<p style="margin-bottom: 0in; font-style: normal;">&#8230;and it doesn&#8217;t require the compression type be in the DDL like Vertica or ParAccel. It figures out the best algo to apply.</p>
<p style="margin-bottom: 0in; font-style: normal;">The compression number I quoted is sizeof(uncompressed)/sizeof(hcc compressed). No indexes were used in this case.</p>
<p style="margin-bottom: 0in; font-style: normal;">Exadata HCC is applicable for bulk loaded (fact table) data, so a significant portion (size wise) of most DWs.</p>
</blockquote>
<p style="margin-bottom: 0in; font-style: normal;">Summing up, that seems to say:</p>
<ul>
<li>Oracle claims 	12-17X compression on a kind of data similar to that on which 	Vertica &#8212; which also uses 10X as a single-point overall compression 	marketing estimate where needed &#8212; claims 20X.</li>
<li>Oracle selects 	compression algorithms automagically.</li>
<li>Oracle&#8217;s 	compression doesn&#8217;t quite apply to all the data. Actually, this may 	be more of an issue for the caching benefits of compression than for 	the I/O or disk storage gains. (If you join a retail transaction 	fact table to a customer dimension table, and you have a lot of 	customers, fitting the uncompressed customer table into RAM could be 	problematic.)</li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;">Omer and I happened to have a call scheduled to discuss MapReduce yesterday evening, but wound up using most of the time to talk about Vertica&#8217;s compression and physical layout features instead. Highlights included:</p>
<ul>
<li>Greg, like 	many Vertica competitors, was wrong about Vertica requiring manual, 	low-level DDL (Data Description Language) for &#8212; well, for much of 	anything. Vertica does all that automatically, at least in theory, 	and suggests that in real life you can indeed often get by without 	manual intervention.</li>
<li>Vertica can do 	trickle feeds into its compressed columnar storage. Greg seemed to 	suggest Oracle Exadata can not. (However, I won&#8217;t be surprised if, 	when his comments are expanded to more than 140 characters, he winds 	up saying the opposite. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  )</li>
<li>Omer 	characterized the lowest latency with which you can get data into 	Vertica and have it be available for query as &#8220;seconds&#8221;, 	vs. &#8220;minutes&#8221; for other columnar vendors.</li>
<li>Vertica 	recommends often keeping multiple copies of a column, for high 	availability and/or performance. This is not directly reflected in 	compression estimates.  In particular, if you&#8217;re going to keep 	redundant copies of data for data-safety reasons anyway, Vertica 	recommends that you:
<ul>
<li>Run queries 	against more than one copy of the data, for performance/throughput.</li>
<li>Store 	different copies of the columns in different sort orders &#8212; e.g., 	according to different likely join keys &#8212; so that the copies are 	optimized for performance on different classes of queries.</li>
</ul>
</li>
<li>Vertica 	doesn&#8217;t have indexes.</li>
<li>Vertica sorts 	columns on ingest. This sorting is, of course, commonly based on 	attributes from columns other than the one being sorted. Even so, 	Omer maintains that sorting helps compression, because of the 	correlation between columns. Examples (and I didn&#8217;t get these all 	from him) might include:
<ul>
<li>City/postal 	code</li>
<li>Customer_ID/store 	location</li>
<li>Customer_ID/product_ID</li>
<li>Product_ID/price</li>
</ul>
</li>
<li>Vertica, based 	on the recent introduction of <a href="../2009/08/25/sybase-iq-technical-highlights/">FlexStore</a>, 	has an ILM (Information Lifecycle Management) story much like <a href="../2009/08/25/sybase-iq-technical-highlights/">Sybase 	IQ&#8217;s</a>. E.g., you can keep different data ranges for different 	columns on fast storage, while the rest of the data is relegated to 	slower/cheaper equipment.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/06/oracle-and-vertica-on-compression-and-other-physical-data-layout-features/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>FlexStore and the rest of Vertica 3.5</title>
		<link>http://www.dbms2.com/2009/08/04/flexstore-and-the-rest-of-vertica-35/</link>
		<comments>http://www.dbms2.com/2009/08/04/flexstore-and-the-rest-of-vertica-35/#comments</comments>
		<pubDate>Tue, 04 Aug 2009 10:50:48 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=860</guid>
		<description><![CDATA[Today, Vertica is announcing its 3.5 release, timed in line with a TDWI conference.  Vertica 3.5 is scheduled to go into beta test in mid-August and be released to general availability in early October.  Vertica 3.5 highlights include:

Vertica/MapReduce 	integration, which I&#8217;m covering in a separate post
A new storage architecture called 	Vertica FlexStore, which [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Today, Vertica is announcing its 3.5 release, timed in line with a TDWI conference.  Vertica 3.5 is scheduled to go into beta test in mid-August and be released to general availability in early October.  Vertica 3.5 highlights include:</p>
<ul>
<li><span style="font-style: normal;"><a href="../2009/08/04/verticas-version-of-mapreduce-integration/">Vertica/MapReduce 	integration</a>, whi</span>ch I&#8217;m covering in a separate post</li>
<li>A new storage architecture called 	Vertica FlexStore, which seems to boil down essentially to three 	things:
<ul>
<li>A sor<span style="font-style: normal;">t 	of <a href="../2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/">row/column 	hybridization</a> &#8212; Verti</span>ca would probably prefer to call it 	something like a <em>column clustering</em> feature &#8212; that I&#8217;m also 	covering in a separate post.</li>
<li>The beginnings of a 	multi-temperature capability, somewhat akin to <a href="../2008/10/14/teradata-virtual-storage/">Teradata 	Virtual Storage</a>.</li>
<li>Enhancements to Vertica&#8217;s WOS 	(Write-Optimized Store, the in-memory part of Vertica that first 	receives updates). I don&#8217;t understand WOS architecture well enough 	to write about that yet.</li>
</ul>
</li>
<li>Load-balancing, to route queries 	evenly among Vertica nodes &#8212; probably just round-robin &#8212; rather 	than  having them just be processed by whichever node happens to 	receive them.</li>
</ul>
<p style="margin-bottom: 0in;"><span id="more-860"></span>And Vertica 3.5 surely includes some lesser features as well.</p>
<p style="margin-bottom: 0in;">Like Teradata Virtual Storage, Vertica FlexStore wants to partition data among different parts of storage automatically. But, unlike Teradata&#8217;s technology, it will grudgingly let you do it by hand if you insist.  Since Vertica installations seem to generally have only one kind of disk each &#8212; and that kind spinning rather than solid-state &#8212; early tests concentrate on allocating data among the inner and outer tracks of a disk. Vertica said that one typically gets 80% of the benefit by dividing the data into just two partitions, and 90% of the benefit if one divides it into three. However, I don&#8217;t recall getting a clear estimate of just how large that benefit is.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/08/04/flexstore-and-the-rest-of-vertica-35/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>PAX Analytica? Row- and column-stores begin to come together</title>
		<link>http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/</link>
		<comments>http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/#comments</comments>
		<pubDate>Tue, 04 Aug 2009 10:40:34 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[VectorWise]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=859</guid>
		<description><![CDATA[Column-store proponents are prone to argue, in effect, that the only reason to implement an analytic DBMS with row-based storage is laziness.  Their case generally runs along the lines:

Analytic queries commonly return 	only a fraction of all possible columns.
Only returning the columns needed

Saves I/O
Saves cache space
Reduces processing
Facilitates compression


Presumably all those row-based MPP 	vendors just [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Column-store proponents are prone to argue, in effect, that the only reason to implement an analytic DBMS with row-based storage is laziness.  Their case generally runs along the lines:</p>
<ul>
<li>Analytic queries commonly return 	only a fraction of all possible columns.</li>
<li>Only returning the columns needed
<ul>
<li>Saves I/O</li>
<li>Saves cache space</li>
<li>Reduces processing</li>
<li>Facilitates compression</li>
</ul>
</li>
<li>Presumably all those row-based MPP 	vendors just went row-based because they had a fine row-based DBMS 	(usually but not always PostgreSQL) to build on.</li>
</ul>
<p style="margin-bottom: 0in;">Pushbacks to this argument from row-based vendors include:</p>
<ul>
<li>Yes, but it&#8217;s harder to update a 	column store</li>
<li>Yes, but there are more steps to 	retrieving a bunch of columns than there are to retrieving the same 	information from row stores</li>
</ul>
<p style="margin-bottom: 0in;"><span id="more-859"></span>plus generous dollops of:</p>
<ul>
<li>We&#8217;re doing just fine, thank you</li>
<li>We&#8217;re not seeing column stores 	much in the marketplace</li>
<li>Don&#8217;t believe all that academic 	hype</li>
<li>Column stores reek of 	elderberries, and are powered by hamster wheels</li>
</ul>
<p style="margin-bottom: 0in;">(OK, I made that last one up, but I do hear the other claims frequently.)</p>
<p style="margin-bottom: 0in;">However, <strong>there are at least two ways in which row- and column-stores are beginning to come together.</strong> First, there are lots of rumors about <strong>row-store vendors bringing out column-store options,</strong> even beyond the <span style="font-style: normal;">recent <a href="../2009/08/04/vectorwise-ingres-and-monetdb/">Ingres/VectorWise announcement</a>.  (But a</span>nything I may know about same beyond noticing the rumors fly by is surely under NDA.) Second, column-store vendors Vertica and VectorWise are bringing out a kind of <strong>row/column hybrid storage</strong> option.</p>
<p style="margin-bottom: 0in;"><a href="http://www.dbms2.com/2009/08/04/flexstore-and-the-rest-of-vertica-35/" >Vertica 3.5</a> introduces what Vertica calls &#8220;FlexStore.&#8221; A key part of <strong>FlexStore</strong> is the ability to store data not just in pure columnar format, but also to group columns together in what amounts to sub-rows. This is advantageous when data is retrieved together and, I presume, when it is updated.  There&#8217;s a tradeoff in giving up column stores&#8217; compression advantages, however, and use of this feature is not recommended for columns that are frequently retrieved independently.  Vertica also notes that since it typically uses 1 megabyte block sizes, any table smaller than that shouldn&#8217;t be broken into columns at all.</p>
<p style="margin-bottom: 0in;">VectorWise, of course, doesn&#8217;t have a product right now, but has gotten a bunch of recent publicity around the column-store product it plans to ship via its partner Ingres in 2010.  When I asked Peter Boncz about row/column hybridization inside VectorWise (not federating between Ingres and VectorWise, but rather truly within VectorWise), he said one of the storage options was <strong>PAX,</strong> and pointed me at <a href="http://www.cs.wisc.edu/multifacet/papers/vldb01_pax.pdf" onclick="javascript:pageTracker._trackPageview('/www.cs.wisc.edu');">a 2001 paper</a> by a group of academics that includes the ubiquitous Dave Dewitt. <em>PAX</em> turns out to stand, in creative spelling, for <em>Partition Attributes Across. </em></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">The PAX idea is to store as many rows of data as can fit into a block, but within the block store them in columns.  This preserves some of the compression and cache-efficiency benefits of column stores, while also bringing back whole rows in a single step. (I think Vertica&#8217;s FlexStore does something similar to this, but I&#8217;m not sure.) </span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">Further confusing things, Peter Boncz of VectorWise told me <strong>VectorWise can support &#8220;any hybrid&#8221; of columnar storage and PAX.</strong></span></p>
<p style="margin-bottom: 0in;"><strong><span style="font-style: normal;">Bottom line: The distinction between row- and column-stores isn&#8217;t going to go away any time soon, but it is at least beginning to blur a bit.</span></strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Vertica&#8217;s version of MapReduce integration</title>
		<link>http://www.dbms2.com/2009/08/04/verticas-version-of-mapreduce-integration/</link>
		<comments>http://www.dbms2.com/2009/08/04/verticas-version-of-mapreduce-integration/#comments</comments>
		<pubDate>Tue, 04 Aug 2009 10:29:14 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[VectorWise]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=858</guid>
		<description><![CDATA[I talked with Omer Trajman of Vertica Monday night about Vertica&#8217;s MapReduce integration, part of its Vertica 3.5 release.  Highlights included:

By &#8220;integrating Vertica and 	MapReduce,&#8221; Vertica means &#8220;integrating Vertica and 	Hadoop.&#8221;
Vertica&#8217;s Hadoop integration is 	based on Cloudera&#8217;s 	DBInputFormat.
Omer called out for me several 	features of Vertica&#8217;s Hadoop integration that didn&#8217;t just come from 	Cloudera, [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I talked with Omer Trajman of Vertica Monday night about Vertica&#8217;s MapReduce integration, part of its <a href="http://www.dbms2.com/2009/08/04/flexstore-and-the-rest-of-vertica-35/" >Vertica 3.5 release</a>.  Highlights included:</p>
<ul>
<li>By &#8220;integrating Vertica and 	MapReduce,&#8221; Vertica means &#8220;integrating Vertica and 	Hadoop.&#8221;</li>
<li>Vertica&#8217;s Hadoop integration is 	based on <a href="http://www.cloudera.com/blog/2009/03/06/database-access-with-hadoop/" onclick="javascript:pageTracker._trackPageview('/www.cloudera.com');">Cloudera&#8217;s 	DBInputFormat.</a></li>
<li>Omer called out for me several 	features of Vertica&#8217;s Hadoop integration that didn&#8217;t just come from 	Cloudera, namely:
<ul>
<li>Cloudera&#8217;s DBInputFormat assumes 	the database runs on a single computer, or a single head node of an 	MPP system. Vertica&#8217;s technology, however, runs on peer parallel 	nodes with no head, and so Vertica adapted the DBInputFormat 	technology accordingly.</li>
<li>Vertica lets you push down Map 	functions to the database. Omer reports a roughly even division 	among users and prospects between those who want to do this and ones 	who don&#8217;t.</li>
<li>Vertica lets you do Reduce 	functions (or Map functions, if you don&#8217;t push them down to the 	database) on a separate cluster than you run the database software. 	Vertica asserts that its customers and prospects all want to do 	this.  Right here is <strong>the big difference between Vertica&#8217;s 	MapReduce integration and <a href="../2008/09/05/three-different-implementations-of-mapreduce/">Aster&#8217;s 	or Greenplum&#8217;s</a>. </strong><span> (Aster 	would also say that Vertica&#8217;s weaker MapReduce/SQL programming 	integration is a big difference as well.)</span></li>
<li>Indeed, Vertica lets you Reduce 	into a different DBMS than Vertica, if you choose.</li>
<li>Vertica gives you flexibility on 	the size of the Map and Reduce clusters. Omer agreed with me when I 	said there were some limits on how fast one can add or subtract 	nodes in a Vertica grid, because there&#8217;s data redistribution 	involved. But one can add/change/delete Hadoop clusters extremely 	quickly.</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;">Apparently, the use cases for Vertica/Hadoop integration to date lie in algorithmic trading and two kinds of web analytics. Specifically:<span id="more-858"></span></p>
<ul>
<li>One or more Vertica customers are 	using MapReduce in production to do relatively simple transforms of 	web log data</li>
<li>Vertica customers are 	experimenting with &#8212; but have not yet put into production &#8212; more 	sophisticated pattern analysis of web log data.</li>
<li>Financial services customers are 	using MapReduce for a lot of <strong>experimentation in discovering new 	algorithms.</strong> The idea is that DBMS/MapReduce integration offers 	rapid prototyping of algorithmic ideas. Those that pan out are then 	reimplemented for production, presumably in some kind of <a href="http://www.dbms2.com/category/memory-centric-data-management/event-stream-processing/" >CEP (Complex Event Processing)</a> system. 	 These users seem to be ones that are pushing down a lot of Map 	functions to the Vertica DBMS.</li>
</ul>
<p style="margin-bottom: 0in;">By the way, Vertica is based on C-Store, the Ph.D. thesis project of Daniel Abadi, who recently <a href="http://dbmsmusings.blogspot.com/2009/07/announcing-release-of-hadoopdb-longer.html?showComment=1248302563267#c4299748243209968660" onclick="javascript:pageTracker._trackPageview('/dbmsmusings.blogspot.com');">wrote</a>:</p>
<blockquote>
<p style="margin-bottom: 0in;">To me, it is far more efficient from a performance and a &#8220;green&#8221; perspective to push the computation to the data. Hence, I am not a fan of decoupling the compute grid and the data grid.</p>
</blockquote>
<p style="margin-bottom: 0in; font-style: normal;">Not coincidentally, Daniel also recently <a href="http://dbmsmusings.blogspot.com/2009/07/watch-out-for-vectorwise.html" onclick="javascript:pageTracker._trackPageview('/dbmsmusings.blogspot.com');">wrote</a> that</p>
<blockquote>
<p style="margin-bottom: 0in; font-style: normal;">If the VectorWise/Ingres solution does get released open source, I believe they will be an excellent column-store storage engine for <a href="http://dbmsmusings.blogspot.com/2009/07/announcing-release-of-hadoopdb-longer.html" onclick="javascript:pageTracker._trackPageview('/dbmsmusings.blogspot.com');">HadoopDB</a>. I have already requested an academic preview edition of their software to play with.</p>
</blockquote>
<p style="margin-bottom: 0in; font-style: normal;">The <a href="http://www.dbms2.com/2009/08/04/vectorwise-ingres-and-monetdb/" >VectorWise</a> guys also told me they are looking forward to seeing how the two projects work together.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/08/04/verticas-version-of-mapreduce-integration/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
