<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; VectorWise</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/vectorwise/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 09 Feb 2012 09:21:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Highlights of a busy news week</title>
		<link>http://www.dbms2.com/2011/09/26/highlights-of-a-busy-news-week/</link>
		<comments>http://www.dbms2.com/2011/09/26/highlights-of-a-busy-news-week/#comments</comments>
		<pubDate>Mon, 26 Sep 2011 05:50:35 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[DataStax]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Ingres]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[VectorWise]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5372</guid>
		<description><![CDATA[I put up 14 posts over the past week, so perhaps you haven&#8217;t had a chance yet to read them all. Highlights included: My most important post of the week was a general guide to IT vendor strategy. That one has already spawned discussion at many companies, from the tiny to the multi-billion-dollar. The best [...]]]></description>
			<content:encoded><![CDATA[<p>I put up 14 posts over the past week, so perhaps you haven&#8217;t had a chance yet to read them all. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Highlights included:</p>
<ul>
<li>My most important post of the week was a general <a href="http://www.strategicmessaging.com/strategy-for-it-vendors-a-worksheet/2011/09/18/">guide to IT vendor strategy</a>. That one has already spawned discussion at many companies, from the tiny to the multi-billion-dollar.</li>
<li>The best comment thread of the week was probably on my post about <a href="http://www.dbms2.com/2011/09/19/oltp-disk-solid-state/">scale-out relational OLTP choices</a>, in which people discussed the merits of various particular alternatives.</li>
<li>I recommended that people strongly consider attending <a href="http://www.dbms2.com/2011/09/20/xldb-the-one-conference-i-like-to-go-to/">XLDB 5 in Menlo Park on October 18-19</a>.</li>
</ul>
<p>Most of the posts, however, were reactions to news events. In particular:</p>
<ul>
<li>Teradata announced that <a href="http://www.dbms2.com/2011/09/22/teradata-columnar-compression/">Teradata 14 will be hybrid-columnar</a>, more in Vertica&#8217;s way than in Greenplum&#8217;s or Aster Data&#8217;s. (Pay no attention to the <em>Wall Street Journal&#8217;s</em> apparent belief that <a href="http://www.dbms2.com/2011/09/22/hybrid-columnar-soundbites/">no other analytic DBMS is hybrid-columnar at all</a>.)</li>
<li>Aster announced the unsurprising news that there will be a Teradata Aster appliance. Also, <a href="http://www.dbms2.com/2011/09/22/aster-database-release-5-and-teradata-aster-appliance/">Aster talked about greater analytic flexibility in the forthcoming Aster 5.0</a>.</li>
<li>With Oracle OpenWorld coming up, Oracle decided to get some of its announcing out of the way early. In particular, it announced the <a href="http://www.dbms2.com/2011/09/21/oracle-database-appliance-soundbites/">Oracle Database Appliance</a>, which is small-business-friendly hardware for running the Oracle DBMS. However, the Oracle Database Appliance doesn&#8217;t seem to do much about the complexity of running the Oracle DBMS software.</li>
<li>In <a href="http://www.dbms2.com/2011/09/23/hadoop-appliances/">a catch-all Hadoop post</a>, I noted that:
<ul>
<li>Oracle has now clearly said it has a Hadoop appliance coming, no doubt next week at OpenWorld.</li>
<li>I still can&#8217;t see why Hadoop appliances would succeed, but a lot of smart folks seem to disagree with me.</li>
<li>Greenplum announced what looks like a nice but unimportant little product upgrade.</li>
<li>It&#8217;s a really good thing that previously reported plans to revamp Hadoop are underway.</li>
</ul>
</li>
<li>DataStax announced that <a href="http://www.dbms2.com/2011/09/22/datastax-pivots-back-to-its-original-strategy/">it really is a Cassandra company after all</a>. Pay no attention to previous marketing that seemed to put DataStax in the same Hadoop-alternative category as, say, MapR.</li>
<li><a href="../2011/09/25/ingres-actian/">Ingres has changed its name to Actian</a>. The announcement seems like a confession that Ingres and VectorWise are going nowhere.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/26/highlights-of-a-busy-news-week/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ingres deemphasized, company now named Actian</title>
		<link>http://www.dbms2.com/2011/09/25/ingres-actian/</link>
		<comments>http://www.dbms2.com/2011/09/25/ingres-actian/#comments</comments>
		<pubDate>Sun, 25 Sep 2011 11:48:18 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Hadapt]]></category>
		<category><![CDATA[Ingres]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[VectorWise]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5361</guid>
		<description><![CDATA[Ingres, the company, is: Changing its name to Actian. Deemphasizing Ingres, the product. Emphasizing a set of products that don&#8217;t exist yet (or at least aren&#8217;t shipping), namely lightweight mobile apps that are business-intelligence-plus-an-action, and technology for building them. These are called &#8220;Action Apps&#8221;, and are discussed on the Actian company blog. Positioning all this [...]]]></description>
			<content:encoded><![CDATA[<p>Ingres, the company, is:</p>
<ul>
<li>Changing its name to Actian.</li>
<li>Deemphasizing Ingres, the product.</li>
<li>Emphasizing a set of products that don&#8217;t exist yet (or at least aren&#8217;t shipping), namely lightweight mobile apps that are business-intelligence-plus-an-action, and technology for building them. These are called &#8220;Action Apps&#8221;, and are discussed on the <a href="http://blogs.actian.com/">Actian company blog</a>.</li>
<li>Positioning all this as something to do with &#8220;big data&#8221; (<a href="http://www.dbms2.com/2011/09/11/big-data-has-jumped-the-shark/">what a shock</a>).</li>
</ul>
<p>It turns out that Actian was the name of an ancient athletic competition commemorating Augustus&#8217; defeat of Anthony at Actium, a battle that was more recently memorialized in the movie Cleopatra. Frankly, I think Cleopatra Software might have been a more interesting company name, although that could mean execs would have to arrive at sales calls rolled up in a carpet.</p>
<p><span id="more-5361"></span>One <a href="http://www.v3.co.uk/v3-uk/news/2111814/ingres-rebrands-actian-push">article</a> said:</p>
<blockquote><p>Greg Wood, chief financial officer for Actian, told <em>V3</em> that while the firm would continue to develop and maintain the Ingres  database platform, its would be placing the spotlight on its Cloud  Action Platform and its line of Action Apps.</p>
<p>&#8220;The Ingres database is well-recognised  and we will continue to support it, but at the same time that brand was  more associated with an older-generation technology,&#8221; Wood said.</p>
<p>&#8220;We think Actian better reflects where we are going as a company, particularly the application strategy.&#8221;</p>
<p>Wood explained that the platform would  look to expand on the emerging field of big data applications by adding  functionality for end users. The small, specialised applications would  link up with data analytics tools, providing alerts and actions when  various conditions are spotted within a database.</p></blockquote>
<p>So what about VectorWise? Notwithstanding Actian&#8217;s stated focus on &#8220;big data&#8221;, I think VectorWise&#8217;s chances for market success are slim.* Reasons include:</p>
<ul>
<li>The market for shared-disk columnar analytic DBMS is crowded (Sybase IQ, Infobright, SAND). Those vendors also have to compete with MPP columnar analytic DBMS offerings from Vertica and ParAccel.</li>
<li>I&#8217;ve never heard anything to make me believe VectorWise is getting significant market traction.</li>
<li>Indeed, Daniel Abadi&#8217;s well-known flirtation with the idea of using VectorWise in HadoopDB/Hadapt excepted, I don&#8217;t recall any marketplace mention of VectorWise at all.</li>
</ul>
<p><em>*The possibility of some kind of Action App synergy leads me to elevate them to &#8220;slim&#8221; from &#8220;none&#8221;.</em></p>
<p>The Action App idea actually sounds cool, but it&#8217;s quite a change from Ingres&#8217; previous positioning and technology, and I have no basis for judging it as likely to succeed. On the other hand, companies have occasionally made successful transitions into business intelligence from relatively unrelated businesses before, most notably Cognos in the mid-1990s.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/25/ingres-actian/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Hadapt update</title>
		<link>http://www.dbms2.com/2011/07/06/hadapt-update/</link>
		<comments>http://www.dbms2.com/2011/07/06/hadapt-update/#comments</comments>
		<pubDate>Wed, 06 Jul 2011 23:43:49 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Hadapt]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[VectorWise]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4925</guid>
		<description><![CDATA[I met with the Hadapt guys today.  I think I can be a bit crisper than before in positioning Hadapt and its use cases, namely: Hadapt is additional software on a cluster that also runs fully functional Hadoop/HDFS. (Cloudera Hadoop more than straight-from-Apache Hadoop to date, but that&#8217;s not a requirement.) The cluster also runs [...]]]></description>
			<content:encoded><![CDATA[<p>I met with the Hadapt guys today.  I think I can be a bit crisper than before in positioning Hadapt and its use cases, namely:</p>
<ul>
<li>Hadapt is additional software on a cluster that also runs fully functional Hadoop/HDFS. (Cloudera Hadoop more than straight-from-Apache Hadoop to date, but that&#8217;s not a requirement.)</li>
<li>The cluster also runs a DBMS on every node, such as PostgreSQL or one of Infobright/Vectorwise.</li>
<li>Hadapt&#8217;s software manages parallel SQL queries by distributing them to the DBMS living on each node. Hadapt says that the resulting query performance far outshines Hive&#8217;s.</li>
<li>Hadapt further says that, by exploiting the partner DBMS, its SQL functionality outpaces Hive&#8217;s as well.</li>
<li>Target Hadapt use cases are centered around keeping <a href="http://www.dbms2.com/2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated</a> or other <a href="http://www.dbms2.com/2011/05/17/poly-structured-database/">poly-structured</a> data in Hadoop, and extracting, enhancing, or otherwise deriving some of it to live in the relational store.</li>
<li>In particular, Hadapt seems like an interesting choice when you want to use that relational data as you work on other data that&#8217;s still in HDFS, or if you want to keep using the relational data in other kinds of MapReduce jobs.</li>
<li>That all fits well with my thoughts about the importance of <a href="http://www.dbms2.com/2011/05/30/another-category-of-derived-data/">derived data</a>.</li>
</ul>
<p>Other evolution from <a href="http://www.dbms2.com/2011/03/23/hadapt-commercialized-hadoopdb/">what  I wrote about Hadapt a few months ago</a> includes:</p>
<ul>
<li>Hadapt  is in beta now.</li>
<li>Hadapt has added adult supervision in the form  of <a href="http://www.hadapt.com/wickline-announcement/">Philip Wickline</a>,  late of Endeca.</li>
</ul>
<p>In other news, Hadapt is our newest client.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/06/hadapt-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hadapt (commercialized HadoopDB)</title>
		<link>http://www.dbms2.com/2011/03/23/hadapt-commercialized-hadoopdb/</link>
		<comments>http://www.dbms2.com/2011/03/23/hadapt-commercialized-hadoopdb/#comments</comments>
		<pubDate>Wed, 23 Mar 2011 12:35:52 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Hadapt]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[VectorWise]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4042</guid>
		<description><![CDATA[The HadoopDB company Hadapt is finally launching, based on the HadoopDB project, albeit with code rewritten from scratch. As you may recall, the core idea of HadoopDB is to put a DBMS on every node, and use MapReduce to talk to the whole database. The idea is to get the same SQL/MapReduce integration as you [...]]]></description>
			<content:encoded><![CDATA[<p>The HadoopDB company Hadapt is finally launching, based on <a href="../../../../../2009/09/13/hadoopdb/">the HadoopDB project</a>, albeit with code rewritten from scratch. As you may recall, the core idea of HadoopDB is to put a DBMS on every node, and use MapReduce to talk to the whole database. The idea is to get the same SQL/MapReduce integration as you get if you use Hive, but with much better performance* and perhaps somewhat better SQL functionality.** Advantages vs. a DBMS-based <a href="../../../../../2011/02/24/analytic-platforms/">analytic platform</a> that includes MapReduce &#8212; e.g. Aster Data &#8212; are less clear.  <span id="more-4042"></span></p>
<p><em>*At least if the underlying DBMS is a fast one. Hadapt likes <a href="../../../../../2010/06/11/ingres-vectorwise-technical-highlights/">VectorWise</a> for that purpose, and <a href="http://gigaom.com/cloud/making-hadoop-work-in-more-places-with-hadapt/">is showing performance comparisons</a> that assume VectorWise is underneath.</em></p>
<p><em>**It seems that Hadapt in the future is assured of having more SQL coverage than Hive does today.</em></p>
<p>It&#8217;s still early days for the Hadapt company. Funding is on the angel level. There seem to be six employees &#8212; Yale professor Daniel Abadi, CEO Justin Borgman, Chief Scientist Kamil Bajda-Pawlikowski,* and three other coders. The Hadapt product will go into beta at an unspecified future time; there currently are a couple of alpha users/design partners. The Hadapt company, a Yale spin-off, obviously needs to move from Connecticut soon. I wasn&#8217;t able to detect any particular outside experience in the form of directors or advisors. And <a href="http://www.strategicmessaging.com/public-and-analyst-relations-an-example-of-epic-fail/2011/03/22/">Hadapt&#8217;s marketing efforts are still somewhat ragged</a>. So basically, the reasons for believing in Hadapt pretty much boil down to:</p>
<ul>
<li>Daniel Abadi is a star.**</li>
<li>Hadapt&#8217;s own tests show that Hadapt is a whole lot faster than Hive.</li>
</ul>
<p><em>*Bajda-Pawlikowski is one of the two Abadi students who did the HadoopDB work. It turns out he had numerous years of coding experience before entering graduate school. (The other student, Azza Abouzeid, is pursuing an academic career.)</em></p>
<p><em>**Vertica was built around Daniel&#8217;s C-Store Ph.D. thesis. He was involved in <a href="../../../../../2008/02/19/h-store-architecture/">H-Store</a> as well. He has <a href="http://dbmsmusings.blogspot.com/">a really good blog</a>. He&#8217;s a really nice guy. Etc.</em></p>
<p>As you might have guessed from the name, the Hadapt guys are proud that their technology is &#8220;adaptive,&#8221; which communicates their fond belief that Hadapt&#8217;s query optimization and planning are more modern and cool than other folks&#8217; query planning and optimization. In particular, Daniel suggested that Hadapt is more thoughtful than most DBMS are about looking at the size of intermediate result sets and  then replanning queries accordingly.</p>
<p>However, the really cool adaptivity point is that Hadapt watches the performance of individual nodes, and takes that into account in query replanning. Daniel asserts, credibly, that this is a Really Good Feature to have in cloud and/or virtualized environments, where Hadapt might not have full control and use of its nodes. I&#8217;d add that it could also give Hadapt a lot of flexibility to be run on clusters of non-identical machines.</p>
<p>On the negative side, Hadapt will not at first have any awareness of how its underlying DBMS are optimized; it will plan for VectorWise the same way it does for PostgreSQL. In that regard, this is a DATAllegro 1.0 story. If I understood correctly, Hadapt has specific connectors for a couple of DBMS (probably exactly those two), and can also talk JDBC to anything. PostgreSQL was apparently 5X faster than MySQL when tested (with either ISAM or InnoDB); Daniel snorted about, for example, MySQL&#8217;s apparent fondness for nested-loop joins over hybrid hash. On the other hand, he was more circumspect about his reasons for favoring VectorWise over, to name another open source columnar DBMS, Infobright.</p>
<p>And finally, a couple of other points:</p>
<ul>
<li>Hadapt will be closed source, although it will of course rely on large amounts of other people&#8217;s open source software. Pay no attention to the importance Daniel previously ascribed to HadoopDB&#8217;s open source nature.</li>
<li>Hadapt decompresses data before moving it from node to node, and also before doing non-SQL MapReduce operations on it. Pay no attention to the years Daniel spent insisting columnar DBMS absolutely must operate on data in compressed form.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/03/23/hadapt-commercialized-hadoopdb/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Ingres VectorWise technical highlights</title>
		<link>http://www.dbms2.com/2010/06/11/ingres-vectorwise-technical-highlights/</link>
		<comments>http://www.dbms2.com/2010/06/11/ingres-vectorwise-technical-highlights/#comments</comments>
		<pubDate>Fri, 11 Jun 2010 11:28:18 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Ingres]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[VectorWise]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2261</guid>
		<description><![CDATA[After working through problems w/ travel, cell phones, and so on, Peter Boncz of VectorWise finally caught up with me for a regrettably brief call. Peter gave me the strong impression that what I&#8217;d written in the past about VectorWise had been and remained accurate, so I focused on filling in the gaps. Highlights included:  [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">After working through problems w/ travel, cell phones, and so on, Peter Boncz of VectorWise finally caught up with me for a regrettably brief call. Peter gave me the strong impression that what <a href="http://www.dbms2.com/2009/08/04/vectorwise-ingres-and-monetdb/">I&#8217;d written in the past about VectorWise</a> had been and remained accurate, so I focused on filling in the gaps. Highlights included:  <span id="more-2261"></span></p>
<ul>
<li>VectorWise is indeed a 	shared-everything analytic DBMS.</li>
<li>The VectorWise front-end is 	Ingres. Ingres VectorWise supports almost all SQL that Ingres does (there 	are a few edge-case exceptions).</li>
<li>Conversely, Ingres VectorWise 	doesn&#8217;t support any SQL Ingres doesn&#8217;t, most notably SQL-99 	Analytics. Naturally, SQL-99 Analytics is a roadmap item for 	Ingres/VectorWise.</li>
<li>Ingres VectorWise 1.0 is pretty 	purely columnar. There&#8217;s a bit of <a href="http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/">PAX</a>, but it&#8217;s mainly 	automagic/under the covers. The one user-controlled exception I 	understood was that one can ensure that composite keys are stored 	together.</li>
<li>The main Ingres VectorWise 	performance secret sauce ingredients we touched on were:
<ul>
<li>Vectorization of operations (hence VectorWise&#8217;s name).</li>
<li>Compression that is tuned for 	speed rather than to minimize storage utilization.</li>
</ul>
</li>
<li>We unfortunately didn&#8217;t have time 	to revisit the other big part of the Ingres VectorWise performance 	story, namely clever design for modern microprocessor architectures. 	High-level generalities about that do pervade <a href="http://www.dbms2.com/2010/06/10/vectorwise-press-release/">the Ingres 	VectorWise press release</a>,<span style="font-style: normal;"> but – 	well, they&#8217;re very high level.</span></li>
<li>Unlike Vertica but like most other 	columnar DBMS vendors, Ingres VectorWise wants you to store your 	data once. You can index-organize the data. You can also organize 	multiple tables in the same order, to make joins among them fast.</li>
<li>Support for actual join indexes is an Ingres VectorWise roadmap item.</li>
<li>As do ever more analytic DBMS, 	Ingres VectorWise has something akin to <a href="http://www.dbms2.com/2006/09/20/netezza-vs-conventional-data-warehousing-rdbms/">Netezza zone maps</a>.</li>
<li>When I asked 	Peter what had changed most from the initial VectorWise development 	plan, other than the above, he basically said that their performance 	priorities had shifted a bit. Specifically, he said.
<ul>
<li>They had 	originally been “blinded” (his word) by the TPC-H benchmark, but 	figured out that they were overly focused on it. (<a href="http://www.dbms2.com/2009/06/22/the-tpc-h-benchmark-is-a-blight-upon-the-industry/">Well, duh</a>.)</li>
<li>They learned 	about the importance of other things such as data loading speeds.</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/11/ingres-vectorwise-technical-highlights/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Fun with quotes in the VectorWise press release</title>
		<link>http://www.dbms2.com/2010/06/10/vectorwise-press-release/</link>
		<comments>http://www.dbms2.com/2010/06/10/vectorwise-press-release/#comments</comments>
		<pubDate>Thu, 10 Jun 2010 11:38:39 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Ingres]]></category>
		<category><![CDATA[VectorWise]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2242</guid>
		<description><![CDATA[Ingres forgot to prebrief me on the VectorWise announcement, and despite valiant efforts hasn&#8217;t succeeded in connecting with me since they realized the lapse. Meanwhile, I took a look at the VectorWise press release, and found the quotes to be somewhat amusing. Daniel Abadi was quoted as saying: “VectorWise &#8230; is now proven to be [...]]]></description>
			<content:encoded><![CDATA[<p>Ingres forgot to prebrief me on the VectorWise announcement, and despite valiant efforts hasn&#8217;t succeeded in connecting with me since they realized the lapse. Meanwhile, I took a look at the <a href="http://www.pr-inside.com/ingres-vectorwise-delivers-business-analytics-r1936188.htm">VectorWise press release</a>, and found the quotes to be somewhat amusing.<br />
<span id="more-2242"></span></p>
<p>Daniel Abadi was quoted as saying:</p>
<blockquote><p>“VectorWise &#8230; is now proven to be a terrific fit as database technology underpinning large-scale HadoopDB applications.”</p></blockquote>
<p>notwithstanding that <a href="http://www.dbms2.com/2009/09/13/hadoopdb/">HadoopDB</a> has never been used for a production application, large-scale or otherwise. Unsurprisingly, Daniel is quite emphatic in noting he didn&#8217;t actually say that.*</p>
<p><em>*Daniel actually said &#8220;has proven,&#8221; which to him means &#8220;[not] officially proven, but proven in a more colloquial sense.&#8221;</em></p>
<p>Meanwhile, a quote from the Rohatyn Group starts:</p>
<blockquote><p>“For the past 20 years, we’ve been searching for the killer database &#8230;&#8221;</p></blockquote>
<p>notwithstanding that the Rohatyn Group hasn&#8217;t been around for even 10 years, let alone 20. (I infer this from the LinkedIn profile of a &#8220;founder&#8221; who&#8217;s been working there only since 2002.) However, Peter Boncz assures me via e-mail that this is a genuine quote even so, and speculates quite reasonably that this database quest started at some other entity.</p>
<p>Otherwise, the quotes are the usual stuff, with highlights including:</p>
<ul>
<li>At least some queries run much faster on VectorWise than they do on a non-analytic DBMS.</li>
<li>A couple of system integration partners are enthused about how happy their VectorWise customers will be if and when they ever have any.</li>
<li>Ingres and VectorWise executives think their product is really cool.</li>
</ul>
<p><em><strong>Related link</strong></em></p>
<ul>
<li><a href="http://www.dbms2.com/2009/11/23/fabricated-press-release-quote/">A press release quote that REALLY went awry</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/10/vectorwise-press-release/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>PAX Analytica? Row- and column-stores begin to come together</title>
		<link>http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/</link>
		<comments>http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/#comments</comments>
		<pubDate>Tue, 04 Aug 2009 10:40:34 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[VectorWise]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=859</guid>
		<description><![CDATA[Column-store proponents are prone to argue, in effect, that the only reason to implement an analytic DBMS with row-based storage is laziness. Their case generally runs along the lines: Analytic queries commonly return only a fraction of all possible columns. Only returning the columns needed Saves I/O Saves cache space Reduces processing Facilitates compression Presumably [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Column-store proponents are prone to argue, in effect, that the only reason to implement an analytic DBMS with row-based storage is laziness.  Their case generally runs along the lines:</p>
<ul>
<li>Analytic queries commonly return 	only a fraction of all possible columns.</li>
<li>Only returning the columns needed
<ul>
<li>Saves I/O</li>
<li>Saves cache space</li>
<li>Reduces processing</li>
<li>Facilitates compression</li>
</ul>
</li>
<li>Presumably all those row-based MPP 	vendors just went row-based because they had a fine row-based DBMS 	(usually but not always PostgreSQL) to build on.</li>
</ul>
<p style="margin-bottom: 0in;">Pushbacks to this argument from row-based vendors include:</p>
<ul>
<li>Yes, but it&#8217;s harder to update a 	column store</li>
<li>Yes, but there are more steps to 	retrieving a bunch of columns than there are to retrieving the same 	information from row stores</li>
</ul>
<p style="margin-bottom: 0in;"><span id="more-859"></span>plus generous dollops of:</p>
<ul>
<li>We&#8217;re doing just fine, thank you</li>
<li>We&#8217;re not seeing column stores 	much in the marketplace</li>
<li>Don&#8217;t believe all that academic 	hype</li>
<li>Column stores reek of 	elderberries, and are powered by hamster wheels</li>
</ul>
<p style="margin-bottom: 0in;">(OK, I made that last one up, but I do hear the other claims frequently.)</p>
<p style="margin-bottom: 0in;">However, <strong>there are at least two ways in which row- and column-stores are beginning to come together.</strong> First, there are lots of rumors about <strong>row-store vendors bringing out column-store options,</strong> even beyond the <span style="font-style: normal;">recent <a href="../2009/08/04/vectorwise-ingres-and-monetdb/">Ingres/VectorWise announcement</a>.  (But a</span>nything I may know about same beyond noticing the rumors fly by is surely under NDA.) Second, column-store vendors Vertica and VectorWise are bringing out a kind of <strong>row/column hybrid storage</strong> option.</p>
<p style="margin-bottom: 0in;"><a href="http://www.dbms2.com/2009/08/04/flexstore-and-the-rest-of-vertica-35/">Vertica 3.5</a> introduces what Vertica calls &#8220;FlexStore.&#8221; A key part of <strong>FlexStore</strong> is the ability to store data not just in pure columnar format, but also to group columns together in what amounts to sub-rows. This is advantageous when data is retrieved together and, I presume, when it is updated.  There&#8217;s a tradeoff in giving up column stores&#8217; compression advantages, however, and use of this feature is not recommended for columns that are frequently retrieved independently.  Vertica also notes that since it typically uses 1 megabyte block sizes, any table smaller than that shouldn&#8217;t be broken into columns at all.</p>
<p style="margin-bottom: 0in;">VectorWise, of course, doesn&#8217;t have a product right now, but has gotten a bunch of recent publicity around the column-store product it plans to ship via its partner Ingres in 2010.  When I asked Peter Boncz about row/column hybridization inside VectorWise (not federating between Ingres and VectorWise, but rather truly within VectorWise), he said one of the storage options was <strong>PAX,</strong> and pointed me at <a href="http://www.cs.wisc.edu/multifacet/papers/vldb01_pax.pdf">a 2001 paper</a> by a group of academics that includes the ubiquitous Dave Dewitt. <em>PAX</em> turns out to stand, in creative spelling, for <em>Partition Attributes Across. </em></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">The PAX idea is to store as many rows of data as can fit into a block, but within the block store them in columns.  This preserves some of the compression and cache-efficiency benefits of column stores, while also bringing back whole rows in a single step. (I think Vertica&#8217;s FlexStore does something similar to this, but I&#8217;m not sure.) </span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">Further confusing things, Peter Boncz of VectorWise told me <strong>VectorWise can support &#8220;any hybrid&#8221; of columnar storage and PAX.</strong></span></p>
<p style="margin-bottom: 0in;"><strong><span style="font-style: normal;">Bottom line: The distinction between row- and column-stores isn&#8217;t going to go away any time soon, but it is at least beginning to blur a bit.</span></strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Vertica&#8217;s version of MapReduce integration</title>
		<link>http://www.dbms2.com/2009/08/04/verticas-version-of-mapreduce-integration/</link>
		<comments>http://www.dbms2.com/2009/08/04/verticas-version-of-mapreduce-integration/#comments</comments>
		<pubDate>Tue, 04 Aug 2009 10:29:14 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[VectorWise]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=858</guid>
		<description><![CDATA[I talked with Omer Trajman of Vertica Monday night about Vertica&#8217;s MapReduce integration, part of its Vertica 3.5 release. Highlights included: By &#8220;integrating Vertica and MapReduce,&#8221; Vertica means &#8220;integrating Vertica and Hadoop.&#8221; Vertica&#8217;s Hadoop integration is based on Cloudera&#8217;s DBInputFormat. Omer called out for me several features of Vertica&#8217;s Hadoop integration that didn&#8217;t just come [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I talked with Omer Trajman of Vertica Monday night about Vertica&#8217;s MapReduce integration, part of its <a href="http://www.dbms2.com/2009/08/04/flexstore-and-the-rest-of-vertica-35/">Vertica 3.5 release</a>.  Highlights included:</p>
<ul>
<li>By &#8220;integrating Vertica and 	MapReduce,&#8221; Vertica means &#8220;integrating Vertica and 	Hadoop.&#8221;</li>
<li>Vertica&#8217;s Hadoop integration is 	based on <a href="http://www.cloudera.com/blog/2009/03/06/database-access-with-hadoop/">Cloudera&#8217;s 	DBInputFormat.</a></li>
<li>Omer called out for me several 	features of Vertica&#8217;s Hadoop integration that didn&#8217;t just come from 	Cloudera, namely:
<ul>
<li>Cloudera&#8217;s DBInputFormat assumes 	the database runs on a single computer, or a single head node of an 	MPP system. Vertica&#8217;s technology, however, runs on peer parallel 	nodes with no head, and so Vertica adapted the DBInputFormat 	technology accordingly.</li>
<li>Vertica lets you push down Map 	functions to the database. Omer reports a roughly even division 	among users and prospects between those who want to do this and ones 	who don&#8217;t.</li>
<li>Vertica lets you do Reduce 	functions (or Map functions, if you don&#8217;t push them down to the 	database) on a separate cluster than you run the database software. 	Vertica asserts that its customers and prospects all want to do 	this.  Right here is <strong>the big difference between Vertica&#8217;s 	MapReduce integration and <a href="../2008/09/05/three-different-implementations-of-mapreduce/">Aster&#8217;s 	or Greenplum&#8217;s</a>. </strong><span> (Aster 	would also say that Vertica&#8217;s weaker MapReduce/SQL programming 	integration is a big difference as well.)</span></li>
<li>Indeed, Vertica lets you Reduce 	into a different DBMS than Vertica, if you choose.</li>
<li>Vertica gives you flexibility on 	the size of the Map and Reduce clusters. Omer agreed with me when I 	said there were some limits on how fast one can add or subtract 	nodes in a Vertica grid, because there&#8217;s data redistribution 	involved. But one can add/change/delete Hadoop clusters extremely 	quickly.</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;">Apparently, the use cases for Vertica/Hadoop integration to date lie in algorithmic trading and two kinds of web analytics. Specifically:<span id="more-858"></span></p>
<ul>
<li>One or more Vertica customers are 	using MapReduce in production to do relatively simple transforms of 	web log data</li>
<li>Vertica customers are 	experimenting with &#8212; but have not yet put into production &#8212; more 	sophisticated pattern analysis of web log data.</li>
<li>Financial services customers are 	using MapReduce for a lot of <strong>experimentation in discovering new 	algorithms.</strong> The idea is that DBMS/MapReduce integration offers 	rapid prototyping of algorithmic ideas. Those that pan out are then 	reimplemented for production, presumably in some kind of <a href="http://www.dbms2.com/category/memory-centric-data-management/event-stream-processing/">CEP (Complex Event Processing)</a> system. 	 These users seem to be ones that are pushing down a lot of Map 	functions to the Vertica DBMS.</li>
</ul>
<p style="margin-bottom: 0in;">By the way, Vertica is based on C-Store, the Ph.D. thesis project of Daniel Abadi, who recently <a href="http://dbmsmusings.blogspot.com/2009/07/announcing-release-of-hadoopdb-longer.html?showComment=1248302563267#c4299748243209968660">wrote</a>:</p>
<blockquote>
<p style="margin-bottom: 0in;">To me, it is far more efficient from a performance and a &#8220;green&#8221; perspective to push the computation to the data. Hence, I am not a fan of decoupling the compute grid and the data grid.</p>
</blockquote>
<p style="margin-bottom: 0in; font-style: normal;">Not coincidentally, Daniel also recently <a href="http://dbmsmusings.blogspot.com/2009/07/watch-out-for-vectorwise.html">wrote</a> that</p>
<blockquote>
<p style="margin-bottom: 0in; font-style: normal;">If the VectorWise/Ingres solution does get released open source, I believe they will be an excellent column-store storage engine for <a href="http://dbmsmusings.blogspot.com/2009/07/announcing-release-of-hadoopdb-longer.html">HadoopDB</a>. I have already requested an academic preview edition of their software to play with.</p>
</blockquote>
<p style="margin-bottom: 0in; font-style: normal;">The <a href="http://www.dbms2.com/2009/08/04/vectorwise-ingres-and-monetdb/">VectorWise</a> guys also told me they are looking forward to seeing how the two projects work together.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/08/04/verticas-version-of-mapreduce-integration/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>VectorWise, Ingres, and MonetDB</title>
		<link>http://www.dbms2.com/2009/08/04/vectorwise-ingres-and-monetdb/</link>
		<comments>http://www.dbms2.com/2009/08/04/vectorwise-ingres-and-monetdb/#comments</comments>
		<pubDate>Tue, 04 Aug 2009 10:14:34 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Ingres]]></category>
		<category><![CDATA[MonetDB]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[VectorWise]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=857</guid>
		<description><![CDATA[I talked with Peter Boncz and Marcin Zukowski of VectorWise last Wednesday, but didn&#8217;t get around to writing about VectorWise immediately. Since then, VectorWise and its partner Ingres have gotten considerable coverage, especially from an enthusiastic Daniel Abadi. Basic facts that you may already know include: VectorWise, the product, will be an open-source columnar analytic [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I talked with Peter Boncz and Marcin Zukowski of VectorWise last Wednesday, but didn&#8217;t get around to writing about VectorWise immediately. Since then, VectorWise and its partner Ingres have gotten considerable coverage, especially from an enthusiastic <a href="http://dbmsmusings.blogspot.com/2009/07/watch-out-for-vectorwise.html">Daniel Abadi</a>.  Basic facts that you may already know include:</p>
<ul>
<li>VectorWise, the product, will be 	an <strong>open-source</strong> columnar analytic DBMS. (But that&#8217;s not quite 	true. Pending productization, it&#8217;s more accurate to call the 	VectorWise technology a <a href="http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/"><em><strong>row/column hybrid</strong></em></a><em>.</em>)</li>
<li>VectorWise is due to be introduced 	in <strong>2010. </strong><span>(Peter Boncz said 	that to me more clearly than I&#8217;ve seen in other coverage.)</span></li>
<li>VectorWise and <strong>Ingres</strong> have 	a deal in which Ingres will at least be the exclusive seller of the 	VectorWise technology, and hopefully will buy the whole company.</li>
<li>Notwithstanding that it was once 	named something like &#8220;MonetDB,&#8221; VectorWise actually is <strong>not 	the same thing as MonetDB,</strong> another open source columnar analytic 	DBMS from the same research group.</li>
<li>The MonetDB and VectorWise 	research groups consist in large part of <strong>academics in Holland,</strong> specifically at CWI  (<span style="font-style: normal;">Centrum voor 	Wiskunde en Informatica).</span> But Ingres has a research group 	working on the project too. (Right now there are about seven &#8220;highly 	experienced&#8221; people each on the VectorWise and Ingres sides, 	although at least the VectorWise folks aren&#8217;t all full-time. More 	are being added.)</li>
<li>Ingres and VectorWise haven&#8217;t 	agreed exactly how VectorWise and Ingres Classic will play together 	in the Ingres product line. (All of the obvious possibilities are 	still on the table.)</li>
<li>VectorWise is shared-everything, 	just as Ingres is. But plans &#8212; still tentative &#8212; are afoot to 	integrate VectorWise with MapReduce in Daniel Abadi&#8217;s 	<a href="http://www.dbms2.com/2009/09/13/hadoopdb/">HadoopDB</a> project.</li>
</ul>
<p style="margin-bottom: 0in;"><span id="more-857"></span>The MonetDB project is led by Martin Kersten, with whom I chatted at SIGMOD in June (standing up and not taking notes, so I may have some details wrong). I get the impression, based on that conversation, my VectorWise call, and other data:</p>
<ul>
<li>Martin has been researching 	analytic DBMS (mainly but not only relational) since the late 1970s, 	and has been based at CWI since 1985.</li>
<li>Peter Boncz has been either second 	in command of that crew or close to it.</li>
<li>Martin Kersten, Peter Boncz, and 	the CWI/MonetDB team in general have gotten all sorts of computer 	science glory for their work.</li>
<li>Martin has enjoyed generously 	stable government research funding for his group, but has found 	commercialization of the technology more difficult than he might at, 	stay, Stanford.  The figure of 15 MonetDB researchers comes to mind, 	although I see from Martin&#8217;s bio that he oversees a team of ~55 in 	total.</li>
<li>One early attempt at 	commercializing MonetDB turned into a company called Data 	Distilleries that was sold to SPSS. Peter Boncz was chief architect 	of Data Distilleries.</li>
<li>Besides VectorWise, there are at 	least two other recent spin-off companies from the MonetDB project. 	One is a zero-headcount shell, set up to facilitate MonetDB project 	members (and others) consulting to users of the open source MonetDB 	technology. The other is in stealth mode, focusing on some vertical 	market.</li>
</ul>
<p style="margin-bottom: 0in;">I further get the impression that VectorWise was actually Marcin Zukowksi&#8217;s <span style="text-decoration: line-through;">Master&#8217;s</span> Ph.D project, with Peter Boncz being his advisor. VectorWise also boasts another Peter Boncz student, who wrote about updating column stores.</p>
<p style="margin-bottom: 0in;">As one might expect from the name, VectorWise does <strong>vector processing.</strong> I.e., the hard part of Marcin&#8217;s work was developing vectorized algorithms for one SQL operation after another.  Vectorization, pipelining, and FPGAs might all seem to go together &#8212; <a href="../2009/07/27/xtremedata-announces-its-dbx-data-warehouse-appliance/">XtremeData certainly seems to think so</a> &#8212; but the VectorWise folks preferred to develop for Intel CPUs anyway, for pretty much the usual reasons.  Another major theme is trying to get the right things into CPU cache, because in their opinion RAM cache is just sooooo painfully slow.</p>
<p style="margin-bottom: 0in;">Our discussion of VectorWise&#8217;s <strong>compression</strong> was interesting. Highlights included:</p>
<ul>
<li>The design requirement is that 	decompression work at a rate of 3 gigabytes/second or so. That way 	the system is faster overall than if it operated at 1 	gigabyte/second on uncompressed data, which I gather is the 	alternative.</li>
<li>VectorWise takes 4-5 <span style="text-decoration: line-through;">steps</span> CPU cycles to 	decompress a tuple.</li>
<li>VectorWise says it sacrificed 	compression ratio to achieve speed. That said, VectorWise claims 	3-4X compression on TPC-H data, which is no worse than <a href="http://www.dbms2.com/2009/06/22/the-tpc-h-benchmark-is-a-blight-upon-the-industry/">what 	ParAccel reported</a>, and enjoys higher compression rates on other 	kinds of data.</li>
<li>VectorWise decompresses data 	before manipulating it, and claims that the advantages of operating 	on compressed data are only significant if &#8212; like Vertica but 	apparently unlike VectorWise &#8212; the database stores columns in 	multiple sort orders each.</li>
<li>VectorWise&#8217;s compression is mainly 	on numerical and numerical-like (e.g. date) datatypes. An exception 	is that VectorWise uses dictionary compression on string data when 	it makes sense to do so.</li>
</ul>
<p style="margin-bottom: 0in;">Other notes include:</p>
<ul>
<li>VectorWise has technology akin to 	Microsoft SQL Server&#8217;s Shared Scans, in which multiple queries that 	require similar table scans don&#8217;t have to repeat all the redundant 	scanning work. I need to get better at figuring out which other 	analytic DBMS do similar things.</li>
<li>While VectorWise hasn&#8217;t yet been 	open-sourced, its code is in the hands of some other academic 	institutions, used mainly for computer science research (as opposed 	to, say, as a data store for some kind of scientific experiment).</li>
<li>VectorWise&#8217;s scalability has only 	been tested up to eight cores.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/08/04/vectorwise-ingres-and-monetdb/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>

