<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; MonetDB</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/monetdb/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 09 Feb 2012 09:21:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>VectorWise, Ingres, and MonetDB</title>
		<link>http://www.dbms2.com/2009/08/04/vectorwise-ingres-and-monetdb/</link>
		<comments>http://www.dbms2.com/2009/08/04/vectorwise-ingres-and-monetdb/#comments</comments>
		<pubDate>Tue, 04 Aug 2009 10:14:34 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Ingres]]></category>
		<category><![CDATA[MonetDB]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[VectorWise]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=857</guid>
		<description><![CDATA[I talked with Peter Boncz and Marcin Zukowski of VectorWise last Wednesday, but didn&#8217;t get around to writing about VectorWise immediately. Since then, VectorWise and its partner Ingres have gotten considerable coverage, especially from an enthusiastic Daniel Abadi. Basic facts that you may already know include: VectorWise, the product, will be an open-source columnar analytic [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I talked with Peter Boncz and Marcin Zukowski of VectorWise last Wednesday, but didn&#8217;t get around to writing about VectorWise immediately. Since then, VectorWise and its partner Ingres have gotten considerable coverage, especially from an enthusiastic <a href="http://dbmsmusings.blogspot.com/2009/07/watch-out-for-vectorwise.html">Daniel Abadi</a>.  Basic facts that you may already know include:</p>
<ul>
<li>VectorWise, the product, will be 	an <strong>open-source</strong> columnar analytic DBMS. (But that&#8217;s not quite 	true. Pending productization, it&#8217;s more accurate to call the 	VectorWise technology a <a href="http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/"><em><strong>row/column hybrid</strong></em></a><em>.</em>)</li>
<li>VectorWise is due to be introduced 	in <strong>2010. </strong><span>(Peter Boncz said 	that to me more clearly than I&#8217;ve seen in other coverage.)</span></li>
<li>VectorWise and <strong>Ingres</strong> have 	a deal in which Ingres will at least be the exclusive seller of the 	VectorWise technology, and hopefully will buy the whole company.</li>
<li>Notwithstanding that it was once 	named something like &#8220;MonetDB,&#8221; VectorWise actually is <strong>not 	the same thing as MonetDB,</strong> another open source columnar analytic 	DBMS from the same research group.</li>
<li>The MonetDB and VectorWise 	research groups consist in large part of <strong>academics in Holland,</strong> specifically at CWI  (<span style="font-style: normal;">Centrum voor 	Wiskunde en Informatica).</span> But Ingres has a research group 	working on the project too. (Right now there are about seven &#8220;highly 	experienced&#8221; people each on the VectorWise and Ingres sides, 	although at least the VectorWise folks aren&#8217;t all full-time. More 	are being added.)</li>
<li>Ingres and VectorWise haven&#8217;t 	agreed exactly how VectorWise and Ingres Classic will play together 	in the Ingres product line. (All of the obvious possibilities are 	still on the table.)</li>
<li>VectorWise is shared-everything, 	just as Ingres is. But plans &#8212; still tentative &#8212; are afoot to 	integrate VectorWise with MapReduce in Daniel Abadi&#8217;s 	<a href="http://www.dbms2.com/2009/09/13/hadoopdb/">HadoopDB</a> project.</li>
</ul>
<p style="margin-bottom: 0in;"><span id="more-857"></span>The MonetDB project is led by Martin Kersten, with whom I chatted at SIGMOD in June (standing up and not taking notes, so I may have some details wrong). I get the impression, based on that conversation, my VectorWise call, and other data:</p>
<ul>
<li>Martin has been researching 	analytic DBMS (mainly but not only relational) since the late 1970s, 	and has been based at CWI since 1985.</li>
<li>Peter Boncz has been either second 	in command of that crew or close to it.</li>
<li>Martin Kersten, Peter Boncz, and 	the CWI/MonetDB team in general have gotten all sorts of computer 	science glory for their work.</li>
<li>Martin has enjoyed generously 	stable government research funding for his group, but has found 	commercialization of the technology more difficult than he might at, 	stay, Stanford.  The figure of 15 MonetDB researchers comes to mind, 	although I see from Martin&#8217;s bio that he oversees a team of ~55 in 	total.</li>
<li>One early attempt at 	commercializing MonetDB turned into a company called Data 	Distilleries that was sold to SPSS. Peter Boncz was chief architect 	of Data Distilleries.</li>
<li>Besides VectorWise, there are at 	least two other recent spin-off companies from the MonetDB project. 	One is a zero-headcount shell, set up to facilitate MonetDB project 	members (and others) consulting to users of the open source MonetDB 	technology. The other is in stealth mode, focusing on some vertical 	market.</li>
</ul>
<p style="margin-bottom: 0in;">I further get the impression that VectorWise was actually Marcin Zukowksi&#8217;s <span style="text-decoration: line-through;">Master&#8217;s</span> Ph.D project, with Peter Boncz being his advisor. VectorWise also boasts another Peter Boncz student, who wrote about updating column stores.</p>
<p style="margin-bottom: 0in;">As one might expect from the name, VectorWise does <strong>vector processing.</strong> I.e., the hard part of Marcin&#8217;s work was developing vectorized algorithms for one SQL operation after another.  Vectorization, pipelining, and FPGAs might all seem to go together &#8212; <a href="../2009/07/27/xtremedata-announces-its-dbx-data-warehouse-appliance/">XtremeData certainly seems to think so</a> &#8212; but the VectorWise folks preferred to develop for Intel CPUs anyway, for pretty much the usual reasons.  Another major theme is trying to get the right things into CPU cache, because in their opinion RAM cache is just sooooo painfully slow.</p>
<p style="margin-bottom: 0in;">Our discussion of VectorWise&#8217;s <strong>compression</strong> was interesting. Highlights included:</p>
<ul>
<li>The design requirement is that 	decompression work at a rate of 3 gigabytes/second or so. That way 	the system is faster overall than if it operated at 1 	gigabyte/second on uncompressed data, which I gather is the 	alternative.</li>
<li>VectorWise takes 4-5 <span style="text-decoration: line-through;">steps</span> CPU cycles to 	decompress a tuple.</li>
<li>VectorWise says it sacrificed 	compression ratio to achieve speed. That said, VectorWise claims 	3-4X compression on TPC-H data, which is no worse than <a href="http://www.dbms2.com/2009/06/22/the-tpc-h-benchmark-is-a-blight-upon-the-industry/">what 	ParAccel reported</a>, and enjoys higher compression rates on other 	kinds of data.</li>
<li>VectorWise decompresses data 	before manipulating it, and claims that the advantages of operating 	on compressed data are only significant if &#8212; like Vertica but 	apparently unlike VectorWise &#8212; the database stores columns in 	multiple sort orders each.</li>
<li>VectorWise&#8217;s compression is mainly 	on numerical and numerical-like (e.g. date) datatypes. An exception 	is that VectorWise uses dictionary compression on string data when 	it makes sense to do so.</li>
</ul>
<p style="margin-bottom: 0in;">Other notes include:</p>
<ul>
<li>VectorWise has technology akin to 	Microsoft SQL Server&#8217;s Shared Scans, in which multiple queries that 	require similar table scans don&#8217;t have to repeat all the redundant 	scanning work. I need to get better at figuring out which other 	analytic DBMS do similar things.</li>
<li>While VectorWise hasn&#8217;t yet been 	open-sourced, its code is in the hands of some other academic 	institutions, used mainly for computer science research (as opposed 	to, say, as a data store for some kind of scientific experiment).</li>
<li>VectorWise&#8217;s scalability has only 	been tested up to eight cores.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/08/04/vectorwise-ingres-and-monetdb/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>

