<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS2 -- DataBase Management System Services &#187; Columnar database management</title>
	<atom:link href="http://www.dbms2.com/category/database-theory-practice/columnar-database-management/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 18 Mar 2010 05:19:19 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Vertica 4.0</title>
		<link>http://www.dbms2.com/2010/02/22/vertica-4/</link>
		<comments>http://www.dbms2.com/2010/02/22/vertica-4/#comments</comments>
		<pubDate>Mon, 22 Feb 2010 08:19:00 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1607</guid>
		<description><![CDATA[Vertica briefed me last month on its forthcoming Vertica 4.0 release. I think it&#8217;s fair to say that Vertica 4.0 is mainly a cleanup/catchup release, washing away some of the tradeoffs Vertica had previously made in support of its innovative DBMS architecture.
For starters, there&#8217;s a lot of new analytic functionality. This isn&#8217;t Aster/Netezza-style ambitious. Rather, [...]]]></description>
			<content:encoded><![CDATA[<p>Vertica briefed me last month on its forthcoming Vertica 4.0 release. I think it&#8217;s fair to say that Vertica 4.0 is mainly a cleanup/catchup release, washing away some of the tradeoffs Vertica had previously made in support of its innovative DBMS architecture.</p>
<p>For starters, there&#8217;s a lot of new analytic functionality. This isn&#8217;t Aster/Netezza-style ambitious. Rather, there&#8217;s a lot more SQL-99 functionality, plus some time series extensions of the sort that financial services firms – an important market for Vertica – need and love. Vertica did suggest a couple of these time series extensions are innovative, but I haven&#8217;t yet gotten detail about those.</p>
<p>Perhaps even more important, Vertica is cleaning up a lot of its previous SQL optimization and execution weirdnesses. In no particular order, I was told:<span id="more-1607"></span></p>
<ul>
<li>Vertica&#8217;s delete performance is up “literally” 30-100X, at least in the case of “large” deletes. Performance for “large” updates has been enhanced as well.</li>
<li>Vertica has finally cleaned up all vestiges of its prior <a href="http://www.dbms2.com/2007/10/23/vertica-star-snowflake-schema/" >bias to star schemas</a>. For example, Vertica concedes that its product previously would sometimes force a star execution plan that wasn&#8217;t really appropriate.</li>
<li>It is no longer the case that you need to define projections before you load a table into Vertica. This is now fully automatic.</li>
<li>Vertica 4.0 automatically redesigns the database when new nodes are added to the system.</li>
<li>When a database designer does hand-tune projections – and there&#8217;s no shame in this still being a possibility in Vertica 4.0 – that hand-tuning is now pulled back into the automatic generation/recommendation/whatever wizards for further projections. I.e., there&#8217;s a kind of DBA round-trip engineering going on.</li>
<li>Vertica used to require that tables being joined be identically “segmented” (I think this means distributed across joins). That is no longer the case in 4.0.</li>
<li>In connection with this new-found flexibility, Vertica now supports full outer joins directly, rather than requiring the left outer join/right outer join/UNION kluge.</li>
<li>The Vertica 4.0 optimizer is smarter than its predecessor about things like predicate pushdown into subqueries, or exploiting commonality between predicates and partition keys.</li>
<li>There&#8217;s a fundamental change that I don&#8217;t understand very well in the Vertica execution engine basic unit of work. It sounds as if in the past all the disk-based data containers the query needed got opened at once and read into memory, whether or not there was enough RAM and CPU cores to handle them, and this problem has now been fixed.</li>
<li>Vertica always seemed to say that you could query immediately on new data, because even if it hadn&#8217;t hit disk yet – the ROS (Read-Optimized Store) – it was available in memory – the WOS (Write-Optimized Store). And queries were in essence federated between the ROS and WOS. But apparently it&#8217;s a new feature in Vertica 4.0 that you can read totally fresh data without locking. I confess to not understanding this very well either. (It has something to do with what  Vertica calls “Epochs”.)</li>
<li>Temporary tables can now be created in Vertica on a local/session basis without any DDL. Make temporary tables easier and more performant is important for a variety of reasons:
<ul>
<li>Microstrategy, Company V* et al. use lots of temp tables. E.g,, Company V on Vertica has 3000 permanent tables and 5-7000 temporary ones.</li>
<li>Vertica rightly points out that temporary tables are also important for ELT (Extract/Load/Transform).</li>
<li>Vertica further says that single-node OEMs such as security appliance vendors use lots of temp tables.</li>
</ul>
</li>
</ul>
<p><em>*Company V = one of the more prominent vertical-market application providers.</em></p>
<p>In other Vertica highlights:</p>
<ul>
<li>It sounds as if 4.0 is the first Vertica release with what I would regard as serious workload management.</li>
<li>While Vertica has stored and retrieved Unicode since Vertica 3.5 or so, 4.0 will be the first Vertica release in which Unicode is sorted and collated properly.</li>
<li>Stored-procedure-like functionality is still a future for Vertica.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/22/vertica-4/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Vertica slaughters Sybase in patent litigation</title>
		<link>http://www.dbms2.com/2010/01/15/vertica-sybase-ipatent-litigation/</link>
		<comments>http://www.dbms2.com/2010/01/15/vertica-sybase-ipatent-litigation/#comments</comments>
		<pubDate>Fri, 15 Jan 2010 13:07:32 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1409</guid>
		<description><![CDATA[Back in August, 2008, I pooh-poohed Sybase&#8217;s patent lawsuit against Vertica. Filed in the notoriously patent-holder-friendly East Texas courts, the suit basically claimed patent rights over the whole idea of a columnar RDBMS. It was pretty clear that this suit was meant to be a model for claims against other columnar RDBMS vendors as well, [...]]]></description>
			<content:encoded><![CDATA[<p>Back in August, 2008, <a href="http://www.dbms2.com/2008/08/14/patent-nonsense-in-the-data-warehouse-dbms-market/" >I pooh-poohed Sybase&#8217;s patent lawsuit against Vertica</a>. Filed in the notoriously patent-holder-friendly East Texas courts, the suit basically claimed patent rights over the whole idea of a columnar RDBMS. It was pretty clear that this suit was meant to be a model for claims against other columnar RDBMS vendors as well, should they ever achieve material marketplace success.</p>
<p>If a recent Vertica press release is to be believed, <a href="http://www.vertica.com/company/news/Vertica-prevails-in-Sybase-patent-lawsuit" onclick="javascript:pageTracker._trackPageview('/www.vertica.com');">Sybase got clobbered</a>. The meat is:</p>
<blockquote><p>&#8230;  Sybase has admitted that under the claim construction order issued by the Court on November 9, 2009, <em>&#8220;Vertica does not infringe Claims 1-15 of U.S. Patent No. 5,794,229.&#8221;</em> Sybase further acknowledged that because the Court ruled that all the remaining claims in the patent (claims 16-24) were invalid, <em>&#8220;Sybase cannot prevail on those claims.&#8221; </em></p></blockquote>
<p>For those counting along at home &#8212; the patent only has 24 claims in total.</p>
<p>I have no idea whether Sybase can still cobble together grounds for appeal, or claims under some other patent. But for now, this sounds like a total victory for Vertica.</p>
<p><em>Edit: I&#8217;ve now seen a PDF of a filing suggesting the grounds under which Sybase will appeal. Basically, it alleges that the judge erred in defining a &#8220;page&#8221; of data too narrowly. Note that if Sybase prevails on appeal on that point, Vertica has a bunch of other defenses that haven&#8217;t been litigated yet. It further seems that Sybase may have recently filed another patent case against Vertica, in a different venue, based on a different patent.</em></p>
<p>One annoying blog troll excepted, is anybody surprised at this outcome?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/01/15/vertica-sybase-ipatent-litigation/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>There sure seem to be a lot of inaccuracies on ParAccel&#8217;s website</title>
		<link>http://www.dbms2.com/2010/01/15/there-sure-seem-to-be-a-lot-of-inaccuracies-on-paraccels-website/</link>
		<comments>http://www.dbms2.com/2010/01/15/there-sure-seem-to-be-a-lot-of-inaccuracies-on-paraccels-website/#comments</comments>
		<pubDate>Fri, 15 Jan 2010 04:47:00 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Telecommunications]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1393</guid>
		<description><![CDATA[In what is actually an interesting post on database compression, ParAccel CTO Barry Zane threw in
Anyone who has met with us knows ParAccel shies away from hype.
But like many things ParAccel says, that is not true.
The latest whoppers came in the form of several customers ParAccel listed on its website who hadn&#8217;t actually bought ParAccel&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>In what is actually an <a href="http://paraccel.com/data_warehouse_blog/?p=192" onclick="javascript:pageTracker._trackPageview('/paraccel.com');">interesting post on database compression</a>, ParAccel CTO Barry Zane threw in</p>
<blockquote><p>Anyone who has met with us knows ParAccel shies away from hype.</p></blockquote>
<p>But like many things ParAccel says, that is not true.</p>
<p>The latest whoppers came in the form of several customers ParAccel listed on its website who hadn&#8217;t actually bought ParAccel&#8217;s DBMS, nor even decided to do so. It is fairly common to to claim a customer win, then retract the claim due to lack of permission to disclose. But that&#8217;s not what happened in these cases. Based on emails helpfully shared by a ParAccel competitor competing in some of those accounts, it seems clear that <strong>ParAccel actually posted fabricated claims of customer wins.</strong> <span id="more-1393"></span></p>
<p>Another thing that was both technically and substantively false was ParAccel&#8217;s claim to be <a href="http://www.dbms2.com/2009/09/30/facts-and-rumors/" >CERTIFIED price-performance leader</a>. Obviously, this was meant to give the impression that ParAccel had been &#8220;certified&#8221; as the leader in price/performance, when the closest thing to that that was remotely true was that ParAccel had a leading position in the category of &#8220;price/performance measurements that happen to have a certification process.&#8221; At least, that was true for a short time; then ParAccel&#8217;s certification was found to have been erroneous, and got revoked, which did not however inspire ParAccel to immediately take the claim off the front page of its website.</p>
<p>ParAccel&#8217;s website also reflects a lot of praise from flagship customer LatiNode. What it perhaps understandably neglects to mention is that LatiNode is in a <a href="http://www.pepperlaw.com/publications_update.aspx?ArticleKey=1651" onclick="javascript:pageTracker._trackPageview('/www.pepperlaw.com');">dormant state</a>, placed there by acquirer Elandia due to LatiNode&#8217;s criminally corrupt customer acquisition practices.</p>
<p>I also don&#8217;t believe ParAccel&#8217;s endlessly-repeated claim that is has never lost a benchmark on performance. However, I must in fairness note that while I&#8217;ve been given names of customers who are supposed counterexamples to this claim by somebody I trust, I&#8217;ve never been able to actually verify those supposed ParAccel losses.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/01/15/there-sure-seem-to-be-a-lot-of-inaccuracies-on-paraccels-website/feed/</wfw:commentRss>
		<slash:comments>22</slash:comments>
		</item>
		<item>
		<title>This and that</title>
		<link>http://www.dbms2.com/2009/12/29/this-and-that/</link>
		<comments>http://www.dbms2.com/2009/12/29/this-and-that/#comments</comments>
		<pubDate>Tue, 29 Dec 2009 09:14:46 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[Mark Logic]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Structured documents]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1348</guid>
		<description><![CDATA[I have various subjects backed up that I don&#8217;t really want to write about at traditional blog-post length.  Here are a few of them.
Vertica offers a post on its 3.5 release, with a riff on the popular theme &#8220;We&#8217;ve fixed some weaknesses in our prior versions that we didn&#8217;t previously say we had.&#8221; More important, [...]]]></description>
			<content:encoded><![CDATA[<p>I have various subjects backed up that I don&#8217;t really want to write about at traditional blog-post length.  Here are a few of them.<span id="more-1348"></span></p>
<p><strong>Vertica</strong> offers a post on<a href="http://databasecolumn.vertica.com/database-innovation/vertica-3-5-flexstoretm-the-next-generation-of-column-stores/" onclick="javascript:pageTracker._trackPageview('/databasecolumn.vertica.com');"> its 3.5 release</a>, with a riff on the popular theme &#8220;We&#8217;ve fixed some weaknesses in our prior versions that we didn&#8217;t previously say we had.&#8221; More important, Vertica is pretty clear on the virtues of its <a href="http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/" >hybrid columnar architecture</a>.</p>
<p>Speaking of which &#8212; <strong>Oracle is going true hybrid columnar</strong> as well. I don&#8217;t have details or timing, however.</p>
<p>Dave Kellogg of <strong>Mark Logic</strong> wrote in to amusedly point out <a href="http://www.oracle.com/technology/tech/xml/xmldb/Current/marklogicserver_4.1_v1.0.pdf" onclick="javascript:pageTracker._trackPageview('/www.oracle.com');" target="_blank"><span style="color: #0000ff;"> </span>Oracle&#8217;s anti-MarkLogic collateral.</a> The very first charge Oracle levies is that MarkLogic goes beyond the emerging XQuery standard to add additional functionality. Considering Oracle&#8217;s approach to SQL standards, I tend to share Dave&#8217;s amusement.</p>
<div><span style="font-family: Calibri,sans-serif; font-size: small;"> </span></div>
<p>Bill Conniff of <a href="http://www.xponentsoftware.com/" onclick="javascript:pageTracker._trackPageview('/www.xponentsoftware.com');">Xponent LLC</a> wrote in to tell of a vastly cheaper and less functional approach to <strong>XML management,</strong> apparently geared to looking at very large XML files one at a time.</p>
<p><strong>Cayuga</strong> is a Cornell research project in complex event processing (CEP). There&#8217;s a <a href="http://www.cs.cornell.edu/bigreddata/cayuga/" onclick="javascript:pageTracker._trackPageview('/www.cs.cornell.edu');">Cayuga academic home page</a>, a Sourceforge page for some <a href="http://sourceforge.net/projects/cayuga/" onclick="javascript:pageTracker._trackPageview('/sourceforge.net');">open source Cayuga CEP code</a>, and so on. Minsheng Hong, writing from a Vertica email address, tipped me off some months ago. The basic idea seems to be to do <em>lots</em> of queries very quickly, rather than a smaller number of queries over and over again. Whether this is an advance in anything but open-sourceness over Apama or Aleri I couldn&#8217;t say, but I do think it&#8217;s a different focus than that of StreamBase or pre-Aleri Coral8.</p>
<p>And finally, editor Doug Henschen listed his <a href="http://intelligent-enterprise.informationweek.com/blog/archives/2009/12/intelligent_ent_2.html;jsessionid=0YRB5UUISPBXLQE1GHRSKH4ATMY32JVN" onclick="javascript:pageTracker._trackPageview('/intelligent-enterprise.informationweek.com');">15 favorite <em>Intelligent Enterprise</em> blog posts of 2009</a> &#8212; four each by Seth Grimes and Doug himself, three by Cindi Howson, two by me,* and one each by Mark Smith and Neil Raden.</p>
<p><em>*Doug selects up to three posts a month from here to republish.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/12/29/this-and-that/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Boston Big Data Summit keynote outline</title>
		<link>http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/</link>
		<comments>http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/#comments</comments>
		<pubDate>Mon, 23 Nov 2009 06:25:50 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[DBMS product categories]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Humor]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1227</guid>
		<description><![CDATA[Last month, Bob Zurek asked me to give a talk on “Big Data”, where “big” is anything from a few terabytes on up, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I&#8217;m posting them below.

The top two points [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Last month, Bob Zurek asked me to give a talk on <a href="http://www.dbms2.com/2009/10/09/presentations-upcoming/" >“Big Data”, where “big” is anything from a few terabytes on up</a>, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I&#8217;m posting them below.</p>
<p><span id="more-1227"></span></p>
<p style="margin-bottom: 0in;">The top two points from Q&amp;A probably were:</p>
<ul>
<li><strong>Big Data and the cloud actually 	have relatively little to do with each other,</strong> <a href="http://www.dbms2.com/2009/10/30/aster-data-application-server-ncluster/" >a few exceptions</a> notwithstanding, especially if the data is in a shared-nothing DBMS 	(as opposed to, say, a MapReduce-oriented file cluster). Two 	principal reasons are:
<ul>
<li>Redistributing data from node to 	node is a little slow, undermining some of the elasticity benefits 	of the cloud.</li>
<li><a href="http://www.dbms2.com/2009/05/29/sneakernet-to-the-cloud/" >Getting data into the cloud in the 	first place is a lot slow</a>.</li>
</ul>
</li>
<li><strong>The NoSQL movement is a lot like 	the Ron Paul campaign</strong> &#8212; it consists of people who are dissatisfied 	with the status quo, whose dissatisfaction has a lot to do with 	insufficient liberty and/or excessive expenditure, and who otherwise 	don&#8217;t have a whole lot in common with each other.</li>
</ul>
<p style="margin-bottom: 0in;">Anyhow, here are my notes for the talk, edited in just a couple of places for readability or linkage.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><strong>Quick introduction</strong></p>
<ul>
<li>Big Data vs. cloud</li>
<li>How big is Big Data?</li>
<li>At the low end of that range, 	there&#8217;s little you can&#8217;t do with conventional technology if you 	have:
<ul>
<li>An unlimited budget for hardware</li>
<li>An unlimited budget for software</li>
<li>An unlimited budget for people, 	especially Oracle DBAs</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Big Data in OLTP</strong></p>
<ul>
<li>Hard-core OLTP
<ul>
<li>Focus of DBMS technology for a 	long-time</li>
<li>Big budgets because each 	transaction has significant value</li>
<li>Tough to get users to change 	technologies</li>
</ul>
</li>
<li>Lighter-weight OLTP
<ul>
<li>Classic example = web companies
<ul>
<li>Big ones &#8212;  retail-oriented ones 	(eBay, Amazon) partially excepted &#8212; <a href="http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/" >rolled their own technology 	stacks</a></li>
<li>Reluctant to give money to anybody
<ul>
<li>Open source, etc.</li>
</ul>
</li>
</ul>
</li>
<li>Difficulty finding market
<ul>
<li>Product vs. feature
<ul>
<li>Clustering/HA/DR/whatever</li>
<li>Ditto cloud enablement</li>
</ul>
</li>
<li>True products haven&#8217;t found much 	traction yet</li>
</ul>
</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Analytic Big Data use cases</strong></p>
<ul>
<li>Kinds of data for analytics
<ul>
<li>More of same != big</li>
<li>More detail and/or new kinds
<ul>
<li>Complete data sets</li>
<li>Transactions</li>
<li>Call details</li>
<li>Tick/trade history</li>
<li>Web clickstreams</li>
<li>Network event logs</li>
<li>Other machine-generated data</li>
<li>CAM bottom line
<ul>
<li>Anything human-generated should 	and will be retained in its entirety</li>
<li>Quantities of machine-generated 	data retained should and will grow roughly in line w/ computing cost 	reductions (Moore&#8217;s Law, etc.)</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li>Analytic uses of Big Data
<ul>
<li>Analytics is mainly about three 	things
<ul>
<li>Problem detection</li>
<li>Customer relationship improvement
<ul>
<li>(Those overlap when the customer 	relationship is bad)</li>
</ul>
</li>
<li>Financial statements on steroids</li>
</ul>
</li>
</ul>
<ul>
<li>Main kinds of analytics
<ul>
<li>What BI vendors traditionally sell
<ul>
<li>General reporting and dashboards</li>
<li>Ad-hoc query (now driven from 	those reports and dashboards)</li>
<li>Planning (allegedly integrated 	with BI)</li>
</ul>
</li>
<li>Research
<ul>
<li>Ad hoc relational query (worth 	mentioning twice because it drives so much of the market)</li>
<li>Data mining</li>
<li>Most web search and web mining</li>
</ul>
</li>
<li>Operational/near-real-time</li>
<li>Archiving/compliance</li>
</ul>
</li>
<li>What gets Big?
<ul>
<li>Mainly research and archiving</li>
<li>But when reporting or operational 	get Big, you have really interesting computing problems</li>
</ul>
</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Technology issues and trends</strong></p>
<ul>
<li>Moore&#8217;s Law
<ul>
<li>CPUs &#8212; All about cores, hence 	parallelism is key</li>
<li>RAM</li>
<li>SSDs – hence replace disks</li>
<li>Sensors – hence generate lots 	more data</li>
</ul>
</li>
<li>Kryder&#8217;s Law
<ul>
<li>But <a href="http://www.dbms2.com/2005/11/13/breaking-the-disk-speed-barrier/" >rotational speeds up only 	12.5X since Eisenhower Administration</a></li>
<li>Hence solid-state memory (or RAM) 	will soon take over</li>
</ul>
</li>
<li>In the mean time, I/O bottlenecks 	have had to be beaten
<ul>
<li>Hence sequential scans</li>
<li>Hence <a href="http://www.dbms2.com/2007/03/26/index-light-mpp-data-warehouse-appliances/" >index-light</a> architectures</li>
<li>Hence columnar</li>
</ul>
</li>
<li>DBMS “overhead”
<ul>
<li>Raw license and maintenance fees – 	software increasing fraction of total</li>
<li>OLTP vestiges – locking and all 	that</li>
<li>DBAs
<ul>
<li>People costs = huge fraction of 	total</li>
<li>Index-lightness addresses</li>
<li>So does appliance</li>
</ul>
</li>
<li>Many people don&#8217;t really know how to 	write SQL</li>
</ul>
</li>
<li>Configuration
<ul>
<li>Appliance/tightly-balanced
<ul>
<li>Netezza</li>
<li>Teradata earlier</li>
<li>Greenplum/Sun</li>
<li>Oracle</li>
<li>IBM</li>
<li>Microsoft/Madison</li>
</ul>
</li>
<li>Commodity/do what you want
<ul>
<li>Vertica</li>
<li>Greenplum now</li>
<li>Infobright, Aster and others</li>
<li>MapReduce-oriented file systems</li>
</ul>
</li>
<li><a href="http://www.dbms2.com/2009/10/25/data-warehouse-balanced-hardware-configuration/" >Extreme rigidity is silly</a>
<ul>
<li><a href="http://www.dbms2.com/2009/10/25/teradata-hardware-strategy-and-tactics/" >Teradata, Oracle have both 	signaled moving to more modularity</a></li>
<li>Big driver of that = heterogeneous 	storage
<ul>
<li>Cheap disk</li>
<li>Expensive disk</li>
<li>Solid-state</li>
<li>RAM</li>
</ul>
</li>
</ul>
<ul>
<li>CPU/storage ratio is even more of a 	driver</li>
</ul>
</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Theoretically defensible ways to segment the market</strong></p>
<ul>
<li><a href="http://www.dbms2.com/2009/09/10/analytic-speed-latency/" >Latency requirements</a>
<ul>
<li>High availability and low latency 	go together</li>
</ul>
</li>
<li>Query types
<ul>
<li>Simultaneous users for same</li>
</ul>
</li>
<li>Database size</li>
<li>Budget</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Actual segments right now</strong></p>
<ul>
<li><a href="http://www.dbms2.com/2009/08/24/teradatas-active-enterprise-data-warehouse-story/" >Utter ADW/EDW</a></li>
<li>Data mart
<ul>
<li>Size</li>
<li>Naturally columnar vs. naturally 	row-based</li>
</ul>
</li>
<li>Operational/frontline</li>
<li>Less dramatic/smaller EDW</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Calpont&#8217;s InfiniDB</title>
		<link>http://www.dbms2.com/2009/11/07/calponts-infinidb/</link>
		<comments>http://www.dbms2.com/2009/11/07/calponts-infinidb/#comments</comments>
		<pubDate>Sun, 08 Nov 2009 01:35:25 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Calpont]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Open source]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1207</guid>
		<description><![CDATA[Since its inception, Calpont has gone through multiple management teams, strategies, and investor groups. What it hadn&#8217;t done, ever, is actually shipped a product. Last week, however, Calpont introduced a free/open source DBMS, InfiniDB, with technical details somewhat reminiscent of what Calpont was promising last April. Highlights include:

Like Infobright, Calpont&#8217;s 	InfiniDB is a columnar DBMS [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Since its inception, Calpont has gone through multiple management teams, strategies, and investor groups. What it hadn&#8217;t done, ever, is actually shipped a product. Last week, however, Calpont introduced a free/open source DBMS, InfiniDB, with technical details somewhat reminiscent of <a href="../2009/04/20/calpont-update-you-read-it-here-first/">what Calpont was promising last April</a>. Highlights include:</p>
<ul>
<li>Like Infobright, Calpont&#8217;s 	InfiniDB is a columnar DBMS consisting of a MySQL front end and a 	columnar storage engine.</li>
<li>Community edition InfiniDB runs on 	a single server.</li>
<li>One of commercial/enterprise 	edition InfiniDB&#8217;s main claims to fame will be MPP support.</li>
<li>There&#8217;s no announced time frame 	for commercial edition InfiniDB.</li>
<li>InfiniDB&#8217;s current compression 	story is dictionary/token only, with decompression occurring  before 	joins are executed. Improvement is a roadmap item.</li>
<li>Indeed, InfiniDB has many roadmap 	items, a few of which can be found <a href="http://infinidb.org/resources/tech-articles/120-infinidb-community-edition-roadmap" onclick="javascript:pageTracker._trackPageview('/infinidb.org');">here</a>. 	Also, a great overview of InfiniDB&#8217;s current state and roadmap can 	be found in <a href="http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/" onclick="javascript:pageTracker._trackPageview('/www.mysqlperformanceblog.com');">this 	MySQL Performance Blog</a> thread. (And follow the links there to 	find performance discussions of other free analytic DBMS.)</li>
<li>One thing InfiniDB already has 	that is still a roadmap item for Infobright is the ability to run a 	query across multiple cores at once.</li>
<li>One thing free InfiniDB has that 	Infobright only offers in its Enterprise Edition is ACID-compliant 	Insert/Update/Delete. <em>(Note: I wish people would stop saying that Infobright Enterprise Edition isn&#8217;t ACID-compliant, since that point was cleared up <a href="http://www.dbms2.com/2009/04/20/infobright-update-3/" >a while ago</a>.)</em></li>
<li>InfiniDB has no indexes or 	materialized views.</li>
<li>However, InfiniDB&#8217;s retrieval is 	expedited by something called “Extents,” which sounds a lot like 	Netezza&#8217;s zone maps.</li>
</ul>
<p><em>Being on vacation, I&#8217;ll stop there for now. (If it weren&#8217;t for Tropical Storm/ depression Ida, I might not even be posting this much until I get back.)</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/11/07/calponts-infinidb/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Introduction to SenSage</title>
		<link>http://www.dbms2.com/2009/10/18/introduction-to-sensage/</link>
		<comments>http://www.dbms2.com/2009/10/18/introduction-to-sensage/#comments</comments>
		<pubDate>Sun, 18 Oct 2009 16:02:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[SenSage]]></category>
		<category><![CDATA[Telecommunications]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1115</guid>
		<description><![CDATA[I visited with SenSage on my two most recent trips to San Francisco. Both visits were, through no fault of SenSage&#8217;s, hasty.  Still, I think I have enough of a handle on SenSage basics to be worth writing up.
General SenSage highlights include:


SenSage used to be known as 	Addamark.
SenSage used to characterize 	itself as being [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I visited with SenSage on my two most recent trips to San Francisco. Both visits were, through no fault of SenSage&#8217;s, hasty.  Still, I think I have enough of a handle on SenSage basics to be worth writing up.</p>
<p style="margin-bottom: 0in;">General SenSage highlights include:</p>
<p><span id="more-1115"></span></p>
<ul>
<li>SenSage used to be known as 	Addamark.</li>
<li>SenSage used to characterize 	itself as being in the Security Information Management (SIM) market.</li>
<li>Now SenSage characterizes itself 	(approximately) as selling technology built around a columnar DBMS 	that happens to be pretty good at log analysis, compliance, and/or 	archiving.</li>
<li>More concisely, SenSage says it is 	in the <a href="http://sensage.com/company/index.php" onclick="javascript:pageTracker._trackPageview('/sensage.com');">event data 	warehouse</a> category.  (The same could arguably be said of 	<a href="http://www.dbms2.com/?p=1119" >Splunk</a>.)</li>
<li>SenSage says it has &gt;400 paying 	customers, of which ~200 are direct.</li>
<li>SenSage has &gt;120 employees and, 	like Splunk, is profitable.</li>
<li>SenSage has enjoyed &gt;50% annual 	revenue growth the past four years.</li>
<li>Some SenSage deals are in the 	multiple-million dollar range.</li>
<li>A major SenSage channel partner – 	dozens of installations &#8212; is SAP, which resells SenSage software on 	HP hardware is a “Compliance Log Warehouse.”</li>
<li>A hot market for SenSage is CDRs 	(Call Detail Records).</li>
<li>SenSage says that, among analytic 	DBMS vendors, it competes with Oracle, IBM, Teradata, Netezza and, 	to some extent, Vertica and Greenplum.</li>
</ul>
<p>Technical SenSage highlights include:</p>
<ul>
<li>SenSage&#8217;s core technology is an 	append-only columnar DBMS, with no master node.</li>
<li>SenSage&#8217;s DBMS uses no indexes and 	requires “no” database administration.</li>
<li>SenSage&#8217;s database is 	range-partitioned, with the range-partition key always being time.</li>
<li>SenSage has something it calls SQO 	(Sparse Query Optimization), which sounds a lot like Netezza zone 	maps. SQO never yields a false negative on whether data is in a 	block, never yields a false positive on equality predicates, and 	only rarely yields a false positive on range predicates.</li>
<li>SenSage&#8217;s database uses large 	block sizes – typically 250,000 records/block, at 200-250 bytes 	per record.  (That&#8217;s in the range of 64 megabytes/block.)</li>
<li>SenSage says its software can load 	10-50,000 records/second/node. If I&#8217;m doing the arithmetic 	correctly, that&#8217;s roughly 7-40 gigabytes/node/hour.</li>
<li>SenSage collects log data into its 	event data warehouse in what it characterizes as an agentless 	manner. Even so, it seems that for a majority of kinds of data 	sources one does have to write custom agents. The two other ways to 	get data into SenSage – and presumably most of the data volume 	comes through these – are:
<ul>
<li>File transfer in the usual way</li>
<li>syslog</li>
</ul>
</li>
<li>SenSage says its software can read 	100s of data sources, and that this is a huge competitive advantage. 	I&#8217;m not totally sure how that jibes with the prior point.</li>
<li>SenSage says it gets 5X 	compression on CDR data, 10-20X on other kinds of logs. That&#8217;s not 	too far off from <a href="../2008/09/24/vertica-finally-spells-out-its-compression-claims/">Vertica&#8217;s 	compression figures</a>.</li>
<li>SenSage says that it has 	datatype-aware compression as well as more standard stuff, with 	VARCHAR compressing particularly well.</li>
<li>In particular, SenSage uses both 	dictionary/token and delta compression.</li>
<li>SenSage&#8217;s software is pretty 	agnostic with respect to storage kind – DAS (Direct Attached 	Storage), SAN (Storage-Area Network), or content-addressable. In 	particular, there&#8217;s only about a 4% performance hit for using 	content-addressable storage.</li>
<li>When using WORM (Write Once Read 	Many) storage like EMC&#8217;s Centera, SenSage leaves record locator 	information behind on ordinary storage and otherwise queries the 	WORM storage just like it queries anything else.</li>
<li>SenSage says it has been using 	MapReduce since “Day 1”.</li>
<li>Probably not coincidentally, you 	can use Perl and other aggregates in SenSage SQL statements.</li>
<li>Perhaps also not coincidentally, 	SenSage says it has a number of advanced built-in analytic 	functions, including some focused on sessionization.</li>
</ul>
<p style="margin-bottom: 0in;">In addition to all that, SenSage offers a built-in event processing engine, consisting of:</p>
<ul>
<li>A finite-state machine correlation 	engine.</li>
<li>A proprietary event processing 	language.</li>
<li>A GUI to “abstract” (i.e., 	generate?) the event processing language.</li>
</ul>
<p style="margin-bottom: 0in;">The SenSage event processing engine is used to generate alerts. Data that comes into SenSage actually is passed to two places at once, namely to both the event processing engine and the database itself.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/18/introduction-to-sensage/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Kickfire capacity and pricing</title>
		<link>http://www.dbms2.com/2009/10/18/kickfire-capacity-and-pricing/</link>
		<comments>http://www.dbms2.com/2009/10/18/kickfire-capacity-and-pricing/#comments</comments>
		<pubDate>Sun, 18 Oct 2009 09:16:14 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Kickfire]]></category>
		<category><![CDATA[Pricing]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1109</guid>
		<description><![CDATA[Kickfire&#8217;s marketing communication efforts are still a work in progress. Kickfire did finally relax its secrecy about FPGA-vs.-custom-silicon – not coincidentally during Netezza&#8217;s recent publicity cycle. That wise choice helped Kickfire get some favorable attention recently for its technical and market strategy, e.g. from Daniel Abadi, Merv Adrian and, kicking things off &#8212; as it [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Kickfire&#8217;s marketing communication efforts are still a work in progress. Kickfire did finally relax its secrecy about FPGA-vs.-custom-silicon – not coincidentally during Netezza&#8217;s recent publicity cycle. That wise choice helped Kickfire get some favorable attention recently for its technical and market strategy, e.g. from <a href="http://dbmsmusings.blogspot.com/2009/09/kickfires-approach-to-parallelism.html" onclick="javascript:pageTracker._trackPageview('/dbmsmusings.blogspot.com');">Daniel Abadi</a>, <a href="http://mervadrian.wordpress.com/2009/10/06/kickfire-disrupts-dw-economics-targets-mainstream-adbms-opportunities/" onclick="javascript:pageTracker._trackPageview('/mervadrian.wordpress.com');">Merv Adrian</a> and, kicking things off &#8212; as it were &#8212; <a href="http://www.dbms2.com/2009/08/21/kickfires-fpga-based-technical-strategy/" >me</a>. Weeks after a recent Kickfire product release, there&#8217;s finally a fairly accurate <a href="http://www.kickfire.com/media/Datasheet_200910.pdf" onclick="javascript:pageTracker._trackPageview('/www.kickfire.com');">data sheet</a> up, although there&#8217;s still one self-defeatingly misleading line I&#8217;ll comment on below. Pricing is a whole other area of confusion, although it seems that current list prices have been inadvertently* leaked in Merv&#8217;s post linked above, with only one inaccuracy that I can detect.**</p>
<p style="margin-bottom: 0in;"><em>*I gather from the company that they forgot to tell Merv pricing was NDA. </em></p>
<p style="margin-bottom: 0in;"><em>** Merv cited a price as “starting” that I believe to be top-of-the-line. No criticism of Merv is implied in that; Kickfire has not been very clear in communicating hard numbers.</em></p>
<p style="margin-bottom: 0in; font-style: normal;">All that said, if one takes Kickfire&#8217;s marketing statements literally, Kickfire list pricing is around <strong>$20-50K per terabyte for a few small, fixed, high-performance configurations.</strong><span> That&#8217;s all-in, for plug-and-play appliances.  What&#8217;s more, that range is based on the actual published user data capacity numbers for various Kickfire models, which I think are low for several reasons:</span></p>
<ul>
<li><span>Kickfire 	doesn&#8217;t officially admit that its model with 14.4 terabytes of disk 	can manage more than 6 terabytes of data, even though it clearly 	can. </span></li>
<li><span>Actually, 	those 14.4 terabytes of disk can be increased or lowered as you 	choose.</span></li>
<li><span>The basic 	compression figures implied in those calculations seem conservative.</span></li>
<li><span>Compression 	figures are a lot more conservative yet, in that Kickfire assumes 	you&#8217;ll have a lot of actual indexes on your data. I&#8217;m not sure 	that&#8217;s necessary for most workloads.</span></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/18/kickfire-capacity-and-pricing/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Greenplum is going hybrid columnar as well</title>
		<link>http://www.dbms2.com/2009/10/14/greenplum-hybrid-columnar/</link>
		<comments>http://www.dbms2.com/2009/10/14/greenplum-hybrid-columnar/#comments</comments>
		<pubDate>Wed, 14 Oct 2009 05:36:25 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1083</guid>
		<description><![CDATA[Over 	the past summer, Vertica, VectorWise, and Oracle all announced flavors of hybrid row/columnar storage. Now 	it&#8217;s Greenplum&#8217;s turn.  Greenplum 	is actually offering true columnar storage, as opposed to Oracle&#8217;s 	PAX-like scheme &#8212; and also as opposed to the kind of Frankencolumn 	storage Daniel Abadi decries. For example, you don&#8217;t have to do 	a [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><span style="font-style: normal;">Over 	the past summer, <a href="../2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/">Vertica, VectorWise</a>, and <a href="../2009/09/03/oracle-11g-exadata-hybrid-columnar-compression/">Oracle</a> all an</span>nounced flavors of hybrid row/columnar storage. Now 	it&#8217;s Greenplum&#8217;s turn.  <span style="font-style: normal;">Greenplum 	is actually offering true columnar storage, as opposed to Oracle&#8217;s 	PAX-like scheme &#8212; and also as opposed to the kind of <a href="http://databasecolumn.vertica.com/2008/07/debunking-another-myth-columns.html" onclick="javascript:pageTracker._trackPageview('/databasecolumn.vertica.com');">Frankencolumn 	storage</a> Daniel Abadi decries. For example, you don&#8217;t have to do 	a join to retrieve multiple columns; you just ask for them and there 	they are. Similarly, Greenplum doesn&#8217;t maintain explicit row IDs – 	whether in row-oriented or column-oriented append-only storage – 	relying instead on block-level header information. <span id="more-1083"></span></span></p>
<p style="margin-bottom: 0in;">Highlights include:</p>
<ul>
<li>Column orientation is a special 	case of what Greenplum is calling <em>Polymorphic Data Storage.*</em><span style="font-style: normal;"> </span></li>
<li>As per product management chief 	Ben Werther&#8217;s bl<span style="font-style: normal;">og post, what 	<a href="http://www.greenplum.com/news/250/231/Beyond-Rows-and-Columns-Greenplum-s-Polymorphic-Data-Storage----Part-2/" onclick="javascript:pageTracker._trackPageview('/www.greenplum.com');">Greenplum&#8217;s 	polymorphic data storage</a> boils down to is that you can store 	different</span> tables in different storage paradigms. This is 	transparent to the SQL or any other API; it&#8217;s just a performance 	choice.</li>
<li>Indeed, Greenplum lets you store 	different partitions of the same table in different storage and/or 	compression schemes. So Greenplum now has a kind of ILM (Information 	Lifecycle Management) story, although it doesn&#8217;t offer the faster 	vs. cheaper storage media differentiation options of <a href="../2009/08/25/sybase-iq-technical-highlights/">Sybase 	IQ</a> or <a href="Good.%20%20Glad%20I%20was%20remembering%20correctly.%20:%29">Vertica</a>.</li>
<li><span style="font-style: normal;">Greenplum 	now has, depending on how one counts, three or four main types of 	table:</span>
<ul>
<li><span style="font-style: normal;">Traditional 	PostgreSQL, which has been available since Day One<br />
</span></li>
<li><span style="font-style: normal;">Row-oriented 	append-only (compressible and scan-optimized), available since 	Greenplum 3.2 (July, 2008)</span></li>
<li><span style="font-style: normal;">Columnar 	append-only (new in Greenplum 3.3.4, shipping now)</span></li>
<li><span style="font-style: normal;">External, 	in which Greenplum treats something external – in a relational 	DBMS or otherwise – as if it were a Greenplum table</span></li>
</ul>
</li>
<li><span style="font-style: normal;">Greenplum 	offers multiple versions of LZ (Lempel-Ziv) and gzip compression, 	any of which you can choose on a table-by-table or 	partition-by-partition basis. </span></li>
<li><span style="font-style: normal;">Greenplum 	offers the same compression algorithms for both row-oriented and 	column-oriented tables.</span></li>
<li><span style="font-style: normal;">Greenplum 	says that compression is typically at least 50% better (i.e., to 2/3 	as much space) in columnar vs. row storage, for the same algorithm. </span></li>
<li><span style="font-style: normal;">Just 	as it doesn&#8217;t offer columnar-specific compression algorithms, 	Greenplum also doesn&#8217;t sport other columnar features Daniel loves, 	such as <a href="http://databasecolumn.vertica.com/2008/12/debunking_yet_another_myth_col.html" onclick="javascript:pageTracker._trackPageview('/databasecolumn.vertica.com');">in-memory 	compression or late materialization</a>. (But then, <a href="../2009/08/04/vectorwise-ingres-and-monetdb/">VectorWise 	doesn&#8217;t do in-memory compression either</a>, and <a href="http://dbmsmusings.blogspot.com/2009/07/watch-out-for-vectorwise.html" onclick="javascript:pageTracker._trackPageview('/dbmsmusings.blogspot.com');">Daniel 	likes VectorWise</a>.)</span></li>
<li><span style="font-style: normal;">All 	the Greenplum choices I&#8217;ve mentioned have to be made manually by 	DBAs.</span></li>
<li><span style="font-style: normal;">Similarly, 	I doubt Greenplum can match Vertica&#8217;s engineering for getting 	updates and trickle feeds quickly into a column store – a 	traditional columnar Achilles heel that Vertica has invested a lot 	of effort to circumvent.</span></li>
</ul>
<p style="margin-bottom: 0in;"><em>*The term “polymorphic” is somewhat, shall we say, overloaded these days.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/14/greenplum-hybrid-columnar/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Oracle and Vertica on compression and other physical data layout features</title>
		<link>http://www.dbms2.com/2009/10/06/oracle-and-vertica-on-compression-and-other-physical-data-layout-features/</link>
		<comments>http://www.dbms2.com/2009/10/06/oracle-and-vertica-on-compression-and-other-physical-data-layout-features/#comments</comments>
		<pubDate>Tue, 06 Oct 2009 12:18:59 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1042</guid>
		<description><![CDATA[In my recent post on Exadata pricing, I highlighted the importance of Oracle&#8217;s compression figures to the discussion, and the uncertainty about same. This led to a Twitter discussion featuring Greg Rahn* of Oracle and Dave Menninger and Omer Trajman of Vertica.  I also followed up with Omer on the phone.
*Guys like Greg Rahn and [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">In my recent post on <a href="http://www.dbms2.com/2009/10/05/oracle-exadata-2-capacity-pricing/" >Exadata pricing</a>, I highlighted the importance of Oracle&#8217;s compression figures to the discussion, and the uncertainty about same. This led to a Twitter discussion featuring <a href="http://twitter.com/GregRahn" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Greg Rahn</a>* of Oracle and <a href="http://twitter.com/dmenninger" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Dave Menninger</a> and <a href="http://twitter.com/otrajman" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Omer Trajman</a> of Vertica.  I also followed up with Omer on the phone.<span id="more-1042"></span></p>
<p style="margin-bottom: 0in;"><em>*Guys like Greg Rahn and Kevin Closson are huge assets to Oracle, which is absurdly and self-defeatingly unhelpful through conventional public/analyst relations channels.<br />
</em>
</p>
<p style="margin-bottom: 0in; font-style: normal;"><a href="http://twitter.com/GregRahn/status/4611513531" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Six</a> <a href="http://twitter.com/GregRahn/status/4612142101" onclick="javascript:pageTracker._trackPageview('/twitter.com');">key</a> <a href="http://twitter.com/GregRahn/status/4612190133" onclick="javascript:pageTracker._trackPageview('/twitter.com');">tweets</a> <a href="http://twitter.com/GregRahn/status/4612253629" onclick="javascript:pageTracker._trackPageview('/twitter.com');">by</a> <a href="http://twitter.com/GregRahn/status/4612966887" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Greg</a> <a href="http://twitter.com/GregRahn/status/4613110620" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Rahn</a> said:</p>
<blockquote>
<p style="margin-bottom: 0in; font-style: normal;">I think the HCC 10x compression is a slideware (common) number. Personally I&#8217;ve seen it in the 12-17x range on customer data&#8230;</p>
<p style="margin-bottom: 0in; font-style: normal;">This was on a dimensional model. Can&#8217;t speak to the specific industry. I do believe Oracle is working on getting industry #s.</p>
<p style="margin-bottom: 0in; font-style: normal;">As far as I know, Exadata HCC uses a superset of compression algorithms that the commonly known column stores use&#8230;</p>
<p style="margin-bottom: 0in; font-style: normal;">&#8230;and it doesn&#8217;t require the compression type be in the DDL like Vertica or ParAccel. It figures out the best algo to apply.</p>
<p style="margin-bottom: 0in; font-style: normal;">The compression number I quoted is sizeof(uncompressed)/sizeof(hcc compressed). No indexes were used in this case.</p>
<p style="margin-bottom: 0in; font-style: normal;">Exadata HCC is applicable for bulk loaded (fact table) data, so a significant portion (size wise) of most DWs.</p>
</blockquote>
<p style="margin-bottom: 0in; font-style: normal;">Summing up, that seems to say:</p>
<ul>
<li>Oracle claims 	12-17X compression on a kind of data similar to that on which 	Vertica &#8212; which also uses 10X as a single-point overall compression 	marketing estimate where needed &#8212; claims 20X.</li>
<li>Oracle selects 	compression algorithms automagically.</li>
<li>Oracle&#8217;s 	compression doesn&#8217;t quite apply to all the data. Actually, this may 	be more of an issue for the caching benefits of compression than for 	the I/O or disk storage gains. (If you join a retail transaction 	fact table to a customer dimension table, and you have a lot of 	customers, fitting the uncompressed customer table into RAM could be 	problematic.)</li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;">Omer and I happened to have a call scheduled to discuss MapReduce yesterday evening, but wound up using most of the time to talk about Vertica&#8217;s compression and physical layout features instead. Highlights included:</p>
<ul>
<li>Greg, like 	many Vertica competitors, was wrong about Vertica requiring manual, 	low-level DDL (Data Description Language) for &#8212; well, for much of 	anything. Vertica does all that automatically, at least in theory, 	and suggests that in real life you can indeed often get by without 	manual intervention.</li>
<li>Vertica can do 	trickle feeds into its compressed columnar storage. Greg seemed to 	suggest Oracle Exadata can not. (However, I won&#8217;t be surprised if, 	when his comments are expanded to more than 140 characters, he winds 	up saying the opposite. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  )</li>
<li>Omer 	characterized the lowest latency with which you can get data into 	Vertica and have it be available for query as &#8220;seconds&#8221;, 	vs. &#8220;minutes&#8221; for other columnar vendors.</li>
<li>Vertica 	recommends often keeping multiple copies of a column, for high 	availability and/or performance. This is not directly reflected in 	compression estimates.  In particular, if you&#8217;re going to keep 	redundant copies of data for data-safety reasons anyway, Vertica 	recommends that you:
<ul>
<li>Run queries 	against more than one copy of the data, for performance/throughput.</li>
<li>Store 	different copies of the columns in different sort orders &#8212; e.g., 	according to different likely join keys &#8212; so that the copies are 	optimized for performance on different classes of queries.</li>
</ul>
</li>
<li>Vertica 	doesn&#8217;t have indexes.</li>
<li>Vertica sorts 	columns on ingest. This sorting is, of course, commonly based on 	attributes from columns other than the one being sorted. Even so, 	Omer maintains that sorting helps compression, because of the 	correlation between columns. Examples (and I didn&#8217;t get these all 	from him) might include:
<ul>
<li>City/postal 	code</li>
<li>Customer_ID/store 	location</li>
<li>Customer_ID/product_ID</li>
<li>Product_ID/price</li>
</ul>
</li>
<li>Vertica, based 	on the recent introduction of <a href="../2009/08/25/sybase-iq-technical-highlights/">FlexStore</a>, 	has an ILM (Information Lifecycle Management) story much like <a href="../2009/08/25/sybase-iq-technical-highlights/">Sybase 	IQ&#8217;s</a>. E.g., you can keep different data ranges for different 	columns on fast storage, while the rest of the data is relegated to 	slower/cheaper equipment.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/06/oracle-and-vertica-on-compression-and-other-physical-data-layout-features/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 0.371 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2010-03-18 11:57:45 -->
