<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS2 -- DataBase Management System Services &#187; Database compression</title>
	<atom:link href="http://www.dbms2.com/category/database-theory-practice/database-compression/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Sun, 14 Mar 2010 23:24:45 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>There sure seem to be a lot of inaccuracies on ParAccel&#8217;s website</title>
		<link>http://www.dbms2.com/2010/01/15/there-sure-seem-to-be-a-lot-of-inaccuracies-on-paraccels-website/</link>
		<comments>http://www.dbms2.com/2010/01/15/there-sure-seem-to-be-a-lot-of-inaccuracies-on-paraccels-website/#comments</comments>
		<pubDate>Fri, 15 Jan 2010 04:47:00 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Telecommunications]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1393</guid>
		<description><![CDATA[In what is actually an interesting post on database compression, ParAccel CTO Barry Zane threw in
Anyone who has met with us knows ParAccel shies away from hype.
But like many things ParAccel says, that is not true.
The latest whoppers came in the form of several customers ParAccel listed on its website who hadn&#8217;t actually bought ParAccel&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>In what is actually an <a href="http://paraccel.com/data_warehouse_blog/?p=192" onclick="javascript:pageTracker._trackPageview('/paraccel.com');">interesting post on database compression</a>, ParAccel CTO Barry Zane threw in</p>
<blockquote><p>Anyone who has met with us knows ParAccel shies away from hype.</p></blockquote>
<p>But like many things ParAccel says, that is not true.</p>
<p>The latest whoppers came in the form of several customers ParAccel listed on its website who hadn&#8217;t actually bought ParAccel&#8217;s DBMS, nor even decided to do so. It is fairly common to to claim a customer win, then retract the claim due to lack of permission to disclose. But that&#8217;s not what happened in these cases. Based on emails helpfully shared by a ParAccel competitor competing in some of those accounts, it seems clear that <strong>ParAccel actually posted fabricated claims of customer wins.</strong> <span id="more-1393"></span></p>
<p>Another thing that was both technically and substantively false was ParAccel&#8217;s claim to be <a href="http://www.dbms2.com/2009/09/30/facts-and-rumors/" >CERTIFIED price-performance leader</a>. Obviously, this was meant to give the impression that ParAccel had been &#8220;certified&#8221; as the leader in price/performance, when the closest thing to that that was remotely true was that ParAccel had a leading position in the category of &#8220;price/performance measurements that happen to have a certification process.&#8221; At least, that was true for a short time; then ParAccel&#8217;s certification was found to have been erroneous, and got revoked, which did not however inspire ParAccel to immediately take the claim off the front page of its website.</p>
<p>ParAccel&#8217;s website also reflects a lot of praise from flagship customer LatiNode. What it perhaps understandably neglects to mention is that LatiNode is in a <a href="http://www.pepperlaw.com/publications_update.aspx?ArticleKey=1651" onclick="javascript:pageTracker._trackPageview('/www.pepperlaw.com');">dormant state</a>, placed there by acquirer Elandia due to LatiNode&#8217;s criminally corrupt customer acquisition practices.</p>
<p>I also don&#8217;t believe ParAccel&#8217;s endlessly-repeated claim that is has never lost a benchmark on performance. However, I must in fairness note that while I&#8217;ve been given names of customers who are supposed counterexamples to this claim by somebody I trust, I&#8217;ve never been able to actually verify those supposed ParAccel losses.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/01/15/there-sure-seem-to-be-a-lot-of-inaccuracies-on-paraccels-website/feed/</wfw:commentRss>
		<slash:comments>22</slash:comments>
		</item>
		<item>
		<title>Calpont&#8217;s InfiniDB</title>
		<link>http://www.dbms2.com/2009/11/07/calponts-infinidb/</link>
		<comments>http://www.dbms2.com/2009/11/07/calponts-infinidb/#comments</comments>
		<pubDate>Sun, 08 Nov 2009 01:35:25 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Calpont]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Open source]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1207</guid>
		<description><![CDATA[Since its inception, Calpont has gone through multiple management teams, strategies, and investor groups. What it hadn&#8217;t done, ever, is actually shipped a product. Last week, however, Calpont introduced a free/open source DBMS, InfiniDB, with technical details somewhat reminiscent of what Calpont was promising last April. Highlights include:

Like Infobright, Calpont&#8217;s 	InfiniDB is a columnar DBMS [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Since its inception, Calpont has gone through multiple management teams, strategies, and investor groups. What it hadn&#8217;t done, ever, is actually shipped a product. Last week, however, Calpont introduced a free/open source DBMS, InfiniDB, with technical details somewhat reminiscent of <a href="../2009/04/20/calpont-update-you-read-it-here-first/">what Calpont was promising last April</a>. Highlights include:</p>
<ul>
<li>Like Infobright, Calpont&#8217;s 	InfiniDB is a columnar DBMS consisting of a MySQL front end and a 	columnar storage engine.</li>
<li>Community edition InfiniDB runs on 	a single server.</li>
<li>One of commercial/enterprise 	edition InfiniDB&#8217;s main claims to fame will be MPP support.</li>
<li>There&#8217;s no announced time frame 	for commercial edition InfiniDB.</li>
<li>InfiniDB&#8217;s current compression 	story is dictionary/token only, with decompression occurring  before 	joins are executed. Improvement is a roadmap item.</li>
<li>Indeed, InfiniDB has many roadmap 	items, a few of which can be found <a href="http://infinidb.org/resources/tech-articles/120-infinidb-community-edition-roadmap" onclick="javascript:pageTracker._trackPageview('/infinidb.org');">here</a>. 	Also, a great overview of InfiniDB&#8217;s current state and roadmap can 	be found in <a href="http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/" onclick="javascript:pageTracker._trackPageview('/www.mysqlperformanceblog.com');">this 	MySQL Performance Blog</a> thread. (And follow the links there to 	find performance discussions of other free analytic DBMS.)</li>
<li>One thing InfiniDB already has 	that is still a roadmap item for Infobright is the ability to run a 	query across multiple cores at once.</li>
<li>One thing free InfiniDB has that 	Infobright only offers in its Enterprise Edition is ACID-compliant 	Insert/Update/Delete. <em>(Note: I wish people would stop saying that Infobright Enterprise Edition isn&#8217;t ACID-compliant, since that point was cleared up <a href="http://www.dbms2.com/2009/04/20/infobright-update-3/" >a while ago</a>.)</em></li>
<li>InfiniDB has no indexes or 	materialized views.</li>
<li>However, InfiniDB&#8217;s retrieval is 	expedited by something called “Extents,” which sounds a lot like 	Netezza&#8217;s zone maps.</li>
</ul>
<p><em>Being on vacation, I&#8217;ll stop there for now. (If it weren&#8217;t for Tropical Storm/ depression Ida, I might not even be posting this much until I get back.)</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/11/07/calponts-infinidb/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Introduction to SenSage</title>
		<link>http://www.dbms2.com/2009/10/18/introduction-to-sensage/</link>
		<comments>http://www.dbms2.com/2009/10/18/introduction-to-sensage/#comments</comments>
		<pubDate>Sun, 18 Oct 2009 16:02:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[SenSage]]></category>
		<category><![CDATA[Telecommunications]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1115</guid>
		<description><![CDATA[I visited with SenSage on my two most recent trips to San Francisco. Both visits were, through no fault of SenSage&#8217;s, hasty.  Still, I think I have enough of a handle on SenSage basics to be worth writing up.
General SenSage highlights include:


SenSage used to be known as 	Addamark.
SenSage used to characterize 	itself as being [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I visited with SenSage on my two most recent trips to San Francisco. Both visits were, through no fault of SenSage&#8217;s, hasty.  Still, I think I have enough of a handle on SenSage basics to be worth writing up.</p>
<p style="margin-bottom: 0in;">General SenSage highlights include:</p>
<p><span id="more-1115"></span></p>
<ul>
<li>SenSage used to be known as 	Addamark.</li>
<li>SenSage used to characterize 	itself as being in the Security Information Management (SIM) market.</li>
<li>Now SenSage characterizes itself 	(approximately) as selling technology built around a columnar DBMS 	that happens to be pretty good at log analysis, compliance, and/or 	archiving.</li>
<li>More concisely, SenSage says it is 	in the <a href="http://sensage.com/company/index.php" onclick="javascript:pageTracker._trackPageview('/sensage.com');">event data 	warehouse</a> category.  (The same could arguably be said of 	<a href="http://www.dbms2.com/?p=1119" >Splunk</a>.)</li>
<li>SenSage says it has &gt;400 paying 	customers, of which ~200 are direct.</li>
<li>SenSage has &gt;120 employees and, 	like Splunk, is profitable.</li>
<li>SenSage has enjoyed &gt;50% annual 	revenue growth the past four years.</li>
<li>Some SenSage deals are in the 	multiple-million dollar range.</li>
<li>A major SenSage channel partner – 	dozens of installations &#8212; is SAP, which resells SenSage software on 	HP hardware is a “Compliance Log Warehouse.”</li>
<li>A hot market for SenSage is CDRs 	(Call Detail Records).</li>
<li>SenSage says that, among analytic 	DBMS vendors, it competes with Oracle, IBM, Teradata, Netezza and, 	to some extent, Vertica and Greenplum.</li>
</ul>
<p>Technical SenSage highlights include:</p>
<ul>
<li>SenSage&#8217;s core technology is an 	append-only columnar DBMS, with no master node.</li>
<li>SenSage&#8217;s DBMS uses no indexes and 	requires “no” database administration.</li>
<li>SenSage&#8217;s database is 	range-partitioned, with the range-partition key always being time.</li>
<li>SenSage has something it calls SQO 	(Sparse Query Optimization), which sounds a lot like Netezza zone 	maps. SQO never yields a false negative on whether data is in a 	block, never yields a false positive on equality predicates, and 	only rarely yields a false positive on range predicates.</li>
<li>SenSage&#8217;s database uses large 	block sizes – typically 250,000 records/block, at 200-250 bytes 	per record.  (That&#8217;s in the range of 64 megabytes/block.)</li>
<li>SenSage says its software can load 	10-50,000 records/second/node. If I&#8217;m doing the arithmetic 	correctly, that&#8217;s roughly 7-40 gigabytes/node/hour.</li>
<li>SenSage collects log data into its 	event data warehouse in what it characterizes as an agentless 	manner. Even so, it seems that for a majority of kinds of data 	sources one does have to write custom agents. The two other ways to 	get data into SenSage – and presumably most of the data volume 	comes through these – are:
<ul>
<li>File transfer in the usual way</li>
<li>syslog</li>
</ul>
</li>
<li>SenSage says its software can read 	100s of data sources, and that this is a huge competitive advantage. 	I&#8217;m not totally sure how that jibes with the prior point.</li>
<li>SenSage says it gets 5X 	compression on CDR data, 10-20X on other kinds of logs. That&#8217;s not 	too far off from <a href="../2008/09/24/vertica-finally-spells-out-its-compression-claims/">Vertica&#8217;s 	compression figures</a>.</li>
<li>SenSage says that it has 	datatype-aware compression as well as more standard stuff, with 	VARCHAR compressing particularly well.</li>
<li>In particular, SenSage uses both 	dictionary/token and delta compression.</li>
<li>SenSage&#8217;s software is pretty 	agnostic with respect to storage kind – DAS (Direct Attached 	Storage), SAN (Storage-Area Network), or content-addressable. In 	particular, there&#8217;s only about a 4% performance hit for using 	content-addressable storage.</li>
<li>When using WORM (Write Once Read 	Many) storage like EMC&#8217;s Centera, SenSage leaves record locator 	information behind on ordinary storage and otherwise queries the 	WORM storage just like it queries anything else.</li>
<li>SenSage says it has been using 	MapReduce since “Day 1”.</li>
<li>Probably not coincidentally, you 	can use Perl and other aggregates in SenSage SQL statements.</li>
<li>Perhaps also not coincidentally, 	SenSage says it has a number of advanced built-in analytic 	functions, including some focused on sessionization.</li>
</ul>
<p style="margin-bottom: 0in;">In addition to all that, SenSage offers a built-in event processing engine, consisting of:</p>
<ul>
<li>A finite-state machine correlation 	engine.</li>
<li>A proprietary event processing 	language.</li>
<li>A GUI to “abstract” (i.e., 	generate?) the event processing language.</li>
</ul>
<p style="margin-bottom: 0in;">The SenSage event processing engine is used to generate alerts. Data that comes into SenSage actually is passed to two places at once, namely to both the event processing engine and the database itself.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/18/introduction-to-sensage/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Kickfire capacity and pricing</title>
		<link>http://www.dbms2.com/2009/10/18/kickfire-capacity-and-pricing/</link>
		<comments>http://www.dbms2.com/2009/10/18/kickfire-capacity-and-pricing/#comments</comments>
		<pubDate>Sun, 18 Oct 2009 09:16:14 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Kickfire]]></category>
		<category><![CDATA[Pricing]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1109</guid>
		<description><![CDATA[Kickfire&#8217;s marketing communication efforts are still a work in progress. Kickfire did finally relax its secrecy about FPGA-vs.-custom-silicon – not coincidentally during Netezza&#8217;s recent publicity cycle. That wise choice helped Kickfire get some favorable attention recently for its technical and market strategy, e.g. from Daniel Abadi, Merv Adrian and, kicking things off &#8212; as it [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Kickfire&#8217;s marketing communication efforts are still a work in progress. Kickfire did finally relax its secrecy about FPGA-vs.-custom-silicon – not coincidentally during Netezza&#8217;s recent publicity cycle. That wise choice helped Kickfire get some favorable attention recently for its technical and market strategy, e.g. from <a href="http://dbmsmusings.blogspot.com/2009/09/kickfires-approach-to-parallelism.html" onclick="javascript:pageTracker._trackPageview('/dbmsmusings.blogspot.com');">Daniel Abadi</a>, <a href="http://mervadrian.wordpress.com/2009/10/06/kickfire-disrupts-dw-economics-targets-mainstream-adbms-opportunities/" onclick="javascript:pageTracker._trackPageview('/mervadrian.wordpress.com');">Merv Adrian</a> and, kicking things off &#8212; as it were &#8212; <a href="http://www.dbms2.com/2009/08/21/kickfires-fpga-based-technical-strategy/" >me</a>. Weeks after a recent Kickfire product release, there&#8217;s finally a fairly accurate <a href="http://www.kickfire.com/media/Datasheet_200910.pdf" onclick="javascript:pageTracker._trackPageview('/www.kickfire.com');">data sheet</a> up, although there&#8217;s still one self-defeatingly misleading line I&#8217;ll comment on below. Pricing is a whole other area of confusion, although it seems that current list prices have been inadvertently* leaked in Merv&#8217;s post linked above, with only one inaccuracy that I can detect.**</p>
<p style="margin-bottom: 0in;"><em>*I gather from the company that they forgot to tell Merv pricing was NDA. </em></p>
<p style="margin-bottom: 0in;"><em>** Merv cited a price as “starting” that I believe to be top-of-the-line. No criticism of Merv is implied in that; Kickfire has not been very clear in communicating hard numbers.</em></p>
<p style="margin-bottom: 0in; font-style: normal;">All that said, if one takes Kickfire&#8217;s marketing statements literally, Kickfire list pricing is around <strong>$20-50K per terabyte for a few small, fixed, high-performance configurations.</strong><span> That&#8217;s all-in, for plug-and-play appliances.  What&#8217;s more, that range is based on the actual published user data capacity numbers for various Kickfire models, which I think are low for several reasons:</span></p>
<ul>
<li><span>Kickfire 	doesn&#8217;t officially admit that its model with 14.4 terabytes of disk 	can manage more than 6 terabytes of data, even though it clearly 	can. </span></li>
<li><span>Actually, 	those 14.4 terabytes of disk can be increased or lowered as you 	choose.</span></li>
<li><span>The basic 	compression figures implied in those calculations seem conservative.</span></li>
<li><span>Compression 	figures are a lot more conservative yet, in that Kickfire assumes 	you&#8217;ll have a lot of actual indexes on your data. I&#8217;m not sure 	that&#8217;s necessary for most workloads.</span></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/18/kickfire-capacity-and-pricing/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Greenplum is going hybrid columnar as well</title>
		<link>http://www.dbms2.com/2009/10/14/greenplum-hybrid-columnar/</link>
		<comments>http://www.dbms2.com/2009/10/14/greenplum-hybrid-columnar/#comments</comments>
		<pubDate>Wed, 14 Oct 2009 05:36:25 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1083</guid>
		<description><![CDATA[Over 	the past summer, Vertica, VectorWise, and Oracle all announced flavors of hybrid row/columnar storage. Now 	it&#8217;s Greenplum&#8217;s turn.  Greenplum 	is actually offering true columnar storage, as opposed to Oracle&#8217;s 	PAX-like scheme &#8212; and also as opposed to the kind of Frankencolumn 	storage Daniel Abadi decries. For example, you don&#8217;t have to do 	a [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><span style="font-style: normal;">Over 	the past summer, <a href="../2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/">Vertica, VectorWise</a>, and <a href="../2009/09/03/oracle-11g-exadata-hybrid-columnar-compression/">Oracle</a> all an</span>nounced flavors of hybrid row/columnar storage. Now 	it&#8217;s Greenplum&#8217;s turn.  <span style="font-style: normal;">Greenplum 	is actually offering true columnar storage, as opposed to Oracle&#8217;s 	PAX-like scheme &#8212; and also as opposed to the kind of <a href="http://databasecolumn.vertica.com/2008/07/debunking-another-myth-columns.html" onclick="javascript:pageTracker._trackPageview('/databasecolumn.vertica.com');">Frankencolumn 	storage</a> Daniel Abadi decries. For example, you don&#8217;t have to do 	a join to retrieve multiple columns; you just ask for them and there 	they are. Similarly, Greenplum doesn&#8217;t maintain explicit row IDs – 	whether in row-oriented or column-oriented append-only storage – 	relying instead on block-level header information. <span id="more-1083"></span></span></p>
<p style="margin-bottom: 0in;">Highlights include:</p>
<ul>
<li>Column orientation is a special 	case of what Greenplum is calling <em>Polymorphic Data Storage.*</em><span style="font-style: normal;"> </span></li>
<li>As per product management chief 	Ben Werther&#8217;s bl<span style="font-style: normal;">og post, what 	<a href="http://www.greenplum.com/news/250/231/Beyond-Rows-and-Columns-Greenplum-s-Polymorphic-Data-Storage----Part-2/" onclick="javascript:pageTracker._trackPageview('/www.greenplum.com');">Greenplum&#8217;s 	polymorphic data storage</a> boils down to is that you can store 	different</span> tables in different storage paradigms. This is 	transparent to the SQL or any other API; it&#8217;s just a performance 	choice.</li>
<li>Indeed, Greenplum lets you store 	different partitions of the same table in different storage and/or 	compression schemes. So Greenplum now has a kind of ILM (Information 	Lifecycle Management) story, although it doesn&#8217;t offer the faster 	vs. cheaper storage media differentiation options of <a href="../2009/08/25/sybase-iq-technical-highlights/">Sybase 	IQ</a> or <a href="Good.%20%20Glad%20I%20was%20remembering%20correctly.%20:%29">Vertica</a>.</li>
<li><span style="font-style: normal;">Greenplum 	now has, depending on how one counts, three or four main types of 	table:</span>
<ul>
<li><span style="font-style: normal;">Traditional 	PostgreSQL, which has been available since Day One<br />
</span></li>
<li><span style="font-style: normal;">Row-oriented 	append-only (compressible and scan-optimized), available since 	Greenplum 3.2 (July, 2008)</span></li>
<li><span style="font-style: normal;">Columnar 	append-only (new in Greenplum 3.3.4, shipping now)</span></li>
<li><span style="font-style: normal;">External, 	in which Greenplum treats something external – in a relational 	DBMS or otherwise – as if it were a Greenplum table</span></li>
</ul>
</li>
<li><span style="font-style: normal;">Greenplum 	offers multiple versions of LZ (Lempel-Ziv) and gzip compression, 	any of which you can choose on a table-by-table or 	partition-by-partition basis. </span></li>
<li><span style="font-style: normal;">Greenplum 	offers the same compression algorithms for both row-oriented and 	column-oriented tables.</span></li>
<li><span style="font-style: normal;">Greenplum 	says that compression is typically at least 50% better (i.e., to 2/3 	as much space) in columnar vs. row storage, for the same algorithm. </span></li>
<li><span style="font-style: normal;">Just 	as it doesn&#8217;t offer columnar-specific compression algorithms, 	Greenplum also doesn&#8217;t sport other columnar features Daniel loves, 	such as <a href="http://databasecolumn.vertica.com/2008/12/debunking_yet_another_myth_col.html" onclick="javascript:pageTracker._trackPageview('/databasecolumn.vertica.com');">in-memory 	compression or late materialization</a>. (But then, <a href="../2009/08/04/vectorwise-ingres-and-monetdb/">VectorWise 	doesn&#8217;t do in-memory compression either</a>, and <a href="http://dbmsmusings.blogspot.com/2009/07/watch-out-for-vectorwise.html" onclick="javascript:pageTracker._trackPageview('/dbmsmusings.blogspot.com');">Daniel 	likes VectorWise</a>.)</span></li>
<li><span style="font-style: normal;">All 	the Greenplum choices I&#8217;ve mentioned have to be made manually by 	DBAs.</span></li>
<li><span style="font-style: normal;">Similarly, 	I doubt Greenplum can match Vertica&#8217;s engineering for getting 	updates and trickle feeds quickly into a column store – a 	traditional columnar Achilles heel that Vertica has invested a lot 	of effort to circumvent.</span></li>
</ul>
<p style="margin-bottom: 0in;"><em>*The term “polymorphic” is somewhat, shall we say, overloaded these days.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/14/greenplum-hybrid-columnar/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Oracle and Vertica on compression and other physical data layout features</title>
		<link>http://www.dbms2.com/2009/10/06/oracle-and-vertica-on-compression-and-other-physical-data-layout-features/</link>
		<comments>http://www.dbms2.com/2009/10/06/oracle-and-vertica-on-compression-and-other-physical-data-layout-features/#comments</comments>
		<pubDate>Tue, 06 Oct 2009 12:18:59 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1042</guid>
		<description><![CDATA[In my recent post on Exadata pricing, I highlighted the importance of Oracle&#8217;s compression figures to the discussion, and the uncertainty about same. This led to a Twitter discussion featuring Greg Rahn* of Oracle and Dave Menninger and Omer Trajman of Vertica.  I also followed up with Omer on the phone.
*Guys like Greg Rahn and [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">In my recent post on <a href="http://www.dbms2.com/2009/10/05/oracle-exadata-2-capacity-pricing/" >Exadata pricing</a>, I highlighted the importance of Oracle&#8217;s compression figures to the discussion, and the uncertainty about same. This led to a Twitter discussion featuring <a href="http://twitter.com/GregRahn" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Greg Rahn</a>* of Oracle and <a href="http://twitter.com/dmenninger" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Dave Menninger</a> and <a href="http://twitter.com/otrajman" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Omer Trajman</a> of Vertica.  I also followed up with Omer on the phone.<span id="more-1042"></span></p>
<p style="margin-bottom: 0in;"><em>*Guys like Greg Rahn and Kevin Closson are huge assets to Oracle, which is absurdly and self-defeatingly unhelpful through conventional public/analyst relations channels.<br />
</em>
</p>
<p style="margin-bottom: 0in; font-style: normal;"><a href="http://twitter.com/GregRahn/status/4611513531" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Six</a> <a href="http://twitter.com/GregRahn/status/4612142101" onclick="javascript:pageTracker._trackPageview('/twitter.com');">key</a> <a href="http://twitter.com/GregRahn/status/4612190133" onclick="javascript:pageTracker._trackPageview('/twitter.com');">tweets</a> <a href="http://twitter.com/GregRahn/status/4612253629" onclick="javascript:pageTracker._trackPageview('/twitter.com');">by</a> <a href="http://twitter.com/GregRahn/status/4612966887" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Greg</a> <a href="http://twitter.com/GregRahn/status/4613110620" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Rahn</a> said:</p>
<blockquote>
<p style="margin-bottom: 0in; font-style: normal;">I think the HCC 10x compression is a slideware (common) number. Personally I&#8217;ve seen it in the 12-17x range on customer data&#8230;</p>
<p style="margin-bottom: 0in; font-style: normal;">This was on a dimensional model. Can&#8217;t speak to the specific industry. I do believe Oracle is working on getting industry #s.</p>
<p style="margin-bottom: 0in; font-style: normal;">As far as I know, Exadata HCC uses a superset of compression algorithms that the commonly known column stores use&#8230;</p>
<p style="margin-bottom: 0in; font-style: normal;">&#8230;and it doesn&#8217;t require the compression type be in the DDL like Vertica or ParAccel. It figures out the best algo to apply.</p>
<p style="margin-bottom: 0in; font-style: normal;">The compression number I quoted is sizeof(uncompressed)/sizeof(hcc compressed). No indexes were used in this case.</p>
<p style="margin-bottom: 0in; font-style: normal;">Exadata HCC is applicable for bulk loaded (fact table) data, so a significant portion (size wise) of most DWs.</p>
</blockquote>
<p style="margin-bottom: 0in; font-style: normal;">Summing up, that seems to say:</p>
<ul>
<li>Oracle claims 	12-17X compression on a kind of data similar to that on which 	Vertica &#8212; which also uses 10X as a single-point overall compression 	marketing estimate where needed &#8212; claims 20X.</li>
<li>Oracle selects 	compression algorithms automagically.</li>
<li>Oracle&#8217;s 	compression doesn&#8217;t quite apply to all the data. Actually, this may 	be more of an issue for the caching benefits of compression than for 	the I/O or disk storage gains. (If you join a retail transaction 	fact table to a customer dimension table, and you have a lot of 	customers, fitting the uncompressed customer table into RAM could be 	problematic.)</li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;">Omer and I happened to have a call scheduled to discuss MapReduce yesterday evening, but wound up using most of the time to talk about Vertica&#8217;s compression and physical layout features instead. Highlights included:</p>
<ul>
<li>Greg, like 	many Vertica competitors, was wrong about Vertica requiring manual, 	low-level DDL (Data Description Language) for &#8212; well, for much of 	anything. Vertica does all that automatically, at least in theory, 	and suggests that in real life you can indeed often get by without 	manual intervention.</li>
<li>Vertica can do 	trickle feeds into its compressed columnar storage. Greg seemed to 	suggest Oracle Exadata can not. (However, I won&#8217;t be surprised if, 	when his comments are expanded to more than 140 characters, he winds 	up saying the opposite. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  )</li>
<li>Omer 	characterized the lowest latency with which you can get data into 	Vertica and have it be available for query as &#8220;seconds&#8221;, 	vs. &#8220;minutes&#8221; for other columnar vendors.</li>
<li>Vertica 	recommends often keeping multiple copies of a column, for high 	availability and/or performance. This is not directly reflected in 	compression estimates.  In particular, if you&#8217;re going to keep 	redundant copies of data for data-safety reasons anyway, Vertica 	recommends that you:
<ul>
<li>Run queries 	against more than one copy of the data, for performance/throughput.</li>
<li>Store 	different copies of the columns in different sort orders &#8212; e.g., 	according to different likely join keys &#8212; so that the copies are 	optimized for performance on different classes of queries.</li>
</ul>
</li>
<li>Vertica 	doesn&#8217;t have indexes.</li>
<li>Vertica sorts 	columns on ingest. This sorting is, of course, commonly based on 	attributes from columns other than the one being sorted. Even so, 	Omer maintains that sorting helps compression, because of the 	correlation between columns. Examples (and I didn&#8217;t get these all 	from him) might include:
<ul>
<li>City/postal 	code</li>
<li>Customer_ID/store 	location</li>
<li>Customer_ID/product_ID</li>
<li>Product_ID/price</li>
</ul>
</li>
<li>Vertica, based 	on the recent introduction of <a href="../2009/08/25/sybase-iq-technical-highlights/">FlexStore</a>, 	has an ILM (Information Lifecycle Management) story much like <a href="../2009/08/25/sybase-iq-technical-highlights/">Sybase 	IQ&#8217;s</a>. E.g., you can keep different data ranges for different 	columns on fast storage, while the rest of the data is relegated to 	slower/cheaper equipment.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/06/oracle-and-vertica-on-compression-and-other-physical-data-layout-features/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Oracle Exadata 2 capacity pricing</title>
		<link>http://www.dbms2.com/2009/10/05/oracle-exadata-2-capacity-pricing/</link>
		<comments>http://www.dbms2.com/2009/10/05/oracle-exadata-2-capacity-pricing/#comments</comments>
		<pubDate>Mon, 05 Oct 2009 12:20:19 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1021</guid>
		<description><![CDATA[Summary of Oracle Exadata 2 capacity pricing
Analyzing Oracle Exadata pricing is always harder than one would first think. But I&#8217;ve finally gotten around to doing an Oracle Exadata 2 pricing spreadsheet. The main takeaways are:

If 	we believe Oracle&#8217;s claims of 10X compression, Exadata 2 costs more 	per terabyte of user data than Netezza TwinFin &#8212; [...]]]></description>
			<content:encoded><![CDATA[<p><strong><span style="font-style: normal;">Summary of Oracle Exadata 2 capacity pricing</span></strong></p>
<p><span style="font-style: normal;">Analyzing Oracle Exadata pricing is always harder than one would first think. But I&#8217;ve finally gotten around to doing an Oracle Exadata 2 pricing </span><a href="http://www.monash.com/uploads/Oracle-Exadata-pricing-estimates-Oct-2009.xls" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">spreadsheet</a>.<span style="font-style: normal;"> The main takeaways are:</span></p>
<ul>
<li><span style="font-style: normal;">If 	we believe Oracle&#8217;s claims of 10X compression, Exadata 2 costs more 	per terabyte of user data than Netezza TwinFin &#8212; $22-26K/TB vs. 	TwinFin&#8217;s &lt;$20K &#8212; but less than the Teradata 2550.</span></li>
<li><span style="font-style: normal;">These 	figures are highly sensitive to assumptions about Oracle&#8217;s </span><a href="http://www.dbms2.com/2009/09/03/oracle-11g-exadata-hybrid-columnar-compression/" >hybrid 	columnar compression</a><span style="font-style: normal;">.</span></li>
<li><span style="font-style: normal;">Similarly, 	if Netezza or Teradata were to significantly upgrade their own 	compression, the price comparison would look quite different.</span></li>
<li><span style="font-style: normal;">Options 	such as Data Mining or Oracle Spatial add 12% or so each to 	Exadata&#8217;s total system price.</span></li>
</ul>
<p><strong><span style="font-style: normal;">Longer version</span></strong></p>
<p><span style="font-style: normal;">When Oracle introduced Exadata last year it was, well, <a href="../2008/09/30/oracle-database-machine-exadata-pricing-part-2/">expensive</a>. Exadata 2 has now been announced, and it is significantly cheaper than Exadata 1 per terabyte of user data, based on:</span></p>
<ul>
<li>Similar overall pricing</li>
<li>Twice the disk capacity</li>
<li>Better compression</li>
</ul>
<p style="margin-bottom: 0in;"><strong><span id="more-1021"></span>Compression is the big question mark.</strong> Row-based DBMS vendors have traditionally been, if anything, conservative in their compression claims, although Netezza recently went with a not-sandbagged 2.25X compression estimate to get below <a href="http://www.dbms2.com/2009/07/30/the-netezza-price-point/" >the $20K/terabyte price point</a>. <span style="font-style: normal;">Columnar software vendors have tended to be more aggressive, with figures of 10X or more casually thrown around, or 40-50X for archival storage. But since columnar vendors sell mainly on a software-only basis, those claims haven&#8217;t generally shown up in per-terabyte total system cost comparisons, which most commonly focus on data warehouse appliance product lines.*</span></p>
<p style="margin-bottom: 0in;"><em>*Kickfire, the one columnar pure-play appliance vendor, hast to date been quite conservative in its compression marketing claims.</em></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">Oracle, however, recently announced a feature called </span><em>hybrid columnar compression</em><span style="font-style: normal;">, and is now making compression claims with the usual columnar grandiosity. Oracle&#8217;s story is 10X compression, and they&#8217;re sticking to it, perhaps because 10 is such a nice round number.* <span style="text-decoration: line-through;">Since Oracle hybrid columnar compression is part of 11g Release 2, and isn&#8217;t Exadata-specific, w</span>We can hope to eventually get a sense from the field of what levels of compression are actually realistic. <em>(Edit: Actually, it seems that <a href="http://blog.tanelpoder.com/2009/09/01/oracle-11gr2-has-been-released-and-with-column-oriented-storage-option/" onclick="javascript:pageTracker._trackPageview('/blog.tanelpoder.com');">hybrid columnar compression only works with Exadata</a>, at least at this time.)</em> But for now, we don&#8217;t have much to go on except Oracle&#8217;s claims.</span></p>
<p style="margin-bottom: 0in;"><em>*Greg Rahn of Oracle tweeted me yesterday that one customer is getting 12-17X compression on &#8220;dimensional model&#8221; data. That sounds comparable to <a href="http://www.dbms2.com/2008/09/24/vertica-finally-spells-out-its-compression-claims/" >Vertica&#8217;s claim of 20X on &#8220;marketing analytics&#8221; and 30X on &#8220;consumer data&#8221; datasets</a>.<br />
</em>
</p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;"><strong>Based on 10X compression</strong></span><span style="font-style: normal;"> (vs. Netezza&#8217;s 2.25X and Teradata&#8217;s lower figure), </span><span style="font-style: normal;"><strong>Oracle Exadata 2 is somewhat more expensive than Netezza TwinFin, and significantly cheaper than Teradata&#8217;s 2550.</strong></span> Specifically, Oracle Exadata 2 comes in around $22K/TB of user data for a full rack, and $26K/TB for a quarter rack, which is the Exadata 2 configuration more comparable to a TwinFin rack in user data capacity. This is if you look at the Exadata hardware version that uses 600 GB SAS drives (vs. 1 TB SAS drives for TwinFin and 300 GB SAS drives for Teradata). With 2 TB SATA drives, at the same system pricing, Exadata prices are 70% lower, getting down to $6K/TB for a full rack. You can see how I got these figures on my Oracle Exadata 2 pricing spreadsheet linked above.</p>
<p style="margin-bottom: 0in;">Obviously, price/terabyte is just one metric. Throughput is often even more important, but also is a lot harder to quantify simply. Oracle Exadata 2 offers more raw I/O than Netezza TwinFin. Netezza TwinFin, with its FPGA-based pipelining, probably has more processing oomph than Oracle Exadata. Oracle&#8217;s compression could lead to better use of RAM cache. And so it goes.</p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">Meanwhile, two factors that in my opinion don&#8217;t matter much to the analysis are:</span></p>
<ul>
<li><em>The 	re-usability of Oracle licenses on other hardware.</em><span style="font-style: normal;"> Most of Exadata&#8217;s cost is either for hardware, or for server 	software that&#8217;s priced on a per-core basis. Neither of those is 	going to manage much (or any) more data three years from now than it 	can today.</span></li>
<li><em>What 	Oracle claims as pricing metrics.</em><span style="font-style: normal;"> Oracle&#8217;s comments on Exadata pricing generally sow confusion, which 	is why I do my own spreadsheets.</span></li>
</ul>
<p style="margin-bottom: 0in;"><em><strong>Related links</strong></em></p>
<ul>
<li><a href="http://www.dbms2.com/2009/09/25/the-hunt-for-oracle-exadata-production-references/" >The hunt for Oracle Exadata production references</a> (but hopefully some will be revealed at Oracle Open World)</li>
<li><a href="http://www.dbms2.com/2009/09/30/facts-and-rumors/" >A favorable rumor about Exadata sales</a></li>
<li><a href="http://www.dbms2.com/2009/09/29/integration-oltp-data-warehousing-exadata-2/" >Issues in integrating OLTP and data warehousing in a single system</a></li>
<li>James Kobielus of Forrester tweets that <a href="http://twitter.com/jameskobielus/status/4627010008" onclick="javascript:pageTracker._trackPageview('/twitter.com');">there are &#8220;plenty&#8221; of pleased Exadata users</a></li>
<li>More on <a href="http://www.dbms2.com/2009/10/06/oracle-and-vertica-on-compression-and-other-physical-data-layout-features/" >Oracle Exadata hybrid columnar compression</a></li>
</ul>
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/05/oracle-exadata-2-capacity-pricing/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Notes on the Oracle Database 11g Release 2 white paper</title>
		<link>http://www.dbms2.com/2009/09/21/notes-on-the-oracle-database-11g-release-2-white-paper/</link>
		<comments>http://www.dbms2.com/2009/09/21/notes-on-the-oracle-database-11g-release-2-white-paper/#comments</comments>
		<pubDate>Mon, 21 Sep 2009 17:12:33 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Cache]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Oracle TimesTen]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=923</guid>
		<description><![CDATA[
The Oracle Database 11g Release 2 white paper I cited a couple of weeks ago has evidently been edited, given that a phrase I quoted last month is no longer to be found. Anyhow, here are some quotes from and comments on what evidently is the latest version.
The In-Memory Database Cache (IMDB Cache) option of [...]]]></description>
			<content:encoded><![CDATA[<ul></ul>
<p><!-- 		@page { size: 8.5in 11in; margin: 0.79in } 		P { margin-bottom: 0.08in } -->The <a href="http://www.oracle.com/technology/products/database/oracle11g/pdf/oracle-database-11g-release2-overview.pdf" onclick="javascript:pageTracker._trackPageview('/www.oracle.com');">Oracle Database 11g Release 2 white paper</a> I cited <a href="http://www.dbms2.com/2009/09/03/oracle-11g-exadata-hybrid-columnar-compression/" >a couple of weeks ago </a>has evidently been edited, given that a phrase I quoted last month is no longer to be found. Anyhow, here are some quotes from and comments on what evidently is the latest version.<span id="more-923"></span></p>
<blockquote><p>The In-Memory Database Cache (IMDB Cache) option of Oracle Database 11g Release 2, allows data to be cached and processed in the memory of the applications themselves, off-loading the data processing to middle tier resources. Any network latency between the middle tier and the back-end database is removed from the transaction path, with the result that individual transactions can often be executed up to 10 times faster. This is particularly useful where very high rates of transaction processing is required, such as those found under market trading systems, Telco switching systems, and Real Time manufacturing environments. All data in the middle tier is fully protected through local recovery, and asynchronous posting to the back end Oracle Database. With Oracle Database 11g Release 2, the ability to transparently deploy IMDB Cache with existing Oracle applications becomes much easier – with common data types, SQL and PL/SQL support, and native support for the Oracle Call Interface (OCI).</p></blockquote>
<p>At a guess, this sounds like it&#8217;s based on Oracle&#8217;s TimesTen acquisition.</p>
<blockquote><p>Oracle Database 11g Release 2 adds further optimizations, including capabilities to automatically determine the most optimal degree of parallelization for a query, based on available resources. With this comes automated parallel statement queuing, where the database determines that, based on current resource availability, it is more effective to queue a query for later execution once required resources have freed up.</p></blockquote>
<p>Sounds like a kind of automatic workload management &#8212; i.e., the kind of optimization vendors of mature products get around to putting into their systems. It does not sound like query pipelining, however.</p>
<blockquote><p>Oracle Database 11g Release 2 will automatically distribute a large compressed table (or a smaller non-compressed table), into the available memory across all the servers in the Grid, and will then localize parallel query processing to the data in memory on the individual nodes. This dramatically improves query performance, and is especially useful where large tables can be entirely compressed into the available memory using compression capabilities.</p></blockquote>
<p>So Oracle caches compressed data. Not stated is which compression techniques are covered.</p>
<blockquote><p>Each Exadata Storage Server stores up to 7 Terabyte [sic] of uncompressed user data, and also comes enabled with 384 GB of solid-state Flash cache. This Flash Cache automatically caches active data of the magnetic disks in the Oracle Exadata Storage Server, delivering a 10x performance gain for read and write operations under OLTP applications.</p></blockquote>
<p>Sounds like the Flash memory is positioned for OLTP use.</p>
<blockquote><p>In the past, Database Administrators and System Administrators have spent a great deal of time determining to how best place data across these disk arrays, to get maximum performance and availability. The best procedure for data placement is to simply Stripe And Mirror Everything; stripe data blocks equally across all disks in an array, and then mirror the blocks on at least two disks. This approach provides the perfect balance between performance, disk utilization, and ease of use.</p></blockquote>
<p>This is a big part of what could be called the &#8220;Administering Oracle doesn&#8217;t suck nearly as badly as it used to&#8221; pitch. (Mitchell Kertzman, who was Sybase CEO after the mid-1990s meltdown, told me his motto was &#8220;We suck less every day.&#8221; But I digress &#8230;)</p>
<blockquote><p>Automatic Storage Management (ASM), a feature of Oracle Database 11g automates the striping and mirroring of database without the need to purchase third party volume management software. As data volumes increase, additional disks can be added, and ASM will automatically restripe and rebalance the data across available disks to ensure optimal performance. Similarly, disks that report errors can be removed from the disk array, and ASM will re-adjust accordingly.</p></blockquote>
<p>I.e., you can add nodes without taking the system down. That&#8217;s becoming a pretty standard feature for serious parallel DBMS.</p>
<blockquote><p>Oracle Database 11g Release 2 improves ASM in significant areas. New intelligent data placement capabilities store infrequently accessed data on the inner rings of the physical disks, while frequently accessed data is placed on the outer rings, offering better performance optimization.</p></blockquote>
<p>Also pretty standard.</p>
<blockquote><p>Oracle has been enhancing partitioning capabilities for over ten years. Oracle Partitioning, an option of Oracle Database 11g Release 2, allows very large tables (and their associated indexes) to be partitioned into smaller, more manageable units, providing a “divide and conquer” approach to very large database management. Partitioning also improves performance, as the optimizer will prune queries to only use the relevant partitions of a table or index in a lookup. Oracle Database 11g Release 2 provides multiple methods for partitioning data, and also allows different levels of partitioning on the same table, so that a single partitioning strategy can be used to improve both performance and manageability.</p></blockquote>
<p>Even better might be a system that doesn&#8217;t lean heavily on complex partitioning to achieve good performance.</p>
<blockquote><p>Oracle Partitioning can also manage the lifecycle of information. Typically, all databases have active data – the information being processed this month or quarter, and historical data that is primarily read-only. Organizations can take advantage of the inherent lifecycle of data to implement a multi-tiered storage solution and lower their overall storage costs. For example, a large table within an order-entry system could contain all the orders processed in the last 7 years. Oracle Partitioning can be used to set up monthly partitions, with the current last four months of order data partitioned onto a high-end storage array, with all the other partitions placed on a lower-cost storage solution, often 2-3 times less cost than the high end storage environment.</p></blockquote>
<p>This is becoming a standard feature for any parallel DBMS that can support multiple kinds of storage in one system.</p>
<blockquote><p>Oracle Database 11g also provides advanced compression techniques to further reduce storage requirements. Using Oracle Advanced Compression, an option to Oracle Database 11g, all data in a table can be compressed using a continuous table compression capability that achieves a 2-4 times compression ratio with little performance impact on OLTP or Data Warehousing workloads. This compression technology replaces duplicate values in a table with a single value, and continuously adapts to data changes over time, so compression ratios are always maintained.</p></blockquote>
<p>Sounds like dictionary/token compression.</p>
<blockquote><p>With Oracle Database 11g Release 2, the Exadata Storage Servers in the Sun Oracle Database Machine also enable new hybrid columnar compression technology that provides up to a 10 times compression ratio, with corresponding improvements in query performance. And, for pure historical data, a new archival level of hybrid columnar compression can be used that provides up to 50 times compression ratios.</p></blockquote>
<p>I thought they said 40X before. But even if my memory isn&#8217;t playing tricks regarding that, single-point compression ratio estimates are always very approximate.</p>
<blockquote><p>Any hardware component in an Oracle Grid can be dynamically added or removed as required. Disks can be added or removed online with ASM, with the data automatically rebalanced across the new disk infrastructure. Additional servers can also be easily added or removed to a Real Application Cluster with users connected to these nodes rebalanced across the infrastructure. This ability to migrate users from one server to another in a RAC cluster also enables rolling patching of the database software. If a patch needs to be applied, then a server can be removed from the cluster, patched, and then put back into the cluster. The same operation can be repeated for the next server in the cluster, and so on.</p></blockquote>
<p>Nice. And the paper goes on in that vein for quite a while.</p>
<blockquote><p>Oracle Total Recall, an option to Oracle Database 11g, provides a solution for the retention of historical information. With Oracle Total Recall, all changes made to data are kept to provide a complete change history of information. This means that auditors can not only see who did what when, but they can also see what the actual information was at the time – something that previously has only be [sic] available by building into the application, or by expensive backup retention policies.</p></blockquote>
<p>Timestamping/time-travel/whatever is increasingly becoming a standard feature as well, especially given the number of PostgreSQL-based DBMS on the market.</p>
<blockquote><p>New internal control requirements found in regulations can be difficult and expensive to implement in an environment with multiple applications. Oracle Database Vault, an option to Oracle Database 11g, allows access controls to be transparently applied underneath existing applications. Users can be prevented from accessing specific application data, or from accessing the database outside of normal hours; separation-of-duty requirements can be enforced for different Database Administrators without a costly least privilege exercise. And Oracle Advanced Security, an option to Oracle Database 11g, can be used to transparently encrypt data at all levels – data in transit on the network; data at rest on physical storage and in backups. Similarly, the Data Masking pack can be used to obfuscate data as it moves from production to development, reducing the potential violation of privacy regulations or risking sensitive data leaks.</p></blockquote>
<p>Oracle is the gold standard in database security.</p>
<blockquote><p>Oracle’s self-management approach takes two tacks. Firstly, wherever possible, repeatable, labor intensive and error prone tasks that can be fully automated in the database have been. For example, Storage Management, Memory Management, Statistics collection, Backup and Recovery, and SQL Tuning have all been automated. Secondly, where operations cannot be fully automated, intelligent advisors are built into the database to mentor Database Administrators on how to get the best out of their systems. Advisors are provided for Configuration Management, Patching, Indexing, Partitioning, Performance Diagnostics, Data Recovery, and, new in Oracle Database 11g Release 2, Compression and Maximum Availability.</p></blockquote>
<p>And boy are they needed.</p>
<blockquote><p>Recent studies performed by an independent research company shows that Database Administrators can expect to spend 26% less time managing their 11g environments over their 10g environments, and as much as 50% when compared to older Oracle9i deployments.</p></blockquote>
<p>50% of way too much is still way too much.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/09/21/notes-on-the-oracle-database-11g-release-2-white-paper/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Oracle Exadata hybrid columnar compression</title>
		<link>http://www.dbms2.com/2009/09/03/oracle-11g-exadata-hybrid-columnar-compression/</link>
		<comments>http://www.dbms2.com/2009/09/03/oracle-11g-exadata-hybrid-columnar-compression/#comments</comments>
		<pubDate>Thu, 03 Sep 2009 09:33:45 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=873</guid>
		<description><![CDATA[Oracle Database 11g Release 2 is out, and as usual I wasn&#8217;t briefed &#8212; perhaps because Oracle is more scared than its competitors are of hard questions, perhaps for some other reason entirely.*  Anyhow, Oracle Database 11 Release 2 contains an Exadata-only feature called hybrid columnar compression. The Oracle Database 11g Release 2 white paper [...]]]></description>
			<content:encoded><![CDATA[<p>Oracle Database 11g Release 2 is out, and as usual I wasn&#8217;t briefed &#8212; perhaps because Oracle is more scared than its competitors are of hard questions, perhaps for some other reason entirely.*  Anyhow, Oracle Database 11 Release 2 contains an Exadata-only feature called hybrid columnar compression. The <a href="http://www.oracle.com/technology/products/database/oracle11g/pdf/oracle-database-11g-release2-overview.pdf" onclick="javascript:pageTracker._trackPageview('/www.oracle.com');">Oracle Database 11g Release 2 white paper</a> says &#8220;data is grouped, ordered, and stored one column at a time.&#8221; But <a href="http://kevinclosson.wordpress.com/2009/09/01/oracle-switches-to-columnar-store-technology-with-oracle-database-11g-release-2/" onclick="javascript:pageTracker._trackPageview('/kevinclosson.wordpress.com');">Kevin Closson</a> clarifies:</p>
<blockquote><p>The word hybrid is important.</p>
<p>Rows are still used. They are stored in an object called a Compression Unit. Compression Units can span multiple blocks. Like values are stored in the compression unit with metadata that maps back to the rows.</p>
<p>So, “hybrid” is the word. But, none of that matters as much as the effectiveness. This form of compression is extremely effective.</p></blockquote>
<p>That sounds a whole lot like <a href="http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/" >PAX</a>. Specifically, in Oracle&#8217;s case I would guess &#8220;hybrid columnar compression&#8221; provides the compression benefits of column stores, but not column stores&#8217; I/O benefits, and also not any kind of in-memory compression.<span id="more-873"></span></p>
<p><em>*Actually, Oracle has indicated to me multiple times that the reason is I won&#8217;t let Oracle review what I write before I publish it. My stance is that such &#8220;review&#8221; is an extremely time-wasting courtesy, in which one spends a lot of time diplomatically explaining to a vendor that, contrary to what it hopes, one really does know the difference between marketing puffery and sober fact.  I rarely do white paper projects any more, notwithstanding that my fee for those now exceeds $2,000/page. I&#8217;m not about to go through the &#8220;review&#8221; hassle for something I write for free, about a vendor who isn&#8217;t otherwise a paying client.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/09/03/oracle-11g-exadata-hybrid-columnar-compression/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Sybase IQ technical highlights</title>
		<link>http://www.dbms2.com/2009/08/25/sybase-iq-technical-highlights/</link>
		<comments>http://www.dbms2.com/2009/08/25/sybase-iq-technical-highlights/#comments</comments>
		<pubDate>Tue, 25 Aug 2009 09:16:07 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=871</guid>
		<description><![CDATA[General highlights of the Sybase IQ technical story include:

Sybase IQ is an analytic DBMS with 	a columnar/column-store architecture
Unlike most analytic DBMS, Sybase 	IQ has a shared-disk architecture.
The Sybase IQ indexing story is a 	bit complicated, with a bunch of different index kinds. Most are 	focused on columns with low cardinality, and it least in some [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">General highlights of the Sybase IQ technical story include:</p>
<ul>
<li>Sybase IQ is an analytic DBMS with 	a columnar/column-store architecture</li>
<li>Unlike most analytic DBMS, Sybase 	IQ has a shared-disk architecture.</li>
<li>The Sybase IQ indexing story is a 	bit complicated, with a bunch of different index kinds. Most are 	focused on columns with low cardinality, and it least in some cases 	are a lot like bitmaps. (Sybase IQ when first introduced was a pure 	bitmap index product, with a single index type “Fast Project”.) 	But one index kind, “High Group” &#8212; designed for columns with 	high cardinality – is an exception to most generalities about 	other Sybase IQ index kinds, and instead is more akin to a b-tree.</li>
<li>Unlike Vertica, Sybase stores each 	column of data only once.  I don&#8217;t see how it would make sense to 	have multiple indexes on the same column, but I didn&#8217;t actually ask 	whether doing so is possible or common.</li>
<li>Sybase estimates that Sybase IQ 	requires ¼ the DBA effort of, say, Oracle. (Frankly, that&#8217;s 	not a particularly good figure.) Obviously, this is just a 	broad-brush average.</li>
<li>Sybase recently repurposed an 	acquired ETL tool to be focused on Sybase IQ. IQ of course also 	works with various third-party tools, certified or otherwise.</li>
<li>Sybase&#8217;s Power Designer CASE 	(Computer-Aided Software Engineering)/database design tool works 	with Sybase IQ.</li>
<li><a href="http://blogs.sybase.com/sybaseiq/2009/07/sybase-iq-151-more-than-meets-the-eye%E2%80%A6/" onclick="javascript:pageTracker._trackPageview('/blogs.sybase.com');">Sybase 	is proud of Sybase IQ&#8217;s new in-database analytics capabilities</a>, 	but I haven&#8217;t yet grasped what, if anything, is differentiated about 	them.</li>
<li>Sybase has an ILM (Information 	Lifecycle Management) story built around the point that different 	columns can be stored on different kinds of media.</li>
</ul>
<p style="margin-bottom: 0in;">Highlights of the Sybase IQ compression story include:<span id="more-871"></span></p>
<ul>
<li>Sybase IQ applies compression to 	both columns and pages</li>
<li>A (the?) major kind of column 	compression is called “projection” &#8212; why? &#8212; but boils down to 	token/dictionary compression. Tokens can be 1, 2, or 3 bytes or 	length – whichever is the best fit for the column&#8217;s cardinality.</li>
<li>I don&#8217;t have details about the 	other kinds of compression.</li>
<li>Data is kept compressed in memory 	“until the latest point possible.”</li>
</ul>
<p style="margin-bottom: 0in;">Highlights of the Sybase IQ update and load story include:</p>
<ul>
<li>Sybase claims that only the “High 	Group” index is costly to update.  Specifically, “High Group” 	costs about as much to update as the database itself. Other indexes 	are fairly trivial to update. (Upon reflection, I don&#8217;t immediately 	see why that makes sense.)</li>
<li>There&#8217;s pipelining of some sort 	when a High Group index is updated.</li>
<li>Sybase claims that bulk loads of 	Sybase IQ are very fast.</li>
<li>Loading Sybase IQ doesn&#8217;t block 	queries. Rather, Sybase IQ has some kind of versioning system in 	which a query just executes against older data.</li>
<li>Sybase IQ updating is done in 	parallel. (That would be parallel among servers, of course, since 	Sybase IQ is shared-disk.)</li>
<li>Trickle feed loading of Sybase IQ 	is slow. When you need to do microbatch loading with latency in the 	2-15 minute range, Sybase recommends staging via an OLTP DBMS, 	whether from Sybase or otherwise. Sybase PowerDesigner generates 	scripts for this, and Sybase Replication Server helps with the 	execution.</li>
</ul>
<p style="margin-bottom: 0in;">Highlights of the Sybase IQ concurrency, scalability, and workload management story include:</p>
<ul>
<li>Sybase points out that, because of 	Sybase IQ&#8217;s shared-disk architecture, queries can execute on a 	single server in the “grid.” Thus, if you have enough cores, it 	can be possible to isolate long-running queries from shorter ones.</li>
<li>Similarly, Sybase notes that you 	can meet different SLAs by putting different users&#8217; queries on more- 	or less-crowded Sybase IQ servers.</li>
<li>Sybase further observes that not 	having to move data among nodes saves Sybase IQ from a lot of 	overhead true MPP systems endure.</li>
<li>Sybase makes the usual claim that, 	because Sybase IQ is so efficient, queries finish quickly, and hence 	there&#8217;s less stress on concurrency than one might otherwise think.</li>
<li>I don&#8217;t get the sense that Sybase 	IQ actually boasts a lot of direct workload management features. 	However, there are such features in Sybase&#8217;s flagship ASE product, 	so hopefully adding something similar to Sybase IQ is a product 	future.</li>
</ul>
<p style="margin-bottom: 0in;"><em><strong>Related links</strong></em></p>
<ul>
<li><a href="http://www.dbms2.com/2009/08/25/sybase-iq-business-notes/" >Sybase IQ business notes</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/08/25/sybase-iq-technical-highlights/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>
