<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Rainstor</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/clearpace/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 09 Feb 2012 09:21:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Teradata Columnar and Teradata 14 compression</title>
		<link>http://www.dbms2.com/2011/09/22/teradata-columnar-compression/</link>
		<comments>http://www.dbms2.com/2011/09/22/teradata-columnar-compression/#comments</comments>
		<pubDate>Thu, 22 Sep 2011 05:25:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Rainstor]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5296</guid>
		<description><![CDATA[Teradata is pre-announcing Teradata 14, for delivery by the end of this year, where by &#8220;Teradata 14&#8243; I mean the latest version of the DBMS that drives the classic Teradata product line. Teradata 14&#8242;s flagship feature is Teradata Columnar, a hybrid-columnar offering that follows in the footsteps of Greenplum (now part of EMC) and Aster [...]]]></description>
			<content:encoded><![CDATA[<p>Teradata is pre-announcing Teradata 14, for delivery by the end of this year, where by &#8220;Teradata 14&#8243; I mean the latest version of the DBMS that drives the classic Teradata product line. Teradata 14&#8242;s flagship feature is Teradata Columnar, a hybrid-columnar offering that follows in the footsteps of <a href="../../../../../2009/10/14/greenplum-hybrid-columnar/">Greenplum</a> (now part of EMC) and <a href="../../../../../2010/09/15/aster-data-ncluster-version-4-6/">Aster Data</a> (now part of Teradata).</p>
<p>The basic idea of Teradata Columnar is:</p>
<ul>
<li>Each table can be stored in Teradata in row format, column format, or a mix.</li>
<li>You can do almost anything with a Teradata columnar table that you can do with a row-based one.</li>
<li>If you choose column storage, you also get some new compression choices.</li>
</ul>
<p><span id="more-5296"></span>The &#8220;mix&#8221; option is like Vertica&#8217;s <a href="../../../../../2009/08/04/flexstore-and-the-rest-of-vertica-35/">FlexStore</a>, in that different columns (e.g. different components of a street address) can be grouped into a mini-row, even if you otherwise choose to store that table in a columnar way. Teradata does not at this time offer the Greenplum or Aster way of mixing rows and columns, whereby some of the rows in a table can be stored in a column-store way, while other rows are stored in entire-row row-store solidarity</p>
<p>Thus, Teradata Columnar gives you many of the basic I/O and compression benefits of columnar DBMS, along with all the usual Teradata goodness of concurrency, workload management, system management, concurrency, SQL support, and so on. By way of comparison:</p>
<ul>
<li>Similar things are true of Greenplum&#8217;s offering (except for the parts about concurrency, advanced workload management, and so on).</li>
<li>Aster doesn&#8217;t have columnar compression.</li>
<li>Oracle has <a href="../../../../../2011/02/06/columnar-compression-database-storage/">columnar compression but no true columnar storage</a>.*</li>
</ul>
<p>Also, as I noted above, Teradata mixes rows and columns in a different way than Aster or EMC Greenplum do.</p>
<p><em>*However, I won&#8217;t be surprised if Oracle soon announces true hybrid-columnar as well. I originally heard about Teradata Columnar and Oracle&#8217;s efforts to develop true hybrid-columnar storage the same week, 23 months ago.</em></p>
<p>Going hybrid-columnar is a big deal. Aster Data, for example, told me that a considerable fraction of all its workloads ran faster with columnar than row-based storage.* And it&#8217;s of extra importance to a vendor that, like Teradata, needs to play catch-up in the compression derby.</p>
<p><em>*Anything in which the queries eliminated more than half or so of the columns (60%, if I recall correctly, but it was definitely an approximate figure). That pretty much means any query except full and near-full table scans.</em></p>
<p>Teradata&#8217;s columnar compression story is pretty complicated. To quote from a forthcoming press release:</p>
<blockquote><p>Teradata automatically chooses from among six types of compression: run length, dictionary, trim, delta on mean, null and UTF8. based on the column demographics.</p></blockquote>
<p>The trickiest words in that are &#8220;automatic&#8221; and &#8220;dictionary&#8221;. Teradata divides column-store data into &#8220;column containers&#8221; of, say, 8 KB. (Current thinking is 8 KB default, 65 KB maximum, but that could change by the time of product release.) By default, Teradata software decides separately for each column container which compression algorithm(s) to use. It can even change its mind dynamically over time, as the contents of the container change.</p>
<p>What I find weird about Teradata&#8217;s columnar dictionary compression is that the dictionary is container-specific. One benefit versus having a more global dictionary is that, since you compress fewer items, compression tokens can each be shorter. (The length of a typical token is a lot like the log of the cardinality of the dictionary.) Another benefit is that smaller dictionaries are faster to search. The obvious offsetting drawback is that a larger and more global dictionary has the potential to compress various items that wind up being left uncompressed in this smaller-scale scheme.</p>
<p>Other notes about Teradata compression include:</p>
<ul>
<li>Teradata has for a while had a more manual form of dictionary compression.</li>
<li>Teradata also has block-level compression.</li>
<li>You can do block-level compression even on top of the columnar compression described above.</li>
<li>The Teradata/Rainstor partnership for archiving-level compression that Rainstor made so much fuss about doesn&#8217;t seem to actually be happening; Teradata seems content with the other compression choices it offers.</li>
</ul>
<p>And finally, Teradata 14 extends <a href="../../../../../2008/10/14/teradata-virtual-storage/">Teradata Virtual Storage</a> with a feature called Compress on Cold. The idea is that &#8220;cold&#8221; data can safely get (extra) compression &#8212; that block-level stuff &#8212; automatically. If the data heats up again (e.g. by becoming relevant for a while to the latest year-over-year comparisons) it can be just as automatically removed from compression. Teradata thinks this is significantly better than the alternative of making manual compression choices based on not-so-granular range partitions.</p>
<p>Unsurprisingly, Teradata lacks some features and benefits found in certain columnar-first analytic DBMS. One biggie is that, absent clever workarounds such as Vertica&#8217;s in-memory write-optimized store, columnar DBMS have a single-row-update performance problem, because you are putting the information in many places on disk rather than just one. I generally take it for granted that a columnar-first vendor has such a workaround. Row-based vendors gone columnar, however, are a different story. Teradata et al. are also likely to decompress data and reassemble it into full rows as soon as it hits RAM, which obviates the potential benefit that you have less data per row clogging up cache.*<em> (Edit: As per Todd Walter&#8217;s comments below, this is not accurate &#8212; and that&#8217;s a potentially important feature.)</em></p>
<p><em>*Late decompression actually depends on columnar compression, not columnar storage, and hence can also be enjoyed by row-based DBMS such as </em><a href="../../../../../2010/06/21/netezza-ibm-db2-compression/"><em>DB2</em></a><em>. </em></p>
<p>To use Teradata Columnar, you need to be using round-robin data distribution rather than, say, hash. Teradata jargon for this is NoPI, where the &#8220;PI&#8221; stands for Primary Index.* Drawbacks to that include:</p>
<ul>
<li>You don&#8217;t get the hash distribution benefit of saving a data redistribution step on joins whose join key happens to be the same as the hash key.</li>
<li>In Teradata-land, NoPI implies append-only, so you get the garbage collection/compactification that implies.</li>
</ul>
<p>However, that&#8217;s a physical append-only; you can still do logical updates.</p>
<p><em>*PI is not to be confused with PPI, which stands for Primary Partition Index, and is Teradata&#8217;s name for range (or case-statement-based) partitioning. PPI works just fine with Teradata Columnar. As of Teradata 14, you can do PPI up to 62 levels deep.</em></p>
<p>The Teradata folks also sent along a slide deck laying out parts of the <a href="http://www.monash.com/uploads/Teradata-Columnar-September-2011.ppt">Teradata Columnar</a> story. But it&#8217;s not one of the better Teradata decks I&#8217;ve ever posted.<em><br />
</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/09/22/teradata-columnar-compression/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Eight kinds of analytic database (Part 2)</title>
		<link>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/</link>
		<comments>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 08:18:18 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Buying processes]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data types]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MOLAP]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Rainstor]]></category>
		<category><![CDATA[SAND Technology]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[SenSage]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4867</guid>
		<description><![CDATA[In Part 1 of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I&#8217;ll cover four more kinds of analytic database &#8212; even newer, for the most part, with a use case/product short list [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/">Part 1</a> of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I&#8217;ll cover four more kinds of analytic database &#8212; even newer, for the most part, with a use case/product short list match that is even less clear.  <span id="more-4867"></span></p>
<p><strong><em>Bit bucket</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included: </em>Logs, other technical/external</li>
<li><em>Likely use styles:</em> Staging/ETL, investigative</li>
<li><em>Canonical example: </em>Log files in a Hadoop cluster<em> </em></li>
<li><em>Stresses:</em> TCO, scale-out, transform/big-query performance, ETL functionality</li>
</ul>
<p>With the explosion of <a href="../../../../../2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a> has come the need for a place to put it all, sometimes called the <a href="../../../../../2011/06/04/dirty-data-stored-dirt-cheap/">big bit bucket</a>. This is like the investigative data mart for big databases, but more <a href="../../../../../2011/05/17/poly-structured-database/">poly-structured</a>. In some cases it is focused on data staging and transformation; but it can also be used for analysis in place.</p>
<p>The list of candidate technologies to run your bit bucket starts with Hadoop and Splunk.</p>
<p><strong><em>Archival data store</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included: </em>Operational, CDR (call detail record), security log</li>
<li><em>Likely use styles:</em> Archival, reporting (for compliance), possibly also investigative</li>
<li><em>Examples:</em> Any long-term detailed historical store</li>
<li><em>Stresses: </em>TCO, compression, scale-out, performance (if multi-use)<em> </em></li>
</ul>
<p><em> </em></p>
<p>Analytic DBMS vendors have been insulting each other with the claim &#8220;that&#8217;s just an archival data store,&#8221; dating back at least to the first time Greenplum was deployed on an underpowered Sun Thumper system. Perhaps only <a href="../../../../../2010/06/11/rainstor-update/">Rainstor</a> truly embraces the archival positioning, and I&#8217;ve become pretty dubious about their technical claims and their company alike.</p>
<p>Still, there&#8217;s a legitimate need for data stores &#8212; especially relational analytic DBMS that:</p>
<ul>
<li>Store data cheaply, with high rates of compression.</li>
<li>Have decent performance if you do want to query the data.</li>
<li>May have archiving/compliance-specific features as well.</li>
</ul>
<p>Along with Rainstor, SAND and SenSage have at least partially targeted that use case. In addition, appliance vendors such as Teradata and Netezza try to have an archive-oriented product version in their lineups.</p>
<p><strong><em>Outsourced data mart</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> All</li>
<li><em>Likely use styles:</em> Traditional BI, investigative analytics, staging/ETL</li>
<li><em>Examples:</em> Advertising tracking, SaaS CRM</li>
<li><em>Stresses:</em> Performance, TCO, reliability, concurrency</li>
</ul>
<p>Much of what happens in analytic database management can also be outsourced. Some applications that run via SaaS (Software as a Service) are analytic. I&#8217;ve had three different clients whose main business is picking marketing targets in various vertical segments; others who wanted to add analytics to what were historically OLTP applications; and others yet who just offered online business intelligence. Also, if your fundamental business is gathering data and reselling it to a variety of user organizations, that&#8217;s an analytic data management challenge. The possibilities expand from there.</p>
<p>Data outsourcers are in the IT business, and so their IT development is &#8212; hopefully! &#8212; more serious and less politically encumbered than at many conventional enterprises. Thus, legacy systems and master data management issues are commonly less prevalent, or at least more aggressively disposed of. The same, up to a point, goes for vendor politics.*  <a href="../../../../../2011/06/26/what-to-think-about-before-you-make-a-technology-decision/">Multitenancy</a> is commonly an issue, as is running in the cloud.<em> </em></p>
<p><em>*Even so, there&#8217;s often That Guy who doesn&#8217;t want to migrate away from Oracle, no matter what.<strong> </strong></em></p>
<p>Vertica gets the nod in a number of these cases; it&#8217;s cloud-friendly, and often the problem is naturally columnar. Other columnar products can be good choices too, with added brownie points for Infobright if the shop is MySQL-oriented anyway. Running Netezza or other appliances makes sense mainly if you&#8217;re pretty sure you want to keep operating your own data centers, but some data outsourcers are just fine with that assumption.</p>
<p><strong><em>Operational analytic(s) server</em></strong></p>
<ul>
<li><em>Kinds of data likely to be included:</em> Customer-centric, log, financial trade</li>
<li><em>Likely use styles:</em> Advanced operational analytics</li>
<li><em>Examples:</em>
<ul>
<li>Lower latency: Web or call-center personalization, anti-fraud</li>
<li>Higher latency: Customer profiling, Basel 3 risk analysis</li>
</ul>
</li>
<li><em>Stresses:</em> Performance, reliability, analytic functionality, perhaps concurrency</li>
</ul>
<p>Even with eight different choices, I need a &#8220;catch-all&#8221; category; this is it.</p>
<p>Suppose you want to do reasonably sophisticated analytics, then use the results in operations. This is the classical challenge in <a href="../../../../../2011/03/30/short-request-and-analytic-processing/">integrating short-request and analytic processing</a>. There are multiple ways to tackle it, embodying different trade-offs in cost, convenience, or analytic accuracy. If the platform on which you want to run your investigative analytics also has the reliability and concurrency appropriate for mission-critical operations, you&#8217;re set. Otherwise, you may want to pipe <a href="../../../../../2010/11/29/data-that-is-derived-augmented-enhanced-adjusted-or-cooked/">derived data</a> into a more &#8220;industrial-strength&#8221; DBMS, ideally the one that runs your operational apps anyway</p>
<p>Another option is to integrate a limited amount of analytics immediately into your short-request processing system. For example, as bad as they are at the kinds of queries that require joins, NoSQL systems are often fast at simple aggregations. As MapReduce/NoSQL integrations mature, that option may not require pumping the data anywhere else for deeper analytics; even if it does, at least you&#8217;re starting out with the data in a convenient bit bucket.</p>
<p>Streaming/CEP-centric architectures could come into play as well. And it goes on from there. The possibilities in this last category are just too varied to generalize about.</p>
<p><em>So did I get them all? Or are there yet other analytic data management use cases that I don&#8217;t fit into my eight categories?</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-2/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Rainstor update</title>
		<link>http://www.dbms2.com/2010/06/11/rainstor-update/</link>
		<comments>http://www.dbms2.com/2010/06/11/rainstor-update/#comments</comments>
		<pubDate>Fri, 11 Jun 2010 10:54:09 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Rainstor]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2252</guid>
		<description><![CDATA[I was tired and cranky when I talked with my former clients at Rainstor (formerly Clearpace) yesterday, so our call was shorter than it otherwise might have been. Anyhow, there&#8217;s a new version called Rainstor 4, the two main themes of which are: Compliance-specific features. Bottleneck Whack-A-Mole. The point is that Rainstor is focusing its [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I was tired and cranky when I talked with my former clients at <a href="http://www.dbms2.com/2009/12/11/rainstor-clearpace/">Rainstor (formerly Clearpace)</a> yesterday, so our call was shorter than it otherwise might have been. Anyhow, there&#8217;s a new version called Rainstor 4, the two main themes of which are:</p>
<ul>
<li>Compliance-specific features.</li>
<li><a href="http://www.dbms2.com/2009/08/21/bottleneck-whack-a-mole/">Bottleneck Whack-A-Mole</a>.</li>
</ul>
<p style="margin-bottom: 0in;">The point is that Rainstor is focusing its efforts on enterprises that:  <span id="more-2252"></span></p>
<ul>
<li>Have a compliance mandate to keep 	detailed information, either now or coming down the pike.</li>
<li>Would like to query the 	information, either as part of the compliance mandate or for the 	usual business reasons one does analysis (or for that matter 	pinpoint lookup of historical information).</li>
<li>Might want to delete the 	information as soon as the compliance mandate runs out. (That&#8217;s a 	new feature. Frankly, I think the clients demanding it are being 	foolish. Information is valuable and <a href="http://www.dbms2.com/2010/04/04/the-retention-of-everything/">should never be thrown away</a> if one can afford to keep it.)</li>
<li>Might want to annotate the 	information, even though it is being preserved immutably. (Also a 	new feature. I think that one is smart.)</li>
</ul>
<p style="margin-bottom: 0in;">“Application retirement” was mentioned only in the context of Rainstor&#8217;s flagship Informatica partnership, and even then mainly for clients who had a compliance reason to keep old application data around. “Cloud” and “private cloud” get mentioned, but they don&#8217;t seem to be as central as Rainstor was previously hoping they would be. (This is one area we could and probably should have touched on more had I been more awake.)</p>
<p style="margin-bottom: 0in;">One thing that hasn&#8217;t changed:  “<a href="http://www.dbms2.com/2008/12/16/database-archiving-and-information-preservation/">Information preservation</a>,” which I coined for Rainstor at our first meeting, is still the company catchphrase.</p>
<p style="margin-bottom: 0in;">So far as I could tell, the big point on Rainstor 4 Bottleneck Whack-A-Mole is this: When you load data into Rainstor (bulk or otherwise), it likes to do some metadata analysis first. (I imagine this is related to the sophisticated <a href="http://www.dbms2.com/2009/05/14/the-secret-sauce-to-clearpaces-compression/">Rainstor compression scheme</a>.) Well, that isn&#8217;t much of a performance hit for schemas with small numbers of tables, but is a bigger deal for more complex schemas. The Rainstor 4 fix is to remember/persist some of that analysis from one time the database is updated until the next time. Sounds obvious, but so do a lot of bottleneck fixes once they are made.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/11/rainstor-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>More miscellany</title>
		<link>http://www.dbms2.com/2009/12/30/more-miscellany/</link>
		<comments>http://www.dbms2.com/2009/12/30/more-miscellany/#comments</comments>
		<pubDate>Wed, 30 Dec 2009 11:38:22 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Continuent]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Rainstor]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1370</guid>
		<description><![CDATA[Adding to yesterday&#8217;s varied quick comments: Robert Hodges of Continuent offers a great outline of Continuent&#8217;s clustering story, with a lot of &#8220;Now we got right what we previously didn&#8217;t know/admit we got wrong.&#8221; Continuent now claims to have a strong clustering offering, both paid and free/open-source, for both MySQL and PostgreSQL, with Oracle support [...]]]></description>
			<content:encoded><![CDATA[<p>Adding to <a href="http://www.dbms2.com/2009/12/29/this-and-that/">yesterday&#8217;s varied quick comments</a>:<span id="more-1370"></span></p>
<p><a href="http://www.dbms2.com/2009/09/03/continuent-on-clustering/">Robert Hodges</a> of <strong>Continuent</strong> offers <a href="http://scale-out-blog.blogspot.com/2009/12/proving-masterslave-clusters-work-and.html">a great outline of Continuent&#8217;s clustering story</a>, with a lot of &#8220;Now we got right what we previously didn&#8217;t know/admit we got wrong.&#8221; Continuent now claims to have a strong <strong>clustering</strong> offering, both paid and free/open-source, for both MySQL and PostgreSQL, with Oracle support perhaps coming really soon.</p>
<p>Merv Adrian, who has <a href="http://www.dbms2.com/2009/06/22/the-tpc-h-benchmark-is-a-blight-upon-the-industry/">overrated the importance of TPC benchmarks</a> in the past, seems to have become more <a href="http://mervadrian.wordpress.com/2009/12/23/additional-caveats-obscure-oracles-tpc-benchmark/">skeptical</a>.</p>
<p>Interim CEO <a href="http://www.infobright.com/Blog/ceo_blog">Mark Burton</a> laid out<strong> Infobright&#8217;s focus</strong> pretty clearly when he took over:</p>
<blockquote><p><span style="letter-spacing: 0px;"> &#8230; the focus must be in building products that fit market segments where ease-of-use and easily attainable performance are valued.  This doesn’t sound like the high end of Data Warehousing to me where highly complex MPP architectures and teams of DBAs spend their time.  It sounds like the realm of Departmental IT and SMB where business leaders are in a hurry to gain access to data and answers without the lead time and pain of complex architectures and high costs.</span></p></blockquote>
<p><span style="letter-spacing: 0px;">I&#8217;m hearing about a <strong>SaaS focus</strong> from a lot of companies. The Continuent link above mentions one. So does <a href="http://www.rainstor.com/news-blog/news/users-demand-saas-data-escrow-services">RainStor&#8217;s latest blog post</a>. <a href="http://www.dbms2.com/2009/12/27/introduction-to-gooddata/">Gooddata</a>, a SaaS vendor itself, seems focused on analyzing data that was originally created via SaaS. I haven&#8217;t talked with Cast Iron or Pervasive for a while, but when I did, their ETL market targeting was <a href="http://www.dbms2.com/2008/03/21/cast-iron-systems-focuses-on-saas-data-integration/">all about SaaS</a>. And of course, I hear dumber SaaS-focus ideas as well. I think the biggest substantive reason for this trend is &#8212; i</span><span style="letter-spacing: 0px;">f you don&#8217;t have the broadest feature set, and fear large enterprises therefore won&#8217;t want your stuff, going after SMBs makes sense. And SMBs are presumed to be going SaaS. Also in the mix, of course, are a single platform to support, a small number of large SaaS vendors to sell to or partner with, and/or general trendiness.<br />
</span></p>
<p><span style="letter-spacing: 0px;"><br />
</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/12/30/more-miscellany/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Notes on RainStor, the company formerly known as Clearpace</title>
		<link>http://www.dbms2.com/2009/12/11/rainstor-clearpace/</link>
		<comments>http://www.dbms2.com/2009/12/11/rainstor-clearpace/#comments</comments>
		<pubDate>Sat, 12 Dec 2009 00:15:02 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Rainstor]]></category>
		<category><![CDATA[SenSage]]></category>
		<category><![CDATA[Telecommunications]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1295</guid>
		<description><![CDATA[Information preservation* DBMS vendor Clearpace officially changed its name to RainStor this week. RainStor is also relocating its CEO John Bantleman and more generally its headquarters to San Francisco. This all led to a visit with John and his colleague Ramon Chen, highlights of which included: RainStor expects to finish the year with &#62; 50 [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><a href="http://www.dbms2.com/2008/12/16/database-archiving-and-information-preservation/">I</a><a href="http://www.dbms2.com/2008/12/16/database-archiving-and-information-preservation/">nformation preservation</a>* DBMS vendor Clearpace officially changed its name to RainStor this week. RainStor is also relocating its CEO John Bantleman and more generally its headquarters to San Francisco. This all led to a visit with John and his colleague Ramon Chen, highlights of which included:<span id="more-1295"></span><!--more--></p>
<ul>
<li>RainStor expects to finish the 	year with &gt; 50 users (overwhelmingly via partners)</li>
<li>A big market for RainStor (at 	least in terms of signed partnerships and large deal activity) is 	retention of telecom records, for compliance purposes, typically for 	a 1-3 year period. This includes:
<ul>
<li>CDRs (Call Detail Records)</li>
<li>Mobile phone records including 	CDRs and missed calls</li>
<li>SMS (Short Message Service), 	including the complete text of same</li>
</ul>
</li>
<li>RainStor thinks a number of larger 	telcos have the need to store a billion records per day each. (I&#8217;m 	not sure how many subscribers such a telco would have to have).</li>
<li>John further thinks that, for the 	same query performance, RainStor can handle such a database on 4 	blades. More precisely, he says that&#8217;s what happened at a test 	conducted by a major technology firm. In the same test case, SenSage 	required 40 blades, and Oracle required 80 or more cores on a pair 	of big SMP machines.  John further says that the Oracle solution 	required a new table and new tablespace every day, while RainStor&#8217;s 	took 3 days for initial installation and required no DBA afterwards. 	However, I&#8217;m in no position to verify this report independently.</li>
<li>In a different kind of proof 	point, so extreme it gives even the RainStor folks pause, a user has 	retired 300 different applications and put their databases onto a 	single 2-core box. (Presumably, this is via RainStor&#8217;s OEM 	relationship with Informatica.)</li>
<li>Coming Very Soon are some services 	tying RainStor&#8217;s DBMS to obvious-suspect SaaS offerings. The core 	positioning is “SaaS data escrow”.i.e., RainStor will help you 	ensure that, in a worst-case scenario, there&#8217;s a nice safe copy of 	your data you can get at. RainStor also encourages you to do basic 	reporting and BI against the RainStor copy of the data, if you 	choose.</li>
<li>The idea I&#8217;ve been pushing lately 	of taking a heterogeneous replication offering like Continuent&#8217;s and 	having it feed an archiving store like RainStor&#8217;s has hit a rather 	basic snag. RainStor doesn&#8217;t actually consume change data capture 	kinds of information directly, at least as of yet, because of 	difficulties fitting such a stream into its 	guaranteed-data-immutability model.</li>
</ul>
<p><em>*I coined that category description for John in the tea room of the Park Lane Hotel. He&#8217;s subsequently embraced it enthusiastically, and I kind of like it myself. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </em></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><em><strong>Related links</strong></em></p>
<ul>
<li>
<p style="margin-bottom: 0in;">RainStor&#8217;s approach to 	compression, as described by <a href="http://www.dbms2.com/2009/05/14/the-secret-sauce-to-clearpaces-compression/">me</a> and by <a href="http://www.rainstor.com/news-blog/blog/rainstors-secret-sauce-data-and-pattern-deduplication">RainStor itself</a></p>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/12/11/rainstor-clearpace/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The secret sauce to Clearpace&#8217;s compression</title>
		<link>http://www.dbms2.com/2009/05/14/the-secret-sauce-to-clearpaces-compression/</link>
		<comments>http://www.dbms2.com/2009/05/14/the-secret-sauce-to-clearpaces-compression/#comments</comments>
		<pubDate>Thu, 14 May 2009 05:51:09 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Rainstor]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=782</guid>
		<description><![CDATA[In an introduction to archiving vendor Clearpace last December, I noted that Clearpace claimed huge compression successes for its NParchive product (Clearpace likes to use a figure of 40X), but didn&#8217;t give much reason that NParchive could compress a lot more effectively than other columnar DBMS. Let me now follow up on that. To the [...]]]></description>
			<content:encoded><![CDATA[<p>In an <a href="http://www.dbms2.com/2008/12/16/introduction-to-clearpace/">introduction to archiving vendor Clearpace last December</a>, I noted that Clearpace claimed huge compression successes for its NParchive product (Clearpace likes to use a figure of 40X), but didn&#8217;t give much reason that NParchive could compress a lot more effectively than other columnar DBMS. Let me now follow up on that.</p>
<p>To the extent there&#8217;s a Clearpace secret sauce, it seems to lie in NParchive&#8217;s unusual data access method.  NParchive doesn&#8217;t just tokenize the values in individual columns; it tokenizes multi-column fragments of rows.  Which particular columns to group together in that way seems to be decided automagically; the obvious guess is that this is based on estimates of the cardinality of their Cartesian products.</p>
<p>Of the top of my head, examples for which this strategy might be particularly successful include:</p>
<ul>
<li>Denormalized databases</li>
<li>Message stores with lots of header information</li>
<li>Addresses</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/05/14/the-secret-sauce-to-clearpaces-compression/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Database archiving and information preservation</title>
		<link>http://www.dbms2.com/2008/12/16/database-archiving-and-information-preservation/</link>
		<comments>http://www.dbms2.com/2008/12/16/database-archiving-and-information-preservation/#comments</comments>
		<pubDate>Tue, 16 Dec 2008 14:42:46 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Rainstor]]></category>
		<category><![CDATA[SAND Technology]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=642</guid>
		<description><![CDATA[Two similar companies reached out to me recently – SAND Technology and Clearpace. Their current market focus is somewhat different: Clearpace talks mainly of archiving, and sells first and foremost into the compliance market, while SAND has the most traction providing “near-line” storage for SAP databases.* But both stories boil down to pretty much the [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Two similar companies reached out to me recently – <a href="http://www.dbms2.com/2008/12/16/introduction-to-sand-technology/">SAND Technology</a> <span style="font-style: normal;">and</span> <a href="http://www.dbms2.com/2008/12/16/introduction-to-clearpace/">Clearpace</a>. <span style="font-style: normal;">Their current market focus is somewhat different: Clearpace talks mainly of archiving, and sells first and foremost into the compliance market, while SAND has the most traction providing “near-line” storage for SAP databases.*  But both stories boil down to pretty much the same thing: </span><span style="font-style: normal;"><strong>Cheap, trustworthy data storage with good-enough query capabilities. </strong></span><span style="font-style: normal;"><span> E.g., I think both companies would agree the following is a not-too-misleading first-approximation characterization of their respective products:</span></span></p>
<ul>
<li>Fully 	functional relational DBMS.</li>
<li>Claims of fast 	query performance, but that&#8217;s not how they&#8217;re sold.</li>
<li>Huge 	compression.</li>
<li>Careful 	attention to time-stamping and auditability.</li>
</ul>
<p style="margin-bottom: 0in;"><span id="more-642"></span><em>*Actually, SAND has two products, one of which really </em><span style="font-style: normal;">is</span><em> sold as a DBMS, competing with Sybase IQ or Netezza. But I&#8217;m talking about the other one, which is the current main focus of SAND&#8217;s sales efforts.</em></p>
<p style="margin-bottom: 0in; font-style: normal;">When Clearpace CEO John Bantleman and I chatted last week, he spoke of such uses as:</p>
<ul>
<li>Cheap 	compliance with data-retention regulations</li>
<li>Keeping data 	accessible even though the application that created it has been 	decommissioned</li>
<li>Cheap 	duplication for disaster recovery</li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;">He also invoked the buzzphrase “information lifecycle management” (ILM).</p>
<p style="margin-bottom: 0in; font-style: normal;">When I pointed out that all of this could be construed as being aspects of “information preservation,” John enthusiastically agreed. Yesterday I bounced that phrase off SAND&#8217;s marketing chief Linda Arens, and she liked it too.</p>
<p style="margin-bottom: 0in; font-style: normal;">And that makes perfect sense. What do “archives” and “archivists” do in the classical senses of the terms?  First and foremost, they preserve information.  They don&#8217;t feel they&#8217;ve done their job well if it&#8217;s too too difficult to access, but utter ease-of-use is not their top concern.</p>
<p style="margin-bottom: 0in;"><em>Digression: I actually spent a day once with a university archivist (retired).  She came to my house to check out a portrait of one of my Monasch ancestors and to rummage through my 19<sup>th</sup> Century family photos.  Australian readers &#8212; and WW1 history buffs &#8212; will have little trouble guessing which university she was from. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </em></p>
<p style="margin-bottom: 0in; font-style: normal;">So far, so good.  But why use a specialty product for the purpose of information preservation, when you  can instead just dump everything into your data warehouse environment?  Well, the vast majority of large enterprises do just that, getting by without specialized technology from SAND, Clearpace, or any close competitor.   And of course data warehouse technology is getting cheaper very quickly.  So not all enterprises will ever need what SAND and Clearpace have to offer.</p>
<p style="margin-bottom: 0in; font-style: normal;">But every enterprise does need to think about a comprehensive information preservation strategy.  Too often ILM puts the cart before the horse, focusing on throwing stuff away more than on keeping it.  Notwithstanding the excessive popularity of some inherently shady legal tricks &#8212; “Let&#8217;s make sure to destroy the evidence before somebody can think of ordering us to preserve it” &#8212; and also notwithstanding some legitimate rules about privacy &#8212; <strong>preserving information is almost always better than losing it,</strong> whether accidentally or on purpose.</p>
<p style="margin-bottom: 0in; font-style: normal;">So I&#8217;d like to propose a deceptively simple exercise for any  enterprise, really of any size.  Inventory all the sources of potentially valuable information that are already being tracked in your enterprise.   Then   make a matching list of the preservation strategies for each. Some of those strategies will be very good.   Others will fall into that ever-popular category “not ideal, but also not bad enough to bother fixing.” Then see which kinds of information are covered neither by a good preservation strategy, nor one that&#8217;s good enough. And think about whether you should move all those into one or two* information preservation environments of last resort.</p>
<p style="margin-bottom: 0in;"><em>*Two = one for tabular data + one for documents and media</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/12/16/database-archiving-and-information-preservation/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Introduction to Clearpace</title>
		<link>http://www.dbms2.com/2008/12/16/introduction-to-clearpace/</link>
		<comments>http://www.dbms2.com/2008/12/16/introduction-to-clearpace/#comments</comments>
		<pubDate>Tue, 16 Dec 2008 14:41:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Rainstor]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=641</guid>
		<description><![CDATA[Clearpace is a UK-based startup in a similar market to what SAND Technology has gotten into – DBMS archiving, with a strong focus on compression and general cost-effectiveness. Clearpace launched its product NParchive a couple of quarters ago, and says it now has 25 people and $1 million or so in revenue. Clearpace NParchive technical [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Clearpace is a UK-based startup in a similar market to what <a href="http://www.dbms2.com/2008/12/16/introduction-to-sand-technology/">SAND Technology</a> has gotten into – DBMS archiving, with a strong focus on compression and general cost-effectiveness.  Clearpace launched its product NParchive a couple of quarters ago, and says it now has 25 people and $1 million or so in revenue.  Clearpace NParchive technical highlights include:<span id="more-641"></span></p>
<ul>
<li>NParchive takes a multi-version 	concurrency control approach.  Data is never updated in place; new 	information is just appended.  Clearpace is careful to “time-proof” 	the data, keeping track and allowing the unwinding of, for example, 	changes in schema table structure.</li>
<li>Data is stored in very large 	blocks – the default is 1 million rows. Currently any change to 	actual data values – as opposed to just database design changes &#8211; 	requires rewriting a whole block, but a redo log is on the roadmap.</li>
<li>NParchive has four different 	approaches to compression, which can be used in series.  Clearpace 	says that if any two of the four work well on a particular data set, 	20X compression is realistically.  If all four work well, 50-100X 	can be achieved.  Presumably, not all <em>have</em> to be turned on 	for any particular database.</li>
<li>Three of NParchive&#8217;s approaches to 	compression are pretty standard – tokenization, a “collection of 	cheap, standard compression algorithms” (including delta, which 	often works well), and EDLIB.</li>
<li>The fourth part of the NParchive 	compression story has something to do with representing records as 	trees, and noticing when patterns are repeated and deduping them.  	I&#8217;m still fuzzy on how that all works.  <em>(Edit: I subsequently posted an <a href="http://www.dbms2.com/2009/05/14/the-secret-sauce-to-clearpaces-compression/">explanation</a> of that part.)</em></li>
<li>Clearpace believes NParchive&#8217;s 	query performance is competitive with Oracle&#8217;s but not, say, 	Netezza&#8217;s. (And yes, that&#8217;s a meaningful assertion, even if you 	believe that all Oracle performance problems are solely due to poor 	implementation practices.)</li>
<li>Clearpace says that no database 	administration is ever needed.  Everything happens automagically – 	or as they say nowadays, “autonomically.”</li>
</ul>
<p style="margin-bottom: 0in;">According to Clearpace CEO John Bantleman, NParchive use cases include:</p>
<ul>
<li>Archiving data warehouses</li>
<li>Archiving log files and similar 	kinds of data that never made it into a data warehouse</li>
<li>Storing – and making available 	for query – data from decommissioned old applications</li>
</ul>
<p style="margin-bottom: 0in;">If I understood a couple of actual OEM stories correctly, we can also add to the list the archiving of transaction processing databases.  Buzzphrases mentioned included information lifecycle management (ILM) and disaster recovery.</p>
<p style="margin-bottom: 0in;">And then I coined a <a href="http://www.dbms2.com/2008/12/16/database-archiving-and-information-preservation/">database archiving buzzphrase</a> of my own &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/12/16/introduction-to-clearpace/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

