<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Greenplum is in the big leagues</title>
	<atom:link href="http://www.dbms2.com/2008/08/25/greenplum-is-in-the-big-leagues/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com/2008/08/25/greenplum-is-in-the-big-leagues/</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 04 Feb 2010 06:32:15 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Greenplum customer notes &#124; DBMS2 -- DataBase Management System Services</title>
		<link>http://www.dbms2.com/2008/08/25/greenplum-is-in-the-big-leagues/comment-page-1/#comment-144520</link>
		<dc:creator>Greenplum customer notes &#124; DBMS2 -- DataBase Management System Services</dc:creator>
		<pubDate>Sun, 18 Oct 2009 18:43:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=495#comment-144520</guid>
		<description>[...] As of the past quarter or two, &lt;10% of Greenplum&#8217;s sales activity is on Sun, which works out to maybe one sale per quarter and at most a small number of sales cycles. (That&#8217;s down from from 50%+ not that long ago.) [...]</description>
		<content:encoded><![CDATA[<p>[...] As of the past quarter or two, &lt;10% of Greenplum&#8217;s sales activity is on Sun, which works out to maybe one sale per quarter and at most a small number of sales cycles. (That&#8217;s down from from 50%+ not that long ago.) [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greenplum update &#8212; Release 3.3 and so on &#124; DBMS2 -- DataBase Management System Services</title>
		<link>http://www.dbms2.com/2008/08/25/greenplum-is-in-the-big-leagues/comment-page-1/#comment-141019</link>
		<dc:creator>Greenplum update &#8212; Release 3.3 and so on &#124; DBMS2 -- DataBase Management System Services</dc:creator>
		<pubDate>Mon, 21 Sep 2009 08:52:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=495#comment-141019</guid>
		<description>[...] Greenplum had about 65 paying customers at the end of Q1. I&#8217;ve forgotten how that jibes with a figure of 50 customers last August. [...]</description>
		<content:encoded><![CDATA[<p>[...] Greenplum had about 65 paying customers at the end of Q1. I&#8217;ve forgotten how that jibes with a figure of 50 customers last August. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greenplum &#8211; Reaching Escape Velocity &#171; Market Strategies for IT Suppliers</title>
		<link>http://www.dbms2.com/2008/08/25/greenplum-is-in-the-big-leagues/comment-page-1/#comment-121062</link>
		<dc:creator>Greenplum &#8211; Reaching Escape Velocity &#171; Market Strategies for IT Suppliers</dc:creator>
		<pubDate>Mon, 11 May 2009 22:50:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=495#comment-121062</guid>
		<description>[...] there&#8217;s no need for me to do that here &#8211; Curt Monash does an excellent job on this post from 2008, and he recently talked with Ebay about their use of Greenplum on a massive scale in this article. [...]</description>
		<content:encoded><![CDATA[<p>[...] there&#8217;s no need for me to do that here &#8211; Curt Monash does an excellent job on this post from 2008, and he recently talked with Ebay about their use of Greenplum on a massive scale in this article. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Infology.Ru &#187; Blog Archive &#187; Оценивая КПД системы хранения: какую долю объема системы хранения занимают данные пользователя</title>
		<link>http://www.dbms2.com/2008/08/25/greenplum-is-in-the-big-leagues/comment-page-1/#comment-100119</link>
		<dc:creator>Infology.Ru &#187; Blog Archive &#187; Оценивая КПД системы хранения: какую долю объема системы хранения занимают данные пользователя</dc:creator>
		<pubDate>Tue, 21 Oct 2008 21:14:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=495#comment-100119</guid>
		<description>[...] вернуть все или почти все это назад. Например, для клиента, у которого объем хранилища равен нескольким п... и который сейчас загружает данными свои системы [...]</description>
		<content:encoded><![CDATA[<p>[...] вернуть все или почти все это назад. Например, для клиента, у которого объем хранилища равен нескольким п&#8230; и который сейчас загружает данными свои системы [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greenplum pushes envelope with MapReduce and parallelism enhancements to its extreme-scale data offering &#124; Dana Gardner&#8217;s BriefingsDirect &#124; ZDNet.com</title>
		<link>http://www.dbms2.com/2008/08/25/greenplum-is-in-the-big-leagues/comment-page-1/#comment-98149</link>
		<dc:creator>Greenplum pushes envelope with MapReduce and parallelism enhancements to its extreme-scale data offering &#124; Dana Gardner&#8217;s BriefingsDirect &#124; ZDNet.com</dc:creator>
		<pubDate>Mon, 29 Sep 2008 13:50:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=495#comment-98149</guid>
		<description>[...] promise to wrap MapReduce into the newest version of its data solutions. The announcement from the data warehousing and analytics supplier comes to a fast-changing landscape, given last week&#8217;s HP-Oracle Exadata [...]</description>
		<content:encoded><![CDATA[<p>[...] promise to wrap MapReduce into the newest version of its data solutions. The announcement from the data warehousing and analytics supplier comes to a fast-changing landscape, given last week&#8217;s HP-Oracle Exadata [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Web analytics &#8212; clickstream and network event data &#124; DBMS2 -- DataBase Management System Services</title>
		<link>http://www.dbms2.com/2008/08/25/greenplum-is-in-the-big-leagues/comment-page-1/#comment-97765</link>
		<dc:creator>Web analytics &#8212; clickstream and network event data &#124; DBMS2 -- DataBase Management System Services</dc:creator>
		<pubDate>Mon, 22 Sep 2008 10:10:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=495#comment-97765</guid>
		<description>[...] believe that both of the previously mentioned petabyte+ databases on Greenplum will feature clickstream [...]</description>
		<content:encoded><![CDATA[<p>[...] believe that both of the previously mentioned petabyte+ databases on Greenplum will feature clickstream [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Phil Rack</title>
		<link>http://www.dbms2.com/2008/08/25/greenplum-is-in-the-big-leagues/comment-page-1/#comment-97212</link>
		<dc:creator>Phil Rack</dc:creator>
		<pubDate>Tue, 09 Sep 2008 23:01:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=495#comment-97212</guid>
		<description>Interesting indeed. I&#039;ve been writing some software for WPS as well as SAS so that users can have access to R routines and R graphics. Of course, part of the problem is the memory constraint issue. I&#039;ve been playing with executing R where the user can determine which R routines/programs they want to run in parallel and have WPS or SAS collect the output and write it back into the appropriate windows. This actually works but I&#039;m not satisfied with what I have done.

Since I don&#039;t see R going 64 bit on Windows anytime soon, I&#039;m starting the process of specing out a system where R runs in a Linux 64 bit OS and has access to a lot more memory space to solve statistical problems. Currently, the idea is to make the Linux system a VM that is easily installed and has all quite a bit of the R libraries already installed.

All I need is time!</description>
		<content:encoded><![CDATA[<p>Interesting indeed. I&#8217;ve been writing some software for WPS as well as SAS so that users can have access to R routines and R graphics. Of course, part of the problem is the memory constraint issue. I&#8217;ve been playing with executing R where the user can determine which R routines/programs they want to run in parallel and have WPS or SAS collect the output and write it back into the appropriate windows. This actually works but I&#8217;m not satisfied with what I have done.</p>
<p>Since I don&#8217;t see R going 64 bit on Windows anytime soon, I&#8217;m starting the process of specing out a system where R runs in a Linux 64 bit OS and has access to a lot more memory space to solve statistical problems. Currently, the idea is to make the Linux system a VM that is easily installed and has all quite a bit of the R libraries already installed.</p>
<p>All I need is time!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Luke Lonergan</title>
		<link>http://www.dbms2.com/2008/08/25/greenplum-is-in-the-big-leagues/comment-page-1/#comment-97208</link>
		<dc:creator>Luke Lonergan</dc:creator>
		<pubDate>Tue, 09 Sep 2008 22:15:04 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=495#comment-97208</guid>
		<description>I see - it&#039;s actually still an in-memory proposition in Greenplum within the R functions themselves, but we can stream data through the R functions and the output may end up spooling to disk if our optimizer thinks it has to.

An example use-case where we&#039;ve used R as a UDF: doing various forms of linear regression required the use of a matrix pseudo-inverse routine to solve the eigenvalue problem.  Instead of writing our own pseudo-inverse routine, we instead used the one that comes with R to evaluate different approaches.  The matrix solve part is actually pretty small, so we were able to do it in memory as the final stage of processing and the R routine was a good fit.

In the end, we ended up implementing our own pseudo-inverse routine, now available as the &#039;pinv()&#039; from within Greenplum.  It&#039;s written in C internally and is blazingly fast.

So - the embedded R UDF capability within Greenplum is useful, but it&#039;s often good to re-write the routine for performance optimization when moving to production.  We provide many of these kinds of functions to our customers in the form of libraries.  Note that we also provide a large array of built-in matrix manipulation routines as well.</description>
		<content:encoded><![CDATA[<p>I see &#8211; it&#8217;s actually still an in-memory proposition in Greenplum within the R functions themselves, but we can stream data through the R functions and the output may end up spooling to disk if our optimizer thinks it has to.</p>
<p>An example use-case where we&#8217;ve used R as a UDF: doing various forms of linear regression required the use of a matrix pseudo-inverse routine to solve the eigenvalue problem.  Instead of writing our own pseudo-inverse routine, we instead used the one that comes with R to evaluate different approaches.  The matrix solve part is actually pretty small, so we were able to do it in memory as the final stage of processing and the R routine was a good fit.</p>
<p>In the end, we ended up implementing our own pseudo-inverse routine, now available as the &#8216;pinv()&#8217; from within Greenplum.  It&#8217;s written in C internally and is blazingly fast.</p>
<p>So &#8211; the embedded R UDF capability within Greenplum is useful, but it&#8217;s often good to re-write the routine for performance optimization when moving to production.  We provide many of these kinds of functions to our customers in the form of libraries.  Note that we also provide a large array of built-in matrix manipulation routines as well.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.dbms2.com/2008/08/25/greenplum-is-in-the-big-leagues/comment-page-1/#comment-97207</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Tue, 09 Sep 2008 22:05:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=495#comment-97207</guid>
		<description>Luke,

It might be helpful if you listed a few ways R results might wind up on disk -- if indeed there are a few different ways. :)

Thanks,

CAM</description>
		<content:encoded><![CDATA[<p>Luke,</p>
<p>It might be helpful if you listed a few ways R results might wind up on disk &#8212; if indeed there are a few different ways. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Thanks,</p>
<p>CAM</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Luke Lonergan</title>
		<link>http://www.dbms2.com/2008/08/25/greenplum-is-in-the-big-leagues/comment-page-1/#comment-97206</link>
		<dc:creator>Luke Lonergan</dc:creator>
		<pubDate>Tue, 09 Sep 2008 22:01:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=495#comment-97206</guid>
		<description>Hi Phil,

That&#039;s one problem with R in general, it holds its results in RAM.

With Greenplum, we enable you to run R programs as stored procedures, which provides you the ability to reuse the math routines in R to some extent, specifically to help you calculate intermediate results as part of WINDOW functions or other OLAP use-cases.

We have also re-implemented some of the routines that R provides as native parallel functions within Greenplum, including multi-variable linear regression, a naive bayes classifier and some others.</description>
		<content:encoded><![CDATA[<p>Hi Phil,</p>
<p>That&#8217;s one problem with R in general, it holds its results in RAM.</p>
<p>With Greenplum, we enable you to run R programs as stored procedures, which provides you the ability to reuse the math routines in R to some extent, specifically to help you calculate intermediate results as part of WINDOW functions or other OLAP use-cases.</p>
<p>We have also re-implemented some of the routines that R provides as native parallel functions within Greenplum, including multi-variable linear regression, a naive bayes classifier and some others.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic page generated in 0.254 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2010-02-04 11:13:46 -->
