<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; SAS Institute</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/sas-institute/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 09 Feb 2012 09:21:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Comments on SAS</title>
		<link>http://www.dbms2.com/2012/02/08/comments-on-sas/</link>
		<comments>http://www.dbms2.com/2012/02/08/comments-on-sas/#comments</comments>
		<pubDate>Wed, 08 Feb 2012 22:51:11 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[KXEN]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[SAS Institute]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5939</guid>
		<description><![CDATA[A reporter interviewed me via IM about how CIOs should view SAS Institute and its products. Naturally, I have edited my comments (lightly) into a blog post. They turned out to be clustered into three groups, as follows: SAS faces a number of challenges, not unlike those faced by other high-priced legacy technology vendors. It [...]]]></description>
			<content:encoded><![CDATA[<p>A reporter interviewed me via IM about how CIOs should view SAS Institute and its products. Naturally, I have edited my comments (lightly) into a blog post. They turned out to be clustered into three groups, as follows:</p>
<ul>
<li>SAS faces a number of challenges, not unlike those faced by other high-priced legacy technology vendors.
<ul>
<li>It is used by organizations who have large budgets to pay for the product and to pay people to be expert on the product&#8217;s intricacies.</li>
<li>SAS has not integrated with scale-out analytic DBMS technologies as well or quickly as had been hoped, or as earlier marketing suggested was likely.</li>
<li>SAS has not been strong in helping its users do <a href="http://www.dbms2.com/2011/11/28/agile-predictive-analytics-the-easy-parts/">agile predictive analytics</a>.</li>
</ul>
</li>
<li>SAS&#8217; strengths are concentrated in product breadth:
<ul>
<li>Lots of statistical algorithms.</li>
<li>Various vertical products that make the modeling techniques more accessible in specific application domains.</li>
<li><a href="http://www.dbms2.com/2011/04/21/sas-hpa-does-make-sense-after-all/">Various approaches to engineering for scalability</a> &#8212; no one of those has been a table-thumping success to date, but SAS has the resources to keep trying.</li>
<li>Some level of integration with its own business intelligence and text analytics products.</li>
</ul>
</li>
<li>For any particular use case, the burden of proof is on SAS alternatives to show that they have enough pieces in the toolkit to meet the needs.
<ul>
<li>SPSS (now owned by IBM) also has legacy issues.</li>
<li>KXEN is focused on marketing use cases.</li>
<li>Mahout has been one of the less successful Hadoop-related open source projects.</li>
<li>R-based technology is still maturing.</li>
<li>The modeling capabilities (as opposed to just scoring) bundled into RDBMS and well-parallelized tend to be pretty limited. Apparent exceptions tend to just be R repackaged.</li>
</ul>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/08/comments-on-sas/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Agile predictive analytics &#8211; the heart of the matter</title>
		<link>http://www.dbms2.com/2011/11/28/agile-predictive-analytics-the-heart-of-the-matter/</link>
		<comments>http://www.dbms2.com/2011/11/28/agile-predictive-analytics-the-heart-of-the-matter/#comments</comments>
		<pubDate>Mon, 28 Nov 2011 19:40:26 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[SAS Institute]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5746</guid>
		<description><![CDATA[I&#8217;ve already suggested that several apparent issues in predictive analytic agility can be dismissed by straightforwardly applying best-of-breed technology, for example in analytic data management. At first blush, the same could be said about the actual analysis, which comprises: Data preparation, which is tedious unless you do a good job of automating it. Running the [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve already suggested that several apparent issues in <a href="http://www.dbms2.com/2011/11/28/agile-predictive-analytics-the-easy-parts/">predictive analytic agility</a> can be dismissed by straightforwardly applying best-of-breed technology, for example in analytic data management. At first blush, the same could be said about the actual analysis, which comprises:</p>
<ul>
<li>Data preparation, which is tedious unless you do a good job of automating it.</li>
<li>Running the actual algorithms.</li>
</ul>
<p>Numerous statistical software vendors (or open source projects) help you with the second part; some make strong claims in the first area as well (e.g., my clients at KXEN). Even so, large enterprises typically have statistical silos, commonly featuring expensive annual SAS licenses and seemingly slow-moving SAS programmers.</p>
<p>As I see it, the predictive analytics workflow goes something like this<span id="more-5746"></span>:</p>
<ul>
<li>Business-knowledgeable people develop a theory as to what kinds of information and segmentation could be valuable in making better business micro-decisions.</li>
<li>Statistics-knowledgeable people determine a structure for modeling that reflects this theory.</li>
<li>Statistics-knowledgeable people tweak the model over time, within a fixed general structure, as new data comes in.</li>
<li>(Optional) Somebody sees to acquiring whatever data is needed that the organization doesn&#8217;t already have (and won&#8217;t get in the ordinary course of ongoing business).</li>
</ul>
<p>The optional last part can be a purchase of third-party information (relatively fast and easy) or the development of a business process (and if necessary associated software) to capture the information (not always so easy). But even if that&#8217;s taken care of, or not present, we have at least two hand-offs where agility can be lost:</p>
<ul>
<li>Businesspeople may throw a request &#8220;over the wall&#8221; to the statisticians, who then work on it as their schedule permits.</li>
<li>Once created, a model may be so set in stone that even small changes are as hard as building a new model from scratch.</li>
</ul>
<p>The second problem can be solved by the statisticians themselves, without outside involvement. Model research and model refinement should be separate processes. You can recheck your clustering on one schedule, but recalibrate your regressions against each cluster more frequently. If that all sounds forbiddingly difficult, perhaps your model recalibration process needs another level of automation.</p>
<p>So I&#8217;ve finally gotten to the point of saying what may have been obvious from the start: <strong>The only excusable impediment to predictive analytic agility is the hand-off from the people who know the business to the people who know the math.</strong> So let&#8217;s examine ways that difficulty can be resolved.</p>
<p>At big internet companies, the usual answer is something like</p>
<blockquote><p>Hey, it&#8217;s just data. From web logs. And network event logs. The data scientists know how to handle that.</p></blockquote>
<p>In financial trading firms, the answer is more</p>
<blockquote><p>The traders and analysts work closely together. Very closely. In fact, when the traders rip out their phones and throw them across the room, the analysts need to duck to avoid getting clobbered.</p></blockquote>
<p>In credit card or telecom marketing or insurance actuarial organizations, the answer may be</p>
<blockquote><p>Don&#8217;t worry; the stats geeks have been at this for a long time; they really do understand our business.</p></blockquote>
<p>All three approaches work.</p>
<p>But what about conventional enterprises, where line-of-business people may not be as math-savvy as internet developers or financial traders, and where the math experts may not have the business issues down cold? My flippant answer is that businesspeople should know some math too.* My more serious answer is that <strong>the &#8220;business analyst&#8221; role should be expanded </strong>beyond BI and planning<strong> to include lightweight predictive analytics as well.</strong></p>
<p><em>*I wasn&#8217;t being entirely flippant, of course. Statistics is even being taught in high school these days. And when I got a PhD in game theory, 2/3 of my thesis committee was at the Harvard Business School.</em></p>
<p>For example, at retailers:</p>
<ul>
<li>Market basket analysis is pretty simplistic (it only looks at small subsets of a basket at a time).</li>
<li>Seasonality is tricky. (Weather and so on can skew it.)</li>
<li>Each store or region can be its own universe.</li>
<li>Some of the results of analytics are rather coarse-grained &#8212; e.g., merchandise adjacencies &#8212; so precision in statistical analysis may not matter much anyway.</li>
</ul>
<p>And so truly rigorous statistical analysis may be both unfeasible and unnecessary; a lot of business-informed seat-of-the-pants reasoning needs to be mixed in. Consequently, there&#8217;s a lot to be said for pushing at least some retail predictive analytics pretty close to the merchandising department(s).</p>
<p>Similar stories could be told in many other industries and pursuits, including but emphatically not limited to:</p>
<ul>
<li>Event marketing.</li>
<li>College admissions.</li>
<li>Political campaigning.</li>
<li>Field maintenance at utility companies.</li>
<li>Price-setting (across many industries).</li>
</ul>
<p>In each case, it&#8217;s easy to see how statistical and predictive analytic techniques could add real value to the business. But it&#8217;s hard to imagine how the enterprise could support the kind of large, experienced, business-knowledge analytic operation one might find in hedge fund investing or telecom churn analysis. And absent that, it&#8217;s tough to see why the only people doing predictive analytics for the organization should sit in some silo of statistical expertise.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/11/28/agile-predictive-analytics-the-heart-of-the-matter/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Application areas for SAS HPA</title>
		<link>http://www.dbms2.com/2011/04/21/application-areas-for-sas-hpa/</link>
		<comments>http://www.dbms2.com/2011/04/21/application-areas-for-sas-hpa/#comments</comments>
		<pubDate>Thu, 21 Apr 2011 08:24:17 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Application areas]]></category>
		<category><![CDATA[Liberty and privacy]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[SAS Institute]]></category>
		<category><![CDATA[Telecommunications]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4342</guid>
		<description><![CDATA[When I talked with SAS about its forthcoming in-memory parallel SAS HPA offering, we talked briefly about application areas. The three SAS cited were: Consumer financial services. The idea here is to combine information about customers&#8217; use of all kinds of services &#8212; banking, credit cards, loans, etc. SAS believes this is both for marketing [...]]]></description>
			<content:encoded><![CDATA[<p>When I talked with SAS about <a href="http://www.dbms2.com/2011/04/21/sas-hpa-does-make-sense-after-all/">its forthcoming in-memory parallel SAS HPA offering</a>, we talked briefly about application areas. The three SAS cited were:</p>
<ul>
<li><strong>Consumer financial services.</strong> The idea here is to combine information about customers&#8217; use of all kinds of services &#8212; banking, credit cards, loans, etc. SAS believes this is both for marketing and risk analysis purposes.</li>
<li><strong>Insurance.</strong> We didn&#8217;t go into detail.</li>
<li><strong>Mobile communications.</strong> SAS&#8217; customers aren&#8217;t giving it details, but they&#8217;re excited about geocoding/geospatial data.</li>
</ul>
<p>Meanwhile, in another interview I heard about, SAS emphasized <strong>retailers.</strong> Indeed, that&#8217;s what spawned my recent post about <a href="http://www.dbms2.com/2011/04/06/so-can-logistic-regression-be-parallelized-or-not/">logistic regression</a>.</p>
<p>The mobile communications one is a bit scary. Your cell phone &#8212; and hence your cellular company &#8212; <a href="http://petewarden.github.com/iPhoneTracker/">know where you are</a>, pretty much from moment to moment. Even without advanced analytic technology applied to it, that&#8217;s a pretty direct privacy threat. Throw in some analytics, and your cell company might know, for example, who you hang out with (in person), where you shop, and how those things predict your future behavior. And so the government &#8212; or just your employer &#8212; might know those things too.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/04/21/application-areas-for-sas-hpa/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>In-memory, parallel, not-in-database SAS HPA does make sense after all</title>
		<link>http://www.dbms2.com/2011/04/21/sas-hpa-does-make-sense-after-all/</link>
		<comments>http://www.dbms2.com/2011/04/21/sas-hpa-does-make-sense-after-all/#comments</comments>
		<pubDate>Thu, 21 Apr 2011 08:23:41 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[SAS Institute]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4343</guid>
		<description><![CDATA[I talked with SAS about its new approach to parallel modeling. The two key points are: SAS no longer plans to go as far with in-database modeling as it previously intended. Rather, SAS plans to run in RAM on MPP DBMS appliances, exploiting MPI (Message Passing Interface). The whole thing is called SAS HPA (High-Performance [...]]]></description>
			<content:encoded><![CDATA[<p>I talked with SAS about its <a href="../../../../../2011/03/13/so-how-many-columns-can-a-single-table-have-anyway/">new approach to parallel modeling</a>. The two key points are:</p>
<ul>
<li><strong>SAS no longer plans to go as far with in-database modeling as it previously intended.</strong></li>
<li>Rather, <strong>SAS plans to run in RAM on MPP DBMS appliances,</strong> exploiting MPI (Message Passing Interface).</li>
</ul>
<p>The whole thing is called SAS HPA (High-Performance Analytics), in an obvious reference to HPC (High-Performance Computing). It will run initially on RAM-heavy appliances from Teradata and EMC Greenplum.</p>
<p>A lot of what&#8217;s going on here is that SAS found it annoyingly difficult to parallelize modeling within the framework of a massively parallel DBMS such as Teradata. Notes on that aspect include:</p>
<ul>
<li><strong>SAS wasn&#8217;t exploiting the capabilities of individual DBMS to their fullest;</strong> rather, it was looking for an approach that would work across multiple brands of DBMS. Thus, for example, the fact that Aster&#8217;s analytic platform architecture is more flexible or powerful than Teradata&#8217;s didn&#8217;t help much with making SAS run within the Aster nCluster database.</li>
<li>Notwithstanding everything else, <strong>SAS did make a certain set of modeling procedures run in-database.</strong></li>
<li><strong>SAS&#8217; previous plans to run in-database modeling in Aster and/or Netezza DBMS may never come to fruition.</strong></li>
</ul>
<p><span id="more-4343"></span>SAS&#8217; problems developing in-database modeling stem from, in essence, the limitations of UDFs (User Defined Functions). So why weren&#8217;t, for example, <a href="../../../../../2009/08/02/teradata-13-focuses-on-advanced-analytic-performance/">Teradata&#8217;s 2009 enhancements to its UDF capabilities</a> enough? The clearest example SAS gave me is that, while <a href="../../../../../2011/03/13/so-how-many-columns-can-a-single-table-have-anyway/">database tables are commonly limited to something on the order of 1000 columns</a> (their figure as well as mine), SAS might need 50-100,000 columns. One reason seems to be interactions between variables; SAS used the word &#8220;multiplied&#8221; a few times, but even so was coy about whether this could simply be regarded as quadratic terms in a regression. Another reason seems to be that in some cases, every value in a column spawns a new column in an intermediate table/array; indeed, this seems to be going on in the previously discussed case of <a href="../../../../../2011/04/06/so-can-logistic-regression-be-parallelized-or-not/">logistic regression</a>.</p>
<p>SAS code will be launched by the DBMS/data warehouse appliances, so potentially it can run under their native workload management. Teradata presumably has enough workload management richness to exploit that; EMC Greenplum, as of my August 2010 notes, probably did not.</p>
<p>SAS was gracious enough to let me post its slide deck, in both <a href="http://www.monash.com/uploads/SAS_HPA_2011-Shorter.pdf">shorter</a> and <a href="http://www.monash.com/uploads/SAS_HPA_2011-Longer.pdf">longer</a> versions. Due to a technical glitch during the call, I neither looked at the slides nor took notes. I think the biggest loss from those difficulties is that I didn&#8217;t learn what the futures at the end of the longer deck were all about.</p>
<p><strong><em>Related links</em></strong></p>
<ul>
<li><a href="http://www.dbms2.com/2011/04/21/application-areas-for-sas-hpa/">Application areas for SAS HPA</a> (April, 2011)</li>
<li><a href="../../../../../2010/05/15/further-clarifying-in-database-mpp-sas/">SAS&#8217; MPP story as of May, 2010</a></li>
<li><a href="../../../../../2007/10/10/sas-goes-mpp-on-teradata-first/">SAS&#8217; plans to run in-database on Teradata</a> (October, 2007)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/04/21/sas-hpa-does-make-sense-after-all/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Revolution Analytics update</title>
		<link>http://www.dbms2.com/2011/04/08/revolution-analytics-update/</link>
		<comments>http://www.dbms2.com/2011/04/08/revolution-analytics-update/#comments</comments>
		<pubDate>Fri, 08 Apr 2011 09:45:30 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Health care]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Revolution Analytics]]></category>
		<category><![CDATA[SAS Institute]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4224</guid>
		<description><![CDATA[I wasn&#8217;t too impressed when I spoke with Revolution Analytics at the time of its relaunch last year. But a conversation Thursday evening was much clearer. And I even learned some cool stuff about general predictive modeling trends (see the bottom of this post). Revolution Analytics business and business model highlights include: Revolution Analytics is [...]]]></description>
			<content:encoded><![CDATA[<p>I wasn&#8217;t too impressed when <a href="../../../../../2010/05/04/revolution-analytics-confused/">I spoke with Revolution Analytics at the time of its relaunch last year</a>. But a conversation Thursday evening was much clearer. And I even learned some cool stuff about general predictive modeling trends (see the bottom of this post).</p>
<p>Revolution Analytics business and business model highlights include:</p>
<ul>
<li><strong>Revolution Analytics is an open-core vendor built around the R language.</strong> That is, Revolution Analytics offers proprietary code and support, with subscription pricing, that help in the use of open source software.</li>
<li>Unlike most open-core vendors I can think of, <strong>Revolution Analytics takes little responsibility for the actual open source part.</strong> Some &#8220;grants&#8221; for developing certain open source R pieces seem to be the main exception. While this has caused some hard feelings, I don&#8217;t have an accurate sense for their scope or severity.</li>
<li>Revolution Analytics also sells a single-user/workstation version of its product, freely admitting that this is mainly a lead generation strategy or, in my lingo, a &#8220;break-even leader.&#8221;</li>
<li>Revolution Analytics boasts <strong>around 100 customers, split about 70-30</strong> between the workstation seeding stuff and the real server product.</li>
<li>Revolution Analytics has &#8220;about&#8221; 37 employees. Headquarters are at 101 University Avenue (do I have to say in what city? <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  ). There are also a development office in Seattle and a sales office in New York.</li>
<li>Revolution Analytics&#8217; pricing is by size of server. &#8220;Small&#8221; servers &#8212; i.e. up to 12 cores &#8212; start at $25K/year.</li>
<li>Unsurprisingly, adoption is more alongside SAS et al. than rip-and-replace.</li>
</ul>
<p><span id="more-4224"></span>Revolution Analytics&#8217; top market sector by far appears to be financial services, both in trading/investment banks/hedge funds and in credit cards/risk analysis. Pharma/life sciences is second, but sales cycles are slow. There&#8217;s also been at least a little activity each in a variety of internet/media/entertainment/gaming/telecom sectors.</p>
<p>When I asked Revolution Analytics why one would use R rather than, say, SAS, Revolution cited three reasons that seemed to be driving customer interest:</p>
<ul>
<li><strong>You can do more with R. </strong>That may be debatable, but what&#8217;s harder to dispute is that there are a bunch of things you can do straightforwardly in R and its thousands of routines that would at best be more difficult in SAS.</li>
<li>Students today are learning R, so you have access to (affordable?) <strong>talent</strong>. That&#8217;s pretty clearly correct, although I do note SPSS&#8217; long history of academic social sciences use .</li>
<li>R is <strong>cheaper.</strong> It&#8217;s hard to argue with that one. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
</ul>
<p>Revolution Analytics&#8217; parallelized-R story starts something like this:</p>
<ul>
<li>Although R is generally thought of as requiring all data to be in RAM, Revolution also offers external memory algorithms. (&#8220;External memory algorithms&#8221; seems to be the discipline-standard way of saying &#8220;Not all data has to be in RAM.&#8221;)</li>
<li>In principle, Revolution is willing to parallelize external memory algorithms for you any which way &#8212; MapReduce, MPI (Message Passing Interface), and more.</li>
<li>Revolution parallelized for multi-core last fall. Multi-server scale-out is coming this summer.</li>
<li>Revolution is working on Netezza support. Revolution expects to use nzMatrix in the effort.</li>
<li>Yes, <a href="../../../../../2011/04/06/so-can-logistic-regression-be-parallelized-or-not/">logistic regression</a> is one of the algorithms Revolution parallelizes.</li>
</ul>
<p>Like Netezza with nzMatrix or Greenplum (now EMC) with its sparse vector routine, Revolution has some useful underpinnings to help with parallelization/scale-out as well. The main one seems to be a variance/covariance matrix, which can be arbitrarily large and can be computed in a very distributed way. Revolution notes that you can use this not just on data but also, for example, on parameters.</p>
<p>One analytic approach &#8212; if not meta-approach &#8212; that Revolution sees as hot is <a href="http://en.wikipedia.org/wiki/Ensemble_learning">ensemble learning</a>. Specifically mentioned was <a href="http://cran.r-project.org/web/packages/caret/vignettes/caretTrain.pdf">Max Kuhn&#8217;s caret package</a>, which evidently automates ensemble techniques. Also specifically mentioned was the Netflix Prize, which I gather was won by an ensemble approach. The idea behind ensemble techniques is that, rather than pick a particular kind of model, you throw a bunch against the wall. The first benefit is that you get to see what works best. The second benefit is that you can combine results and hopefully outperform any one of the models.</p>
<p>Obviously, ensemble techniques can require vastly more performance than just running a single model. I wouldn&#8217;t be surprised if, going forward, they turned out to be one of analytics&#8217; biggest performance challenges.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/04/08/revolution-analytics-update/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>So can logistic regression be parallelized or not?</title>
		<link>http://www.dbms2.com/2011/04/06/so-can-logistic-regression-be-parallelized-or-not/</link>
		<comments>http://www.dbms2.com/2011/04/06/so-can-logistic-regression-be-parallelized-or-not/#comments</comments>
		<pubDate>Wed, 06 Apr 2011 10:04:33 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[SAS Institute]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4177</guid>
		<description><![CDATA[A core point in SAS&#8217; pitch for its new MPI (Message-Passing Interface) in-memory technology seems to be logistic regression is really important, and shared-nothing MPP doesn&#8217;t let you parallelize it. The Mahout/Hadoop folks also seem to despair of parallelizing logistic regression. On the other hand, Aster Data said it had parallelized logistic regression a year [...]]]></description>
			<content:encoded><![CDATA[<p>A core point in SAS&#8217; pitch for its new MPI (Message-Passing Interface) in-memory technology seems to be logistic regression is really important, and shared-nothing MPP doesn&#8217;t let you parallelize it. The Mahout/Hadoop folks also seem to <a href="https://cwiki.apache.org/MAHOUT/logistic-regression.html">despair of parallelizing logistic regression</a>.</p>
<p>On the other hand, <a href="http://www.dbms2.com/2010/02/22/aster-data-ncluster-4-5/">Aster Data said it had parallelized logistic regression a year ago</a>. (Slides 6-7 from <a href="http://www.dbms2.com/2010/06/27/lots-of-aster-data-analytic-packages/">a mid-2010 Aster deck</a> may be clearer.) I&#8217;m guessing <a href="http://www.fuzzyl.com/in-database_analytics.php">Fuzzy Logix</a> might make a similar claim, although I&#8217;m not really sure.</p>
<p>What gives?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/04/06/so-can-logistic-regression-be-parallelized-or-not/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Comments on EMC Greenplum</title>
		<link>http://www.dbms2.com/2011/04/05/comments-on-emc-greenplum/</link>
		<comments>http://www.dbms2.com/2011/04/05/comments-on-emc-greenplum/#comments</comments>
		<pubDate>Wed, 06 Apr 2011 00:57:28 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data integration and middleware]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[SAS Institute]]></category>
		<category><![CDATA[Solid-state memory]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=4163</guid>
		<description><![CDATA[I am annoyed with my former friends at Greenplum, who took umbrage at a brief sentence I wrote in October, namely &#8220;eBay has thrown out Greenplum&#8220;.  Their reaction included: EMC Greenplum no longer uses my services. EMC Greenplum no longer briefs me. EMC Greenplum reneged on a commitment to fund an effort in the area [...]]]></description>
			<content:encoded><![CDATA[<p>I am annoyed with my former friends at Greenplum, who took umbrage at a brief sentence I wrote in October, namely &#8220;<a href="../../../../../2010/10/06/ebay-followup-greenplum-out-teradata-10-petabytes-hadoop-has-some-value-and-more/">eBay has thrown out Greenplum</a>&#8220;.  Their reaction included:</p>
<ul>
<li>EMC Greenplum no longer uses my services.</li>
<li>EMC Greenplum no longer briefs me.</li>
<li>EMC Greenplum reneged on a commitment to fund an effort in the area of privacy.</li>
</ul>
<p>The last one really hurt, because in trusting them, I put in quite a bit of effort, and discussed their promise with quite a few other people.</p>
<p><em><span id="more-4163"></span>Yes, that five-word sentence really seems to have been the problem. I&#8217;ve heard that from more than one source.</em></p>
<p>I think the rest is overwrought too, and not just because I regret the loss of revenue, or of what seemed to be a warm, friendly, hug-laden, and sushi-intensive relationship with Scott Yara and some other folks. At various times, on the subject of its eBay installation:</p>
<ul>
<li>Greenplum overoptimistically told me that eBay&#8217;s Teradata installation would be replaced with Greenplum gear.</li>
<li>Greenplum exaggerated the pace of its eBay installation; unfortunately, I believed them, and later had to publish a <a href="../../../../../2009/03/02/named-customer-silliness/">retraction</a>.</li>
<li>Greenplum neglected to tell me when eBay had its Greenplum equipment removed.</li>
</ul>
<p>Now the same Scott Yara who hovered over me for months in marketing micromanagement before I broke the news of <a href="../../../../../2009/04/30/ebays-two-enormous-data-warehouses/">the Greenplum and Teradata eBay installations</a> &#8212; he could do that because the whole discussion started out under NDA &#8212; doesn&#8217;t answer my email. Evidently, Greenplum thinks it&#8217;s OK to repeatedly be misleading, but doesn&#8217;t think it&#8217;s OK if my nuance is one they disagree with.</p>
<p><em>The most entertaining example I recall of Greenplum BS was when CTO Luke Lonergan told 50+ academics at the 2009 XLDB that Greenplum had 10 customers with half a petabyte each of data. I followed him out of the room and said &#8220;10 customers &#8212; half a petabyte each &#8212; I presume that&#8217;s for sufficiently small values of &#8216;one half&#8217;?&#8221; We eventually settled on a value of &#8220;one half&#8221; in the 0.2 range &#8212; which is actually a pretty impressive claim in itself.</em></p>
<p>Be all that as it may, EMC Greenplum has a couple of press releases out on which I&#8217;ve been asked to comment. One is a deal with <a href="http://www.greenplum.com/news/345/388/SAS-to-offer-high-performance-analytics-on-EMC-Greenplum-database-appliance/d,press-releases/">SAS</a>, less impressive than SAS&#8217; deals with Teradata and Aster Data in that it offers no actual in-database modeling. Yes, it sounds like modeling on the same nodes where the data sits, but it sounds less desirable than true in-database modeling in that:</p>
<ul>
<li>You can only get great performance if the amount of data modeled is small enough to fit into RAM.</li>
<li>Integration with other database processing, MapReduce, etc. may be limited.</li>
</ul>
<p>Also, <a href="http://www.greenplum.com/news/346/388/EMC-Expands-Greenplum-Big-Data-Analytics-Appliance-Family/d,press-releases/">EMC Greenplum expanded its line of appliances</a>, to include one that seems optimized for price-per-terabyte and one with solid-state drives. So far, that&#8217;s very standard stuff. There&#8217;s also a new data loading appliance, which seems to catch up with the Aster Data&#8217;s 2008 strategy of having <a href="../../../../../2008/09/05/mpp-data-warehouse-nodes/">separate nodes for bulk loading</a>.</p>
<p><em>Ironically, when <a href="../../../../../2010/10/10/partnering-with-cloudera/">Aster moved away from a total reliance on that strategy,</a> it was becoming more Greenplum-like. As is so often the case, it seems that different vendors&#8217; feature sets are converging.</em></p>
<p>Meanwhile, the last I heard about Greenplum&#8217;s previously very strategic <a href="../../../../../2010/04/12/greenplumchorus/">Chorus</a> effort is that it&#8217;s being revamped. I don&#8217;t get the impression it&#8217;s nearly as central to Greenplum&#8217;s strategy as it used to be.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2011/04/05/comments-on-emc-greenplum/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Notes and links October 22, 2010</title>
		<link>http://www.dbms2.com/2010/10/22/notes-and-links-october-22-2010/</link>
		<comments>http://www.dbms2.com/2010/10/22/notes-and-links-october-22-2010/#comments</comments>
		<pubDate>Fri, 22 Oct 2010 06:47:05 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Liberty and privacy]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[SAS Institute]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[VoltDB and H-Store]]></category>
		<category><![CDATA[eBay]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=3346</guid>
		<description><![CDATA[A number of recent posts have had good comments. This time, I won&#8217;t call them out individually. Evidently Mike Olson of Cloudera is still telling the machine-generated data story, exactly as he should be. The Information Arbitrage/IA Ventures folks said something similar, focusing specifically on &#8220;sensor data&#8221; &#8230; &#8230; and, even better, went on to [...]]]></description>
			<content:encoded><![CDATA[<p>A number of recent posts have had good comments. This time, I won&#8217;t call them out individually.</p>
<p>Evidently <a href="http://www.cscyphers.com/blog/2010/10/12/hadoop-world-2010/">Mike Olson of Cloudera is still telling the machine-generated data story</a>, exactly as he should be. The <a href="http://informationarbitrage.com/post/1359525958/big-ideas-around-big-problems-in-big-data">Information Arbitrage/IA Ventures</a> folks said something similar, focusing specifically on &#8220;sensor data&#8221; &#8230;</p>
<p>&#8230; and, even better, went on to say:  <span id="more-3346"></span></p>
<blockquote><p><strong>Privacy is dead</strong>.<br />
What do we consider to be the  boundaries of privacy, especially with respect to items like medical  data? In a data privacy-free world, should we be regulating data usage  instead? How do we deal with asymmetric access to our personal data,  e.g., how is it that insurance companies claim the right to our personal  information?</p></blockquote>
<p>Obviously, <a href="http://www.dbms2.com/2010/04/04/privacy-liberty-continued/">my answer to the second question is Yes!!!!</a></p>
<p>Also from Hadoop World &#8212; Dave Menninger, now an analyst, reports on <a href="http://www.ventanaresearch.com/blog/commentblog.aspx?id=4003">some Hadoop metrics</a>:</p>
<blockquote><p><span id="Contentblock1"><span>How big is “big data”?  In his opening remarks, Mike shared some statistics from a survey of  attendees. The average Hadoop cluster among respondents was 66 nodes and  114 terabytes of data. However there is quite a range. The largest in  the survey responses was a cluster of 1,300 nodes and more than 2  petabytes of data. (Presenters from eBay blew this away, describing  their production cluster of  8,500 nodes and 16 petabytes of storage.)  Over 60 percent of respondents had 10 terabytes or less, and half were  running 10 nodes or less.</span></span></p></blockquote>
<p><a href="http://www.dbms2.com/2010/10/06/ebay-followup-greenplum-out-teradata-10-petabytes-hadoop-has-some-value-and-more/">That eBay comment was particularly interestin</a>g. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>A while back, Doug Henschen noted that Netezza flagship reference Catalina Marketing is now at <a href="http://intelligent-enterprise.informationweek.com/blog/archives/2010/07/big_data_the_ea.html#more">2.5 petabytes</a>. Most of that is in one 600 billion row table. Oddly, the article talks of the Netezza/SAS partnership accelerating model-building via in-database scoring (not modeling) technology. Doug also wrote of a lot of <a href="http://intelligent-enterprise.informationweek.com/blog/archives/2010/08/whats_at_stake.html#more">analytic DBMS replacements</a>, including:</p>
<ul>
<li>Microsoft by ParAccel</li>
<li>Oracle by Aster Data, IBM, Oracle Exadata, probably Netezza, and probably Hadoop</li>
<li>Netezza by Greenplum</li>
<li>IBM by Teradata</li>
</ul>
<p>Carl Olofson pointed out on Twitter that <a href="http://www.oracle.com/us/corporate/Acquisitions/datascaler/index.html">DataScaler was an in-memory database technology just bought by Oracle</a>. This inspired me to google on them, and I found a sparse <a href="http://www.svadventure.com/">DataScaler CEO blog</a>. I link it because of an amusing juxtaposition &#8212; the second-to-last post says, in effect, &#8220;We make appliances and we recommend all these awesome technology design partners who helped us design the hardware,&#8221; while the very last post says &#8220;Designing our own hardware was a mistake.&#8221; <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p><a href="http://www.dbms2.com/2010/07/23/some-interesting-links/">Fred Holahan</a> is now VP of Marketing at <a href="http://www.dbms2.com/2010/05/25/voltdb-finally-launches/">VoltDB</a>, which is a lesson to me about giving free consulting &#8230; Anyhow, Fred tells me that VoltDB has about a dozen users on their way to production, some of whom are headed to being VoltDB paying customers, some of whom are not.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/10/22/notes-and-links-october-22-2010/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>It can be hard to analyze analytics</title>
		<link>http://www.dbms2.com/2010/10/10/it-can-be-hard-to-analyze-analytics/</link>
		<comments>http://www.dbms2.com/2010/10/10/it-can-be-hard-to-analyze-analytics/#comments</comments>
		<pubDate>Sun, 10 Oct 2010 10:31:26 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[SAS Institute]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=3135</guid>
		<description><![CDATA[When vendors talk about the integration of advanced analytics into database technology, confusion tends to ensue. For example: Aster Data is generally an exception to this rule, as it should be, since that integration is at the core of its positioning. Even so, in the last paragraph of that link, I called Aster out for [...]]]></description>
			<content:encoded><![CDATA[<p>When vendors talk about the integration of advanced analytics into database technology, confusion tends to ensue. For example: <span id="more-3135"></span></p>
<ul>
<li><a href="http://www.dbms2.com/2010/06/27/lots-of-aster-data-analytic-packages/">Aster Data is generally an exception to this rule</a>, as it should be, since that integration is at the core of its positioning. Even so, in the last paragraph of that link, I called Aster out for what at that time was some product description nonsense, which was specifically in an area that many vendors are confusing about explaining, namely &#8230;</li>
<li>&#8230; the distinction between <strong>three kinds of parallelization.</strong>
<ul>
<li>If you do something entirely in SQL on an MPP system that parallelizes SQL &#8212; then it&#8217;s parallel!</li>
<li>If you have a parallelization framework such as SQL or MapReduce that can invoke the same function on every node &#8212; well, then that&#8217;s parallel!</li>
<li>Many algorithms &#8212; including almost every important statistical one &#8212; have to be explicitly coded to be parallel if they&#8217;re actually going to run in in parallel. The <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.71.4156&amp;rep=rep1&amp;type=pdf">seminal paper on parallel data mining</a> shows that such parallelization is, in many important cases, straightforward &#8212; but somebody still has to take the trouble to actually do it.</li>
</ul>
</li>
<li>Netezza TwinFin i-Class was <a href="http://www.dbms2.com/2010/06/21/netezza-database-software-technology-overview/">renamed/repackaged/repriced</a> before it ever shipped. Even so, when Tim Young or Phil Francisco tries to recall exactly the &#8220;i&#8221; stands  for, comedy ensues. And the post I promised to write about Netezza TwinFin i-Class in June (as per the last sentence of <a href="http://www.dbms2.com/2010/06/21/notes-on-a-spate-of-netezza-related-blog-posts/">this post</a>) hasn&#8217;t happened yet, for reasons other than lack of interest on my part.</li>
<li><a href="http://www.dbms2.com/2010/05/07/in-database-sas-teradata-netezza-aster/">SAS/DBMS integration</a> tends to be a multi-year process, with in-database scoring coming long before <strong>in-database modeling.</strong> The drip-drip-drip of big-company PR over that time period can be quite bewildering &#8230;</li>
<li>&#8230; especially since SAS partners in some cases are shipping home-grown in-database modeling long before SAS gives it to them.
<ul>
<li>See Slides 7-8 of a recent <a href="http://www.dbms2.com/2010/06/27/lots-of-aster-data-analytic-packages/">Aster Data slide deck</a>.</li>
<li>The main point of Netezza&#8217;s nzMatrix is that Netezza needed it for <a href="http://www.dbms2.com/2010/02/22/netezza-twinfin/">in-database modeling algorithms</a> it was building.</li>
</ul>
</li>
<li>After backing off from its early <a href="http://www.dbms2.com/2008/08/26/why-mapreduce-matters-to-sql-data-warehousing/">endorsement of MapReduce</a>, <a href="http://www.dbms2.com/2010/10/10/emc-greenplum-notes/">Greenplum</a> pretty much went to the other extreme and didn&#8217;t talk about its advanced analytics capabilities at all.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/10/10/it-can-be-hard-to-analyze-analytics/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Further thoughts on previous posts</title>
		<link>http://www.dbms2.com/2010/09/27/further-thoughts-on-previous-posts/</link>
		<comments>http://www.dbms2.com/2010/09/27/further-thoughts-on-previous-posts/#comments</comments>
		<pubDate>Mon, 27 Sep 2010 11:28:50 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[About this blog]]></category>
		<category><![CDATA[Calpont]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[SAS Institute]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=3063</guid>
		<description><![CDATA[One thing I love about DBMS 2 is the really smart comments a number of readers &#8212; that would be you guys &#8212; make. However, not all the smart comments are made in the first 5 minutes a post is up, so some readers (unless you circle back) might miss great points other readers make. [...]]]></description>
			<content:encoded><![CDATA[<p>One thing I love about <em><a href="http://www.dbms2.com">DBMS 2</a></em> is the really smart comments a number of readers &#8212; that would be you guys &#8212; make. However, not all the smart comments are made in the first 5 minutes a post is up, so some readers (unless you circle back) might miss great points other readers make. Well, here are some pointers to some of what you might have missed, along with other follow-up comments to old posts while I&#8217;m at it.<span id="more-3063"></span></p>
<ul>
<li>Both on this blog and <a href="http://www.reddit.com/r/programming/comments/dfb7z/details_of_the_jpmorgan_chase_oracle_database/">Reddit</a>, there&#8217;s been considerable pushback against my idea that web usage types of user profile data shouldn&#8217;t be cluttering up an ACID-compliant database. But there&#8217;s also been considerable support, e.g. from <a href="http://www.dbms2.com/2010/09/24/a-little-more-on-the-jpmorgan-chase-oracle-outage/#comment-185219">Dan Weinreb</a>, who knows quite a lot about huge OLTP systems.</li>
<li>Meanwhile, RJP supplied <a href="http://www.dbms2.com/2010/09/17/jp-morgan-chase-oracle-database-outage/#comment-184381">details about the JP Morgan Chase Oracle outage that my actual source didn&#8217;t know</a>.</li>
<li>For obvious reasons, IBM wasn&#8217;t in a position to talk a lot of <a href="http://www.dbms2.com/2010/09/20/ibm-netezza-acquisition/">IBM/Netezza</a> detail when we happened to chat post-merger-announcement. But they did want to set me straight on SAS being kicked out for SPSS, pointing out that SAS runs in the DB2 database today (scoring, not modeling).</li>
<li>Product marketer Stephanie McReynolds added on to my post about <a href="http://www.dbms2.com/2010/09/15/aster-data-ncluster-version-4-6/">Aster Data nCluster 4.6</a> in exactly the way I wish all vendors would. She added information I had been unsure about when I did the post &#8212; or had simply left out &#8212; and she was fast in doing so. I encourage all vendors I write about to follow her example.</li>
<li>The comments on my post about <a href="http://www.dbms2.com/2010/08/21/the-substance-of-pentahos-hadoop-strategy/">Pentaho&#8217;s ETL-for-HDFS</a> made the product sound more appealing than the post itself did.</li>
<li>My August 18 <a href="http://www.dbms2.com/2010/08/18/nosql-hvsp-adoption/">NoSQL</a> post was tailor-made for people to add-on pitches for their own favorite products, NoSQL-oriented websites, etc. A number of interesting such additions showed up accordingly.</li>
<li>There were many thoughtful responses to my question about <a href="http://www.dbms2.com/2010/07/29/how-should-somebody-teach-themselves-programming-skills/">how somebody should teach themselves database programming skills</a>. Indeed, whole other blog posts were written and linked back. That&#8217;s a great resource if you ever get the question asked by a friend or acquaintance of you.</li>
<li>The flame war that erupted in response to <a href="http://www.dbms2.com/2010/07/30/advice-for-some-non-clients/">my comments on vendor and analyst ethics</a> spawned a number of <a href="http://www.strategicmessaging.com/further-notes-on-ethics-and-analyst-research/2010/08/02/">more productive discussions</a> elsewhere.</li>
<li>Jeff Hammerbacher has made <a href="http://www.dbms2.com/2010/08/11/big-data-is-watching-you/#comment-180256">various</a> <a href="http://www.dbms2.com/2010/07/31/nested-data-structures-keep-coming-up-especially-for-log-files/#comment-178699">comments</a> to the effect &#8220;Yes indeedy! Hadoop does that too!&#8221; (My wording, not his. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> )</li>
<li>Alin Dobra reported on some tests suggesting <a href="http://www.dbms2.com/2010/07/07/analytic-database-storage-aware/#comment-176038">sequential reads remain far faster than random reads even on Flash SSDs</a>.</li>
<li><a href="http://www.dbms2.com/2009/11/07/calponts-infinidb/">Calpont</a> has an ever-slicker website and <a href="http://www.calpont.com/about/news">yet another new marketing VP</a>, but no customers that are easy to detect.</li>
<li>My <a href="http://www.dbms2.com/2010/07/04/fair-data-use/">July 4 privacy post</a> engendered thoughtful discussion from three of the smartest guys who comment here &#8212; Chris Bird, Michael McIntire, and Dan Weinreb.</li>
<li>IBM and Netezza both added crunchy details to my post about their <a href="http://www.dbms2.com/2010/06/21/netezza-ibm-db2-compression/">data compression strategies</a>.</li>
<li>And for those of you who don&#8217;t read my other blogs &#8212; last night&#8217;s post was <a href="http://www.texttechnologies.com/2010/09/26/how-to-preserve-investigative-reporting-in-the-new-media-era/">a long and optimistic rumination on the future of investigative reporting</a>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/09/27/further-thoughts-on-previous-posts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

