<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Aster Data</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/aster-data-warehouse/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 02 Sep 2010 09:06:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Big Data is Watching You!</title>
		<link>http://www.dbms2.com/2010/08/11/big-data-is-watching-you/</link>
		<comments>http://www.dbms2.com/2010/08/11/big-data-is-watching-you/#comments</comments>
		<pubDate>Wed, 11 Aug 2010 05:30:22 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2760</guid>
		<description><![CDATA[There&#8217;s a boom in large-scale analytics. The subjects of this analysis may be categorized as:

People
Financial trades
Electronic networks
Everything else

The most varied, interesting, and valuable of those four categories is the first one.

That may change some day, with the growing importance of machine-generated data, and of big-data science in particular. But I think it&#8217;s a fair assessment [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">There&#8217;s a boom in large-scale analytics. The subjects of this analysis may be categorized as:</p>
<ul>
<li>People</li>
<li>Financial trades</li>
<li>Electronic networks</li>
<li>Everything else</li>
</ul>
<p style="margin-bottom: 0in;">The most varied, interesting, and valuable of those four categories is the first one.</p>
<p><span id="more-2760"></span></p>
<p style="margin-bottom: 0in;"><em>That may change some day, with the growing importance of<a href="http://www.dbms2.com/2010/04/08/machine-generated-data-example/" > </a><a href="http://www.dbms2.com/2010/04/08/machine-generated-data-example/" >machine-generated data</a>,</em><em> and of <a href="http://www.dbms2.com/2009/10/03/issues-in-scientific-data-management/" >big-data science</a> </em><em>in particular. But I think it&#8217;s a fair assessment at the present, and for at least the next few years.</em></p>
<p style="margin-bottom: 0in;">Some of th<span style="font-weight: normal;">e most interesting use cases are concentrated in the areas of identifying individuals, groups of people, or behaviors of (groups of) people. For example:</span></p>
<ul>
<li>comScore works hard to <strong>identify 	individual web surfers </strong><span style="font-weight: normal;">– 	i.e. to </span><strong>deanonymize</strong><span style="font-weight: normal;"> them &#8212; even</span> though they may have given incomplete or false 	personal information.</li>
<li>Other companies at least try to 	figure out <strong>which information in a user&#8217;s profile is unreliable,</strong> so as to classify them better. (Yes, there are 62-year-old 	video-game-obsessed Lady Gaga fans, but that&#8217;s generally not the way 	to bet.)</li>
<li>Multiple telecom vendors try to 	identify who their <strong>most influential customers</strong> are (to a first 	approximation, they&#8217;re the ones most often called by the most 	people, but it surely gets more sophisticated than that). This 	information is then used to reduce churn, either by working hard to 	retain those users, or – if they do churn – to move very fast to 	retain the business from their friends.</li>
<li>Other kinds of companies do 	similar kinds of analysis, to the extent that they have enough of a 	social graph to do so. (This application is a case where the term 	“<a href="http://www.dbms2.com/2010/06/08/profile-of-revealed-preferences/" >social graph</a>” is not a misnomer.)</li>
<li><strong>Turing detectives</strong> (I just 	coined that phrase) try to determine whether users are humans or 	bots.</li>
<li>Central to detecting <strong>insurance 	fraud</strong> is identifying suspiciously close connections between 	claimants, service providers, and so on.</li>
<li>Identifying groups of people is 	also important in flagging <strong>insider trading.</strong><span style="font-weight: normal;"> Even more important are other kinds of analysis, along the lines of 	“is this normal innocent trading behavior?” </span></li>
<li><span style="font-weight: normal;">Intelligence 	agencies try to detect networks of </span><strong>terrorists</strong><span style="font-weight: normal;"> and their sympathizers. They further try to identify unusual 	patterns of communication or meetings along those networks that 	might indicate terrorist acts are being planned. (Civilian law 	enforcement agencies can use similar techniques.)</span></li>
</ul>
<p style="margin-bottom: 0in; font-weight: normal;">In most cases, the analysis and/or run-time execution of the relevant models is done with the help of analytic DBMS. Other technologies that come into play include non-DBMS MapReduce (Hadoop), graph engines, and CEP (Complex Event Processing). The vendor most heavily represented on that list is probably Aster Data, because:</p>
<ul>
<li>Aster Data is 	focused on hard-core analytics.</li>
<li>I talk a lot 	with Aster Data, and in particular had a long, detailed use-cases 	discussion with them last week.</li>
<li><span style="font-weight: normal;">The 	comScore example happens to come from a speaker at </span><a href="http://www.dbms2.com/2010/05/07/implications-onew-analytic-technology/" ><span style="font-weight: normal;">an 	Aster event</span></a><span style="font-weight: normal;"> I also 	participated in.</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-weight: normal;">And by the way, all this only scratches the surface of what will be possible down the road. It&#8217;s based mainly on where you live, what you purchase, how you behave on websites, and who you communicate with. </span><span style="color: #000080;"><span lang="zxx"><span style="text-decoration: underline;"><a href="../2010/07/04/fair-data-use/"><span style="font-weight: normal;">Other kinds of data, which could be used to be yet more intrusive</span></a></span></span></span><span style="font-weight: normal;">, generally aren&#8217;t involved.</span></p>
<p style="margin-bottom: 0in;"><span style="font-weight: normal;">I actually have two points in drawing up this list. One is golly-gee-whiz about how a lot of analytically sophisticated applications are actually getting into production. The other is to highlight the privacy and liberty threats If This Goes On Unchecked (which is why I didn&#8217;t include some other less-people-focused examples). There&#8217;s also a related danger that, to the extent we don&#8217;t get some smart regulations to keep us safe(r), we&#8217;ll get a bunch of stupid regulations instead. </span></p>
<p style="margin-bottom: 0in;"><span style="font-weight: normal;">The Analytic Era has only just begun.<br />
</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/11/big-data-is-watching-you/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Links and observations</title>
		<link>http://www.dbms2.com/2010/08/09/links-and-observations/</link>
		<comments>http://www.dbms2.com/2010/08/09/links-and-observations/#comments</comments>
		<pubDate>Tue, 10 Aug 2010 02:37:51 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Calpont]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[Kickfire]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Northscale]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[XtremeData]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2743</guid>
		<description><![CDATA[I&#8217;m back from a trip to the SF Bay area, with a lot of writing ahead of me. I&#8217;ll dive in with some quick comments here, then write at greater length about some of these points when I can. From my trip:  

Aster Data showed me a lot of customer names and deal sizes, across [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m back from a trip to the SF Bay area, with a lot of writing ahead of me. I&#8217;ll dive in with some quick comments here, then write at greater length about some of these points when I can. From my trip:  <span id="more-2743"></span></p>
<ul>
<li>Aster Data showed me a lot of customer names and deal sizes, across a bunch of industries (mainly enterprise rather than web). Yes, Aster&#8217;s market success is for real. (But almost all those details are NDA.)</li>
<li>Sybase&#8217;s product plans for IQ are pretty impressive. (But the most interesting parts are, you guessed it, NDA.)</li>
<li>I&#8217;ve kissed and made up* with ParAccel, now that they&#8217;ve replaced their CEO, replaced their marketing chief, and stopped the worst of the <a href="http://www.dbms2.com/2010/01/15/there-sure-seem-to-be-a-lot-of-inaccuracies-on-paraccels-website/" >marketing</a> <a href="http://www.dbms2.com/2009/06/22/the-tpc-h-benchmark-is-a-blight-upon-the-industry/" >nonsense</a> I used to complain about. ParAccel has some interesting plans for ParAccel 3.0 which are, naturally, NDA.</li>
<li>The Peoplesoft guys are doing it over again at Workday. Only this time, their platform isn&#8217;t a relational DBMS. Rather, it&#8217;s an in-memory, completely object-oriented data model, with disk used only on a &#8220;Just in case the power ever goes out&#8221; basis. (Thankfully, nothing at all about our conversation was NDA.)</li>
<li>I&#8217;m finally feeling good about <a href="# I spent considerable time  with my clients at both Greenplum and EMC (if we ignore the fact that  the deal has closed and they're now the same company). I also had more  of  a hardcore engineering discussion than I've had with Greenplum for  quite a while (I should have been pushier about that earlier). Takeaways  included:      * This is starting off as a honeymoon deal. Everything  Greenplum was planning to do is being continued. Additional resources  are being poured into Greenplum to do more.     * Some Greenplum execs  seem to envision staying long term, some seem to envision moving on to  their next startups. The ones who envision moving on are, however, going  to work hard first to make the merger a success.     * Greenplum has,  for quite a while, had more of an advanced analytics/embedded predictive  modeling story than I realized. Bad on them for not fleshing it out  more in marketing and product packaging alike.     * Greenplum both  denies the concurrency problems I previously noted and also has a very  credible story as to how it will eliminate them. :) Seriously, Greenplum  tells of one customer that routinely runs 150 simultaneously queries -  on what I think is not a terribly big system -- and a number of POCs  (Proofs of Concept) that simulated similar levels of concurrency.">Northscale&#8217;s  memcached-compatible persistent store Membase</a>. The main reason is  that they showed me a near-term path to interfaces that are richer than  key-value. Also, Todd Hoff reassured me that even pure persistent  memcached has a place.</li>
<li>Rumor says that even the one app for which Facebook was using Cassandra &#8212; in-box search &#8212; has been decommissioned. On the other hand, numerous other scale-0ut DBMS (SQL or otherwise) seem to have Facebook footholds. But details are &#8212; all together now! &#8212; NDA.</li>
</ul>
<p><em>*If you know ParAccel&#8217;s new marketing chief Michael Weir, you  surely guessed I mean that only in a figurative sense.</em></p>
<p>From elsewhere:</p>
<ul>
<li>Daniel Abadi offered <a href="http://dbmsmusings.blogspot.com/2010/08/thoughts-on-kickfires-apparent-demise.html" onclick="javascript:pageTracker._trackPageview('/dbmsmusings.blogspot.com');">his  analysis</a> of <a href="../2010/07/27/kickfire-unlikely-to-survive/">Kickfire&#8217;s  demise</a>. In general I agree, but Daniel neglected to mention one  hugely important factor &#8212; the chicken-egg negative effect of Kickfire&#8217;s  lack of market or marketing traction. Customers were extremely reluctant to buy from Kickfire  because they perceived, correctly, that Kickfire&#8217;s survivability was far  from assured.</li>
<li>While the <a href="http://infinidb.org/community/forums/11-general-infinidb/1000-strange-issue-with-drop-table" onclick="javascript:pageTracker._trackPageview('/infinidb.org');">InfiniDB forums</a> suggest that there are at least a couple of production users of Calpont&#8217;s free InfiniDB, Calpont seemingly has a long way to go to be even as successful as Kickfire. But Calpont does have a bit of money to spend on lead generation; maybe some day they&#8217;ll even have actual customers.</li>
<li>In a response to a question I messaged over, <a href="http://www.dbms2.com/2010/03/18/xtremedata-update/" >XtremeData</a> tells me they have actual customers now. Press releases to follow.</li>
<li>The <a href="http://news.cnet.com/8301-31021_3-20013111-260.html?part=rss&amp;subj=news&amp;tag=2547-1_3-0-20" onclick="javascript:pageTracker._trackPageview('/news.cnet.com');">admiration for the job Mark Hurd did at HP</a> is in my opinion overstated. Sure, the financial/operational management appeared to work, but HP did little on Hurd&#8217;s watch to strengthen its reputation or customers&#8217; loyalty. In particular:
<ul>
<li>HP&#8217;s analytics efforts have accomplished little.</li>
<li>HP&#8217;s data warehouse appliance efforts have failed pathetically.</li>
<li>From what I hear, HP&#8217;s execution in its Exadata partnership was not good.</li>
<li>HP&#8217;s server business in general is distinguished mainly by HP being a big company.</li>
<li>HP&#8217;s EDS acquisition has been rocky, not that EDS was sailing so smoothly on its own beforehand.</li>
<li>HP&#8217;s success in PCs amounts to &#8220;arguably, HP sucks a little less than the other guys&#8221;.</li>
<li>HP&#8217;s elite reputation is long gone (admittedly, for the most part that predates Hurd).</li>
</ul>
</li>
<li><a href="http://intelligent-enterprise.informationweek.com/blog/archives/2010/08/software_innova.html" onclick="javascript:pageTracker._trackPageview('/intelligent-enterprise.informationweek.com');">Doug Henschen</a> evidently favors really strong intellectual property protection for software, even forbidding plug-compatible reverse engineering. I agree with Doug up to the point that <a href="http://www.monashreport.com/2010/07/19/my-view-of-intellectual-property/" onclick="javascript:pageTracker._trackPageview('/www.monashreport.com');">it should be forbidden to copy proprietary software</a>, but I don&#8217;t see why he (or a court) would view such behavior as copying.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/09/links-and-observations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Breakthrough: Exadata now has as many reference accounts as Aster Data!</title>
		<link>http://www.dbms2.com/2010/07/14/exadata-reference-accounts/</link>
		<comments>http://www.dbms2.com/2010/07/14/exadata-reference-accounts/#comments</comments>
		<pubDate>Wed, 14 Jul 2010 13:21:59 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2572</guid>
		<description><![CDATA[According to Bob Evans of Information Week, there now are 15 disclosed Exadata reference accounts. Coincidentally, there are exactly 15 logos on Aster Data&#8217;s customer page. So on it own, that&#8217;s not a particularly impressive piece of information.
But other highlights of his column include:

Some of those accounts are rather big-name. However, I&#8217;m not at all [...]]]></description>
			<content:encoded><![CDATA[<p>According to Bob Evans of Information Week, there now are <a href="http://www.informationweek.com/news/global-cio/interviews/showArticle.jhtml?articleID=225800024&amp;cid=RSSfeed_IWK_ALL" onclick="javascript:pageTracker._trackPageview('/www.informationweek.com');">15 disclosed Exadata reference accounts</a>. Coincidentally, there are exactly 15 logos on <a href="http://www.asterdata.com/customers/index.php" onclick="javascript:pageTracker._trackPageview('/www.asterdata.com');">Aster Data&#8217;s customer page</a>. So on it own, that&#8217;s not a particularly impressive piece of information.</p>
<p>But other highlights of his column include:</p>
<ul>
<li><strong>Some of those accounts are rather big-name.</strong> However, I&#8217;m not at all sure whether they&#8217;re actual production references.</li>
<li>Andy Mendelsohn characterizes the sweet spot of Exadata&#8217;s market as <strong>&#8220;virtual private cloud.&#8221;</strong> That matches <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" >what Juan Loaiza told me six months ago</a>.</li>
<li>Oracle claims <strong>numerous competitive wins for Exadata.</strong> Let me hasten to note that one vendor&#8217;s &#8220;competitive win&#8221; is another vendor&#8217;s &#8220;our salesman read the deal as an unfavorable one and chose not to compete,&#8221; or even sometimes &#8220;Huh? We never heard about that deal.&#8221; That said, what I&#8217;m hearing is that <a href="http://www.dbms2.com/2010/03/19/some-business-trends-in-the-data-warehouse-market/" >Exadata is indeed a much stronger competitor than it used to be</a>.</li>
<li>Oracle claims a <strong>near $1 billion sales run rate</strong> for Exadata. No doubt, a large majority of those are hardware upgrades for existing Oracle database customers, often from non-Sun/Oracle hardware. Even so, some of those are surely deals that would have migrated away from Oracle in the pre-Exadata past.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/07/14/exadata-reference-accounts/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Lots of Aster Data analytic packages</title>
		<link>http://www.dbms2.com/2010/06/27/lots-of-aster-data-analytic-packages/</link>
		<comments>http://www.dbms2.com/2010/06/27/lots-of-aster-data-analytic-packages/#comments</comments>
		<pubDate>Sun, 27 Jun 2010 11:35:28 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2406</guid>
		<description><![CDATA[A number of vendors had announcements last week, notably:

Netezza (user conference)
Aster Data (to steal some of Netezza&#8217;s thunder)
Infobright (so far as I can tell, just because it was time for a product release, and also to get ahead of the summer doldrums)
Northscale (ditto)

Time to play some catchup.
I&#8217;ll start with Aster Data, which added to the [...]]]></description>
			<content:encoded><![CDATA[<p>A number of vendors had announcements last week, notably:</p>
<ul>
<li><a href="http://www.dbms2.com/2010/06/21/netezza-database-software-technology-overview/" >Netezza</a> (user conference)</li>
<li>Aster Data (to steal some of Netezza&#8217;s thunder)</li>
<li><a href="http://www.dbms2.com/2010/06/27/infobright-release-3-4/" >Infobright</a> (so far as I can tell, just because it was time for a product release, and also to get ahead of the summer doldrums)</li>
<li>Northscale (ditto)</li>
</ul>
<p>Time to play some catchup.</p>
<p>I&#8217;ll start with Aster Data, which added to the list of <a href="http://www.dbms2.com/2010/02/22/aster-data-ncluster-4-5/" >analytic packages</a> it previously announced, and kindly gave me permission to post a partial<a href="http://www.monash.com/uploads/Aster-Data-Analytics-Packages-June-2010.pptx" onclick="javascript:pageTracker._trackPageview('/www.monash.com');"> slide deck</a> from the briefing on same. Highlights of Aster&#8217;s analytic packages story include:  <span id="more-2406"></span></p>
<ul>
<li>Statistics, statistics, and more statistics, including:
<ul>
<li><a href="http://www.dbms2.com/2010/05/15/further-clarifying-in-database-mpp-sas/" >A deal with SAS</a></li>
<li>A lot of R packages</li>
<li>Deals with Fuzzy Logix and another partner</li>
</ul>
</li>
<li>Linear algebra/matrix manipulation, useful in:
<ul>
<li>You guessed it &#8212; statistics</li>
<li>Other machine learning</li>
<li>Optimization</li>
<li>and more</li>
</ul>
</li>
<li>Entity and pattern extraction, for example:
<ul>
<li>Market basket analysis, which is a great input for &#8212; I knew you&#8217;d keep up with this &#8212; statistics</li>
<li>Sessionization, ditto</li>
<li>Text tokenization (presumably pretty basic stuff)</li>
<li>The versatile <a href="http://www.dbms2.com/2009/02/10/aster-data-npath/" >nPath</a> package</li>
</ul>
</li>
<li>Data mining/predictive analytics beyond straight statistics</li>
<li>More stuff, some of it mentioned in my <a href="http://www.dbms2.com/2009/12/02/mapreduce-for-complex-analytics-webina/" >Aster-sponsored webinars</a> late last year</li>
</ul>
<p>Like Netezza, Aster is offering functions in two formats:</p>
<ul>
<li>Fully parallel &#8212; i.e., you can simply invoke them via SQL and they&#8217;ll execute in parallel</li>
<li>Parallel-ready &#8212; i.e., you can invoke them on every node of the MPP cluster at once via Aster&#8217;s MapReduce framework</li>
</ul>
<p>There are 30+ of the former and 1000+ of the latter, grouped into 40+ packages.</p>
<p>I will not repeat Aster&#8217;s current, confusing terminology for these two categories &#8212; hopefully Aster will do some renaming &#8212; but you can find same on Slide 6 of the deck linked above.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/27/lots-of-aster-data-analytic-packages/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What kinds of data warehouse load latency are practical?</title>
		<link>http://www.dbms2.com/2010/06/21/data-warehouse-load-latency/</link>
		<comments>http://www.dbms2.com/2010/06/21/data-warehouse-load-latency/#comments</comments>
		<pubDate>Mon, 21 Jun 2010 12:15:17 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2319</guid>
		<description><![CDATA[I took advantage of my recent conversations with Netezza and IBM to discuss what kinds of data warehouse load latency were practical. In both cases I got the impression:

Subsecond load latency is 	substantially impossible. Doing that amounts to OLTP.
5 seconds or so is doable with 	aggressive investment and tuning.
Several minute load latency is 	pretty easy.
10-15 [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I took advantage of my recent conversations with <a href="http://www.dbms2.com/2010/06/21/netezza-database-software-technology-overview/" >Netezza</a> and <a href="http://www.dbms2.com/2010/06/21/netezza-ibm-db2-compression/" >IBM</a> to discuss what kinds of data warehouse load latency were practical. In both cases I got the impression:</p>
<ul>
<li>Subsecond load latency is 	substantially impossible. Doing that amounts to OLTP.</li>
<li>5 seconds or so is doable with 	aggressive investment and tuning.</li>
<li>Several minute load latency is 	pretty easy.</li>
<li>10-15 minute latency or longer is 	now very routine.</li>
</ul>
<p style="margin-bottom: 0in;">There&#8217;s generally a throughput/latency tradeoff, so if you want very low latency with good throughput, you may have to throw a lot of hardware at the problem.</p>
<p style="margin-bottom: 0in;">I&#8217;d expect to hear similar things from any other vendor with reasonably mature analytic DBMS technology. Low-latency load is a problem for columnar systems, but both <a href="http://www.dbms2.com/2008/08/12/vertica-paraccel-exasol/" >Vertica <span style="font-style: normal;">and</span> ParAccel</a> designed in workarounds from the getgo. Aster Data probably didn&#8217;t meet these criteria until <a href="http://www.dbms2.com/2009/10/30/aster-data-application-server-ncluster/" >Version 4.0</a>, its old “<a href="http://www.dbms2.com/2008/10/22/aster-data-systems-ncluster/" >frontline</a>” positioning notwithstanding, but I think it does now.</p>
<p style="margin-bottom: 0in;"><em><strong>Related link</strong></em></p>
<ul>
<li>
<p style="margin-bottom: 0in;"><a href="http://www.dbms2.com/2009/09/10/analytic-speed-latency/" >Just what is your need for speed</a> anyway?</p>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/21/data-warehouse-load-latency/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Further clarifying in-database MPP SAS</title>
		<link>http://www.dbms2.com/2010/05/15/further-clarifying-in-database-mpp-sas/</link>
		<comments>http://www.dbms2.com/2010/05/15/further-clarifying-in-database-mpp-sas/#comments</comments>
		<pubDate>Sat, 15 May 2010 04:14:46 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[SAS Institute]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2136</guid>
		<description><![CDATA[My recent post about SAS&#8217; MPP/in-database efforts was based on a discussion in a shared ride to the airport, and was correspondingly rough. SAS&#8217; Shannon Heath was kind enough to write in with clarifications, and to allow me to post same. With permission, I&#8217;ve also made trivial grammar edits.


Regarding Netezza, SAS Scoring Accelerator for Netezza [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.dbms2.com/2010/05/07/in-database-sas-teradata-netezza-aster/" >My recent post about SAS&#8217; MPP/in-database efforts</a> was based on a discussion in a shared ride to the airport, and was correspondingly rough. SAS&#8217; Shannon Heath was kind enough to write in with clarifications, and to allow me to post same. <span id="more-2136"></span>With permission, I&#8217;ve also made trivial grammar edits.</p>
<blockquote>
<ul>
<li>Regarding Netezza, <a href="http://www.sas.com/technologies/analytics/datamining/scoring_acceleration/index.html#section=1" onclick="javascript:pageTracker._trackPageview('/www.sas.com');">SAS Scoring Accelerator</a> for Netezza currently supports Netezza Performance Data Server 4.6.  Support for Netezza TwinFin is slated for July 2010.</li>
<li>Regarding the AsterData nCluster, I can understand your confusion on the distinction between Limited Availability (LA) and General Availability (GA). To help clarify, Limited Availability means that the technology is available for pre-qualified customers who are also active SAS Enterprise Miner users. In those cases, the product has been through QA and is available for purchase by those limited pre-qualified customers, and support for a limited number of customer sites is available. When the product becomes Generally Available, all qualifying customers are able to take advantage.</li>
<li>Regarding your question of general parallelism/in-database capability, SAS currently is taking advantage of two technologies to provide scalability and performance. The first, <a href="http://www.sas.com/technologies/architecture/in-databaseprocessing/index.html" onclick="javascript:pageTracker._trackPageview('/www.sas.com');">SAS In-Database</a>, takes the processing to where the data resides and parallelizes the analytical computations by leveraging the MPP architecture. The second technology, <a href="http://www.sas.com/technologies/architecture/grid/" onclick="javascript:pageTracker._trackPageview('/www.sas.com');">SAS Grid Manager</a>, parallelizes computational steps or subparts of a process across different nodes and bring the results back together.</li>
<li>To further expand upon Michelle’s answer to your question “What’s the big deal about in-database data mining scoring anyway?”, hopefully these additional bullets will help:
<ul>
<li>Manually converting the model scoring logic into SQL is difficult and time consuming in several instances. It would result in long hours and higher costs and would require testing and revalidation of the code. It will also restrict modelers to basic linear regression models. Automated model scoring (using SAS Scoring Accelerator) will allow modelers to use complex models like decision trees, neural networks, etc. (i.e. leverage full capabilities of SAS Enterprise Miner). <em>(Note: Since I was comparing specialized in-database scoring to the in-database alternative of writing it all in SQL, this is actually the only one of the three bullets that address my question. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  )</em></li>
<li>Scoring is typically done on a periodic basis – either daily, weekly, monthly or on an event driven basis and many of our customers are scoring large numbers of models against enormous tables. Conventional scoring of SAS models requires connecting to the database server to extract rows to SAS for scoring. The scores are commonly bulk loaded back to the database. As the number of rows in the table grows over time, network latency grows because the amount of data that is fetched from the database to the SAS scoring process increases. SAS Scoring Accelerator reduces the unnecessary data movement and replication for analytic (i.e. model deployment) processing.</li>
<li>SAS Scoring Accelerator allows the scoring process to be linearly scalable, completely leveraging the power of the parallel shared-nothing architecture of the database or data warehouse in question.</li>
</ul>
</li>
</ul>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/15/further-clarifying-in-database-mpp-sas/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Notes and cautions about new analytic technology</title>
		<link>http://www.dbms2.com/2010/05/07/implications-onew-analytic-technology/</link>
		<comments>http://www.dbms2.com/2010/05/07/implications-onew-analytic-technology/#comments</comments>
		<pubDate>Sat, 08 May 2010 03:05:25 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Presentations]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2070</guid>
		<description><![CDATA[As previously noted, I headlined Aster&#8217;s Big Data Summit in Washington, DC last Thursday. More than others, that talk did reuse material I&#8217;d presented before.  I promised the audience that when I got back I&#8217;d put up a blog post linking to supporting material for the talk.
Part of the time, I talked about things I&#8217;ve [...]]]></description>
			<content:encoded><![CDATA[<p>As <a href="http://www.dbms2.com/2010/04/18/washington-dc-may-2010-big-data-summi/" >previously noted</a>, I headlined Aster&#8217;s Big Data Summit in Washington, DC last Thursday. More than others, that talk did reuse material I&#8217;d presented before.  I promised the audience that when I got back I&#8217;d put up a blog post linking to supporting material for the talk.</p>
<p>Part of the time, I talked about things I&#8217;ve written about before. For example:<span id="more-2070"></span></p>
<ul>
<li><a href="http://www.dbms2.com/2010/04/04/privacy-liberty-continued/" >Liberty and privacy</a>. That&#8217;s a link to my most recent overview post on <strong>the liberty and privacy implications of modern analytic technology. </strong>The notes I spoke from were actually posted previously, after I spoke from them at the <a href="http://www.dbms2.com/2010/01/31/data-based-snooping-threat-libert/" >New England Database Summit</a> at MIT in January. I&#8217;m gratified that, at both events, I got very positive feedback on liberty and privacy issues.</li>
<li><a href="http://www.dbms2.com/2009/09/10/analytic-speed-latency/" >Pick the right latency</a>. That&#8217;s a link to a post (also based on a previous talk, in this case the one I traveled around the world giving for Netezza last September) in which I laid out<strong> the different levels of speed and latency</strong> an analytic application might require. I counted 9 orders of magnitude between the slowest and fastest, which is pretty much the difference between the speed of a turtle (at least a small, slow one) and the speed of light.</li>
<li>On the more general point of <strong>operationalizing analytics,</strong> my best or at least most detailed writing to date may be in <a href="http://www.monash.com/3GABP.pdf" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">a 2004 whitepaper on analytic business processes</a> which, sadly, is still fairly futuristic today.</li>
<li>I offered a few ways to think about <strong>the different kinds of data that go into data warehouses. </strong>
<ul>
<li>Some of those were outlined in a post last January about <a href="../2010/01/17/three-broad-categories-of-data/">three broad categories of data</a>, distinguishing among<strong> human/tabular, human/non-tabular,</strong> and <strong>machine-generated</strong>data.</li>
<li>That was a kind of sequel to a post last December about a <a href="http://www.dbms2.com/2009/12/07/data-warehouse-volume-growth/" >three broad categories of data warehouse house growth drivers</a>, namely <strong>more of the same</strong> vs. <strong>more detai</strong>l vs. <strong>wholly new kinds of data.</strong></li>
<li>I gave some examples of <a href="http://www.monashreport.com/2006/10/04/data-mining-requires-data/" onclick="javascript:pageTracker._trackPageview('/www.monashreport.com');">creating new data to analyze</a> back in 2005 and 2006.</li>
</ul>
</li>
<li>Comments I made at various points were foreshadowed in a post on <a href="http://www.dbms2.com/2009/05/30/reinventing-business-intelligence/" >reinventing business intelligence</a>.</li>
</ul>
<p>I also raised a few points that I&#8217;m not finding good links for. I&#8217;ll try to cover those in future blog posts.</p>
<p><em><strong>Related link</strong></em></p>
<ul>
<li>Notes for my <a href="http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/" >Boston Big Data Summit</a> (no relation to Aster Data&#8217;s Big Data Summit series) talk in October, 2009</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/07/implications-onew-analytic-technology/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Clarifying the state of MPP in-database SAS</title>
		<link>http://www.dbms2.com/2010/05/07/in-database-sas-teradata-netezza-aster/</link>
		<comments>http://www.dbms2.com/2010/05/07/in-database-sas-teradata-netezza-aster/#comments</comments>
		<pubDate>Fri, 07 May 2010 06:23:49 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[SAS Institute]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2061</guid>
		<description><![CDATA[I routinely am briefed way in advance of products&#8217; introductions. For that reason and others, it can be hard for me to keep straight what&#8217;s been officially announced, introduced for test, introduced for general availability, vaguely planned for the indefinite future, and so on. Perhaps nothing has confused me more in that regard than the [...]]]></description>
			<content:encoded><![CDATA[<p>I routinely am briefed way in advance of products&#8217; introductions. For that reason and others, it can be hard for me to keep straight what&#8217;s been officially announced, introduced for test, introduced for general availability, vaguely planned for the indefinite future, and so on. Perhaps nothing has confused me more in that regard than the SAS Institute&#8217;s multi-year effort to get SAS integrated into various MPP DBMS, specifically <a href="http://www.dbms2.com/2009/08/02/teradata-13-focuses-on-advanced-analytic-performance/" >Teradata</a>, <a href="http://www.dbms2.com/2010/02/22/netezza-twinfin/" >Netezza Twinfin(i)</a>, and <a href="http://www.dbms2.com/2010/02/22/aster-data-ncluster-4-5/" >Aster Data nCluster</a>.</p>
<p>However, I chatted briefly Thursday with Michelle Wilkie, who is the SAS product manager overseeing all this (and also some other stuff, like SAS running on grids without being integrated into a DBMS). As best I understood, the story is:<span id="more-2061"></span></p>
<ul>
<li>On <strong>Teradata,</strong> SAS is shipping in-database scoring today. SAS also is shipping a limited amount of in-database modeling on Teradata, the count recently having gone up from 4 &#8220;procs&#8221; to 10.</li>
<li>On <strong>Netezza Twinfin(i),</strong> SAS is shipping in-database scoring, and this was recently announced. I can&#8217;t actually find much evidence of this announcement by searching the Web or the SAS website, but Michelle was pretty clear on the point even so.  Further confusing matters, <a href="http://www.sas.com/technologies/analytics/datamining/scoring_acceleration/" onclick="javascript:pageTracker._trackPageview('/www.sas.com');">SAS&#8217; website</a> seems to say in-database scoring is supported on Netezza&#8217;s old generation of products but not its latest one, even though SAS CTO Keith Collins told me <a href="http://www.dbms2.com/2009/09/03/sas-on-netezza-and-other-netezza-extensibility/" >exactly the opposite</a> would be true.</li>
<li>On <strong>Aster Data nCluster,</strong> SAS will ship in-database scoring by the end of 2010. If I understood correctly, this will be for &#8220;limited&#8221; rather than &#8220;general&#8221; availability, but Michelle framed that as a distinction without a difference. I.e., if you want to buy in-database SAS scoring on Aster nCluster, you&#8217;ll be able to.</li>
<li>(More) in-database SAS modeling is expected on all of Teradata, Netezza Twinfin(i), and Aster Data nCluster in the vague future. (The concept of 2011/2012 came into play.)</li>
<li>SAS/Teradata integration, developed first, involved more hand-coding. SAS has subsequently developed some kind of a more general parallelism/in-database capability, akin to what it has in the DBMS-less grid, that either is or isn&#8217;t a good match for DBMS vendors&#8217; native way of supporting parallel processing. (Obviously, I&#8217;m still pretty unclear on this part.)</li>
<li>SAS technology is a good fit for Aster Data&#8217;s MapReduce-centric way of doing parallelism.</li>
</ul>
<p>I also took the opportunity to ask Michelle a question I&#8217;ve had a heck of a time getting answered: <strong>What&#8217;s the big-deal about in-database data mining scoring anyway?</strong> After all, the most common form of in-database data mining scoring is just to take a weighted sum of specific fields in a row, where the weights are the regression coefficients. You can do that in generic SQL, with performance that superficially should be at least as good as that for any alternative strategy. Michelle&#8217;s answers seemed to be twofold:</p>
<ul>
<li><strong>There are other kinds of scoring too</strong> &#8212; neural networks, etc.</li>
<li><strong>Coding the scoring in SQL isn&#8217;t that easy. </strong>Michelle gave the example of a specific user (default Netezza reference account, with initials resembling mine) that spent 400 hours writing and testing something you now get for free with SAS/Netezza integration.</li>
</ul>
<p><em>Edit: In response to this post, SAS wrote in with <a href="http://www.dbms2.com/2010/05/15/further-clarifying-in-database-mpp-sas/" >further clarification about </a></em><em><a href="http://www.dbms2.com/2010/05/15/further-clarifying-in-database-mpp-sas/" >in-database and/or MPP SAS</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/07/in-database-sas-teradata-netezza-aster/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>I&#8217;ll be speaking in Washington, DC on May 6</title>
		<link>http://www.dbms2.com/2010/04/18/washington-dc-may-2010-big-data-summi/</link>
		<comments>http://www.dbms2.com/2010/04/18/washington-dc-may-2010-big-data-summi/#comments</comments>
		<pubDate>Sun, 18 Apr 2010 21:48:15 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Archiving and information preservation]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Liberty and privacy]]></category>
		<category><![CDATA[Presentations]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1937</guid>
		<description><![CDATA[My clients at Aster Data are putting on a sequence of conferences called &#8220;Big Data Summit(s)&#8221;, and wanted me to keynote one. I agreed to the one in Washington, DC, on May 6, on the condition that I would be allowed to start with the same liberty and privacy themes I started my New England [...]]]></description>
			<content:encoded><![CDATA[<p>My clients at Aster Data are putting on a sequence of conferences called &#8220;Big Data Summit(s)&#8221;, and wanted me to keynote one. I agreed to the one <a href="http://bigdatasummit.com/2010/dc/" onclick="javascript:pageTracker._trackPageview('/bigdatasummit.com');">in Washington, DC, on May 6</a>, on the condition that I would be allowed to start with the same liberty and privacy themes I started my <a href="http://www.dbms2.com/2010/01/31/data-based-snooping-threat-libert/" >New England Database Summit keynote</a> with. Since I already knew Aster to be one of the multiple companies in this industry that is responsibly concerned about the liberty and privacy threats we&#8217;re all helping cause, I expected them to agree to that condition immediately, and indeed they did.</p>
<p>On a rough-draft basis, my talk concept is:</p>
<p style="margin-bottom: 0in;"><strong>Implications of New Analytic Technology in four areas:</strong></p>
<ul>
<li><strong>Liberty &amp; privacy</strong></li>
<li><strong>Data acquisition &amp; retention</strong></li>
<li><strong>Data exploration</strong></li>
<li><strong>Operationalized analytics</strong></li>
</ul>
<p>I haven&#8217;t done any work yet on the talk besides coming up with that snippet, and probably won&#8217;t until the week before I give it. Suggestions are welcome.</p>
<p>If anybody actually has a link to a clear discussion of legislative and regulatory data retention requirements, that would be cool. I know they&#8217;ve exploded, but I don&#8217;t  have the details.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/04/18/washington-dc-may-2010-big-data-summi/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Aster Data&#8217;s mapreduce.org site</title>
		<link>http://www.dbms2.com/2010/04/18/aster-mapreduce-or/</link>
		<comments>http://www.dbms2.com/2010/04/18/aster-mapreduce-or/#comments</comments>
		<pubDate>Sun, 18 Apr 2010 20:56:10 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[MapReduce]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1918</guid>
		<description><![CDATA[Aster Data has started a site mapreduce.org, which purports to compile &#8220;the best information about MapReduce.&#8221; At the moment, mapreduce.org highlights include:

A feed of MapReduce-related posts from several blogs, including this one.
A calendar of MapReduce-related events, not necessarily Aster-specific, integrated with a feed combining &#8230;

&#8230; Aster MapReduce-related press releases and also &#8230;
&#8230; not necessarily Aster-specific [...]]]></description>
			<content:encoded><![CDATA[<p>Aster Data has started a site <a href="http://www.mapreduce.org/" onclick="javascript:pageTracker._trackPageview('/www.mapreduce.org');">mapreduce.org</a>, which purports to compile &#8220;the best information about MapReduce.&#8221; At the moment, mapreduce.org highlights include:</p>
<ul>
<li>A feed of MapReduce-related posts from several blogs, including this one.</li>
<li>A calendar of MapReduce-related events, not necessarily Aster-specific, integrated with a feed combining &#8230;
<ul>
<li>&#8230; Aster MapReduce-related press releases and also &#8230;</li>
<li>&#8230; not necessarily Aster-specific MapReduce-related press articles.</li>
</ul>
</li>
<li>Links to a lot of Aster Data MapReduce-related collateral. Some of that stuff is quite good.*</li>
<li>A sycophantic introduction from Colin White praising the value of the mapreduce.org &#8220;independent forum.&#8221;</li>
</ul>
<p><em>*I did a couple of <a href="http://www.dbms2.com/2009/10/15/mapreduce-webinar-slides/" >MapReduce-related webinars</a> for Aster late last year. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  But seriously &#8212; Aster does a good job of writing clear and informative collateral.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/04/18/aster-mapreduce-or/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
