<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Fox and MySpace</title>
	<atom:link href="http://www.dbms2.com/category/users/fox-interactive-media-myspace/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 09 Feb 2012 09:21:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Stakeholder-facing analytics</title>
		<link>http://www.dbms2.com/2010/05/15/stakeholder-facing-analytics/</link>
		<comments>http://www.dbms2.com/2010/05/15/stakeholder-facing-analytics/#comments</comments>
		<pubDate>Sat, 15 May 2010 07:58:05 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Fox and MySpace]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2149</guid>
		<description><![CDATA[There&#8217;s a point I keep making in speeches, and used to keep making in white papers, yet have almost never spelled out in this blog. Let me now (somewhat) correct the oversight. Analytic technology isn&#8217;t only for you. It&#8217;s also for your customers, citizens, and other stakeholders. I am not referring here to what is [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s a point I keep making in speeches, and used to keep making in white papers, yet have almost never spelled out in this blog. Let me now (somewhat) correct the oversight.</p>
<p><strong>Analytic technology isn&#8217;t only for you. It&#8217;s also for your customers, citizens, and other stakeholders.</strong></p>
<p>I am <strong>not</strong> referring here to what is well understood to be an important, fast-growing activity &#8212; providing data and its analysis to customers as your primary or only business &#8212; nor to the related business of taking people&#8217;s data, crunching it for them, and giving them results. That combined sector &#8212; which I am pretty alone in aggregating into one and calling <a href="http://www.dbms2.com/category/analytics-technologies/data-mart-warehouse-outsourcing/">data mart outsourcing</a> &#8212; is one of the top several vertical markets for a lot of the analytic DBMS vendors I write about. Rather, I&#8217;m talking about enterprises that gather data for some primary purpose, and have discovered that a good <strong>secondary</strong> use of the data is to reflect it back to stakeholders, often the same ones who provided or created it in the first place.</p>
<p>For now I&#8217;ll call this category <strong>stakeholder-facing analytics,</strong> as the shorter phrase &#8220;stakeholder analytics&#8221; would be ambiguous.* I first picked up the idea early this decade from Information Builders, for whom it had become something of a specialty. I&#8217;ve been asking analytics vendors for examples of stakeholder-facing analytics ever since, and a number have been able to comply. But the whole thing is in its early days even so; almost any sufficiently large enterprise should be more active in stakeholder-facing analytics than it currently is.<br />
<span id="more-2149"></span><br />
<em>*Comments as to what the category</em> should<em> be called are welcome below.</em></p>
<p>Examples of stakeholder-facing analytics include:</p>
<ul>
<li>Enterprises report back on the business customers do with them. For example:
<ul>
<li>Credit card companies provide reports on spending back to their credit card holders, especially small businesses.</li>
<li>So do office supply retailers.</li>
<li>Brokerage firms provide reporting back to their small-institution customers.</li>
</ul>
</li>
<li>Governments expose information to their citizens online.
<ul>
<li>In an early example, New York City restaurant ratings were put online.</li>
<li><a href="http://sec.gov/edgar/searchedgar/companysearch.html">Putting SEC filings online</a> has has been a huge success.</li>
<li>The Obama Administration has committed to putting <a href="http://www.data.gov/catalog">large amounts of information</a> online.</li>
</ul>
</li>
<li>Regulated companies (such as utilities) could be required to put data online directly, without even using the government as an intermediary.</li>
<li>Some part of Fox &#8212; perhaps MySpace Music? &#8212; offers free access to a PostgreSQL extract from <a href="http://www.dbms2.com/2009/03/05/fox-interactive-medias-multi-hundred-terabyte-database-running-on-greenplum/">its Greenplum database</a> to each of its largest advertisers.</li>
<li>Google Analytics offers some basic BI for free to website owners everywhere.</li>
<li>Anybody from web hosting companies to public utilities could open their kimonos and allow their customers to track adherence to actual or implied SLAs (Service Level Agreements) in areas such as uptime, length of outage, responsiveness, and the like.</li>
</ul>
<p>So what cool examples do you have of stakeholder-facing analytics?*</p>
<p><em>*Yes, this is an invitation to drop links to case studies into the comment thread below. </em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/15/stakeholder-facing-analytics/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Quick news, links, comments, etc.</title>
		<link>http://www.dbms2.com/2010/03/27/quick-news-links-comments-etc/</link>
		<comments>http://www.dbms2.com/2010/03/27/quick-news-links-comments-etc/#comments</comments>
		<pubDate>Sat, 27 Mar 2010 04:59:30 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Akiban]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Fox and MySpace]]></category>
		<category><![CDATA[Games and virtual worlds]]></category>
		<category><![CDATA[Groovy Corporation]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1775</guid>
		<description><![CDATA[Some notes based on what I&#8217;ve been reading recently: Tom Foremski outlined the dire (at least in theory) privacy risks of geolocation services, going into a lot more detail on that point than I ever have. However, he topped that off with the odd claim that people pay toll (rather than using an electronic service) [...]]]></description>
			<content:encoded><![CDATA[<p>Some notes based on what I&#8217;ve been reading recently:<span id="more-1775"></span></p>
<ul>
<li>Tom Foremski outlined the dire (at least in theory) <a href="http://www.siliconvalleywatcher.com/mt/archives/2010/03/geo_loco_and_pr.php">privacy risks of geolocation services</a>, going into a lot more detail on that point than <a href="http://www.dbms2.com/2010/01/31/data-based-snooping-threat-libert/">I ever have</a>. However, he topped that off with the odd claim that people pay toll (rather than using an electronic service) to cross the Bay Bridge because they fear being tracked, rather than for reasons of time or money.</li>
<li>Oracle had an earnings conference call. <a href="http://blogs.zdnet.com/BTL/?p=32389">Larry Dignan</a> did a good job of covering the highlights; the gory details are on the <a href="http://seekingalpha.com/article/195696-oracle-f3q10-qtr-end-02-28-10-earnings-call-transcript?page=1">Seeking Alpha</a> transcript, especially pp. 3-5.  Oracle now claims to be getting lots of multi-system deals for Exadata. (But I still haven&#8217;t seen much in the way of production customers named.) ULAs, which I presume are Unlimited License Agreements, are important on the software side. Besides picking on IBM and SAP, Oracle even touted a competitive win vs. EMC, which not coincidentally seems to be working on partnering with almost every Oracle competitor it can find.</li>
<li>Brian Prentice of Gartner basically <a href="http://blogs.gartner.com/brian_prentice/2010/03/23/open-sources-reality-distortion-field/">accused open source</a> of being Dotcom 2.0, in terms of dubious business models and the hype associated with same. I agree with many of his particulars, and indeed often steer vendor clients away from open source strategies. For marketing purposes, I do feel that sometimes <a href="http://www.dbms2.com/2009/10/19/greenplum-free-single-node-edition/">free can be a real cool price</a>; but open source is not the only way to be free.</li>
<li><a href="http://www.dbms2.com/2010/03/22/akibanakiba/">Akiban</a>, which I wrote about a couple of days ago, seems to be building out its <a href="http://akiban.com/">website</a>. As of this writing the website is still pretty raw, with bewildering messaging, carelessly repeated paragraphs, and a notable lack of clues as to who&#8217;s in company leadership. Even so &#8212; unless I missed some of the current stuff before, the site has come a long way in a few days, so maybe there&#8217;s hope.</li>
<li>Groovy Corporation, which introduced the <a href="../2009/07/28/the-groovy-sql-switch/">Groovy SQL Switch</a> just last summer, seems to be doing something different now. It&#8217;s merged into a company called uCirrus (where the u is really a mu), but uCirrus doesn&#8217;t have a meaningful website yet, whereas <a href="http://www.groovycorp.com/index.php">Groovy</a> does. There&#8217;s stuff there about a &#8220;push data cloud,&#8221; stressing the importance of not being a DBMS, under the name Cortex, whatever that all means. Groovy seems to have an online gaming deal for Cortex with MySpace, or maybe Cortex is just the name of a specific Groovy/MySpace project.</li>
<li>Mike Mooney offered a long rant on <a href="http://mooneyblog.mmdbsolutions.com/">the problems with database (design) version control</a>. He did concede that the most recent Microsoft Visual Studio might help, for those who are bought into (and can afford) the Microsoft stack. Frankly, I think that&#8217;s what views are for, updatable or otherwise. In many cases, they&#8217;ll let you build what you need, quickly and without breaking anything, and you can leave it to the DBAs to sort out database performance later.</li>
<li>I just discovered <a href="http://www.chadpluspl.us/">Chad Stewart&#8217;s programming blog</a>. While he&#8217;s evidently a game programmer, a lot of his comments have broader applicability.</li>
<li>Chip Hazard offered a VC&#8217;s perspectives on <a href="http://hazard.typepad.com/hazard-lights/2010/02/quick-reminder-of-the-challenges-and-opportunities-in-enterprise-it.html">the difficulties facing enterprise IT startups</a>. (Hat tip to Miriam Tuerk for turning me on to him.) Although he didn&#8217;t phrase it this way, his bottom line (at least the part I agree with) is that the startup&#8217;s products have to be amazingly superior to the alternatives (big vendors or in-house).</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/27/quick-news-links-comments-etc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>General introduction to Splunk</title>
		<link>http://www.dbms2.com/2009/10/18/general-introduction-to-splunk/</link>
		<comments>http://www.dbms2.com/2009/10/18/general-introduction-to-splunk/#comments</comments>
		<pubDate>Sun, 18 Oct 2009 15:59:56 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Fox and MySpace]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[Splunk]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Text]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1119</guid>
		<description><![CDATA[I dropped by log analysis software vendor Splunk a few weeks ago for a chat with Marketing VP Steve Sommer (who some you may know from Cognos and/or Informix), Product Management VP Christina Noren, and above all co-founder/CTO Erik Swan. Splunk turns out to be a pretty interesting company, from both business and technical standpoints. [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I dropped by log analysis software vendor Splunk a few weeks ago for a chat with Marketing VP Steve Sommer (who some you may know from Cognos and/or Informix), Product Management VP Christina Noren, and above all co-founder/CTO Erik Swan. Splunk turns out to be a pretty interesting company, from both business and technical standpoints. For one thing, Splunk seems highly regarded by most people I mention it to.</p>
<p style="margin-bottom: 0in;">Splunk&#8217;s technical stories include:</p>
<ul>
<li>Text search over log files.</li>
<li>Business intelligence over text 	search. (That part sounds a lot like <a href="http://www.texttechnologies.com/2007/12/12/attivio-tries-to-do-it-all/">Attivio</a>.)</li>
<li>MapReduce with schema flexibility 	and smart multi-stage execution plans. (That part sounds a lot like 	Aster Data.)</li>
</ul>
<p style="margin-bottom: 0in;">More on those in <a href="http://www.dbms2.com/2009/10/18/technical-introduction-to-splunk/">a separate post</a>.</p>
<p style="margin-bottom: 0in;">Less technical Splunk highlights include:<span id="more-1119"></span></p>
<ul>
<li>Splunk has ~1200 paying customers, 	and is adding a couple hundred more per quarter.</li>
<li>Splunk has ~160 people.</li>
<li>~80% of Splunk sales are in North 	America.</li>
<li>Typical Splunk sales prices are in 	the $10-50K range, with an average around $25K, or maybe that 	average is a bit over $30K. Some Splunk deals are six- or even 	seven-figure.</li>
<li>Splunk is “quite profitable.”</li>
<li>Splunk&#8217;s eponymous product is 	priced according to how much data is indexed per day. If you index 	half a gigabyte of logs per day or less, Splunk is completely free. 	So, while Splunk is closed-source, there&#8217;s something of an 	open-source-like Splunk adoption model.</li>
<li>Splunk has been selling product 	for a couple of years. I gather Splunk 4 was recently released.</li>
<li>Splunk&#8217;s biggest industry segments 	are, not too surprisingly,
<ul>
<li>Telco</li>
<li>Financial services</li>
<li>Government</li>
<li>“Online”</li>
</ul>
</li>
<li>Splunk&#8217;s paying customers seem to 	use it mainly for:
<ul>
<li>Web logs and associated network 	event logs (this seems to be the biggest area)</li>
<li>Security and perhaps other general 	IT log analysis</li>
<li>Physical security logs (mainly in 	the government)</li>
<li>Anti-fraud (I&#8217;m not sure how that 	works)</li>
</ul>
</li>
<li>One would think Splunk would be 	used to manage a lot of intelligence telemetry, but that wasn&#8217;t 	particularly hinted at.</li>
<li>In general, the core problem 	Splunk is used for is log analysis for trouble-shooting purposes.</li>
<li>Splunk&#8217;s nonpaying users are more 	diverse; examples mentioned included windmill operations and protein 	research.</li>
<li>Splunk&#8217;s customers include Aster 	Data flagship accounts MySpace and LinkedIn. I bet many other top 	web companies are Splunk customers as well.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/10/18/general-introduction-to-splunk/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>NoSQL?</title>
		<link>http://www.dbms2.com/2009/07/01/nosql-sql-alternative/</link>
		<comments>http://www.dbms2.com/2009/07/01/nosql-sql-alternative/#comments</comments>
		<pubDate>Wed, 01 Jul 2009 07:33:25 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[DBMS product categories]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Fox and MySpace]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[VoltDB and H-Store]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=827</guid>
		<description><![CDATA[Eric Lai emailed today to ask what I thought about the NoSQL folks, and especially whether I thought their ideas were useful for enterprises in general, as opposed to just Web 2.0 companies. That was the first I heard of NoSQL, which seems to be a community discussing SQL alternatives popular among the cloud/big-web-company set, [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Eric Lai emailed today to ask what I thought about the <a href="http://blog.oskarsson.nu/2009/06/nosql-debrief.html">NoSQL</a> folks, and especially whether I thought their ideas were useful for enterprises in general, as opposed to just Web 2.0 companies. That was the first I heard of NoSQL, which seems to be a community discussing SQL alternatives popular among the cloud/big-web-company set, such as BigTable, Hadoop, Cassandra and so on. My short answers are:</p>
<ul>
<li>In most cases, no.</li>
<li>Most of these technologies are 	designed for simple, high-volume OLTP (OnLine Transaction 	Processing.) Most large enterprises have an established way of doing 	OLTP, probably via relational database management systems. Why 	change?</li>
<li>MapReduce is an exception, in that 	it&#8217;s designed for analytics. MapReduce may be useful for 	enterprises. But where it is, it probably should be <a href="http://www.dbms2.com/2008/08/26/why-mapreduce-matters-to-sql-data-warehousing/">integrated 	into an analytic DBMS</a>.</li>
<li>There&#8217;s one 	big countervailing factor to all these generalities &#8212; <em>schema 	flexibility.</em></li>
</ul>
<p style="margin-bottom: 0in;">As for the longer form, let me start by noting that there are two main kinds of reason for not liking SQL.  <span id="more-827"></span>First, you might be fine with the idea of a (somewhat) nonprocedural, schema-aware DML/DDL (Data Manipulation/Description Language), but just think another kind is better, or more suited to your use case.  If your reason is like that, you might favor alternatives such as:</p>
<ul>
<li>OLAP-based languages such as MDX.</li>
<li>XML-oriented languages.</li>
<li>&#8220;True&#8221; relational 	languages, because SQL deviated from the path of relational virtue 	under the corrupt influence of IBM &#8212; aka &#8220;Blue Babylon&#8221; &#8212; and the IT world has been 	languishing in sin ever since.</li>
</ul>
<p style="margin-bottom: 0in;">The second class of reason for avoiding SQL is because you don&#8217;t like the idea of a separate schema-aware DML at all.  Possible reasons for this orientation include:</p>
<ul>
<li>You just like to program, and want 	to manipulate stored data the same way you do anything else. Thus, 	you are bothered by an &#8220;impedance mismatch&#8221; between SQL 	and your favorite programming languages.  This is real. It also has been overcome by many, many enterprises around the world.</li>
<li>You believe that more procedural 	alternatives are a better fit for cloud computing and extreme 	scale-out on failure-prone commodity hardware. <a href="http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/">Facebook</a> made 	that case to me.  However, I have trouble thinking of very many 	enterprise scenarios where it applies, especially when one considers 	electricity costs and the like.</li>
<li>Your schemas change more quickly 	than your data architects can reasonably be expected to keep up 	with.  Facebook made that case to me too. Enterprise examples might 	include marketing campaigns and M&amp;A.  I&#8217;ve long thought this to 	be a legitimate, looming concern. But I don&#8217;t know that 	stripped-down DBMS are the way to address it.</li>
<li>You believe that SQL has severe 	processing overhead.  In most enterprise use cases, that would just 	be bogus.</li>
<li>You lack familiarity with SQL.</li>
</ul>
<p style="margin-bottom: 0in;">That last point is not a joke. One of the weirder database architectures I know of is <a href="http://www.dbms2.com/2007/06/09/the-database-technology-of-guild-wars/">the one underlying Guild Wars</a>.  Its developer &#8212; a brilliantly impressive guy &#8212; told me flat-out that he learned in college how to build a DBMS, but he didn&#8217;t learn how to develop for a conventional one.  This was instrumental in his decision to build an unconventional data management architecture that uses SQL Server as little more than a smart file manager.</p>
<p style="margin-bottom: 0in;">The questions of SQL performance and &#8212; often-unspecified &#8212; &#8220;overhead&#8221; are interesting to view through the lens of the <a href="http://www.dbms2.com/2009/06/22/h-store-horizontica-voltdb/">H-Store/VoltDB</a> project. Mike Stonebraker et al.:</p>
<ul>
<li>Are building a scale-out-oriented 	OLTP DBMS that is meant to run in RAM, preserving data through 	replication to other servers&#8217; RAM more than through output to disk.</li>
<li>Believe that 95% of what a typical 	SQL DBMS does to manage OLTP is wasteful overhead</li>
<li>Originally planned to not use SQL, 	but wound up going with SQL because alternatives were insufficiently 	performant.</li>
</ul>
<p style="margin-bottom: 0in;">Mike himself, of course, has been all over the spectrum on SQL-like languages. First he favored QUEL vigorously over SQL for mainstream relational DBMS.  Then he led the charge to extend SQL in PostgreSQL, Illustra, et al. Then he actually staked out a contrarian position in the area of complex event/stream processing <span style="font-style: normal;">by favoring a SQL-like language in an area where other alternatives were better established &#8212; but that was at what turned into StreamBase, <a href="http://www.dbms2.com/2009/05/21/notes-on-cep-application-development/">which now emphasizes visual programming over any kind of coding language</a>.</span></p>
<p style="margin-bottom: 0in;">I need to write much more about schema flexibility, but tonight &#8212; which will be my third straight of &lt;&lt;8 hours sleep &#8212; is not the time for that.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/07/01/nosql-sql-alternative/feed/</wfw:commentRss>
		<slash:comments>30</slash:comments>
		</item>
		<item>
		<title>Aster Data sticks by its SQL/MapReduce guns</title>
		<link>http://www.dbms2.com/2009/06/09/aster-data-nclustersql-mapreduce/</link>
		<comments>http://www.dbms2.com/2009/06/09/aster-data-nclustersql-mapreduce/#comments</comments>
		<pubDate>Tue, 09 Jun 2009 15:56:00 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Fox and MySpace]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Parallelization]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=806</guid>
		<description><![CDATA[Aster Data continues to think that MapReduce, integrated with SQL, is an important technology. For example: Aster announced today that it&#8217;s providing .NET support for SQL/MapReduce. Perhaps not coincidentally, Aster&#8217;s biggest customer is MySpace, which is apparently a big Microsoft shop.  (And MySpace parent Fox Interactive Media is a SQL/MapReduce fan, albeit running on Greenplum.) [...]]]></description>
			<content:encoded><![CDATA[<p>Aster Data continues to think that MapReduce, integrated with SQL, is an important technology. For example:</p>
<ul>
<li>Aster announced today that it&#8217;s providing .NET support for SQL/MapReduce. Perhaps not coincidentally, <a href="http://www.dbms2.com/2009/03/05/myspaces-multi-hundred-terabyte-database-running-on-aster-data/">Aster&#8217;s biggest customer is MySpace</a>, which is apparently a big Microsoft shop.  (And MySpace parent <a href="http://www.dbms2.com/2009/03/07/three-greenplum-customers-applications-of-mapreduce/">Fox Interactive Media is a SQL/MapReduce fan</a>, albeit running on Greenplum.)</li>
<li>Aster generally puts more emphasis on MapReduce than <a href="http://www.dbms2.com/2008/08/25/mapreduce-sound-bites/">SQL/MapReduce rival Greenplum</a>.  That&#8217;s a non-trivial comparison, because <a href="http://www.dbms2.com/2009/06/05/greenplum-update-release-3-3/">Greenplum is making progress in SQL/MapReduce</a> itself.</li>
<li>When talking with Aster folks, I <span style="text-decoration: line-through;">can&#8217;t get them to shut up</span> hear a lot about SQL/MapReduce.</li>
</ul>
<p>I was <a href="http://www.dbms2.com/2008/08/26/why-mapreduce-matters-to-sql-data-warehousing/">a big fan of SQL/MapReduce</a> when it was first announced last August. Notwithstanding persuasive examples favoring <a href="http://www.dbms2.com/2009/04/14/ebay-thinks-mpp-dbms-clobber-mapreduce/">pure DBMS</a> or <a href="http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/">pure MapReduce</a> over DBMS/MapReduce integration, I continue to think the SQL/MapReduce idea has great potential.  But I do wish more successful production examples would become visible &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/06/09/aster-data-nclustersql-mapreduce/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>More on Fox Interactive Media&#8217;s use of Greenplum</title>
		<link>http://www.dbms2.com/2009/06/08/more-on-fox-interactive-medias-use-of-greenplum/</link>
		<comments>http://www.dbms2.com/2009/06/08/more-on-fox-interactive-medias-use-of-greenplum/#comments</comments>
		<pubDate>Mon, 08 Jun 2009 04:47:59 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Fox and MySpace]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=803</guid>
		<description><![CDATA[Greenplum&#8217;s most important reference is probably its energetic advocate Fox Interactive Media, even ahead of much larger user Greenplum user eBay, and notwithstanding Aster Data&#8217;s large presence in Fox subsidiary MySpace. I just ran across a &#8220;review&#8221; of Greenplum by FIM&#8217;s Brian Dolan, neatly summarizing his views about Greenplum&#8217;s strengths, weaknesses, and uses inside Fox.  [...]]]></description>
			<content:encoded><![CDATA[<p>Greenplum&#8217;s most important reference is probably its energetic advocate <a href="http://www.dbms2.com/2009/03/05/fox-interactive-medias-multi-hundred-terabyte-database-running-on-greenplum/">Fox Interactive Media</a>, even ahead of <a href="http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/">much larger user Greenplum user eBay</a>, and notwithstanding <a href="http://www.dbms2.com/2009/03/05/myspaces-multi-hundred-terabyte-database-running-on-aster-data/">Aster Data&#8217;s large presence in Fox subsidiary MySpace</a>. I just ran across a <a href="http://www.information-management.com/issues/2007_58/greenplum_database-10015307-1.html">&#8220;review&#8221; of Greenplum by FIM&#8217;s Brian Dolan</a>, neatly summarizing his views about Greenplum&#8217;s strengths, weaknesses, and uses inside Fox.  Highlights include:<span id="more-803"></span></p>
<blockquote><p><strong>DELIVERABLES:</strong> Our research analytics team uses Greenplum Database to conduct tens of thousands of real-time [statistical] tests against millions of users every day, analyzing each visitor’s reaction to ads against more than 2,000 variables. This analysis is turned into reports for the BI, database administrators, systems engineering and product teams.</p></blockquote>
<blockquote><p><strong>VENDOR SUPPORT:</strong> We’ve had nothing but positive interactions with the Greenplum team, from first sales call through to implementation and support. The technical team has been able to answer all of our questions, and we have the utmost respect for the engineering minds behind Greenplum Database.</p></blockquote>
<blockquote><p><strong>WEAKNESSES:</strong> The ability to prioritize and balance queries at run time is a weakness. This would allow more effective sharing of resources between power users and reporting tasks. Second, the query optimizer is still immature.</p></blockquote>
<p>Also, when Greenplum briefed on its <a href="http://www.dbms2.com/2009/06/08/the-future-of-data-marts/">Enterprise Data Cloud</a> vision, it gave me a couple of details about what Fox was doing in that area.  For starters, Brian Dolan has a team of analysts, single-digit in number, each with his/her private Greenplum sandbox, just like the vision says they should have.  Second, it so happens that Fox has some customer/partner-facing data marts fed by the Greenplum database, but those are run on PostgreSQL rather than Greenplum.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/06/08/more-on-fox-interactive-medias-use-of-greenplum/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>More on Greenplum, Fox/MySpace, and load speeds</title>
		<link>http://www.dbms2.com/2009/03/20/more-on-greenplum-foxmyspace-and-load-speeds/</link>
		<comments>http://www.dbms2.com/2009/03/20/more-on-greenplum-foxmyspace-and-load-speeds/#comments</comments>
		<pubDate>Fri, 20 Mar 2009 16:40:40 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Fox and MySpace]]></category>
		<category><![CDATA[Greenplum]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=726</guid>
		<description><![CDATA[Eric Lai offers more facts, figures, explanation, and competitive insight than I did on Greenplum&#8217;s loading of the Fox/MySpace database, including that Greenplum is being loaded with data at the 4 TB/hour rate only for half an hour at a time. Also, Eric cites the Greenplum Fox Interactive Media database as being only 200 TB [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.networkworld.com/news/2009/031909-upstarts-speed-past-bi-vendors.html?page=1">Eric Lai</a> offers more facts, figures, explanation, and competitive insight than <a href="http://www.dbms2.com/2009/03/20/greenplum-claims-very-fast-load-speeds-and-fox-still-throws-away-most-of-its-myspace-data/">I did</a> on Greenplum&#8217;s loading of the Fox/MySpace database, including that Greenplum is being loaded with data at the 4 TB/hour rate only for half an hour at a time.</p>
<p>Also, Eric cites the Greenplum Fox Interactive Media database as being only 200 TB in size.  Surely there is some confusion somewhere, since Greenplum described it as being <a href="http://www.dbms2.com/2008/08/25/greenplums-single-biggest-customer/">400 TB</a> back in August.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/03/20/more-on-greenplum-foxmyspace-and-load-speeds/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Greenplum claims very fast load speeds, and Fox still throws away most of its MySpace data</title>
		<link>http://www.dbms2.com/2009/03/20/greenplum-claims-very-fast-load-speeds-and-fox-still-throws-away-most-of-its-myspace-data/</link>
		<comments>http://www.dbms2.com/2009/03/20/greenplum-claims-very-fast-load-speeds-and-fox-still-throws-away-most-of-its-myspace-data/#comments</comments>
		<pubDate>Fri, 20 Mar 2009 09:10:47 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Fox and MySpace]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=723</guid>
		<description><![CDATA[Data warehouse load speeds are a contentious issue.  Vertica contrived a benchmark with a 5 1/2 terabyte/hour load rate.  Oracle has gotten dinged for very low load speeds, which then are hotly debated.  I was told recently of a Greenplum partner&#8217;s salesman steering a prospect who needed rapid load speeds away from Greenplum, which seemed [...]]]></description>
			<content:encoded><![CDATA[<p>Data warehouse load speeds are a contentious issue.  Vertica contrived a benchmark with a <a href="http://www.dbms2.com/2008/12/02/data-warehouse-load-speeds-in-the-spotlight/">5 1/2 terabyte/hour load rate</a>.  Oracle has gotten dinged for <a href="http://www.dbms2.com/2008/09/17/more-mysteries-regarding-oracle-cdr-load-speed/">very low load speeds</a>, which then are hotly debated.  I was told recently of a Greenplum partner&#8217;s salesman steering a prospect who needed rapid load speeds away from Greenplum, which seemed odd to me.</p>
<p>Now Greenplum has come out swinging, claiming <a href="http://www.greenplum.com/news/181/231/Product-Perspective-Greenplum-Reinvents-Data-Loading/d,blog/">&#8220;consistent&#8221; load speeds of <strong>4 terabytes/hour</strong></a> at <a href="http://www.dbms2.com/2009/03/05/fox-interactive-medias-multi-hundred-terabyte-database-running-on-greenplum/">its Fox Interactive Media account</a>, and armed with a customer quote saying just that.  Note however that load speeds tend to be proportional to the number of disks, and there are a LOT of disks at that installation.</p>
<p>One way to think about load speeds is &#8212; how long would it take to load the entire database? It seems as if the Fox database could be loaded, perhaps not in one week, but certainly in less than two. Flipping that around, <strong>the Fox site only has enough capacity to hold less than 2 weeks of detailed data.</strong> (This is not uncommon in network event kinds of databases.) And a corollary of that is &#8212; <strong>worldwide storage sales are still constrained by cost, not by absolute limits on the amounts of data enterprises would like to store.</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/03/20/greenplum-claims-very-fast-load-speeds-and-fox-still-throws-away-most-of-its-myspace-data/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Three Greenplum customers&#8217; applications of MapReduce</title>
		<link>http://www.dbms2.com/2009/03/07/three-greenplum-customers-applications-of-mapreduce/</link>
		<comments>http://www.dbms2.com/2009/03/07/three-greenplum-customers-applications-of-mapreduce/#comments</comments>
		<pubDate>Sat, 07 Mar 2009 07:54:31 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Fox and MySpace]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=718</guid>
		<description><![CDATA[Greenplum (and Truviso) advisor Joseph Hellerstein offers a few examples of MapReduce applications (specifically Greenplum MapReduce), namely: The big aha moment occured for me during our panel discussion, which included Luke Lonergan from Greenplum, Roger Magoulas from O’Reilly, and Brian Dolan from Fox Interactive Media (which runs MySpace among other web properties). Roger talked about [...]]]></description>
			<content:encoded><![CDATA[<p>Greenplum (and Truviso) advisor Joseph Hellerstein offers <a href="http://databeta.wordpress.com/2009/02/06/inflection-point/">a few examples of MapReduce applications</a> (specifically Greenplum MapReduce), namely:</p>
<blockquote><p>The big aha moment occured for me during our panel discussion, which included Luke Lonergan from Greenplum, Roger Magoulas from O’Reilly, and Brian Dolan from Fox Interactive Media (which runs MySpace among other web properties).</p>
<p>Roger talked about using MapReduce to extract structured entities from text for doing tech trend analyses from billions of rows of online job postings.  Brian (who is a mathematician by training) was talking about implementing <a title="Conjugate Gradiant on Wikipedia" href="http://en.wikipedia.org/wiki/Conjugate_gradient_method">conjugate gradiant</a> and <a title="SVMs on Wikipedia" href="http://en.wikipedia.org/wiki/Support_vector_machine">Support Vector Machines</a> in parallel SQL to support “hypertargeting” for advertisers.  I mentioned how Jonathan Goldman at LinkedIn was using SQL and MapReduce to do graph algorithms for social network analysis.</p></blockquote>
<p>Incidentally: While it&#8217;s been some months since I asked, my sense is that the O&#8217;Reilly text extraction is home-grown, and primitive compared to what one could do via <a href="http://www.texttechnologies.com/category/text-mining/">commercial products</a>. That said, if the specific application is examining job postings, I&#8217;m not sure how much value more sophisticated products would add. After all, tech job listings are generally written in a style explicitly designed to ensure that most or all of their meaning is conveyed simply by a bag of keywords. And by the way, this effort has been underway for <a href="http://www.texttechnologies.com/2006/08/12/text-mining-into-big-data-warehouses/">quite some time</a>.</p>
<p><em><strong>Related link</strong></em></p>
<ul>
<li>Greenplum has a <a href="http://www.greenplum.com/customers/o-reilly/">page</a> on the O&#8217;Reilly relationship.  However, the part that isn&#8217;t behind a registration barrier is trivial &#8212; and I wouldn&#8217;t know one way or the other about the registration-required part.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/03/07/three-greenplum-customers-applications-of-mapreduce/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Fox Interactive Media&#8217;s multi-hundred terabyte database running on Greenplum</title>
		<link>http://www.dbms2.com/2009/03/05/fox-interactive-medias-multi-hundred-terabyte-database-running-on-greenplum/</link>
		<comments>http://www.dbms2.com/2009/03/05/fox-interactive-medias-multi-hundred-terabyte-database-running-on-greenplum/#comments</comments>
		<pubDate>Thu, 05 Mar 2009 13:05:08 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Fox and MySpace]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=714</guid>
		<description><![CDATA[Greenplum&#8217;s largest named account is Fox Interactive Media &#8212; the parent organization of MySpace &#8212; which has a multi-hundred terabyte database that it uses for hardcore data mining/analytics. Greenplum has been engaging in regrettable business practices, claiming that it is in the process of supplanting Aster Data at Fox/MySpace. In fact, MySpace&#8217;s use of Aster [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Greenplum&#8217;s largest named account is Fox Interactive Media &#8212; the parent organization of MySpace &#8212; which has a multi-hundred terabyte database that it uses for hardcore data mining/analytics.  Greenplum has been engaging in regrettable business practices, claiming that it is in the process of supplanting Aster Data at Fox/MySpace. In fact, <a href="http://www.dbms2.com/2009/03/05/myspaces-multi-hundred-terabyte-database-running-on-aster-data/">MySpace&#8217;s use of Aster</a> is more mission-critical than Fox&#8217;s use of Greenplum, and is increasing significantly.</p>
<p style="margin-bottom: 0in;">Still, as <a href="http://www.greenplum.com/customers/fox-interactive-media/">Greenplum&#8217;s gushing customer video</a> with Fox Interactive Media* illustrates, the Fox/Greenplum database is impressive on its own merits. <span id="more-714"></span>Indeed, the Fox/Greenplum database seems to be larger than the one MySpace runs on Aster, even though it seems to be used less intensively.  Perhaps data is thrown out from the Aster database but never from the Greenplum one, or perhaps the Greenplum database just ingests more data in the first place.</p>
<p style="margin-bottom: 0in;"><em>*The Greenplum video features completely different customer personnel than Aster&#8217;s.  I&#8217;m sure of this because all the Fox people on the Greenplum video are male, while all the MySpace techies on Aster&#8217;s video are female.</em></p>
<p style="margin-bottom: 0in;">In particular, Fox Interactive uses a database architectural idea I&#8217;m hearing ever more about.  Within the data warehouse, it creates data mart &#8220;sandboxes,&#8221; that analysts can use however they wish. They can even  create and delete tables without screwing up the underlying data. eBay famously uses a similar approach in its Teradata installation, and I believe Dell&#8217;s DATAllegro data warehouse is set up that way as well.</p>
<p style="margin-bottom: 0in;"><em><strong>Related links</strong></em></p>
<ul>
<li><a href="http://www.dbms2.com/2009/03/05/myspaces-multi-hundred-terabyte-database-running-on-aster-data/">MySpace&#8217;s multi-hundred terabyte database running on Aster Data</a></li>
<li><a href="http://www.dbms2.com/2009/03/02/named-customer-silliness/">Greenplum&#8217;s biggest customers</a></li>
</ul>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/03/05/fox-interactive-medias-multi-hundred-terabyte-database-running-on-greenplum/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

