<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS2 -- DataBase Management System Services &#187; RDF and graphs</title>
	<atom:link href="http://www.dbms2.com/category/datatype/rdf-graph-database/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Tue, 09 Mar 2010 18:18:02 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Aster Data nCluster 4.5</title>
		<link>http://www.dbms2.com/2010/02/22/aster-data-ncluster-4-5/</link>
		<comments>http://www.dbms2.com/2010/02/22/aster-data-ncluster-4-5/#comments</comments>
		<pubDate>Mon, 22 Feb 2010 08:20:13 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[SAS Institute]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1617</guid>
		<description><![CDATA[Like Vertica, Netezza, and Teradata, Aster is using this week to pre-announce a forthcoming product release, Aster Data nCluster 4.5. Aster is really hanging its identity on “Big Data Analytics” or some variant of that concept, and so the two major named parts of Aster nCluster 4.5 are:

Aster Data Analytic Foundation, a set of analytic [...]]]></description>
			<content:encoded><![CDATA[<p>Like <a href="http://www.dbms2.com/2010/02/22/vertica-4/" >Vertica</a>, <a href="http://www.dbms2.com/2010/02/22/netezza-twinfin/" >Netezza</a>, and Teradata, Aster is using this week to pre-announce a forthcoming product release, Aster Data nCluster 4.5. Aster is really hanging its identity on “Big Data Analytics” or some variant of that concept, and so the two major named parts of Aster nCluster 4.5 are:</p>
<ul>
<li><strong>Aster Data Analytic Foundation,</strong> a set of analytic packages prebuilt in <a href="../2009/06/09/aster-data-nclustersql-mapreduce/">Aster&#8217;s SQL-MapReduce</a><strong></strong></li>
<li><strong>Aster Data Developer Express,</strong> an Eclipse-based IDE (Integrated Development Environment) for developing and testing applications built on Aster nCluster, Aster SQL-MapReduce, and Aster Data Analytic Foundation</li>
</ul>
<p>And in other Aster news:</p>
<ul>
<li>Along with the development GUI in Aster nCluster 4.5, there is also a new administrative GUI.</li>
<li>Aster has certified that nCluster works with Fusion I/O boards, because at least one retail industry prospect cares. However, that in no way means that arm&#8217;s-length Fusion I/O certification is Aster&#8217;s ultimate <a href="../2010/01/31/flash-pcmsolid-state-memory-disk/">solid-state memory</a> strategy.</li>
<li>I had the wrong impression about how far Aster/SAS integration has gotten. So far, it&#8217;s just at the connector level.</li>
</ul>
<p>Aster Data Developer Express evidently does some cool stuff, like providing some sort of parallelism testing right on your desktop. It also generates lots of stub code, saving humans from the tedium of doing that. Useful, obviously.</p>
<p>But mainly, I want to write about the analytic packages.<span id="more-1617"></span> I&#8217;m not convinced that they&#8217;re a big deal in themselves yet, or that a whole lot of person-months have gone into their combined development. Still, I think they provide a great indication of one direction in which analytic functionality is going. And by the way, Aster promises to release a lot more of that kind of thing over the next 12 months.</p>
<p>Aster&#8217;s flagship analytic package is <a href="../2009/02/10/aster-data-npath/">nPath</a>, which is like a <strong>regular expression matcher,</strong> but <strong>for (time) series of data</strong> rather than for character strings. The main use for nPath is in pulling specific kinds of event sequences out of web or network event logs. However, one could imagine uses in other sectors that focus on temporal or sequential data (e.g., trading, intelligence, other sensor analysis), should existing SQL- and/or CEP-based technologies not prove sufficiently flexible. Aster 4.5 adds some new aggregation capabilities around nPath.</p>
<p>Other not-wholly-new packages in the Aster Data Analytic Foundation announcement are for <strong>sessionization</strong> (of clickstream data and the like) and <strong>tokenization </strong>(of text/character string data). While sessionization can be done in SQL, Aster thinks its MapReduce-based version is faster, since it doesn&#8217;t require self-joins. Makes sense. Aster&#8217;s tokenization sounds lame, however – text analytics in MapReduce tends to reinvent simplistic wheels for no clear reason, and Aster doesn&#8217;t seem to be an exception. (Aster would argue, however, that anything it does in SQL-MapReduce is more flexible than pure SQL or pure MapReduce alternatives.)</p>
<p>Another example of better-living-without-self-joins is Aster&#8217;s new <strong>market basket</strong> package. This lets you look at a set of point-of-sale data, pick a small integer N, and pull out all the sets of N things that were bought by the same person at the same time. I haven&#8217;t probed the claim in detail, but Aster implies there&#8217;s less combinatorial explosion in its approach than it is in the self-join alternative.</p>
<p><em>Note: Gartner highlighted self joins as a performance challenge in its recent </em><a href="../2010/02/10/gartner-magic-quadrant-data-warehouse-2009-2010/">Data Warehouse Magic Quadrant</a><em>.</em></p>
<p>Aster is also releasing a few <strong>statistical and general analytic functions</strong> &#8212; specifically (and I quote a slide):</p>
<ul>
<li>exponential moving average</li>
<li>weighted moving average</li>
<li>simple moving average</li>
<li>volume-weighted average price</li>
<li>correlation</li>
<li>linear regression</li>
<li>logistic regression</li>
<li>approximate_percentile</li>
<li>approximate_count_distinct</li>
</ul>
<p>The point of the last two items on the list is that if you set a non-zero tolerance for error, you can you can count things or order them into bins very efficiently – especially in terms of RAM &#8212; while being guaranteed not to exceed your error tolerance.</p>
<p><em>Note: One obvious inference from this list &#8212; which Aster gladly confirms &#8212; is that Aster has high hopes of selling to the financial services industry. </em></p>
<p>Finally, Aster is releasing its first pure <strong>graph-analytic</strong> function, for finding the shortest path between a given pair of nodes.</p>
<p>While I had the Aster folks on the phone anyway, I also took the opportunity to ask about the Aster nCluster 4.0 capability to create fairly persistent non-relational in-memory data structures. Specifically, I asked whether different users could access the same in-memory structure, and was told that this is a little klugey but not too horrendous. That suggests Aster&#8217;s capability may be a strict superset of UDF-based (User-Defined Function) approaches to meeting the same need, at least from a functionality standpoint. However, ease of creating those in-memory structures may still be better in the more SQL/UDF-centric approach favored by Teradata.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/22/aster-data-ncluster-4-5/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Open issues in database and analytic technology</title>
		<link>http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/</link>
		<comments>http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/#comments</comments>
		<pubDate>Mon, 01 Feb 2010 22:04:31 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1507</guid>
		<description><![CDATA[The last part of my New England Database Summit talk was on open issues in database and analytic technology. This was closely intertwined with the previous section, and also relied on a lot that I&#8217;ve posted here. So I&#8217;ll just put up a few notes on that part, with lots of linkage to prior discussion [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">The last part of my <a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/" >New England Database Summit</a> talk was on open issues in database and analytic technology. This was closely intertwined with the <a href="http://www.dbms2.com/2010/01/31/trends-database-aanalytic-technology/" >previous section</a>, and also relied on a lot that I&#8217;ve posted here. So I&#8217;ll just put up a few notes on that part, with lots of linkage to prior discussion of the same points.<span id="more-1507"></span></p>
<p><!-- 		@page { margin: 0.79in } 		P { margin-bottom: 0.08in } --></p>
<ul>
<li>The most important issue in 	database and analytic technology, in my opinion, isn&#8217;t technological 	at all – rather, it&#8217;s the legal and political steps needed to <a href="http://www.dbms2.com/2010/01/31/data-based-snooping-threat-libert/" > preserve liberty</a> in the face of advancing, intrusive 	technology.</li>
<li>Another important issue for 	society – and this one does involve a lot of technology – is 	scientific number crunching. In particular, <a href="http://www.dbms2.com/2009/10/03/issues-in-scientific-data-management/" >database technology for 	scientific computing</a> needs to be developed much further. I&#8217;ll have 	more to say on all this soon.</li>
<li>More generally, technology needs 	to keep advancing for parallel analytics. Fortunately, it is. Watch 	this space over the next few weeks.</li>
<li>Oracle has said, in effect, that <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" > its most important technological challenge of the decade</a> is getting 	<a href="http://www.dbms2.com/2010/01/31/flash-pcmsolid-state-memory-disk/" >solid-state memory</a> right. I agree.</li>
<li>Data volumes will keep going up, 	up, up. Technology needs to keep evolving accordingly. Much of what 	I write is on that subject.</li>
<li>Data needs to be processed and analyzed at <a href="http://www.dbms2.com/2009/09/10/analytic-speed-latency/" >very 	different latencies</a>. And there&#8217;s much further to go in integrating 	disparate latencies.</li>
<li>Analytic database management in 	the cloud hasn&#8217;t been solved yet, especially for Big Data. Among the 	reasons are the difficulty of moving data into the cloud (unless it 	originated there), the slowness of moving it from node to node in 	shared-nothing architectures (which reduces the elasticity benefit), 	and above all the long and unpredictable latencies of interprocessor 	communication while queries are running (a key subject of discussion 	at the <a href="http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/" >Boston Big Data Summit</a>).</li>
<li>Better business intelligence user 	interfaces are increasingly available. I&#8217;m thinking particularly of 	approaches with buzzwords like <a href="http://www.dbms2.com/2008/08/04/qliktech-qlikview-update/" >visualization/interactive exploration</a> or <a href="http://www.texttechnologies.com/2007/08/03/the-case-for-inxight-awareness-server/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">faceted</a>. But they aren&#8217;t well-integrated into the overall 	analytic stack, as big BI vendors are trailing the smaller ones in 	this regards. (Part of the problem relates to my previous point.)</li>
<li>Application development over text 	search isn&#8217;t in the same league as application development over 	relational DBMS. The choices are mainly XML (e.g., <a href="http://www.texttechnologies.com/2008/04/29/mark-logic-viewed-as-a-different-kind-of-text-search-technology-vendor/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">MarkLogic</a>), SQL 	for text integrated into RDBMS (limited by the weakness of those 	integrations), and something like <a href="http://www.texttechnologies.com/2008/09/20/attivio-update/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">Attivio&#8217;s Java SDK</a>. There&#8217;s a 	major conceptual barrier in building those apps, namely the 	unpredictability of query results. Still, it should be possible to 	do better.</li>
<li>Similarly, text analytics and 	conventional analytics exist well side by side. They can even be in 	the same database and/or dashboard, although in practice that is 	limited by the strong <a href="http://www.texttechnologies.com/2008/10/24/attensity-update-2/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">SaaS focus of text mining vendors and users</a>. But analytic 	integration of them is really hard. Linguistic imprecision is, in my 	opinion, only the #2 reason for this difficulty. The #1 reason is 	that trends detected by text analytics are much less precise than 	trends on tabular data – e.g., a 50% increase in a certain kind of 	complaint may be no more significant than a 5% change in a revenue 	variable.</li>
<li>I&#8217;m increasingly persuaded that <a href="http://www.dbms2.com/2009/08/21/social-network-analysis-aka-relationship-analytics/" > graph analytics</a> can be handled without a graph-centric data model. 	But right now, it isn&#8217;t being handled well at all. Lots more needs 	to be done – although when it is, it will just exacerbate the 	privacy/liberty dangers that so concern me.</li>
</ul>
<p><em><strong>Other posts based on my January, 2010 New England Database Summit keynote address</strong></em></p>
<ul>
<li><a title="Data-based snooping — a huge threat to liberty that we’re all helping make worse" href="../2010/01/31/data-based-snooping-threat-libert/">Data-based snooping — a huge threat to liberty that we’re all helping make worse</a></li>
<li><a title="Flash, other solid-state memory, and disk" href="../2010/01/31/flash-pcmsolid-state-memory-disk/">Flash, other solid-state memory, and disk</a></li>
<li><a title="Interesting trends in database and analytic technology" href="../2010/01/31/trends-database-aanalytic-technology/">Interesting trends in database and analytic technology</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Webinar on MapReduce for complex analytics (Thursday, December 3, 10 am and 2 pm Eastern)</title>
		<link>http://www.dbms2.com/2009/12/02/mapreduce-for-complex-analytics-webina/</link>
		<comments>http://www.dbms2.com/2009/12/02/mapreduce-for-complex-analytics-webina/#comments</comments>
		<pubDate>Wed, 02 Dec 2009 20:57:50 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Data integration and middleware]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1267</guid>
		<description><![CDATA[The second in my two-webinar series for Aster Data will occur tomorrow, twice (both live), at 10 am and 2 pm Eastern time. The other presenters will be Jonathan Goldman, who was Principal Scientist at LinkedIn but now has joined Aster himself, and Steve Wooledge of Aster (playing host). Key links are:

Registration for tomorrow&#8217;s webinars
Replay [...]]]></description>
			<content:encoded><![CDATA[<p>The second in my two-webinar series for Aster Data will occur tomorrow, twice (both live), at 10 am and 2 pm Eastern time. The other presenters will be Jonathan Goldman, who was Principal Scientist at LinkedIn but now has joined Aster himself, and Steve Wooledge of Aster (playing host). Key links are:</p>
<ul>
<li>Registration for <a href="http://www.asterdata.com/wc_091203_masteringmapreduce/" onclick="javascript:pageTracker._trackPageview('/www.asterdata.com');">tomorrow&#8217;s webinars</a></li>
<li>Replay of the <a href="http://www.asterdata.com/masteringmapreduce2/" onclick="javascript:pageTracker._trackPageview('/www.asterdata.com');"> first webinar</a></li>
<li>My slides from the <a href="http://www.dbms2.com/2009/10/15/mapreduce-webinar-slides/" >first webinar</a></li>
</ul>
<p>The main subjects of the webinar will be:</p>
<ul>
<li>Some review of material from the first webinar (all three presenters)</li>
<li>Discussion of how MapReduce can help with three kinds of analytics:
<ul>
<li>Pattern matching (Jonathan will give detail)</li>
<li>Number-crunching (I&#8217;ll cover that, and it will be short)</li>
<li>Graph analytics (I haven&#8217;t written the slides yet, but my starting point will be some of the <a href="http://www.dbms2.com/2009/08/21/social-network-analysis-aka-relationship-analytics/" >relationship analytics</a> ideas we discussed in August)</li>
</ul>
</li>
</ul>
<p>Arguably, aspects of data transformation fit into each of those three categories, which may help explain why data transformation has been so prominent among the early applications of MapReduce.</p>
<p>As you can see from Aster&#8217;s title for the webinar (which they picked while I was on vacation), at least their portion will be focused on customer analytics, e.g. web analytics.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/12/02/mapreduce-for-complex-analytics-webina/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Social network analysis, aka relationship analytics</title>
		<link>http://www.dbms2.com/2009/08/21/social-network-analysis-aka-relationship-analytics/</link>
		<comments>http://www.dbms2.com/2009/08/21/social-network-analysis-aka-relationship-analytics/#comments</comments>
		<pubDate>Fri, 21 Aug 2009 11:10:46 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Cogito and 7 Degrees]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data types]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=867</guid>
		<description><![CDATA[A number of applications lend themselves to graph-oriented analytics, including:

Finding bad guys (national 	intelligence)
Finding bad guys (anti-fraud)
Data mining the social graph 	(e.g., for advertising optimization on social networks, or to 	identify influencers)

There are plenty more graph-oriented applications, of course, such as the identification of biochemical pathways. But I want to focus for now on ones [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">A number of applications lend themselves to graph-oriented analytics, including:</p>
<ul>
<li>Finding bad guys (national 	intelligence)</li>
<li>Finding bad guys (anti-fraud)</li>
<li>Data mining the social graph 	(e.g., for advertising optimization on social networks, or to 	identify influencers)</li>
</ul>
<p style="margin-bottom: 0in;">There are plenty more graph-oriented applications, of course, such as the identification of biochemical pathways. But I want to focus for now on ones like those on my list.  My key points are:</p>
<ul>
<li><strong>There are Big Data problems that 	lend themselves to graphical data models.</strong></li>
<li>So far as I can tell,<strong> the database 	management community isn&#8217;t doing enough to address them.</strong> (If I&#8217;m 	wrong about that, please tell me. I plan to arrive in Lyon for 	VLDB/XLDB Wednesday of next week, and of course I can always be 	reached by email.)</li>
</ul>
<p style="margin-bottom: 0in;">Here&#8217;s what I mean.<span id="more-867"></span></p>
<p style="margin-bottom: 0in;">Applications that analyze relationship graphs are commonly grouped under the name <em>social network analysis. </em><span style="font-style: normal;">As <a href="http://www.strategicmessaging.com/monashs-first-law-of-commercial-semantics-explained/2009/01/09/" onclick="javascript:pageTracker._trackPageview('/www.strategicmessaging.com');">I frequently point out</a>, category names and definitions tend to be imperfect, and that one is no exception. In particular &#8212; and the Wikipedia article on <a href="http://en.wikipedia.org/wiki/Social_network" onclick="javascript:pageTracker._trackPageview('/en.wikipedia.org');">social networks and social network analysis</a> is an excellent example of this – the term tends to be construed to cover the linkages between people or organizations, but not between, say, physical addresses, email addresses, and all the other stuff those intelligence applications actually track.  I tried to introduce the term <a href="http://www.monash.com/CogitoBulletin.pdf" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">relationship analytics</a> a while back, but it unfortunately didn&#8217;t stick. </span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">I only ever got familiar with one company that tried to do a true graph-oriented database management system, suitable for social network analysis/relationship analytics.  It was called Cogito, and had some <a href="../2006/05/22/introduction-to-cogito/">interesting ideas about graphical data structures</a>. Unfortunately, Cogito didn&#8217;t stick either.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">As per the “Metrics” section of the Wikipedia article linked above, there are a number of well-established metrics about the relationships of pairs or groups of node to each other.  The usual way to calculate these metrics is to load the graph into memory and get to work.  (Indeed, such uses seem to be driving a lot of <a href="../2009/04/15/cloudera-presents-the-mapreduce-bull-case/">the national intelligence adoption of Hadoop</a>.) And while I&#8217;m perfectly willing to believe that <a href="../2007/06/15/fast-rdf-in-specialty-relational-databases/">relational database management systems can do a fine job of managing generic RDF</a>, it&#8217;s less obvious that they&#8217;re well-suited to support standard graph-analysis metric computations.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">The reason, in a nutshell, is that the relational approaches usually boil down to maintaining a table with a row for every node-edge-node triple, and then doing a lot of fast self-joins to identify paths.  That can work if connectivity is low and paths are sparse. But for higher degrees of connectivity, such strategies can lead – BOOM! &#8212; to serious combinatorial explosion.  And that&#8217;s not good, because a lot of this analysis focuses on finding exactly the parts of the graph where the connections run thickest.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">Cogito&#8217;s idea was to say “What if, for every node, you could retrieve in only a few blocks all the paths leading from it, at least up to pathlength N?” Unfortunately, Cogito&#8217;s approach to creating this effect had too little to do with optimizer development or selectively redundant data storage, and too much to do with wishful thinking; not coincidentally, <a href="http://www.cogitoinc.com/index.html" onclick="javascript:pageTracker._trackPageview('/www.cogitoinc.com');">Cogito</a> is no longer around. (I haven&#8217;t kept in touch with Cogito&#8217;s successor <a href="http://www.7-degrees.com/index.html" onclick="javascript:pageTracker._trackPageview('/www.7-degrees.com');">7 Degrees</a>, and the reason hasn&#8217;t been lack of effort or interest on my part.)</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">But suppose the idea had worked.  Then – unlike today – it might be realistic to do on-the-fly analytics on Very Large Graphs, just as we do operational business intelligence of a more relational or MOLAP nature. That would be cool.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">How cool would it be? Well, that&#8217;s a bit hard to say. Look again at the list of applications I put up top. Those are NOT ones people generally talk a lot about. Spooks and fraud-fighters are two very secretive kinds of folks. And, for a variety of reasons, the owners of the largest websites also are reluctant to publicize details of how they do or don&#8217;t profile individual users in vivid detail. And then there&#8217;s also the question of whether we even want to help improve technology whose main use is to improve the precision with which computers track individuals – but I don&#8217;t think that&#8217;s the front on which the privacy wars are best fought.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">But if I were a computer science researcher right now, graph databases – optimized to support graph-analytic metrics &#8212; are one of the areas I&#8217;d look at to see if I could make an impact.</span></p>
<p style="margin-bottom: 0in; font-style: normal;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/08/21/social-network-analysis-aka-relationship-analytics/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Cloudera presents the MapReduce bull case</title>
		<link>http://www.dbms2.com/2009/04/15/cloudera-presents-the-mapreduce-bull-case/</link>
		<comments>http://www.dbms2.com/2009/04/15/cloudera-presents-the-mapreduce-bull-case/#comments</comments>
		<pubDate>Wed, 15 Apr 2009 06:22:13 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Facebook and Cassandra]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=751</guid>
		<description><![CDATA[Monday was fire-drill day regarding MapReduce vs. MPP relational DBMS.  The upshot was that I was quoted in Computerworld and paraphrased in GigaOm as being a little more negative on MapReduce than I really am, in line with my comment

Frankly, my views on MapReduce are more balanced than [my] weary negativity would seem to [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Monday was <a href="../2009/04/14/there-always-seems-to-be-a-fire-drill-around-mapreduce-news/">fire-drill</a> day regarding MapReduce vs. MPP relational DBMS.  The upshot was that I was quoted in <a href="http://www.computerworld.com/action/article.do?command=viewArticleBasic&amp;articleId=9131526" onclick="javascript:pageTracker._trackPageview('/www.computerworld.com');"><em>Computerworld</em></a> and paraphrased in <a href="http://gigaom.com/2009/04/14/mapreduce-vs-sql-its-not-one-or-the-other/" onclick="javascript:pageTracker._trackPageview('/gigaom.com');"><em>GigaOm</em></a> as being a little more negative on MapReduce than I really am, in line with my comment</p>
<blockquote>
<p style="margin-bottom: 0in;">Frankly, my views on MapReduce are more balanced than [my] weary negativity would seem to imply.</p>
</blockquote>
<p style="margin-bottom: 0in;">Tuesday afternoon the dial turned a couple notches more positive yet, when I talked with Michael Olson and Jeff Hammerbacher of Cloudera. Cloudera is a new company, built around the open source MapReduce implementation Hadoop. So far Cloudera gives away its Hadoop distribution, without charging for any sort of maintenance or subscription, and just gets revenue from professional services.  Presumably, Cloudera plans for this business model to change down the road.</p>
<p style="margin-bottom: 0in;">Much of our discussion revolved around Facebook, where Jeff directed a huge and diverse Hadoop effort.  Apparently, Hadoop played much of the role of an enterprise data warehouse at Facebook &#8212; at least for clickstream/network data &#8212; including:</p>
<ul>
<li>2 1/2 petabytes of data managed 	via Hadoop</li>
<li><span style="text-decoration: line-through;">10 terabytes/day of data ingested 	via Hadoop </span><em>(Edit: Some of these metrics have been updated in <a href="http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/" >a subsequent post about Facebook</a>.)</em><span style="text-decoration: line-through;"><br />
</span></li>
<li><span style="text-decoration: line-through;">Ad targeting queries run every 15 	minutes in Hadoop</span></li>
<li><span style="text-decoration: line-through;">Dashboard roll-up queries run 	every hour in Hadoop</span></li>
<li>Ad-hoc research/analytic Hadoop 	queries run whenever</li>
<li>Anti-fraud analysis done in Hadoop</li>
<li>Text mining (e.g., of things 	written on people&#8217;s &#8220;walls&#8221;) done in Hadoop</li>
<li>100s or 1000s of simultaneous 	Hadoop queries</li>
<li>JSON-based social network analysis 	in Hadoop</li>
</ul>
<p style="margin-bottom: 0in;">Some Facebook data, however, was put into an Oracle RAC cluster for business intelligence.  And Jeff does concede that query execution is slower in Hadoop than in a relational DBMS.  Hadoop was also used to build the index for Facebook&#8217;s custom text search engine.</p>
<p style="margin-bottom: 0in;">Jeff&#8217;s reasons for liking Hadoop over relational DBMS at Facebook included:<span id="more-751"></span></p>
<ul>
<li><strong>Price.</strong> Hadoop is free.  	MPP relational DBMS generally aren&#8217;t.</li>
<li><strong>Re-purposed data transformation 	logic.</strong> Facebook has lots of code sitting around in, e.g., 	Python to massage various specific kinds of data on its site. This 	code is re-used in Hadoop for ETL/ELT/ELTL/whatever.</li>
<li><strong>Resource management.</strong> Amazingly, Jeff found it easier to build a custom Hadoop resource 	manager to deal with the 100s or 1000s of concurrent queries than to 	rely on the native capabilities of a DBMS.</li>
<li><strong>Schema flexibility.</strong> This is 	a subject I&#8217;ve been preaching about for years. When people interact 	with web sites, the best schema to store data from their 	interactions changes just as quickly as the nature of their possible 	interactions does. Of course, when you add new features to a 	website, you can capture anything you like on a glorified 	entity-attribute-value basis.  (Actually, I guess it would be more 	like EventDescriptor-SessionIdentifierClue-Timestamp.)  But evolving 	a relational schema rapidly enough to keep up is hard. Facebook 	found it easier to evolve its Hadoop-based data massagers instead. 	(I&#8217;ve usually suggested running with XML or an XML-like approach, 	but notwithstanding the case of Marklogic<a href="http://www.marklogic.com/news-and-events/press-releases/mark-logic-and-open-connect-announce-oem-agreement.html" onclick="javascript:pageTracker._trackPageview('/www.marklogic.com');">/OpenConnect</a> , that&#8217;s not usually the way network analytics implementers choose 	to go.)</li>
</ul>
<p style="margin-bottom: 0in;">More generally, Jeff argues there are tasks better programmed in Hadoop than SQL.  He generally leans that way when data is complex, or when the programmers are high-performance computing types who aren&#8217;t experienced DBMS users anyway. One specific example is graph construction and traversal; there seems to be considerable adoption of Hadoop for graph analysis in the national intelligence sector.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/04/15/cloudera-presents-the-mapreduce-bull-case/feed/</wfw:commentRss>
		<slash:comments>27</slash:comments>
		</item>
		<item>
		<title>Oracle spotlights its datatype support</title>
		<link>http://www.dbms2.com/2008/09/23/oracle-spotlights-its-datatype-support/</link>
		<comments>http://www.dbms2.com/2008/09/23/oracle-spotlights-its-datatype-support/#comments</comments>
		<pubDate>Tue, 23 Sep 2008 06:09:34 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data types]]></category>
		<category><![CDATA[GIS and geospatial]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[RDF and graphs]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=564</guid>
		<description><![CDATA[Oracle put out a flurry of press releases today in conjunction with Oracle OpenWorld.  One, which was simply positioned as a report on some &#8220;mission-critical&#8221; customer apps, caught my eye because all four detailed examples involved nonstandard datatypes:

Two Oracle Spatial
One &#8220;semantic,&#8221; which in Oracle lingo seems to mean &#8212; you guessed it &#8212; RDF
One [...]]]></description>
			<content:encoded><![CDATA[<p>Oracle put out a flurry of press releases today in conjunction with Oracle OpenWorld.  One, which was simply positioned as a report on some <a href="http://www.oracle.com/us/corporate/press/017486_EN?rssid=rss_ocom_pr" onclick="javascript:pageTracker._trackPageview('/www.oracle.com');">&#8220;mission-critical&#8221; customer apps</a>, caught my eye because all four detailed examples involved nonstandard datatypes:</p>
<ul>
<li>Two Oracle Spatial</li>
<li>One &#8220;semantic,&#8221; which in <a href="http://www.oracle.com/technology/tech/semantic_technologies/htdocs/what_oracle_brings.html" onclick="javascript:pageTracker._trackPageview('/www.oracle.com');">Oracle lingo</a> seems to mean &#8212; you guessed it &#8212; RDF</li>
<li>One <a href="http://www.oracle.com/technology/products/intermedia/pdf/11g_collateral/dicom11g_twp.pdf" onclick="javascript:pageTracker._trackPageview('/www.oracle.com');">DICOM</a>, which seems to be <a href="http://medical.nema.org/" onclick="javascript:pageTracker._trackPageview('/medical.nema.org');">a medical imaging datatype</a>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/09/23/oracle-spotlights-its-datatype-support/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Known applications of MapReduce</title>
		<link>http://www.dbms2.com/2008/08/26/known-applications-of-mapreduce/</link>
		<comments>http://www.dbms2.com/2008/08/26/known-applications-of-mapreduce/#comments</comments>
		<pubDate>Tue, 26 Aug 2008 04:54:17 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Text]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=500</guid>
		<description><![CDATA[Most of the actual MapReduce applications I&#8217;ve heard of fall into a few areas:

Text tokenization, indexing, and 	search
Creation of other kinds of data 	structures (e.g., graphs)
Data mining and machine learning

That covers all MapReduce apps I recall hearing about via commercial companies and users, and also includes most of what&#8217;s in the two big sources I [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Most of the actual MapReduce applications I&#8217;ve heard of fall into a few areas:</p>
<ul>
<li><strong>Text tokenization, indexing, and 	search</strong></li>
<li><strong>Creation of other kinds of data 	structures (e.g., graphs)</strong></li>
<li><strong>Data mining and machine learning</strong></li>
</ul>
<p style="margin-bottom: 0in;">That covers all MapReduce apps I recall hearing about via commercial companies and users, and also includes most of what&#8217;s in the two big sources I found online.  <span id="more-500"></span>To wit:</p>
<p style="margin-bottom: 0in;">1.  In a <a href="http://labs.google.com/papers/mapreduce-osdi04-slides/index-auto-0005.html" onclick="javascript:pageTracker._trackPageview('/labs.google.com');">slide presentation</a>, Google offers the following applications of MapReduce:</p>
<ul>
<li>distributed grep</li>
<li>distributed sort</li>
<li>web link-graph reversal</li>
<li>term-vector per host</li>
<li>web access log stats</li>
<li>inverted index construction</li>
<li>document clustering</li>
<li>machine learning</li>
<li>statistical machine translation</li>
</ul>
<p style="margin-bottom: 0in;">2.  The <a href="http://wiki.apache.org/hadoop/PoweredBy" onclick="javascript:pageTracker._trackPageview('/wiki.apache.org');">Hadoop applications page</a> offers a rich trove of applications.  Excerpts include:</p>
<ul>
<li>Aggregate, store, and analyze data related to in-stream 	viewing behavior of Internet video audiences.</li>
<li>Analytics</li>
<li>Analyze and index textual information</li>
<li>Analyzing similarities of user&#8217;s behavior.</li>
<li>Build scalable machine learning algorithms like canopy 	clustering, k-means and many more to come (naive bayes classifiers, 	others)</li>
<li>Charts calculation and web log analysis</li>
<li>Crawl Blog posts and later process them.</li>
<li>Crawling, processing, serving and log analysis</li>
<li>Data mining and blog crawling</li>
<li>Facial similarity and recognition across large datasets.</li>
<li>Filter and index our listings, removing exact duplicates and 	grouping similar ones.</li>
<li>Filtering and indexing listing, processing log analysis, and 	for recommendation data.</li>
<li>Flexible web search engine software</li>
<li>Gathering world wide DNS data in order to discover content 	distribution networks and configuration issues</li>
<li>Generating web graphs</li>
<li>Image based video copyright protection.</li>
<li>Image content based advertising and auto-tagging for social 	media.</li>
<li>Image processing environment for image-based product 	recommendation system</li>
<li>Image retrieval engine</li>
<li>Large scale image conversions</li>
<li>Latent Semantic Analysis, Collaborative Filtering</li>
<li>Log analysis, data mining and machine learning</li>
<li>Natural Language Search</li>
<li>Open source social search tools.</li>
<li>Parses and indexes mail logs for search</li>
<li>Plot the entire internet</li>
<li>Process apache log, analyzing user&#8217;s action and click flow 	and the links click with any specified page in site and more.</li>
<li>Process clickstream and demographic data in order to create 	web analytic reports.</li>
<li>Process data relating to people on the web</li>
<li>Process documents from a continuous web crawl and distributed 	training of support vector machines</li>
<li>Process whole price data user input with map/reduce.</li>
<li>Produce statistics.</li>
<li>Product search indices</li>
<li>Recommender system for behavioral targeting, plus other 	clickstream analytics</li>
<li>Reduce usage data for internal metrics, for search indexing 	and for recommendation data.</li>
<li>Research for Ad Systems and Web Search</li>
<li>Retrieving and Analyzing Biomedical Knowledge</li>
<li>Run Naive Bayes classifiers in parallel over crawl data to 	discover event information</li>
<li>Search engine for chiropractic information, local 	chiropractors, products and schools</li>
<li>Serve large Lucene indexes</li>
<li>Session analysis and report generation</li>
<li>Source code search engine</li>
<li>Statistical analysis and modeling at scale.</li>
<li>Storage, log analysis, and pattern discovery/analysis.</li>
<li>Store copies of internal log and dimension data sources and 	use it as a source for reporting/analytics and machine learning.</li>
<li>Teaching and general research activities on natural language 	processing and machine learning.</li>
<li>Vertical search engine for trustworthy wine information</li>
</ul>
<p>There also were some research apps and some general processing speed-up apps I found harder to excerpt.</p>
<p><strong><em>Some of our recent links about MapReduce</em></strong></p>
<ul>
<li><a href="http://www.dbms2.com/2008/08/26/why-mapreduce-matters-to-sql-data-warehousing/" >The integration of MapReduce with SQL data warehousing</a></li>
<li><a href="http://www.dbms2.com/2008/08/26/known-applications-of-mapreduce/" >Three major applications of MapReduce</a></li>
<li><a href="http://www.dbms2.com/2008/08/26/three-approaches-to-parallelizing-data-transformation/" >Another application of MapReduce</a></li>
<li><a href="http://www.dbms2.com/2008/08/25/mapreduce-sound-bites/" >Sound bites about MapReduce</a></li>
<li><a href="http://www.dbms2.com/2008/08/25/mapreduce-links/" >Other links about MapReduce</a></li>
</ul>
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/08/26/known-applications-of-mapreduce/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Mike Stonebraker&#8217;s DBMS taxonomy</title>
		<link>http://www.dbms2.com/2008/02/16/stonebraker-database-taxonomy/</link>
		<comments>http://www.dbms2.com/2008/02/16/stonebraker-database-taxonomy/#comments</comments>
		<pubDate>Sat, 16 Feb 2008 23:21:26 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data types]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[Mid-range]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Relational database management systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/2008/02/16/stonebraker-database-taxonomy/</guid>
		<description><![CDATA[In a response to my recent five-part series on DBMS diversity, Mike Stonebraker has proposed his own taxonomy of data management technologies over on Vertica&#8217;s Database Column blog.


 OLTP DBMSs focused on fast, reliable transaction processing
Analytic/Data Warehouse DBMSs focused on efficient load and ad-hoc query performance
 Science DBMSs &#8212; after all MatLab does not scale [...]]]></description>
			<content:encoded><![CDATA[<p>In a response to my recent five-part series on DBMS diversity, Mike Stonebraker has proposed his own <a href="http://www.databasecolumn.com/2008/02/responding-to-monash-1.html" onclick="javascript:pageTracker._trackPageview('/www.databasecolumn.com');">taxonomy of data management technologies</a> over on Vertica&#8217;s <em><a href="http://www.databasecolumn.com" onclick="javascript:pageTracker._trackPageview('/www.databasecolumn.com');">Database Column</a></em> blog.</p>
<blockquote>
<ol>
<li> OLTP DBMSs focused on fast, reliable transaction processing</li>
<li>Analytic/Data Warehouse DBMSs focused on efficient load and ad-hoc query performance</li>
<li> Science DBMSs &#8212; after all MatLab does not scale to disk-sized arrays</li>
<li>RDF stores focused on efficiently storing semi-structured data in this format</li>
<li> XML stores focused on semi-structured data in this format</li>
<li>Search engines &#8212; the big players all use proprietary engines in this area</li>
<li>Stream Processing Engines focused on real-time StreamSQL</li>
<li>&#8220;Lean and Mean,&#8221; less-than-a-database engines focused on doing a small number of things very well (embedded databases are probably in this category)</li>
<li> MapReduce and Hadoop &#8212; after all Google has enough &#8220;throw weight&#8221; to define a category</li>
</ol>
</blockquote>
<p>He goes on to say that each will be architected differently, except that &#8212; as he already convinced me back in July &#8212; <a href="http://www.dbms2.com/2007/06/15/fast-rdf-in-specialty-relational-databases/" >RDF will be well-managed by specialty data warehouse DBMS</a>.<span id="more-359"></span></p>
<p>I must confess that I didn&#8217;t explicitly mention array-based data stores, whether scientific ones, nor the remaining native MOLAP (Multi-Dimensional OnLine Analytic Processing) engines, nor the <em>sui generis</em> <a href="http://www.dbms2.com/2006/10/04/sas-intelligence-storage/" >SAS Intelligence Storage</a> relational data warehouse product.  So great catch there.  On the not-so-great side, I think Mike&#8217;s definitions of categories #8 and #9 are a bit fuzzy (embedded DBMS tend to be full DBMS, but MapReduce is less than a DBMS).  And of course any finite list like his will make over-general assumptions (e.g., it&#8217;s not obvious the StreamSQL-based CEP vendors will blow away rule-oriented Apama) and omit edge cases.</p>
<p>But there&#8217;s really only one point on which we have meaningful disagreement &#8212; Mike dumps all OLTP and general-purpose relational DBMS into a single bucket.  Considering that such products currently represent a large majority of the multi-billion dollar DBMS market, I think some finer distinctions are in order.  At a minimum, let&#8217;s break them into two categories &#8212; <a href="http://www.dbms2.com/2008/02/15/relational-database-management-categories/" >high-end</a> vs. <a href="http://www.dbms2.com/2008/02/15/mid-range-relational-database-management/" >mid-range</a>.  High-end systems have maximum robustness, whether because there&#8217;s a real application need or because it just makes their owners feel good.  Mid-range systems do everything high-end systems did in the 1990s, and are a cheaper/better alternative for ever more database management tasks.</p>
<p><strong><em>The series on database diversity (more links at the bottom of Part 1):</em><br />
</strong></p>
<ul>
<li> Part 1: <a href="http://www.dbms2.com/2008/02/15/database-management-system-choices-overview/" >Database management system choices – overview</a></li>
<li> Part 2: <a href="http://www.dbms2.com/2008/02/15/relational-database-management-categories/" >Database management system choices – 4 categories of relational</a></li>
<li> Part 3: <a href="http://www.dbms2.com/2008/02/15/specialty-data-warehouse-database-management/" >Database management system choices – relational data warehouse</a></li>
<li> Part 4: <a href="http://www.dbms2.com/2008/02/15/mid-range-relational-database-management/" >Database management system choices – mid-range-relational</a></li>
<li> Part 5: <a href="http://www.dbms2.com/2008/02/15/non-relational-database-management/" >Database management system choices – beyond relational</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2008/02/16/stonebraker-database-taxonomy/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Vertica update – HP appliance deal, customer information, and more</title>
		<link>http://www.dbms2.com/2007/11/07/vertica-hp-appliance-and-customers/</link>
		<comments>http://www.dbms2.com/2007/11/07/vertica-hp-appliance-and-customers/#comments</comments>
		<pubDate>Wed, 07 Nov 2007 07:15:26 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business Objects]]></category>
		<category><![CDATA[DATAllegro]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Relational database management systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/2007/11/07/vertica-hp-appliance-and-customers/</guid>
		<description><![CDATA[Vertica quietly announced an appliance bundling deal with HP and Red Hat today.  That got me quickly onto the phone with Vertica&#8217;s Andy Ellicott, to discuss a few different subjects.  Most interesting was the part about Vertica&#8217;s customer base, highlights of which included:

Vertica&#8217;s claim to have “50” customers includes a bunch of unpaid [...]]]></description>
			<content:encoded><![CDATA[<p>Vertica quietly announced an appliance bundling deal with HP and Red Hat today.  That got me quickly onto the phone with Vertica&#8217;s Andy Ellicott, to discuss a few different subjects.  Most interesting was the part about Vertica&#8217;s customer base, highlights of which included:</p>
<ul>
<li>Vertica&#8217;s <a href="http://www.dbms2.com/2007/10/23/vertica-update/" >claim</a> to have “50” customers includes a bunch of unpaid licenses, many of them in academia.</li>
<li>Vertica has about 15 paying customers.</li>
<li>Based on conversations with mutual prospects, Vertica believes that&#8217;s more customers than DATAllegro has.  (Of course, each DATAllegro sale is bigger than one of Vertica&#8217;s.  Even so, I hope Vertica is wrong in its estimate, since DATAllegro told me its customer count was “double digit” quite a while ago.)</li>
<li>Most Vertica customers manage over 1 terabyte of user data.  A couple have bought licenses showing they intend to manage 20 terabytes or so.</li>
<li>Vertica&#8217;s biggest customer/application category – existing customers and sales pipelines alike – is call detail records for telecommunications companies.  (Other data warehouse specialists also have activity in the CDR area.).   Major applications are billing assurance (getting the inter-carrier charges right) and marketing analysis.  Call center uses are still in the future.</li>
<li>Vertica&#8217;s other big market to date is investment research/tick history.  Surely not coincidentally, this is a big area of focus for Mike Stonebraker, evidently at both companies for which he&#8217;s CTO.  (The other, of course, is StreamBase.)</li>
<li> Runners-up in market activity are clickstream analysis and general consumer analytics.  These seem to be present in Vertica&#8217;s pipeline more than in the actual customer base.</li>
</ul>
<p><span id="more-279"></span></p>
<ul>
<li>Fraud detection comes up as a specific application in multiple customer segments.</li>
<li><a href="http://www.dbms2.com/2007/06/15/fast-rdf-in-specialty-relational-databases/" >RDF</a> isn&#8217;t a big deal for Vertica yet.  However, Vertica does have some RDF pilot projects in the biological research area.</li>
<li>A lot of Vertica customers use Business Objects and/or Informatica.  And as part of QA, Vertica&#8217;s product is tested against other major business intelligence tools as well.</li>
</ul>
<p>As for the HP/Vertica appliance deal:</p>
<ul>
<li>Here&#8217;s the link to <a href="http://www.vertica.com/appliance" onclick="javascript:pageTracker._trackPageview('/www.vertica.com');">Vertica&#8217;s database appliance product page</a>. Note that it mentions 10 terabytes of user data as a representative case.</li>
<li>Vertica reports that a significant minority of its customers/prospects wanted an appliance alternative.</li>
<li>HP now has what it surely perceives as a high-end/low-end pair of offerings – Neoview and Vertica.  Similarly, Sun has what it perceives as a similar pair – Greenplum and ParAccel.  Of course, neither Vertica nor ParAccel would wholly endorse that “low-end” positioning, but they&#8217;re glad to have the big-company partnerships even so.</li>
</ul>
<p>Edit:  For more on the data warehouse appliance market overall, please see this December, 2007 post on <a href="http://www.dbms2.com/2007/12/03/data-warehouse-appliances-%e2%80%93-fact-and-fiction/" >data warehouse appliance fact and fiction</a>.</p>
<p><em></em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2007/11/07/vertica-hp-appliance-and-customers/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Nonstandard data management software &#8212; beyond the Bowling Alley?</title>
		<link>http://www.dbms2.com/2007/07/13/nonstandard-data-management-markets/</link>
		<comments>http://www.dbms2.com/2007/07/13/nonstandard-data-management-markets/#comments</comments>
		<pubDate>Fri, 13 Jul 2007 11:41:46 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Native XML]]></category>
		<category><![CDATA[RDF and graphs]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/2007/07/13/nonstandard-data-management-markets/</guid>
		<description><![CDATA[I just finished a short Monash Letter on markets for nonstandard data management software.  Of course, the whole thing is available only to Monash Advantage members, but here are some salient points:

When new kinds of data are managed, new kinds of data management are used. More precisely, the old ways are tried first &#8212; [...]]]></description>
			<content:encoded><![CDATA[<p>I just finished a short <em>Monash Letter</em> on markets for nonstandard data management software.  Of course, the whole thing is available only to <em><a href="http://www.monash.com/advantage.html" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">Monash Advantage</a></em> members, but here are some salient points:</p>
<ul>
<li><em>When new kinds of data are managed, new kinds of data management are used.</em> More precisely, the old ways are tried first &#8212; but once they fail new technologies are tried out.</li>
<li>Up through the &#8220;Bowling Alley,&#8221; markets for nonstandard data management technology commonly follow the classic Geoffrey Moore pattern.  However, <em>they rarely experience a &#8220;Tornado&#8221; or mass adoption. </em> </li>
<li><em>I think this is apt to change.</em>  My three strongest candidates are native XML, RDF, and memory-centric event/stream processing used for data reduction (as opposed to sub-millisecond latency, which I do think will continue to be a niche requirement).</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2007/07/13/nonstandard-data-management-markets/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
