<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Cogito and 7 Degrees</title>
	<atom:link href="http://www.dbms2.com/category/products-and-vendors/7-degrees-cogito/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 09 Feb 2012 09:21:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Social network analysis, aka relationship analytics</title>
		<link>http://www.dbms2.com/2009/08/21/social-network-analysis-aka-relationship-analytics/</link>
		<comments>http://www.dbms2.com/2009/08/21/social-network-analysis-aka-relationship-analytics/#comments</comments>
		<pubDate>Fri, 21 Aug 2009 11:10:46 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Cogito and 7 Degrees]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data types]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=867</guid>
		<description><![CDATA[A number of applications lend themselves to graph-oriented analytics, including: Finding bad guys (national intelligence) Finding bad guys (anti-fraud) Data mining the social graph (e.g., for advertising optimization on social networks, or to identify influencers) There are plenty more graph-oriented applications, of course, such as the identification of biochemical pathways. But I want to focus [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">A number of applications lend themselves to graph-oriented analytics, including:</p>
<ul>
<li>Finding bad guys (national 	intelligence)</li>
<li>Finding bad guys (anti-fraud)</li>
<li>Data mining the social graph 	(e.g., for advertising optimization on social networks, or to 	identify influencers)</li>
</ul>
<p style="margin-bottom: 0in;">There are plenty more graph-oriented applications, of course, such as the identification of biochemical pathways. But I want to focus for now on ones like those on my list.  My key points are:</p>
<ul>
<li><strong>There are Big Data problems that 	lend themselves to graphical data models.</strong></li>
<li>So far as I can tell,<strong> the database 	management community isn&#8217;t doing enough to address them.</strong> (If I&#8217;m 	wrong about that, please tell me. I plan to arrive in Lyon for 	VLDB/XLDB Wednesday of next week, and of course I can always be 	reached by email.)</li>
</ul>
<p style="margin-bottom: 0in;">Here&#8217;s what I mean.<span id="more-867"></span></p>
<p style="margin-bottom: 0in;">Applications that analyze relationship graphs are commonly grouped under the name <em>social network analysis. </em><span style="font-style: normal;">As <a href="http://www.strategicmessaging.com/monashs-first-law-of-commercial-semantics-explained/2009/01/09/">I frequently point out</a>, category names and definitions tend to be imperfect, and that one is no exception. In particular &#8212; and the Wikipedia article on <a href="http://en.wikipedia.org/wiki/Social_network">social networks and social network analysis</a> is an excellent example of this – the term tends to be construed to cover the linkages between people or organizations, but not between, say, physical addresses, email addresses, and all the other stuff those intelligence applications actually track.  I tried to introduce the term <a href="http://www.monash.com/CogitoBulletin.pdf">relationship analytics</a> a while back, but it unfortunately didn&#8217;t stick. </span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">I only ever got familiar with one company that tried to do a true graph-oriented database management system, suitable for social network analysis/relationship analytics.  It was called Cogito, and had some <a href="../2006/05/22/introduction-to-cogito/">interesting ideas about graphical data structures</a>. Unfortunately, Cogito didn&#8217;t stick either.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">As per the “Metrics” section of the Wikipedia article linked above, there are a number of well-established metrics about the relationships of pairs or groups of node to each other.  The usual way to calculate these metrics is to load the graph into memory and get to work.  (Indeed, such uses seem to be driving a lot of <a href="../2009/04/15/cloudera-presents-the-mapreduce-bull-case/">the national intelligence adoption of Hadoop</a>.) And while I&#8217;m perfectly willing to believe that <a href="../2007/06/15/fast-rdf-in-specialty-relational-databases/">relational database management systems can do a fine job of managing generic RDF</a>, it&#8217;s less obvious that they&#8217;re well-suited to support standard graph-analysis metric computations.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">The reason, in a nutshell, is that the relational approaches usually boil down to maintaining a table with a row for every node-edge-node triple, and then doing a lot of fast self-joins to identify paths.  That can work if connectivity is low and paths are sparse. But for higher degrees of connectivity, such strategies can lead – BOOM! &#8212; to serious combinatorial explosion.  And that&#8217;s not good, because a lot of this analysis focuses on finding exactly the parts of the graph where the connections run thickest.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">Cogito&#8217;s idea was to say “What if, for every node, you could retrieve in only a few blocks all the paths leading from it, at least up to pathlength N?” Unfortunately, Cogito&#8217;s approach to creating this effect had too little to do with optimizer development or selectively redundant data storage, and too much to do with wishful thinking; not coincidentally, <a href="http://www.cogitoinc.com/index.html">Cogito</a> is no longer around. (I haven&#8217;t kept in touch with Cogito&#8217;s successor <a href="http://www.7-degrees.com/index.html">7 Degrees</a>, and the reason hasn&#8217;t been lack of effort or interest on my part.)</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">But suppose the idea had worked.  Then – unlike today – it might be realistic to do on-the-fly analytics on Very Large Graphs, just as we do operational business intelligence of a more relational or MOLAP nature. That would be cool.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">How cool would it be? Well, that&#8217;s a bit hard to say. Look again at the list of applications I put up top. Those are NOT ones people generally talk a lot about. Spooks and fraud-fighters are two very secretive kinds of folks. And, for a variety of reasons, the owners of the largest websites also are reluctant to publicize details of how they do or don&#8217;t profile individual users in vivid detail. And then there&#8217;s also the question of whether we even want to help improve technology whose main use is to improve the precision with which computers track individuals – but I don&#8217;t think that&#8217;s the front on which the privacy wars are best fought.</span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">But if I were a computer science researcher right now, graph databases – optimized to support graph-analytic metrics &#8212; are one of the areas I&#8217;d look at to see if I could make an impact.</span></p>
<p style="margin-bottom: 0in; font-style: normal;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2009/08/21/social-network-analysis-aka-relationship-analytics/feed/</wfw:commentRss>
		<slash:comments>21</slash:comments>
		</item>
		<item>
		<title>Bulletin on Cogito</title>
		<link>http://www.dbms2.com/2006/12/27/bulletin-on-cogito/</link>
		<comments>http://www.dbms2.com/2006/12/27/bulletin-on-cogito/#comments</comments>
		<pubDate>Wed, 27 Dec 2006 21:47:12 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cogito and 7 Degrees]]></category>
		<category><![CDATA[RDF and graphs]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/2006/12/27/bulletin-on-cogito/</guid>
		<description><![CDATA[My Bulletin on Cogito &#8212; i.e., a short-short white paper &#8212; is now available for download. Thankfully, it turned out to be pretty consistent with what I previously wrote on the company and its technology. The conclusion to the paper bears quoting here: In deciding between conventional DBMS and specialty graph-oriented tools such as Cogito’s, [...]]]></description>
			<content:encoded><![CDATA[<p>My Bulletin on Cogito &#8212; i.e., a short-short white paper &#8212; is now <a href="http://www.monash.com/CogitoBulletin.pdf">available for download</a>.  Thankfully, it turned out to be pretty consistent with what I <a href="http://www.dbms2.com/2006/05/22/introduction-to-cogito/">previously wrote</a> on the company and its technology.  <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />    The conclusion to the paper bears quoting here:</p>
<p class="MsoNormal">
<blockquote><p>In deciding between conventional DBMS and specialty graph-oriented tools such as Cogito’s, there’s one key criterion:  <em>Path length.</em> If path lengths are short and predictable, there’s a good chance that <a href="http://www.dbms2.com/2006/07/03/oracle-graphical-data-models-and-rdf/">relational DBMS and their forthcoming extensions</a> can do the job.  In complex graphs with longer paths, however, relational approaches may not scale well.  In such cases, specialty technologies warrant serious consideration.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2006/12/27/bulletin-on-cogito/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Introduction to Cogito</title>
		<link>http://www.dbms2.com/2006/05/22/introduction-to-cogito/</link>
		<comments>http://www.dbms2.com/2006/05/22/introduction-to-cogito/#comments</comments>
		<pubDate>Mon, 22 May 2006 14:16:11 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cogito and 7 Degrees]]></category>
		<category><![CDATA[RDF and graphs]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/2006/05/22/introduction-to-cogito/</guid>
		<description><![CDATA[In my Computerworld column appearing today, I promised to post here about Cogito. Let me start with a disclosure and a confession: Disclosure: I have a business relationship with Cogito. Specifically, I will write a sponsored white paper about their technology. I also am informally (i.e., for no incremental pay at this time) advising them [...]]]></description>
			<content:encoded><![CDATA[<p>In my <em>Computerworld</em> column appearing today, I promised to post here about Cogito.  Let me start with a disclosure and a confession:<span id="more-79"></span></p>
<p><em>Disclosure:</em> I have a business relationship with Cogito.  Specifically, I will write a sponsored white paper about their technology.  I also am informally (i.e., for no incremental pay <em>at this time</em>) advising them on their strategy.  Indeed, the term &#8220;relationship analytics&#8221; is something I coined in my first briefing with them, and they immediately adopted it as their tagline.  That briefing evidently had a lot to do with why they decided to pay my rates for a business relationship. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p><em>Confession:</em> I haven&#8217;t yet finished the research for the aforementioned white paper, so I&#8217;m going to tapdance around some details in this post.</p>
<p>The basic idea behind Cogito&#8217;s storage mechanism and data architecture is that everything is an arc or node of a graph, or an attribute of same.  Thus, there&#8217;s always a simple relational/tabular model that&#8217;s <strong>logically</strong> equivalent to a Cogito model, with one table for each type of node or arc.</p>
<p>But suppose your application involves tracking graph paths of nontrivial or indeterminate length.  Returning a result via SQL (or some other tabular/relational query language) would require an ugly exponential explosion in the amount of work.   Renowned SQL expert Celko, who&#8217;s literally written the book(s) on tree management in SQL, has documented this point on behalf of the company.  (I invite Cogito &#8212; hi, WD! &#8212; to post a comment to this blog, with the relevant links of their choice.)</p>
<p>Cogito&#8217;s storage, however, is optimized very differently from a tabular system&#8217;s.  For any given node, it stores ALL the arcs leading from it clustered together, even if those are of different types and would wind up in different SQL tables.  The term I&#8217;ve coined for this is &#8220;starburst&#8221;.  What&#8217;s more &#8212; for reasons I haven&#8217;t yet fully understood &#8212; it turns out that they can get a high hit rate for the following desirable outcome:</p>
<p><strong>For a node referenced in a starburst, there&#8217;s a high likelihood that the node&#8217;s own starburst will be stored in the same memory block, or else in a contiguous one.</strong></p>
<p>Thus, <strong>not only are all the arcs leading from a node clustered together, but most or all of the short-length graph paths are clustered together as well.</strong></p>
<p>So what is this stuff good for?  First of all, there are some apps that hard to describe beyond saying they retrieve and present information in the form of a relationship graph (e.g., a geneology website).  Beyond that, Cogito has gotten good traction in law enforcement.  Here the idea is that you&#8217;re looking for the needles of a few true relationships in an enormous haystack of apparent six-degrees-of-separation-style connections.  The same &#8220;find the bad guy&#8221; kind of applications exist in principle in antifraud, epidemiology, and other bioinformatics areas, but I&#8217;m not aware of Cogito getting a lot of traction yet in those markets.</p>
<p>As for apps beyond &#8220;show the graph&#8221; and &#8220;find the bad guy&#8221; &#8212; well, that&#8217;s an area of research for me.  Stay tuned.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2006/05/22/introduction-to-cogito/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
	</channel>
</rss>

