In my Computerworld column appearing today, I promised to post here about Cogito. Let me start with a disclosure and a confession:
Disclosure: I have a business relationship with Cogito. Specifically, I will write a sponsored white paper about their technology. I also am informally (i.e., for no incremental pay at this time) advising them on their strategy. Indeed, the term “relationship analytics” is something I coined in my first briefing with them, and they immediately adopted it as their tagline. That briefing evidently had a lot to do with why they decided to pay my rates for a business relationship.
Confession: I haven’t yet finished the research for the aforementioned white paper, so I’m going to tapdance around some details in this post.
The basic idea behind Cogito’s storage mechanism and data architecture is that everything is an arc or node of a graph, or an attribute of same. Thus, there’s always a simple relational/tabular model that’s logically equivalent to a Cogito model, with one table for each type of node or arc.
But suppose your application involves tracking graph paths of nontrivial or indeterminate length. Returning a result via SQL (or some other tabular/relational query language) would require an ugly exponential explosion in the amount of work. Renowned SQL expert Celko, who’s literally written the book(s) on tree management in SQL, has documented this point on behalf of the company. (I invite Cogito — hi, WD! — to post a comment to this blog, with the relevant links of their choice.)
Cogito’s storage, however, is optimized very differently from a tabular system’s. For any given node, it stores ALL the arcs leading from it clustered together, even if those are of different types and would wind up in different SQL tables. The term I’ve coined for this is “starburst”. What’s more — for reasons I haven’t yet fully understood — it turns out that they can get a high hit rate for the following desirable outcome:
For a node referenced in a starburst, there’s a high likelihood that the node’s own starburst will be stored in the same memory block, or else in a contiguous one.
Thus, not only are all the arcs leading from a node clustered together, but most or all of the short-length graph paths are clustered together as well.
So what is this stuff good for? First of all, there are some apps that hard to describe beyond saying they retrieve and present information in the form of a relationship graph (e.g., a geneology website). Beyond that, Cogito has gotten good traction in law enforcement. Here the idea is that you’re looking for the needles of a few true relationships in an enormous haystack of apparent six-degrees-of-separation-style connections. The same “find the bad guy” kind of applications exist in principle in antifraud, epidemiology, and other bioinformatics areas, but I’m not aware of Cogito getting a lot of traction yet in those markets.
As for apps beyond “show the graph” and “find the bad guy” — well, that’s an area of research for me. Stay tuned.