May 22, 2006

Introduction to Cogito

In my Computerworld column appearing today, I promised to post here about Cogito. Let me start with a disclosure and a confession:

Disclosure: I have a business relationship with Cogito. Specifically, I will write a sponsored white paper about their technology. I also am informally (i.e., for no incremental pay at this time) advising them on their strategy. Indeed, the term “relationship analytics” is something I coined in my first briefing with them, and they immediately adopted it as their tagline. That briefing evidently had a lot to do with why they decided to pay my rates for a business relationship. :)

Confession: I haven’t yet finished the research for the aforementioned white paper, so I’m going to tapdance around some details in this post.

The basic idea behind Cogito’s storage mechanism and data architecture is that everything is an arc or node of a graph, or an attribute of same. Thus, there’s always a simple relational/tabular model that’s logically equivalent to a Cogito model, with one table for each type of node or arc.

But suppose your application involves tracking graph paths of nontrivial or indeterminate length. Returning a result via SQL (or some other tabular/relational query language) would require an ugly exponential explosion in the amount of work. Renowned SQL expert Celko, who’s literally written the book(s) on tree management in SQL, has documented this point on behalf of the company. (I invite Cogito — hi, WD! — to post a comment to this blog, with the relevant links of their choice.)

Cogito’s storage, however, is optimized very differently from a tabular system’s. For any given node, it stores ALL the arcs leading from it clustered together, even if those are of different types and would wind up in different SQL tables. The term I’ve coined for this is “starburst”. What’s more — for reasons I haven’t yet fully understood — it turns out that they can get a high hit rate for the following desirable outcome:

For a node referenced in a starburst, there’s a high likelihood that the node’s own starburst will be stored in the same memory block, or else in a contiguous one.

Thus, not only are all the arcs leading from a node clustered together, but most or all of the short-length graph paths are clustered together as well.

So what is this stuff good for? First of all, there are some apps that hard to describe beyond saying they retrieve and present information in the form of a relationship graph (e.g., a geneology website). Beyond that, Cogito has gotten good traction in law enforcement. Here the idea is that you’re looking for the needles of a few true relationships in an enormous haystack of apparent six-degrees-of-separation-style connections. The same “find the bad guy” kind of applications exist in principle in antifraud, epidemiology, and other bioinformatics areas, but I’m not aware of Cogito getting a lot of traction yet in those markets.

As for apps beyond “show the graph” and “find the bad guy” — well, that’s an area of research for me. Stay tuned.


8 Responses to “Introduction to Cogito”

  1. William Donahoo on May 22nd, 2006 7:16 pm

    Thanks for the mention Curt.

    We have a number of relevant white papers and information on our site, I would recommend
    two specific ones that are useful:

    The first one is on Relationship Analytics and defines what our product Knowledge Center is all about:

    The second is a benchmark on how our solution compares to a RDBMS when trying to find the six degrees of
    Kevin Bacon using the IMDB data. It is a good comparison for a person that understands the limitations of RDBMS
    and appreciates a faster way to find how one element might be related to another element.

    Find out more on our website or use our link there to request a private demo or more information.

  2. William Donahoo on May 23rd, 2006 11:40 am

    Note my second link was incorrect.

    for the benchmark whitepaper please use this link.

  3. rob finn on May 24th, 2006 3:50 pm

    Hi Curt, this is a very interesting company. I think a large application of this would be for collection agencies. Cogito customers might need to pull in external data as well, and perhaps Cogito could partner with the appropriate data providers to make an easier discovery of the needed data repositories. I am curious if Cogito should market themselves as enterprise fuzzy search versus a visualization pitch. Karen Stephenson of Netform might provide some key insights into the visualization pitch. I would be interested to hear your comparison of Cigito versus other approaches: semantic or NPL approaches – example being Semagix, Visible Path is another company and its pitch seems more compatible with non technical users. There also seems to be potential partnership with data abstraction companies such as Pantero,Composite or meta mgmt companies like Revelytx?

  4. Curt Monash on May 24th, 2006 10:24 pm

    Hi Rob!

    To answer part of that, I’d say that NLP (Natural Language Processing) is more applicable to other kinds of things. I think it shines for command/control, in what I think has been a missed industry opportunity for 20 years (I loved Lotus HAL, but am obviously one of the few people who did).


  5. Text Technologies»Blog Archive » Relationship analytics — turbocharge for text mining? on June 25th, 2006 3:30 am

    [...] Relationship analytics, which is a new phrase meaning “data management and analysis tools optimized for handling complex relationships” Here a complex relationship is one that, if represented in a relationship graph, would have pathlength a lot more than 1 or 2. [...]

  6. DBMS2 — DataBase Management System Services»Blog Archive » Data warehouse and mart uses – a tentative taxonomy on September 24th, 2006 1:29 am

    [...] But actually – that wasn’t the final category. While we’ve pretty much covered relational and other tabular warehousing, there’s also the whole huge category of text and media search. An enterprise text index is, in its own way, a data warehouse. And then there are a variety of specialty categories, such as relationship analytics. [...]

  7. Patrick Herron on November 10th, 2006 10:32 pm

    I just finished writing my master’s thesis which is in part about how a text mining application without relationship mining is not really text mining but rather just information extraction. Machine learning applied to text, such as classification or clustering-type tasks, is precisely relationship mining.

    Cogito sounds very interesting. Their applications look like they do a good job of representing the relationships to users and recommending useful relationship types as well. Cogito as a company also looks to be competing for a role as a part of Big Brother’s brain. Opportunity knocks, I guess.

  8. Bulletin on Cogito | DBMS2 -- DataBase Management System Services on August 21st, 2009 3:21 am

    [...] is now available for download. Thankfully, it turned out to be pretty consistent with what I previously wrote on the company and its technology. The conclusion to the paper bears quoting [...]

Leave a Reply

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.