This post is part of a series on managing and analyzing graph data. Posts to date include:
- Graph data model basics
- Relationship analytics definition
- Relationship analytics applications (this post)
- Analysis of large graphs
- In adversarial domains such as national security, anti-fraud, or search engine ranking, it’s natural to keep algorithms secret.
- The big exception – influencer analytics, aka social network analysis — is obscured by a major hype/reality gap (so, come to think of it, is a lot of other predictive modeling).
Even so, it’s fairly safe to say:
- Much of relationship analytics is about subgraph pattern matching.
- Much of relationship analytics is about identifying subgraph patterns that are predictive of certain characteristics or outcomes.
- An important kind of relationship analytics challenge is to identify influential individuals.
Notes on that middle point include:
- Pattern identification could be done through trial-and-error visualization, through predictive modeling, or through any form of investigative analytics in between.
- I presume what’s hardest about all this from a processing-performance standpoint would often be enumerating the subgraphs of a particular candidate pattern.
So I’m tempted to say “it’s all about subgraphs.” But it might be more accurate yet to say “It’s about paths”. Arguably, that’s saying the same thing; paths are subgraphs, and subgraphs are made up of paths, so a way of finding one is also a way of finding the other. But referring to paths nods to such standard tasks as:
- Finding the shortest path between two nodes.
- Calculating centrality metrics.
Paths are also simpler than subgraphs, and hence also simpler to think about.
Let’s drill down a bit more on the cases of influencer analysis and centrality. Telecom service providers around the world compete with relatively few of their peers (because they’re so geographically bound), and hence are pretty good about sharing technical ideas with each other. One application that has spread like wildfire is influencer analysis for churn control. The idea is to identify influential subscribers who, if they left your service, would be particularly likely to take other people with them, so that you can make great efforts to retain them. The key data used is CDRs (call detail records).
As in many things, it’s tough to separate influencer analysis adoption fact from fiction.
- The telecom case is surely real; I’ve heard of many examples.
- Social networking is a harder call. Top-down, the story sounds good; but bottom-up, I’m not so sure.*
- I’m quite dubious about attempts to use influencer analysis based on, say, credit card records; the detailed information about person-to-person connections isn’t there.
- National security clearly uses similar kinds of techniques, albeit for slightly different purposes.
Specific conclusions I’ve heard include:
- Who calls you is a better predictor of whether you influence cellular subscribers to churn along with you than who you call.
- Length of calls is an indicator of involvement influence in terrorist networks (short ones suggest there’s serious business being done).
*For example my Klout profile asserts I’m more influential about Airlines than about Databases or Software. A bit of manual intervention could surely change that — which just serves to underscore my doubts about the effectiveness of social network analytic automation.
One more thing — relationship analytics on social networks rarely works unless you take out a few spurious highly-connected nodes. The paradigmatic example is the local pizza parlor, which receives many phone calls, but is neither a terrorist mastermind nor a major influence upon telecom service churn. More on that point when I write about the partitioning of large graphs.