“Social graph” is a highly misleading term, and so is “social network analysis.” By this I mean:
There’s something akin to “social graphs” and “social network analysis” that is more or less worthy of all the current hype – but graphs and network analysis are only a minor part of the whole story.
In particular, the most important parts of the Facebook “social graph” are neither social nor a graph. Rather, what’s really important is an aggregate Profile of Revealed Preferences, of which person-to-person connections or other things best modeled by a graph play only a small part.
Let me hasten to note that – even when viewed narrowly — the ideas of “social graph”and “social network analysis” do have significance. Nontrivial use cases to date for big data social network analysis include:
- Intelligence agencies identify and analyze terrorist networks. Corporations and civilian law enforcement do the same for fraud networks.
- Telephone companies use calling data to figure out which of their customers are most likely to influence which other customers in the decision to keep or change service providers. (Frankly, I find that rather creepy.)
- Social networks figure out which other members you’re likely to know, and encourage you to connect with them.
Epidemiologists aspire to add to that list, based on their success to date using much more micro forms of social network analysis. But after that, I’m running out of examples. Sure, graph analytics is good for a bunch of other things (e.g., biology at the genetic or molecular level), but those have little or nothing to do with “social graphs” or social network analysis as they are commonly understood.
Note: Of course, it is also the case that everything can be modeled by entity-attribute-value triples, and those can always be modeled by graphs. But so what?
Let’s consider what, in a marketer’s ideal world, would go into your Profile of Revealed Preferences. Raw data might include:
- Personally identifyING information. Duh. This is what makes everything else possible.
- Purchase transaction data. Lots of it. Like, everything on your credit card statements.
- Demographic and lifestyle information. Address, date of birth, educational history, race, household composition, and so on.
- Affiliations. Politics, religion, group membership of any kind. (OK, that’s partly social.)
- Explicitly stated opinions, preferences and desires, including:
- Any surveys you have filled out.
- Any recommendations you have made (e.g., through the Facebook Like feature).
- The text of anything you’ve written and posted – and, very ideally, of your private emails as well.
- Any wish lists you’ve filled in.
- Attention information. What you clicked on, what you looked at, and all that stuff website owners track.
- Your movements, to the extent they are tracked. (E.g., via Foursquare and the like.)
- Your gaming activities and the like. (This is social mainly to the extent it overlaps with other categories I’ve already mentioned.)
- Your medical information.
- Who you communicate with, and what you communicate with them about. (Hey! There’s something else social!)
- Similar information about the people you communicate with.
My core privacy thoughts about that data include:
- Individuals deserve the right to control all that information. At a minimum, they deserve total control over how the data (raw or derived) is passed from the service – e.g., website – where it naturally resides (e.g., where it is originated) to any other place.
- Given a chance, individuals would make fine-grained choices about what parts of their Profile of Revealed Preferences are available to which organizations. Reasons include:
- Individuals have rather complex trust relationships with different kinds of merchants and marketers.
- Consumers get different benefits from sharing information with different kinds of merchants and marketers. (Sometimes personalization is a large benefit. Sometimes it’s just creepy. And some companies actively bribe you to give them information they can use to sell to you.)
When one frame things this way, two rather difficult technological questions naturally arise.
- Suppose, implausibly, that a single entity were allowed to control and use (for marketing) all of your Profile of Revealed Preferences information. How would they store and analyze it?
- How does the answer to #1 change because control over the information will, in fact, be fragmented?
It’s tough enough to answer these questions for data about one person. Trying to include all but the simplest information about other people is and will for years remain quite infeasible. So, for the most part, this is not “social” information.
It’s also not naturally a “graph.” Similarly, it is not a good candidate for network analysis. To see why, let me outline why I used the name “Profile of Revealed Preferences”:
- The reason marketers want this data is, mainly, because they want to know what appeals to you, and how strongly you feel about it.
- The analytic process often entails taking explicit choices you have made, and inferring other preferences from them.
- The output of the analytic process is often one or more “scores” that then get plugged into various selection algorithms to determine what you should be shown or offered. At least implicitly, these algorithms are predicting what you will or won’t respond well to.
Not much graph-like there.
This post has gotten pretty long, so I’ll stop here without spelling anything else out. But questions I still hope to address down the road include:
- How should Profile of Revealed Preferences data be stored?
- Suppose we want to pass around derived results and not the raw data. How could we ever get to standards that would make such interchange realistic?
- If we only have raw data to pass around, what are the implications for privacy, liberty, and the structure of the online industries?