This post is the second of a series. The first one was an overview of privacy dangers, replete with specific examples of kinds of data that are stored for good reasons, but can also be repurposed for more questionable uses. More on this subject may be found in my August, 2010 post Big Data is Watching You!
There are two technology trends driving electronic privacy threats. Taken together, these trends raise scenarios such as the following:
- Your web surfing behavior indicates you’re a sports car buff, and you further like to look at pictures of scantily-clad young women. A number of your Facebook friends are single women. As a result, you’re deemed a risk to have a mid-life crisis and divorce your wife, thus increasing the interest rate you have to pay when refinancing your house.
- Your cell phone GPS indicates that you drive everywhere, instead of walking. There is no evidence of you pursuing fitness activities, but forum posting activity suggests you’re highly interested in several TV series. Your credit card bills show that your taste in restaurant food tends to the fatty. Your online photos make you look fairly obese, and a couple have ashtrays in them. As a result, you’re judged a high risk of heart attack, and your medical insurance rates are jacked up accordingly.
- You did actually have that mid-life crisis and get divorced. At the child-custody hearing, your ex-spouse’s lawyer quotes a study showing that football-loving upper income Republicans are 27% more likely to beat their children than yoga-class-attending moderate Democrats, and the probability goes up another 8% if they ever bought a jersey featuring a defensive lineman. What’s more, several of the more influential people in your network of friends also fit angry-male patterns, taking the probability of abuse up another 13%. Because of the sound statistics behind such analyses, the judge listens.
Not all these stories are quite possible today, but they aren’t far off either.
One of the supporting trends, pretty obvious, is that there is a lot more electronic information than there used to be. Indeed:
- Sufficient information exists to provided a very detailed picture of our activities.
- Much of it is recorded for very good and beneficial reasons. We wouldn’t want that part to stop.
- This information is inevitably available to government.
Here’s what I mean by the inevitability claim. Whether or not you think anti-terrorism concerns are overblown, as a practical matter your fellow voters* will allow a broad range of governmental information access. Besides, just the widely-available credit card and similar commercial data is enough to provide a fairly detailed picture of what you’re up to. In most countries, anti-pornography, anti-file-sharing, and/or general civilian law enforcement efforts serve to strengthen the point further.
*If you live in a country too unfree for voters to much matter, then it is surely also the case that governmental information has few practical limits.
Examples of information being tracked (more particulars were covered in the first post of this series):
- Almost everything we buy is recorded, via credit card transactions, point-of-sale data, and/or website transaction records. This data is summarized in files covering 100s of millions of individuals, with 1000s of fields per person. Those files can be used for a broad variety of business or law enforcement purposes.
- That data gives a great picture of what we eat, where we commute or travel, what we pay attention to, and so on.
- All our other financial information also passes through computer systems, such as at banks.
- Increasingly, our physical movements are tracked more directly, via cell phones (our own), police cameras, and the like.
- Other than face-to-face conversations, almost all our communications are electronic. Even social media non-adopters rely heavily on telephones, email, and the like.
- Increasingly, our reading and viewing entertainment choices are electronically recorded as well.
Most of that data is available to law enforcement departments. Much of it is available to commercial companies as well.
And these vast amounts of data will hardly go to waste. The second major technological trend in play is that the data can be much more effectively analyzed than before. New kinds of or effectiveness in analytic profiling create whole new levels of exposure (using the word “exposure” in its most literal sense), in at least three ways:
- Relationship profiling.* Relationship analytics technology has been around for a while. When it’s used to find bad guys (terrorists, fraudsters, etc.), that’s one thing. But some of the marketing uses are spookier. Marketing-like uses applied back to governmental surveillance could be spookier yet.
- Propensity profiling.* A huge fraction of what happens in big data analytics is figuring out what you’re likely to buy, vote for, look at, click on, react to, or think. Marketers getting that right can be a bit creepy. So can marketers getting it wrong. Governments doing the same thing could be much creepier yet.
- De-anonymization.* You may think you can be anonymous online, but you really can’t. Also, it’s getting ever harder to keep your roles or activities online separate from each other.
* I just coined the terms “relationship profiling” and “propensity profiling.” “De-anonymization,” however, has been in use for a while.
Classical relationship profiling questions include assessing who has a close relationship with whom, who influences whom, who influences lots of people, etc. The most obvious data to infer this from is communication — who called whom, how long they talked, who they called next, what time of day this all happened, and so on. Anti-terrorist uses are obvious. A major marketing use is telcos — who of course have this data — deciding who to offer their best deals to, by trying to identify who influences the most other customers. These calculations of course involve comparing lots of data, mainly about people who are NOT targets of terrorist investigation or preferential telephone service pricing.
Much of Facebook’s $50 billion valuation hinges on the assumption it can do similar things based on the “social graph” it infers from informal communication among friends. To date that assumption has been questionable, but we’re still in the very early days. Meanwhile, cruder methods of analyzing social influence are used. But the trend is clear — marketers want to use technology to identify social leaders, influence them however they can, and hope that the rest of us follow along baaing. Up to a point, that’s actually OK — learning things from our friends and acquaintances is an important and pleasant part of living in a society. And political campaigners have been doing it for generations, in the most low-tech of fashions. Still, it’s one thing for such targeting of leaders to be transparent; if done surreptitiously, it suddenly starts to feel a lot more sinister.
For years, propensity profiling has been an area of huge investment and technological progress. It’s the central application of big data analytics, and the heart of the business for many companies I write about, or that are my clients. Credit files, web logs, other marketing responses, census information, and other data are combined to infer:
- Your income, household composition, age, race, education, and other basic demographics.
- Your buying, voting, reading, viewing, and other consuming interests in minute psychographic detail.
- Your feelings about particular companies and brands, your propensity to become or stop being their customer, and what kinds of advertisements or offers it would take to influence you.
- Your status as a credit risk.
- The chance you are committing or will commit fraud.
This has been going on since at least the 1990s, especially in service industries with “loyalty card” kinds of programs, such as retail or travel/leisure. In the credit case it’s been going on longer than that. But new data sources, processed by new analytic technologies, have brought the practice to a vastly greater height.
Finally — in case you care about being anonymous online, you’re running out of luck. De-anonymization analytics are getting too good. The Electronic Freedom Foundation’s de-anonymization overview in 2009 was one of many articles pointing out that it often was possible to attach a specific name to online activities that in theory don’t track personally identifiable information. Meanwhile, at a talk I attended in May, 2010, comScore spoke of its successful efforts to tie various anonymous online activities, such as visits to different websites, to each other. And after I entered “usinger.com” into my browser address bar, I started seeing ads for Usinger sausages at a variety of prominent websites.
I’m not sure how much of a privacy threat de-anonymization technology is in and of itself, but it certainly provides support to both relationship and propensity profiling.