Surveillance and privacy
Discussion of issues related to liberty and privacy, and especially how they are affected by and interrelated with data management and analytic technologies. Related subjects include:
A couple of points that arise frequently in conversation, but that I don’t seem to have made clearly online.
“Metadata” is generally defined as “data about data”. That’s basically correct, but it’s easy to forget how many different kinds of metadata there are. My list of metadata kinds starts with:
- Data about data structure. This is the classical sense of the term. But please note:
- In a relational database, structural metadata is rather separate from the data itself.
- In a document database, each document might carry structure information with it.
- Other inputs to core data management functions. Two major examples are:
- Column statistics that inform RDBMS optimizers.
- Value ranges that inform partition pruning or, more generally, data skipping.
- Inputs to ancillary data management functions — for example, security privileges.
- Support for human decisions about data — for example, information about authorship or lineage.
What’s worse, the past year’s most famous example of “metadata”, telephone call metadata, is misnamed. This so-called metadata, much loved by the NSA (National Security Agency), is just data, e.g. in the format of a CDR (Call Detail Record). Calling it metadata implies that it describes other data — the actual contents of the phone calls — that the NSA strenuously asserts don’t actually exist.
And finally, the first bullet point above has a counter-intuitive consequence — all common terminology notwithstanding, relational data is less structured than document data. Reasons include:
- Relational databases usually just hold strings — or maybe numbers — with structural information being held elsewhere.
- Some document databases store structural metadata right with the document data itself.
- Some document databases store data in the form of (name, value) pairs. In some cases additional structure is imposed by naming conventions.
- Actual text documents carry the structure imposed by grammar and syntax.
- A lengthy survey of metadata kinds, biased to Hadoop (August, 2012)
- Metadata as derived data (May, 2011)
- Dataset management (May, 2013)
- Structured/unstructured … multi-structured/poly-structured (May, 2011)
|Categories: Data models and architecture, Hadoop, Structured documents, Surveillance and privacy, Telecommunications||4 Comments|
1. Censorship worries me, a lot. A classic example is Vietnam, which basically has outlawed online political discussion.
And such laws can have teeth. It’s hard to conceal your internet usage from an inquisitive government.
2. Software and software related patents are back in the news. Google, which said it was paying $5.5 billion or so for a bunch of Motorola patents, turns out to really have paid $7 billion or more. Twitter and IBM did a patent deal as well. Big numbers, and good for certain shareholders. But this all benefits the wider world — how?
The purpose of legal intellectual property protections, simply put, is to help make it a good decision to create something. …
Why does “securing … exclusive Right[s]” to the creators of things that are patented, copyrighted, or trademarked help make it a good decision for them to create stuff? Because it averts competition from copiers, thus making the creator a monopolist in what s/he has created, allowing her to at least somewhat value-price her creation.
I.e., the core point of intellectual property rights is to prevent copying-based competition. By way of contrast, any other kind of intellectual property “right” should be viewed with great suspicion.
That Constitutionally-based principle makes as much sense to me now as it did then. By way of contrast, “Let’s give more intellectual property rights to big corporations to protect middle-managers’ jobs” is — well, it’s an argument I view with great suspicion.
But I find it extremely hard to think of a technology industry example in which development was stimulated by the possibility of patent protection. Yes, the situation may be different in pharmaceuticals, or for gadgeteering home inventors, but I can think of no case in which technology has been better, or faster to come to market, because of the possibility of a patent-law monopoly. So if software and business-method patents were abolished entirely – even the ones that I think could be realistically adjudicated — I’d be pleased.
3. In November, 2008 I offered IT policy suggestions for the incoming Obama Administration, especially: Read more
|Categories: Buying processes, Google, IBM and DB2, Public policy, Surveillance and privacy||1 Comment|
In response to the uproar created by the Edward Snowden revelations, the White House commissioned five dignitaries to produce a 300-page report, released last December 12. (Official name: Report and Recommendations of The President’s Review Group on Intelligence and Communications Technologies.) I read or skimmed a large minority of it, and I found enough substance to be worthy of a blog post.
Many of the report’s details fall in the buckets of bureaucratic administrivia,* internal information security, or general pabulum. But the commission started with four general principles that I think have great merit. Read more
I think that most sufficiently large enterprise SaaS vendors should offer an appliance option, as an alternative to the core multi-tenant service. In particular:
- SaaS appliances address customer fears about security, privacy, compliance, performance isolation, and lock-in.
- Some of these benefits occur even if the appliance runs in the same data centers that host the vendor’s standard multi-tenant SaaS. Most of the rest occur if the customer can choose a co-location facility in which to place the appliance.
- Whether many customers should or will use the SaaS appliance option is somewhat secondary; it’s a check-mark item. I.e., many customers and prospects will be pleased that the option at least exists.
How I reached them
Core reasons for selling or using SaaS (Software as a Service) as opposed to licensed software start:
- The SaaS vendor handles all software upgrades, and makes them promptly. In principle, this benefit could also be achieved on a dedicated system on customer premises (or at the customer’s choice of co-location facility).
- In addition, the SaaS vendor handles all the platform and operational stuff — hardware, operating system, computer room, etc. This benefit is antithetical to direct customer control.
- The SaaS vendor only has to develop for and operate on a tightly restricted platform stack that it knows very well. This benefit is also enjoyed in the case of customer-premises appliances.
Conceptually, then, customer-premises SaaS is not impossible, even though one of the standard Big Three SaaS benefits is lost. Indeed:
- Microsoft Windows and many other client software packages already offer to let their updates be automagically handled by the vendor.
- In that vein, consumer devices such as game consoles already are a kind of SaaS appliance.
- Complex devices of any kind, including computers, will see ever more in the way of “phone-home” features or optional services, often including routine maintenance and upgrades.
But from an enterprise standpoint, that’s all (relatively) simple stuff. So we’re left with a more challenging question — does customer-premises SaaS make sense in the case of enterprise applications or other server software?
|Categories: Data warehouse appliances, HP and Neoview, salesforce.com, Software as a Service (SaaS), Surveillance and privacy||5 Comments|
I’ve posted a lot about surveillance and privacy intrusion. Even so, I have a few more things to say.
1. Surveillance and privacy intrusion do, of course, have real benefits. That’s a big part of why I advocate a nuanced approach to privacy regulation. Several of those benefits are mentioned below.
2. Nobody’s opinion about privacy rules should be based on the exact state of surveillance today, for at least two reasons:
- The disclosures keep coming.
- Technology keeps changing.
In particular, people may not realize how comprehensive surveillance will get, due largely to the “internet of things”. The most profound reason — and this will take decades to fully play out — is that we’re headed toward a medical revolution in which our vital signs are more or less continually monitored as they go about their business. Such monitoring will, of course, provide a very detailed record of people’s activities and perhaps even states of mind. Further, vehicle movements will all be tracked and our mobile devices will keep noting our location, in each case for multiple reasons.
- Stores CDRs (Call Detail Records), many or all of which are collected via …
- … some kind of back door into the AT&T switches that many carriers use. (See Slide 2.)
- Has also included “subscriber information” for AT&T phones since July, 2012.
- Contains “long distance and international” CDRs back to 1987.
- Currently adds 4 billion CDRs per day.
- Is administered by a Federal drug-related law enforcement agency but …
- … is used to combat many non-drug-related crimes as well. (See Slides 21-26.)
Other notes include:
- The agencies specifically mentioned on Slide 16 as making numerous Hemisphere requests are the DEA (Drug Enforcement Agency) and DHS (Department of Homeland Security).
- “Roaming” data giving city/state is mentioned in the deck, but more precise geo-targeting is not.
I’ve never gotten a single consistent figure, but typical CDR size seems to be in the 100s of bytes range. So I conjecture that Project Hemisphere spawned one of the first petabyte-scale databases ever.
Hemisphere Project unknowns start: Read more
|Categories: Data warehousing, GIS and geospatial, Petabyte-scale data management, Specific users, Surveillance and privacy, Telecommunications||Leave a Comment|
For years I’ve argued three points about privacy intrusions and surveillance:
- Privacy intrusions are a huge threat to liberty. Since the Snowden revelations started last June, this view has become more widely accepted.
- Much of the problem is the very chilling effects they can have upon the exercise of day-to-day freedoms. Fortunately, I’m not as alone in saying that as I once feared. For example, Christopher Slobogin made that point in a recent CNN article, and then pointed me to a paper* citing other people echoing it, including Sonia Sotomayor.
- Liberty can’t be effectively protected just by controls on the collection, storage, or dissemination of data; direct controls are needed on the use of data as well. Use-based data controls are much more robust in the face of technological uncertainty and change than possession-based ones are.
Since that last point is still very much a minority viewpoint,** I’ll argue it one more time below. Read more
I made a remarkably rumpled video appearance yesterday with SiliconAngle honchos John Furrier and Dave Vellante. (Excuses include <3 hours sleep, and then a scrambling reaction to a schedule change.) Topics covered included, with approximate timechecks:
- 0:00 Introductory pabulum, and some technical difficulties
- 2:00 More introduction
- 3:00 Dynamic schemas and data model churn
- 6:00 Surveillance and privacy
- 13:00 Hadoop, especially the distro wars
- 22:00 BI innovation
- 23:30 More on dynamic schemas and data model churn
Edit: Some of my remarks were transcribed.
- I posted on dynamic schemas data model churn a few days ago.
- I capped off a series on privacy and surveillance a few days ago.
- I commented on various Hadoop distributions in June.
|Categories: Business intelligence, ClearStory Data, Data warehousing, Hadoop, MapR, MapReduce, Surveillance and privacy||Leave a Comment|
I’ve been harping on the grave dangers of surveillance and privacy intrusion. Clearly, something must be done to rein them in. But what?
Well, let’s look at an older and better-understood subject — governmental use of force. Governments, by their very nature, possess tools for tyranny: armies, police forces, and so on. So how do we avoid tyranny? We limit what government is allowed to do with those tools, and we teach our citizens — especially those who serve in government — to obey and enforce the limits.
Those limits can be lumped into two categories:
- Direct — there are very strong controls as to when and how the government may use force.
- Indirect — there are also controls on how the government can even threaten the use of force. I.e., substantially all laws are ultimately backed up by the threat of governmental force — and there are limits as to which laws may or may not be enacted.
The story is similar for surveillance technology:
- As data gathering and analysis technologies skyrocket in power, they become ever more powerful tools for tyranny.
- Direct controls are called for — there is some surveillance the government is and should not be allowed to do.
- Indirect controls are also necessary — even when it has information, there are ways in which the government should not be allowed to use it.
But there’s a big difference between the cases of physical force and surveillance.
- The direct controls on the use of force are strong; under ordinary circumstances, government is NOT allowed to just go out and shoot somebody.
- The direct controls on surveillance, however, are very weak; government has access to all kinds of information.
I’ve worried for years about a terrible and under-appreciated danger of privacy intrusion, which in a recent post I characterized as a chilling effect upon the exercise of ordinary freedoms. When government — or an organization such as your employer, your insurer, etc. — watches you closely, it can be dangerous to deviate from the norm. Even the slightest non-conformity could have serious consequences. I wish that were an exaggeration; let’s explore why it isn’t.
Possible difficulties — most of them a little bit futuristic — include:
- Being perceived as a potential terrorist or terrorist sympathizer. That’s a biggie, of course, at least in “free” countries. Even getting on the No-Fly List is enough to pretty much shut down your travel, and hence your options in many careers. If you want to avoid such problems, it might be prudent not to:
- Visit certain websites.
- Email, telephone, or otherwise communicate with certain people.
- Use certain words or phrases in email or on the telephone.
- Being regarded as too vehement a political dissenter in general. Political dissent is deadly dangerous in too many countries around the world, and has costs even in “free” countries. (Jacob Appelbaum is one recent US example.) To avoid such problems, there are a whole lot of things you might think twice about writing, saying, or doing, and certain people it’s definitely risky to associate with or write nice things about.
- Not being regarded as a probable loyal, hard-working, accepting employee. Think about the difficulties “over-qualified” candidates have getting hired. Then consider what might happen if employers had (accurate or otherwise) psychographic profiles estimating who was most likely to stay at a job, to accept boring job tasks or long hours, or to tolerate sub-standard pay. Then consider how wise it might be to show interest in, for example:
- Other careers.
- Certain hobbies that might be construed as leading to other careers.
- Living in other parts of the country.
- Being perceived as likely to engage in socially-unapproved sexual behavior. In the United States, certain sexual choices — even among consenting adults — could cause problems with discrimination, child custody, or divorce. Elsewhere, your choice of partner could lead to prison or even death. (I don’t know exactly which shopping choices could get one identified as a possible homosexual or philanderer … but just to be on the safe side, you might not want to download any Barbara Streisand songs. )
- Being regarded as a poor health or safety risk for employment, insurance, or more. Do you like fatty foods? Extreme sports? Night clubs? Recreational drugs? Tobacco? More than a little alcohol? Fast cars? Fast women? Evidence of any of those tastes could move you up the risk charts for heart attack, accident, marital dissolution or some other outcome that an employer or insurer wouldn’t like.