Surveillance and privacy

Discussion of issues related to liberty and privacy, and especially how they are affected by and interrelated with data management and analytic technologies. Related subjects include:

Petabyte-scale data management
Privacy, censorship, and freedom (in The Monash Report)

July 29, 2013

Very chilling effects

I’ve worried for years about a terrible and under-appreciated danger of privacy intrusion, which in a recent post I characterized as a chilling effect upon the exercise of ordinary freedoms. When government — or an organization such as your employer, your insurer, etc. — watches you closely, it can be dangerous to deviate from the norm. Even the slightest non-conformity could have serious consequences. I wish that were an exaggeration; let’s explore why it isn’t.

Possible difficulties — most of them a little bit futuristic — include:

Read more

July 8, 2013

Privacy and data use — the problem of chilling effects

This is the second of a two-part series on the theory of information privacy. In the first post, I review the theory to date, and outline what I regard as a huge and crucial gap. In the second post, I try to fill that chasm.

The first post in this two-part series:

Actually, it’s easy to name specific harms from privacy loss. A list might start:

I expect that few people in, say, the United States will suffer these harms in the near future, at least the more severe ones. However, the story gets worse, because we don’t know which disclosures will have which adverse effects. For example, Read more

July 8, 2013

Privacy and data use — a gap in the theory

This is the first of a two-part series on the theory of information privacy. In the first post, I review the theory to date, and outline what I regard as a huge and crucial gap. In the second post, I try to fill that chasm.

Discussion of information privacy has exploded, spurred by increasing awareness of data’s collection and use. Confusion reigns, however, for reasons such as:

Let’s address the last point.  Read more

July 2, 2013

Notes and comments, July 2, 2013

I’m not having a productive week, part of the reason being a hard drive crash that took out early drafts of what were to be last weekend’s blog posts. Now I’m operating from a laptop, rather than my preferred dual-monitor set-up. So please pardon me if I’m concise even by comparison to my usual standards.

*Basic and unavoidable ETL (Extract/Transform/Load) of course excepted.

**I could call that ABC (Always Be Comparing) or ABT (Always Be Testing), but they each sound like – well, like The Glove and the Lions.

June 13, 2013

How is the surveillance data used?

Over the past week, discussion has exploded about US government surveillance. After summarizing, as best I could, what data the government appears to collect, now I ‘d like to consider what they actually do with it. More precisely, I’d like to focus on the data’s use(s) in combating US-soil terrorism. In a nutshell:

Consider the example of Tamerlan Tsarnaev:

In response to this 2011 request, the FBI checked U.S. government databases and other information to look for such things as derogatory telephone communications, possible use of online sites associated with the promotion of radical activity, associations with other persons of interest, travel history and plans, and education history.

While that response was unsuccessful in preventing a dramatic act of terrorism, at least they tried.

As for actual success stories — well, that’s a bit tough. In general, there are few known examples of terrorist plots being disrupted by law enforcement in the United States, except for fake plots engineered to draw terrorist-leaning individuals into committing actual crimes. One of those examples, that of Najibullah Zazi, was indeed based on an intercepted email — but the email address itself was uncovered through more ordinary anti-terrorism efforts.

As for machine learning/data mining/predictive modeling, I’ve never seen much of a hint of it being used in anti-terrorism efforts, whether in the news or in my own discussions inside the tech industry. And I think there’s a great reason for that — what would they use for a training set? Here’s what I mean.  Read more

June 10, 2013

Where things stand in US government surveillance

Edit: Please see the comment thread below for updates. Please also see a follow-on post about how the surveillance data is actually used.

US government surveillance has exploded into public consciousness since last Thursday. With one major exception, the news has just confirmed what was already thought or known. So where do we stand?

My views about domestic data collection start:

*Recall that these comments are US-specific. Data retention legislation has been proposed or passed in multiple countries to require recording of, among other things, all URL requests, with the stated goal of fighting either digital piracy or child pornography.

As for foreign data: Read more

May 20, 2013

Some stuff I’m working on

1. I have some posts up on Strategic Messaging. The most recent are overviews of messaging, pricing, and positioning.

2. Numerous vendors are blending SQL and JSON management in their short-request DBMS. It will take some more work for me to have a strong opinion about the merits/demerits of various alternatives.

The default implementation — one example would be Clustrix’s — is to stick the JSON into something like a BLOB/CLOB field (Binary/Character Large Object), index on individual values, and treat those indexes just like any others for the purpose of SQL statements. Drawbacks include:

IBM DB2 is one recent arrival to the JSON party. Unfortunately, I forgot to ask whether IBM’s JSON implementation was based on IBM DB2 pureXML when I had the chance, and IBM hasn’t gotten around to answering my followup query.

3. Nor has IBM gotten around to answering my followup queries on the subject of BLU, an interesting-sounding columnar option for DB2.

4. Numerous clients have asked me whether they should be active in DBaaS (DataBase as a Service). After all, Amazon, Google, Microsoft, Rackspace and salesforce.com are all in that business in some form, and other big companies have dipped toes in as well. Read more

December 12, 2012

Some trends that will continue in 2013

I’m usually annoyed by lists of year-end predictions. Still, a reporter asked me for some, and I found one kind I was comfortable making.

Trends that I think will continue in 2013 include:

Growing attention to machine-generated data. Human-generated data grows at the rate business activity does, plus 0-25%. Machine-generated data grows at the rate of Moore’s Law, also plus 0-25%, which is a much higher total. In particular, the use of remote machine-generated data is becoming increasingly real.

Hadoop adoption. Everybody has the big bit bucket use case, largely because of machine-generated data. Even today’s technology is plenty good enough for that purpose, and hence justifies initial Hadoop adoption. Development of further Hadoop technology, which I post about frequently, is rapid. And so the Hadoop trend is very real.

Application SaaS. The on-premises application software industry has hopeless problems with product complexity and rigidity. Any suite new enough to cut the Gordian Knot is or will be SaaS (Software as a Service).

Newer BI interfaces. Advanced visualization — e.g. Tableau or QlikView — and mobile BI are both hot. So, more speculatively, are “social” BI (Business Intelligence) interfaces.

Price discounts. If you buy software at 50% of list price, you’re probably doing it wrong. Even 25% can be too high.

MySQL alternatives.  NoSQL and NewSQL products often are developed as MySQL alternatives. Oracle has actually done a good job on MySQL technology, but now its business practices are scaring companies away from MySQL commitments, and newer short-request SQL DBMS are ready for use.

Read more

September 7, 2012

Integrated internet system design

What are the central challenges in internet system design? We probably all have similar lists, comprising issues such as scale, scale-out, throughput, availability, security, programming ease, UI, or general cost-effectiveness. Screw those up, and you don’t have an internet business.

Much new technology addresses those challenges, with considerable success. But the success is usually one silo at a time — a short-request application here, an analytic database there. When it comes to integration, unsolved problems abound.

The top integration and integration-like challenges for me, from a practical standpoint, are:

Other concerns that get mentioned include:

Let’s skip those latter issues for now, focusing instead on the first four.

Read more

March 1, 2012

Where the privacy discussion needs to head

An Atlantic article suggests that the digital advertising industry is coalescing around the position “restrict data use if you must, but go easy on data collection and retention.”

There is a fascinating scrum over what “Do Not Track” tools should do and what orders websites will have to respect from users. The Digital Advertising Alliance (of which the NAI is a part), the Federal Trade Commission, W3C, the Internet Advertising Bureau (also part of the DAA), and privacy researchers at academic institutions are all involved. In November, the DAA put out a new set of principles that contain some good ideas like the prohibition of “collection, use or transfer of Internet surfing data across Websites for determination of a consumer’s eligibility for employment, credit standing, healthcare treatment and insurance.”

This week, the White House seemed to side with privacy advocates who want to limit collection, not just uses. Its Consumer Privacy Bill of Rights pushes companies to allow users to “exercise control over what personal data companies collect from them and how they use it.” The DAA heralded its own participation in the White House process, though even it noted this is the beginning of a long journey.

There has been a clear and real philosophical difference between the advertisers and regulators representing web users. On the one hand, as Stanford privacy researcher Jonathan Mayer put it, “Many stakeholders on online privacy, including U.S. and EU regulators, have repeatedly emphasized that effective consumer control necessitates restrictions on the collection of information, not just prohibitions on specific uses of information.” But advertisers want to keep collecting as much data as they can as long as they promise to not to use it to target advertising. That’s why the NAI opt-out program works like it does.

That’s a drum I’ve been beating for years, so to a first approximation I’m pleased. However:

So to sum up my views on consumer privacy:

That’s the good news. The bad news is on the side of government data collection and use. As I wrote last yearRead more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.