Surveillance and privacy
Discussion of issues related to liberty and privacy, and especially how they are affected by and interrelated with data management and analytic technologies. Related subjects include:
The United States and consequently much of the world are in political uproar. Much of that is about very general and vital issues such as war, peace or the treatment of women. But quite a lot of it is to some extent tech-industry-specific. The purpose of this post is outline how and why that is.
- There’s a worldwide backlash against “elites” — and tech industry folks are perceived as members of those elites.
- That perception contains a lot of truth, and not just in terms of culture/education/geography. Indeed, it may even be a bit understated, because trends commonly blamed on “trade” or “globalization” often have their roots in technological advances.
- There’s a worldwide trend towards authoritarianism. Surveillance/ privacy and censorship issues are strongly relevant to that trend.
- Social media companies are up to their neck in political considerations.
Because they involve grave threats to liberty, I see surveillance/privacy as the biggest technology-specific policy issues in the United States. (In other countries, technology-driven censorship might loom larger yet.) My views on privacy and surveillance have long been:
- Fixing the legal frameworks around information use is a difficult and necessary job. The tech community should be helping more than it is.
- Until those legal frameworks are indeed cleaned up, the only responsible alternative is to foot-drag on data collection, on data retention, and on the provision of data to governmental agencies.
Given the recent election of a US president with strong authoritarian tendencies, that foot-dragging is much more important than it was before.
Other important areas of technology/policy overlap include: Read more
The United States presidency was recently assumed by an Orwellian lunatic.* Sadly, this is not an exaggeration. The dangers — both of authoritarianism and of general mis-governance — are massive. Everybody needs in some way to respond.
*”Orwellian lunatic” is by no means an oxymoron. Indeed, many of the most successful tyrants in modern history have been delusional; notable examples include Hitler, Stalin, Mao and, more recently, Erdogan. (By way of contrast, I view most other Soviet/Russian leaders and most jumped-up-colonel coup leaders as having been basically sane.)
There are many candidates for what to focus on, including:
- Technology-specific issues — e.g. privacy/surveillance, network neutrality, etc.
- Issues in which technology plays a large role — e.g. economic changes that affect many people’s employment possibilities.
- Subjects that may not be tech-specific, but are certainly of great importance. The list of candidates here is almost endless, such as health care, denigration of women, maltreatment of immigrants, or the possible breakdown of the whole international order.
But please don’t just go on with your life and leave the politics to others. Those “others” you’d like to rely on haven’t been doing a very good job.
What I’ve chosen to do personally includes: Read more
1. The cloud is super-hot. Duh. And so, like any hot buzzword, “cloud” means different things to different marketers. Four of the biggest things that have been called “cloud” are:
- The Amazon cloud, Microsoft Azure, and their competitors, aka public cloud.
- Software as a service, aka SaaS.
- Co-location in off-premises data centers, aka colo.
- On-premises clusters (truly on-prem or colo as the case may be) designed to run a broad variety of applications, aka private cloud.
Further, there’s always the idea of hybrid cloud, in which a vendor peddles private cloud systems (usually appliances) running similar technology stacks to what they run in their proprietary public clouds. A number of vendors have backed away from such stories, but a few are still pushing it, including Oracle and Microsoft.
This is a good example of Monash’s Laws of Commercial Semantics.
2. Due to economies of scale, only a few companies should operate their own data centers, aka true on-prem(ises). The rest should use some combination of colo, SaaS, and public cloud.
This fact now seems to be widely understood.
Five years ago, in a taxonomy of analytic business benefits, I wrote:
A large fraction of all analytic efforts ultimately serve one or more of three purposes:
- Problem and anomaly detection and diagnosis
- Planning and optimization
That continues to be true today. Now let’s add a bit of spin.
1. A large fraction of analytics is adversarial. In particular: Read more
|Categories: Business intelligence, Investment research and trading, Log analysis, Predictive modeling and advanced analytics, RDF and graphs, Surveillance and privacy, Web analytics||3 Comments|
One of the most important issues in privacy and surveillance is also one of the least-discussed — the use of new surveillance technologies in ordinary law enforcement. Reasons for this neglect surely include:
- Governments, including in the US, lie about this subject a lot. Indeed, most of the reporting we do have is exposure of the lies.
- There’s no obvious technology industry ox being gored. What I wrote in another post about Apple, Microsoft et al. upholding their customers’ rights doesn’t have a close analogue here.
One major thread in the United States is: Read more
Numerous tussles fit the template:
- A government wants access to data contained in one or more devices (mobile/personal or server as the case may be).
- The computer’s manufacturer or operator doesn’t want to provide it, for reasons including:
- That’s what customers prefer.
- That’s what other governments require.
- Being pro-liberty is the right and moral choice. (Yes, right and wrong do sometimes actually come into play. )
As a general rule, what’s best for any kind of company is — pricing and so on aside — whatever is best or most pleasing for their customers or users. This would suggest that it is in tech companies’ best interest to favor privacy, but there are two important quasi-exceptions: Read more
|Categories: Amazon and its cloud, Google, Microsoft and SQL*Server, Surveillance and privacy, Web analytics||2 Comments|
This year, privacy and surveillance issues have been all over the news. The most important, in my opinion, deal with the tension among:
- Personal privacy.
- General law enforcement.
More precisely, I’d say that those are the most important in Western democracies. The biggest deal worldwide may be China’s movement towards an ever-more-Orwellian surveillance state.
The main examples on my mind — each covered in a companion post — are:
- The Apple/FBI conflict(s) about locked iPhones.
- The NSA’s propensity to share data with civilian law enforcement.
Legislators’ thinking about these issues, at least in the US, seems to be confused but relatively nonpartisan. Support for these assertions includes:
- The recent unanimous passage in the US House of Representatives of a law restricting police access to email.
- An absurd anti-encryption bill proposed in the US Senate.
- The infrequent mention of privacy/surveillance issues in the current election campaign.
I do think we are in for a spate of law- and rule-making, especially in the US. Bounds on the possible outcomes likely include: Read more
- I’ve suggested in the past that multi-data-center capabilities are important for “data sovereignty”/geo-compliance.
- The need for geo-compliance just got a lot stronger, with the abolition of the European Union’s Safe Harbour rule for the US. If you collect data in multiple countries, you should be at least thinking about geo-compliance.
- Cassandra is an established leader in multi-data-center operation.
But when I made that connection and checked in accordingly with my client Patrick McFadin at DataStax, I discovered that I’d been a little confused about how multi-data-center Cassandra works. The basic idea holds water, but the details are not quite what I was envisioning.
The story starts:
- Cassandra groups nodes into logical “data centers” (i.e. token rings).
- As a best practice, each physical data center can contain one or more logical data center, but not vice-versa.
- There are two levels of replication — within a single logical data center, and between logical data centers.
- Replication within a single data center is planned in the usual way, with the principal data center holding a database likely to have a replication factor of 3.
- However, copies of the database held elsewhere may have different replication factors …
- … and can indeed have different replication factors for different parts of the database.
In particular, a remote replication factor for Cassandra can = 0. When that happens, then you have data sitting in one geographical location that is absent from another geographical location; i.e., you can be in compliance with laws forbidding the export of certain data. To be clear (and this contradicts what I previously believed and hence also implied in this blog):
- General multi-data-center operation is not what gives you geo-compliance, because the default case is that the whole database is replicated to each data center.
- Instead, you get that effect by tweaking your specific replication settings.
|Categories: Cassandra, Clustering, DataStax, HBase, NoSQL, Open source, Specific users, Surveillance and privacy||3 Comments|
1. European Union data sovereignty laws have long had a “Safe Harbour” rule stating it was OK to ship data to the US. Per the case Maximilian Schrems v Data Protection Commissioner, this rule is now held to be invalid. Angst has ensued, and rightly so.
The core technical issues are roughly:
- Data is usually in one logical database. Data may be replicated locally, for availability and performance. It may be replicated remotely, for availability, disaster recovery, and performance. But it’s still usually logically in one database.
- Now remote geographic partitioning may be required by law. Some technologies (e.g. Cassandra) support that for a single logical database. Some don’t.
- Even under best circumstances, hosting and administrative costs are likely to be higher when a database is split across more geographies (especially when the count is increased from 1 to 2).
Facebook’s estimate of billions of dollars in added costs is not easy to refute.
My next set of technical thoughts starts: Read more
It is extremely difficult to succeed with SaaS (Software as a Service) and packaged software in the same company. There were a few vendors who seemed to pull it off in the 1970s and 1980s, generally industry-specific application suite vendors. But it’s hard to think of more recent examples — unless you have more confidence than I do in what behemoth software vendors say about their SaaS/”cloud” businesses.
Despite the cautionary evidence, I’m going to argue that SaaS and software can and often should be combined. The “should” part is pretty obvious, with reasons that start:
- Some customers are clearly better off with SaaS. (E.g., for simplicity.)
- Some customers are clearly better off with on-premises software. (E.g., to protect data privacy.)
- On-premises customers want to know they have a path to the cloud.
- Off-premises customers want the possibility of leaving their SaaS vendor’s servers.
- SaaS can be great for testing, learning or otherwise adopting software that will eventually be operated in-house.
- Marketing and sales efforts for SaaS and packaged versions can be synergistic.
- The basic value proposition, competitive differentiation, etc. should be the same, irrespective of delivery details.
- In some cases, SaaS can be the lower cost/lower commitment option, while packaged product can be the high end or upsell.
- An ideal sales force has both inside/low-end and bag-carrying/high-end components.
But the “how” of combining SaaS and traditional software is harder. Let’s review why. Read more