Liberty and privacy
Discussion of issues related to liberty and privacy, and especially how they are affected by and interrelated with data management and analytic technologies. Related subjects include:
I think that most sufficiently large enterprise SaaS vendors should offer an appliance option, as an alternative to the core multi-tenant service. In particular:
- SaaS appliances address customer fears about security, privacy, compliance, performance isolation, and lock-in.
- Some of these benefits occur even if the appliance runs in the same data centers that host the vendor’s standard multi-tenant SaaS. Most of the rest occur if the customer can choose a co-location facility in which to place the appliance.
- Whether many customers should or will use the SaaS appliance option is somewhat secondary; it’s a check-mark item. I.e., many customers and prospects will be pleased that the option at least exists.
How I reached them
Core reasons for selling or using SaaS (Software as a Service) as opposed to licensed software start:
- The SaaS vendor handles all software upgrades, and makes them promptly. In principle, this benefit could also be achieved on a dedicated system on customer premises (or at the customer’s choice of co-location facility).
- In addition, the SaaS vendor handles all the platform and operational stuff — hardware, operating system, computer room, etc. This benefit is antithetical to direct customer control.
- The SaaS vendor only has to develop for and operate on a tightly restricted platform stack that it knows very well. This benefit is also enjoyed in the case of customer-premises appliances.
Conceptually, then, customer-premises SaaS is not impossible, even though one of the standard Big Three SaaS benefits is lost. Indeed:
- Microsoft Windows and many other client software packages already offer to let their updates be automagically handled by the vendor.
- In that vein, consumer devices such as game consoles already are a kind of SaaS appliance.
- Complex devices of any kind, including computers, will see ever more in the way of “phone-home” features or optional services, often including routine maintenance and upgrades.
But from an enterprise standpoint, that’s all (relatively) simple stuff. So we’re left with a more challenging question — does customer-premises SaaS make sense in the case of enterprise applications or other server software?
|Categories: Data warehouse appliances, HP and Neoview, Liberty and privacy, salesforce.com, Software as a Service (SaaS)||4 Comments|
I’ve posted a lot about surveillance and privacy intrusion. Even so, I have a few more things to say.
1. Surveillance and privacy intrusion do, of course, have real benefits. That’s a big part of why I advocate a nuanced approach to privacy regulation. Several of those benefits are mentioned below.
2. Nobody’s opinion about privacy rules should be based on the exact state of surveillance today, for at least two reasons:
- The disclosures keep coming.
- Technology keeps changing.
In particular, people may not realize how comprehensive surveillance will get, due largely to the “internet of things”. The most profound reason — and this will take decades to fully play out — is that we’re headed toward a medical revolution in which our vital signs are more or less continually monitored as they go about their business. Such monitoring will, of course, provide a very detailed record of people’s activities and perhaps even states of mind. Further, vehicle movements will all be tracked and our mobile devices will keep noting our location, in each case for multiple reasons.
- Stores CDRs (Call Detail Records), many or all of which are collected via …
- … some kind of back door into the AT&T switches that many carriers use. (See Slide 2.)
- Has also included “subscriber information” for AT&T phones since July, 2012.
- Contains “long distance and international” CDRs back to 1987.
- Currently adds 4 billion CDRs per day.
- Is administered by a Federal drug-related law enforcement agency but …
- … is used to combat many non-drug-related crimes as well. (See Slides 21-26.)
Other notes include:
- The agencies specifically mentioned on Slide 16 as making numerous Hemisphere requests are the DEA (Drug Enforcement Agency) and DHS (Department of Homeland Security).
- “Roaming” data giving city/state is mentioned in the deck, but more precise geo-targeting is not.
I’ve never gotten a single consistent figure, but typical CDR size seems to be in the 100s of bytes range. So I conjecture that Project Hemisphere spawned one of the first petabyte-scale databases ever.
Hemisphere Project unknowns start: Read more
|Categories: Data warehousing, GIS and geospatial, Liberty and privacy, Petabyte-scale data management, Specific users, Telecommunications||Leave a Comment|
For years I’ve argued three points about privacy intrusions and surveillance:
- Privacy intrusions are a huge threat to liberty. Since the Snowden revelations started last June, this view has become more widely accepted.
- Much of the problem is the very chilling effects they can have upon the exercise of day-to-day freedoms. Fortunately, I’m not as alone in saying that as I once feared. For example, Christopher Slobogin made that point in a recent CNN article, and then pointed me to a paper* citing other people echoing it, including Sonia Sotomayor.
- Liberty can’t be effectively protected just by controls on the collection, storage, or dissemination of data; direct controls are needed on the use of data as well. Use-based data controls are much more robust in the face of technological uncertainty and change than possession-based ones are.
Since that last point is still very much a minority viewpoint,** I’ll argue it one more time below. Read more
I made a remarkably rumpled video appearance yesterday with SiliconAngle honchos John Furrier and Dave Vellante. (Excuses include <3 hours sleep, and then a scrambling reaction to a schedule change.) Topics covered included, with approximate timechecks:
- 0:00 Introductory pabulum, and some technical difficulties
- 2:00 More introduction
- 3:00 Dynamic schemas and data model churn
- 6:00 Surveillance and privacy
- 13:00 Hadoop, especially the distro wars
- 22:00 BI innovation
- 23:30 More on dynamic schemas and data model churn
Edit: Some of my remarks were transcribed.
- I posted on dynamic schemas data model churn a few days ago.
- I capped off a series on privacy and surveillance a few days ago.
- I commented on various Hadoop distributions in June.
|Categories: Business intelligence, ClearStory Data, Data warehousing, Hadoop, Liberty and privacy, MapR, MapReduce||Leave a Comment|
I’ve been harping on the grave dangers of surveillance and privacy intrusion. Clearly, something must be done to rein them in. But what?
Well, let’s look at an older and better-understood subject — governmental use of force. Governments, by their very nature, possess tools for tyranny: armies, police forces, and so on. So how do we avoid tyranny? We limit what government is allowed to do with those tools, and we teach our citizens — especially those who serve in government — to obey and enforce the limits.
Those limits can be lumped into two categories:
- Direct — there are very strong controls as to when and how the government may use force.
- Indirect — there are also controls on how the government can even threaten the use of force. I.e., substantially all laws are ultimately backed up by the threat of governmental force — and there are limits as to which laws may or may not be enacted.
The story is similar for surveillance technology:
- As data gathering and analysis technologies skyrocket in power, they become ever more powerful tools for tyranny.
- Direct controls are called for — there is some surveillance the government is and should not be allowed to do.
- Indirect controls are also necessary — even when it has information, there are ways in which the government should not be allowed to use it.
But there’s a big difference between the cases of physical force and surveillance.
- The direct controls on the use of force are strong; under ordinary circumstances, government is NOT allowed to just go out and shoot somebody.
- The direct controls on surveillance, however, are very weak; government has access to all kinds of information.
I’ve worried for years about a terrible and under-appreciated danger of privacy intrusion, which in a recent post I characterized as a chilling effect upon the exercise of ordinary freedoms. When government — or an organization such as your employer, your insurer, etc. — watches you closely, it can be dangerous to deviate from the norm. Even the slightest non-conformity could have serious consequences. I wish that were an exaggeration; let’s explore why it isn’t.
Possible difficulties — most of them a little bit futuristic — include:
- Being perceived as a potential terrorist or terrorist sympathizer. That’s a biggie, of course, at least in “free” countries. Even getting on the No-Fly List is enough to pretty much shut down your travel, and hence your options in many careers. If you want to avoid such problems, it might be prudent not to:
- Visit certain websites.
- Email, telephone, or otherwise communicate with certain people.
- Use certain words or phrases in email or on the telephone.
- Being regarded as too vehement a political dissenter in general. Political dissent is deadly dangerous in too many countries around the world, and has costs even in “free” countries. (Jacob Appelbaum is one recent US example.) To avoid such problems, there are a whole lot of things you might think twice about writing, saying, or doing, and certain people it’s definitely risky to associate with or write nice things about.
- Not being regarded as a probable loyal, hard-working, accepting employee. Think about the difficulties “over-qualified” candidates have getting hired. Then consider what might happen if employers had (accurate or otherwise) psychographic profiles estimating who was most likely to stay at a job, to accept boring job tasks or long hours, or to tolerate sub-standard pay. Then consider how wise it might be to show interest in, for example:
- Other careers.
- Certain hobbies that might be construed as leading to other careers.
- Living in other parts of the country.
- Being perceived as likely to engage in socially-unapproved sexual behavior. In the United States, certain sexual choices — even among consenting adults — could cause problems with discrimination, child custody, or divorce. Elsewhere, your choice of partner could lead to prison or even death. (I don’t know exactly which shopping choices could get one identified as a possible homosexual or philanderer … but just to be on the safe side, you might not want to download any Barbara Streisand songs. )
- Being regarded as a poor health or safety risk for employment, insurance, or more. Do you like fatty foods? Extreme sports? Night clubs? Recreational drugs? Tobacco? More than a little alcohol? Fast cars? Fast women? Evidence of any of those tastes could move you up the risk charts for heart attack, accident, marital dissolution or some other outcome that an employer or insurer wouldn’t like.
This is the second of a two-part series on the theory of information privacy. In the first post, I review the theory to date, and outline what I regard as a huge and crucial gap. In the second post, I try to fill that chasm.
The first post in this two-part series:
- Reviewed the privacy theory of the past 123 years.
- Declared it inadequate to address today’s surveillance and information privacy issues.
- Suggested a reason for its failure — the harms of privacy violation are too rarely spelled out in concrete terms, making it impractical to do even implicit cost-benefit analyses.
Actually, it’s easy to name specific harms from privacy loss. A list might start:
- Being investigated (rightly or wrongly) for a crime, with all the hassle and legal risk that ensues.
- Being discriminated against for employment, credit, or insurance.
- Being embarrassed publicly, or discriminated against socially.
- Being bullied or stalked by deplorable private-citizen acquaintances.
- Being put on the no-fly list.
I expect that few people in, say, the United States will suffer these harms in the near future, at least the more severe ones. However, the story gets worse, because we don’t know which disclosures will have which adverse effects. For example, Read more
This is the first of a two-part series on the theory of information privacy. In the first post, I review the theory to date, and outline what I regard as a huge and crucial gap. In the second post, I try to fill that chasm.
Discussion of information privacy has exploded, spurred by increasing awareness of data’s collection and use. Confusion reigns, however, for reasons such as:
- Data is often collected behind a veil of secrecy. That’s top-of-mind these days, in light of the Snowden/Greenwald revelations.
- Nobody understands all of the various technologies involved. Telecom experts don’t know a lot about data management and analysis, and vice-versa, while the political reporters don’t understand much about technology at all. I think numerous reporting errors have resulted.
- There’s no successful theory explaining when privacy should and shouldn’t be preserved. To put it quite colloquially:
- Big Brother is watching you …
- … and he’s scary.
- Privacy theory focuses on the “watching” part …
- … but the “scary” part is what really needs to be addressed.
Let’s address the last point. Read more
I’m not having a productive week, part of the reason being a hard drive crash that took out early drafts of what were to be last weekend’s blog posts. Now I’m operating from a laptop, rather than my preferred dual-monitor set-up. So please pardon me if I’m concise even by comparison to my usual standards.
- My recent posts based on surveillance news have been partly superseded by – well, by more news. Some of that news, along with some good discussion, may be found in the comment threads.
- The same goes for my recent Hadoop posts.
- The replay for my recent webinar on real-time analytics is now available. My part ran <25 minutes.
- One of my numerous clients using or considering a “real-time analytics” positioning is Sqrrl, the company behind the NoSQL DBMS Accumulo. Last month, Derrick Harris reported on a remarkable Accumulo success story – multiple US intelligence instances managing 10s of petabytes each, and supporting a variety of analytic (I think mainly query/visualization) approaches.
- Several sources have told me that MemSQL’s Zynga sale is (in part) for Membase replacement. This is noteworthy because Zynga was the original pay-for-some-of-the-development Membase customer.
- More generally, the buzz out of Couchbase is distressing. Ex-employees berate the place; job-seekers check around and then decide not to go there; rivals tell me of resumes coming out in droves. Yes, there’s always some of that, even at obviously prospering companies, but this feels like more than the inevitable low-level buzz one hears anywhere.
- I think the predictive modeling state of the art has become:
- Cluster in some way.
- Model separately on each cluster.
- And if you still want to do something that looks like a regression – linear or otherwise – then you might want to use a tool that lets you shovel training data in WITHOUT a whole lot of preparation* and receive a model back out. Even if you don’t accept that as your final model, it can at least be a great guide to feature selection (in the statistical sense of the phrase) and the like.
- Champion/challenger model testing is also a good idea, at least if you’re in some kind of personalization/recommendation space, and have enough traffic to test like that.**
- Most companies have significant turnover after being acquired, perhaps after a “golden handcuff” period. Vertica is no longer an exception.
- Speaking of my clients at HP Vertica – they’ve done a questionable job of communicating that they’re willing to price their product quite reasonably. (But at least they allowed me to write about $2K/terabyte for hardware/software combined.)
- I’m hearing a little more Amazon Redshift buzz than I expected to. Just a little.
- StreamBase was bought by TIBCO. The rumor says $40 million.
*Basic and unavoidable ETL (Extract/Transform/Load) of course excepted.
**I could call that ABC (Always Be Comparing) or ABT (Always Be Testing), but they each sound like – well, like The Glove and the Lions.