August 4, 2013

Data model churn

Perhaps we should remind ourselves of the many ways data models can be caused to churn. Here are some examples that are top-of-mind for me. They do overlap a lot — and the whole discussion overlaps with my post about schema complexity last January, and more generally with what I’ve written about dynamic schemas for the past several years..

Just to confuse things further — some of these examples show the importance of RDBMS, while others highlight the relational model’s limitations.

The old standbys

Product and service changes. Simple changes to your product line many not require any changes to the databases recording their production and sale. More complex product changes, however, probably will.

A big help in MCI’s rise in the 1980s was its new Friends and Family service offering. AT&T couldn’t respond quickly, because it couldn’t get the programming done, where by “programming” I mainly mean database integration and design. If all that was before your time, this link seems like a fairly contemporaneous case study.

Organizational changes. A common source of hassle, especially around databases that support business intelligence or planning/budgeting, is organizational change. Kalido’s whole business was based on accommodating that, last I checked, as were a lot of BI consultants’. Read more

July 31, 2013

“Disruption” in the software industry

I lampoon the word “disruptive” for being badly overused. On the other hand, I often refer to the concept myself. Perhaps I should clarify. 🙂

You probably know that the modern concept of disruption comes from Clayton Christensen, specifically in The Innovator’s Dilemma and its sequel, The Innovator’s Solution. The basic ideas are:

In response (this is the Innovator’s Solution part):

But not all cleverness is “disruption”.

Here are some of the examples that make me think of the whole subject. Read more

July 29, 2013

What our legislators should do about privacy (and aren’t)

I’ve been harping on the grave dangers of surveillance and privacy intrusion. Clearly, something must be done to rein them in. But what?

Well, let’s look at an older and better-understood subject — governmental use of force. Governments, by their very nature, possess tools for tyranny: armies, police forces, and so on. So how do we avoid tyranny? We limit what government is allowed to do with those tools, and we teach our citizens — especially those who serve in government — to obey and enforce the limits.

Those limits can be lumped into two categories:

The story is similar for surveillance technology:

But there’s a big difference between the cases of physical force and surveillance.

Read more

July 29, 2013

Very chilling effects

I’ve worried for years about a terrible and under-appreciated danger of privacy intrusion, which in a recent post I characterized as a chilling effect upon the exercise of ordinary freedoms. When government — or an organization such as your employer, your insurer, etc. — watches you closely, it can be dangerous to deviate from the norm. Even the slightest non-conformity could have serious consequences. I wish that were an exaggeration; let’s explore why it isn’t.

Possible difficulties — most of them a little bit futuristic — include:

Read more

July 23, 2013

Investigative analytics and untrusted code — a quick note

This is probably a good time to disclose that I own a chunk of founders’ stock — no, I didn’t pay cash for it — in LiteStack, the start-up sponsoring ZeroVM.

Jordan Novet posted a survey of Hadoop security, and evidently Merv Adrian is making a big deal about the subject as well. But there’s one point I rarely see mentioned which, come to think of it, could apply to relational analytic platforms as well.

A big use of Hadoop and analytic platforms alike is investigative analytics, and specifically experimentation via hastily-written code. But untrusted code can, at least in theory, compromise the security of the servers it runs on. And when you run the code on the same servers that manage the data, that could compromise the security of your database as well.

Frankly, in most use cases I doubt this is a big deal. Process isolation would probably avert most “accidental attacks”, and a deliberate attack might be hard to pull off in a reliable manner. As for database corruption, also a theoretical danger via the same vector — that danger is much smaller than the risk of bad code being submitted by well-intentioned doofuses.

Still, I’d like to see a forthright discussion of this threat.

July 20, 2013

The refactoring of everything

I’ll start with three observations:

As written, that’s probably pretty obvious. Even so, it’s easy to forget just how pervasive the refactoring is and is likely to be. Let’s survey some examples first, and then speculate about consequences. Read more

July 12, 2013

More notes on predictive modeling

My July 2 comments on predictive modeling were far from my best work. Let’s try again.

1. Predictive analytics has two very different aspects.

Developing models, aka “modeling”:

More precisely, some modeling algorithms are straightforward to parallelize and/or integrate into RDBMS, but many are not.

Using models, most commonly:

2. Some people think that all a modeler needs are a few basic algorithms. (That’s why, for example, analytic RDBMS vendors are proud of integrating a few specific modeling routines.) Other people think that’s ridiculous. Depending on use case, either group can be right.

3. If adoption of DBMS-integrated modeling is high, I haven’t noticed.

Read more

July 8, 2013

Privacy and data use — the problem of chilling effects

This is the second of a two-part series on the theory of information privacy. In the first post, I review the theory to date, and outline what I regard as a huge and crucial gap. In the second post, I try to fill that chasm.

The first post in this two-part series:

Actually, it’s easy to name specific harms from privacy loss. A list might start:

I expect that few people in, say, the United States will suffer these harms in the near future, at least the more severe ones. However, the story gets worse, because we don’t know which disclosures will have which adverse effects. For example, Read more

July 8, 2013

Privacy and data use — a gap in the theory

This is the first of a two-part series on the theory of information privacy. In the first post, I review the theory to date, and outline what I regard as a huge and crucial gap. In the second post, I try to fill that chasm.

Discussion of information privacy has exploded, spurred by increasing awareness of data’s collection and use. Confusion reigns, however, for reasons such as:

Let’s address the last point.  Read more

July 2, 2013

Notes and comments, July 2, 2013

I’m not having a productive week, part of the reason being a hard drive crash that took out early drafts of what were to be last weekend’s blog posts. Now I’m operating from a laptop, rather than my preferred dual-monitor set-up. So please pardon me if I’m concise even by comparison to my usual standards.

*Basic and unavoidable ETL (Extract/Transform/Load) of course excepted.

**I could call that ABC (Always Be Comparing) or ABT (Always Be Testing), but they each sound like – well, like The Glove and the Lions.

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.