Notes on predictive modeling, October 10, 2014
As planned, I’m getting more active in predictive modeling. Anyhow …
1. I still believe most of what I said in a July, 2013 predictive modeling catch-all post. However, I haven’t heard as much subsequently about Ayasdi as I had expected to.
2. The most controversial part of that post was probably the claim:
I think the predictive modeling state of the art has become:
- Cluster in some way.
- Model separately on each cluster.
In particular:
- It is always possible to instead go with a single model formally.
- A lot of people think accuracy, ease-of-use, or both are better served by a true single-model approach.
- Conversely, if you have a single model that’s pretty good, it’s natural to look at the subset of the data for which it works poorly and examine that first. Voila! You’ve just done a kind of clustering.
3. Nutonian is now a client. I just had my first meeting with them this week. To a first approximation, they’re somewhat like KXEN (sophisticated math, non-linear models, ease of modeling, quasi-automagic feature selection), but with differences that start:
- While KXEN was distinguished by how limited its choice of model templates was, Nutonian is distinguished by its remarkable breadth. Is the best model for your data a quadratic polynomial in which some of the terms are trigonometric functions? Nutonian is happy to find that for you.
- Nutonian is starting out as a SaaS (Software as a Service) vendor.
- A big part of Nutonian’s goal is to find a simple/parsimonious model, because — although this is my phrasing rather than theirs — the simpler the model, the more likely it is to have robust explanatory power.
With all those possibilities, what do Nutonian models actually wind up looking like? In internet/log analysis/whatever kinds of use cases, I gather that:
- The model is likely to be a polynomial — of multiple variables of course — of degree no more than 3 or 4.
- Variables can have time delays built into them (e.g., sales today depend on email sent 2 weeks ago). Indeed, some of Nutonian’s flashiest early modeling successes seem to be based around the ease with which they capture time-delayed causality.
- In each monomial, all variables except 1 are likely to be “control”/”capping”/”transition-point”/”on-off switch”/logical/conditional/whatever variables — i.e., variables whose range is likely to be either {0,1} or perhaps [0,1] instead.
Nutonian also serves real scientists, however, and their models can be all over the place.
4. One set of predictive modeling complexities goes something like this:
- A modeling exercise may have 100s or 1000s of potential variables to work with. (For simplicity, think of a potential variable as a column or field in the input data.)
- The winning models are likely to use only a small fraction of these variables.
- Those may not be variables you’re thrilled about using.
- Fortunately, many variables have strong covariances with each other, so it’s often possible to exclude your disfavored variables and come out with a model almost as good.
I pushed the Nutonian folks to brainstorm with me about why one would want to exclude variables, and quite a few kinds of reasons came up, including:
- (My top example.) Regulatory compliance may force you to exclude certain variables. E.g., credit scores in the US mustn’t be based on race.
- (Their top example.) Some data is just expensive to get. E.g., a life insurer would like to come up with a way to avoid using blood test results in their decision making, because they’d like to drop the expense of the blood tests.
- (Perhaps our joint other top example.) Clarity of explanation is an important goal. Some models are black boxes, and that’s that. Others are also supposed to uncover causality that helps humans make all kinds of better decision. Regulators may also want clear models. Note: Model clarity can be affected by model structure and variable(s) choice alike.
- Certain variables can simply be more or less trusted, in terms of the accuracy of the data.
- Certain variables can be more or less certain to be available in the future. However, I wonder how big a concern that is in a world where models are frequently retrained anyway.
5. I’m not actually seeing much support for the theory that Julia will replace R except perhaps from Revolution Analytics, the company most identified with R. Go figure.
6. And finally, I don’t think it’s wholly sunk in among predictive modeling folks that Spark both:
- Has great momentum.
- Was designed with machine learning in mind.
Comments
9 Responses to “Notes on predictive modeling, October 10, 2014”
Leave a Reply
Curt,
While it is usually possible to build a single predictive model for a population, you are always better off segmenting first and building separate models for each segment. Since it takes more time and effort to do that, it’s a tradeoff between model accuracy and cost. Tooling that automates the process changes that tradeoff in favor of the segmented approach.
The people who favor a single-model approach tend to be academics. The “correct” way to model segment differences is by modeling interactions and cross-effects, but these are much harder for business users to interpret than a distinct model.
None of this is news. Segmented models have been standard practice in credit risk for twenty years.
Re point #4, you are off by an order of magnitude. Customers I’ve worked with recently routinely model with tens of thousands of variables; a health insurer models with a half-million. High-dimension data requires different methods, techniques and algorithms.
Regards,
Thomas
Thomas (or anybody else):
Suppose you segment, with a different model for each segment. Then formally that would be equivalent to a single model, which is a weighted sum of the segment models, in which
A. The weights sum to 1.
B. Each weight is either 0 or 1.
Question: Wouldn’t there sometimes be a way to relax Constraint B and thereby get a better model?
Curt,
Your question seems to be based on a misunderstanding. The formal equivalent of a segmented model (separate models for each segment) is a model in which there is a class variable (“segment”) and significant interaction effects between the class variable and at least one other main effect in the model.
Suppose we want to model the academic success of people who fall into two groups, nerds and jocks. We have a single predictor, hours of study, and we want to model the effect that jocks have to spend more time studying than nerds to get the same grades.
We can segment the population and build separate models for jocks and nerds. Alternatively, we can build a single model with one continuous main effect (“hours of study”) one class variable (“jock/nerd”) and an interaction effect between “jock/nerd” and “hours of study”.
Both approaches will produce similar predictions, and both will outperform a single model that includes “hours of study” alone.
Of course, in a simple example like this the single model is fine. In a a commercial application, the number of possible interactions among main effects is very large.
Also, try explaining “interaction effect” to a CFO.
There are other practical reasons to segment first that are out of scope for a single comment.
Regards,
Thomas
[…] up on my notes on predictive modeling post from three weeks ago, I’d like to tackle some areas of recurring […]
[…] (more) complex models that are at once more accurate and more easily arrived at than (nearly) linear […]
[…] Nutonian has some innovative ideas in non-linear modeling for pattern detection/root-cause analysis. […]
[…] I wrote 2 1/2 years ago that sophisticated predictive modeling commonly fit the […]
[…] I wrote 2 1/2 years ago that sophisticated predictive modeling commonly fit the […]
[…] I wrote 2 1/2 years ago that sophisticated predictive modeling commonly fit the […]