October 10, 2014

Notes on predictive modeling, October 10, 2014

As planned, I’m getting more active in predictive modeling. Anyhow …

1. I still believe most of what I said in a July, 2013 predictive modeling catch-all post. However, I haven’t heard as much subsequently about Ayasdi as I had expected to.

2. The most controversial part of that post was probably the claim:

I think the predictive modeling state of the art has become:

  • Cluster in some way.
  • Model separately on each cluster.

In particular:

3. Nutonian is now a client. I just had my first meeting with them this week. To a first approximation, they’re somewhat like KXEN (sophisticated math, non-linear models, ease of modeling, quasi-automagic feature selection), but with differences that start:

With all those possibilities, what do Nutonian models actually wind up looking like? In internet/log analysis/whatever kinds of use cases, I gather that:

Nutonian also serves real scientists, however, and their models can be all over the place.

4. One set of predictive modeling complexities goes something like this:

I pushed the Nutonian folks to brainstorm with me about why one would want to exclude variables, and quite a few kinds of reasons came up, including:

5. I’m not actually seeing much support for the theory that Julia will replace R except perhaps from Revolution Analytics, the company most identified with R. Go figure.

6. And finally, I don’t think it’s wholly sunk in among predictive modeling folks that Spark both:


9 Responses to “Notes on predictive modeling, October 10, 2014”

  1. Thomas W. Dinsmore on October 22nd, 2014 11:45 am


    While it is usually possible to build a single predictive model for a population, you are always better off segmenting first and building separate models for each segment. Since it takes more time and effort to do that, it’s a tradeoff between model accuracy and cost. Tooling that automates the process changes that tradeoff in favor of the segmented approach.

    The people who favor a single-model approach tend to be academics. The “correct” way to model segment differences is by modeling interactions and cross-effects, but these are much harder for business users to interpret than a distinct model.

    None of this is news. Segmented models have been standard practice in credit risk for twenty years.

    Re point #4, you are off by an order of magnitude. Customers I’ve worked with recently routinely model with tens of thousands of variables; a health insurer models with a half-million. High-dimension data requires different methods, techniques and algorithms.



  2. Curt Monash on October 23rd, 2014 11:56 pm

    Thomas (or anybody else):

    Suppose you segment, with a different model for each segment. Then formally that would be equivalent to a single model, which is a weighted sum of the segment models, in which

    A. The weights sum to 1.
    B. Each weight is either 0 or 1.

    Question: Wouldn’t there sometimes be a way to relax Constraint B and thereby get a better model?

  3. Thomas W. Dinsmore on October 25th, 2014 10:20 am


    Your question seems to be based on a misunderstanding. The formal equivalent of a segmented model (separate models for each segment) is a model in which there is a class variable (“segment”) and significant interaction effects between the class variable and at least one other main effect in the model.

    Suppose we want to model the academic success of people who fall into two groups, nerds and jocks. We have a single predictor, hours of study, and we want to model the effect that jocks have to spend more time studying than nerds to get the same grades.

    We can segment the population and build separate models for jocks and nerds. Alternatively, we can build a single model with one continuous main effect (“hours of study”) one class variable (“jock/nerd”) and an interaction effect between “jock/nerd” and “hours of study”.

    Both approaches will produce similar predictions, and both will outperform a single model that includes “hours of study” alone.

    Of course, in a simple example like this the single model is fine. In a a commercial application, the number of possible interactions among main effects is very large.

    Also, try explaining “interaction effect” to a CFO.

    There are other practical reasons to segment first that are out of scope for a single comment.



  4. Notes on predictive modeling, November 2, 2014 | DBMS 2 : DataBase Management System Services on November 2nd, 2014 6:49 am

    […] up on my notes on predictive modeling post from three weeks ago, I’d like to tackle some areas of recurring […]

  5. Where the innovation is | DBMS 2 : DataBase Management System Services on January 19th, 2015 3:28 am

    […] (more) complex models that are at once more accurate and more easily arrived at than (nearly) linear […]

  6. Notes on machine-generated data, year-end 2014 | DBMS 2 : DataBase Management System Services on September 23rd, 2015 11:26 am

    […] Nutonian has some innovative ideas in non-linear modeling for pattern detection/root-cause analysis. […]

  7. Analyzing the right data | DBMS 2 : DataBase Management System Services on April 13th, 2017 8:05 am

    […] I wrote 2 1/2 years ago that sophisticated predictive modeling commonly fit the […]

  8. Analyzing the right data | DBMS 2 : DataBase Management System Services – Iot Portal on April 13th, 2017 10:01 am

    […] I wrote 2 1/2 years ago that sophisticated predictive modeling commonly fit the […]

  9. Analyzing the right data | DBMS 2 : DataBase Management System Services – Cloud Data Architect on April 14th, 2017 1:23 am

    […] I wrote 2 1/2 years ago that sophisticated predictive modeling commonly fit the […]

Leave a Reply

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.