July 12, 2013

More notes on predictive modeling

My July 2 comments on predictive modeling were far from my best work. Let’s try again.

1. Predictive analytics has two very different aspects.

Developing models, aka “modeling”:

More precisely, some modeling algorithms are straightforward to parallelize and/or integrate into RDBMS, but many are not.

Using models, most commonly:

2. Some people think that all a modeler needs are a few basic algorithms. (That’s why, for example, analytic RDBMS vendors are proud of integrating a few specific modeling routines.) Other people think that’s ridiculous. Depending on use case, either group can be right.

3. If adoption of DBMS-integrated modeling is high, I haven’t noticed.

4. The term predictive analytics was invented or at least popularized by SPSS, some years before IBM bought the company. The industry eventually adopted the term. I prefer predictive modeling. It is fair to say that predictive modeling subsumes both statistical modeling and machine learning.

Nobody really knows exactly what data mining does or doesn’t include — the term is a poster child for Monash’s Third Law — but whatever it is, it seems central to the SAS and SPSS product lines. Simply using “data mining” as a synomyn for “predictive modeling” won’t lead you too far astray.

5. In that July 2 post I wrote:

I think the predictive modeling state of the art has become:

  • Cluster in some way.
  • Model separately on each cluster.

“Cluster in some way” can actually mean several things, for example:

The one thing it doesn’t mean is “scale out”, and I apologize for the ambiguity to whoever read it the wrong way.

This is often called ensemble modeling, except that I think — what a shock! — different people use the term somewhat differently.

6. Much of the difficulty and delay-to-value in predictive modeling comes from data preparation/feature selection — not so much the scripting of the ETL (Extract/Transform/Load), but rather choices about which variables to model on and, often, how to describe them. So it’s unsurprising that vendors sometimes tell me “Our tool is great because the data preparation is automagically handled”; I’ve heard that from companies as big as KXEN and as small as Simularity.

Typically, what’s going on is that they’ve come up with a particular approach to modeling that, among other virtues, has the short-time-to-value benefit. Well:

I think some KXEN users follow just such an approach.

7. I’ve spent a few hours talking with Ayasdi, and I’m still confused. But here are a few notes as best I understand things.

Company basics include:

Buzz says Ayasdi has a heavy component of professional service in what it does. Ayasdi disputes this. Buzz also says Ayasdi is hot. I doubt Ayasdi disputes that part. :)

There’s some serious math involved in Ayasdi, but I’m skeptical about that aspect, for several reasons:

So I’ll just summarize Ayasdi’s math, as best I understand it, this way:

8. I’m hearing a few more mentions of Mahout than I used to.

9. Skytree is accumulating some resources (money, people), but I haven’t talked with them.

Comments

3 Responses to “More notes on predictive modeling”

  1. Trends in predictive modeling | DBMS 2 : DataBase Management System Services on September 20th, 2013 8:10 am

    [...] of perceived demand for in-database modeling I’ve heard. (Compare Point #3 of my July predictive modeling post.) And fits with what I’ve been hearing about [...]

  2. What matters in investigative analytics? | DBMS 2 : DataBase Management System Services on October 6th, 2013 8:10 am

    [...] noted in July that complex, multi-stage predictive modeling is increasingly in vogue. Well, if predictive modeling is much more complicated than before, then [...]

  3. Notes on predictive modeling, October 10, 2014 | DBMS 2 : DataBase Management System Services on October 10th, 2014 4:40 am

    […] I still believe most of what I said in a July, 2013 predictive modeling catch-all post. However, I haven’t heard as much subsequently about Ayasdi as I had expected […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.