Last November, I wrote two posts on agile predictive analytics. It’s time to return to the subject. I’m used to KXEN talking about the ability to do predictive modeling, very quickly, perhaps without professional statisticians; that the core of what KXEN does. But I was surprised when Revolution Analytics told me a similar story, based on a different approach, because ordinarily that’s not how R is used at all.
Ultimately, there seem to be three reasons why you’d want quick turnaround on your predictive modeling:
- You want to change your models quickly as your world changes (your products change, your competition does different things, you customers have different interests, etc.).
- You like the general benefits of agility (faster, cheaper, more responsive). Or in particular, …
- … you want to model differently for different segments of your relevant universe (sets of customers, sets of products, sets of financial securities, etc.), so any one model had better be easy to build.
A KXEN story along these lines might go:
- A retailer has 100s or 1000s of stores, each of which sell a lot of items.
- A single model that covered all the stores would be horrifically complex.
- Better to run a separate model for each store.
The point here is KXEN automates some modeling steps that are manual with most other tools, allowing each individual model to be done more quickly.
One production Revolution use case goes:
- A large stockbroker has 100s of equities traders.
- At any time, what a trader wants to model might be governed by a particular customer’s interests — what exactly their objective function is, which particular stocks they want to look at, etc.
- An app was built to let traders re-run the models fresh each time, with a convenient UI that allows parameterized inputs of ticker symbols and risk objectives.
R (in Revolution’s version or any other that I know of) doesn’t have KXEN’s general quick-modeling features, and perhaps not even those of SAS or SPSS. But building a specific parameterized app is obviously a workaround for that lack.
That said, there are indeed a lot of cases where you often need to re-run your models from scratch, whether through convenient technology or by throwing lots of bodies at the problem. Suppose, for example, you’re doing some kind of marketing campaign management for a telecom service provider. Potential changes to your data, or to its interpretation, include:
- Your service plan changes.
- Your competitors’ service plans change.
- You or your competitor embarks on a major new advertising campaign.
- New hardware comes out.
- New hardware doesn’t come out for a little while, and the market shifts away from early adopters.
- You figure out a better way of explaining things to a confused subset of your customers, happily changing their perceptions.
- You do some clever analysis, and subcategorize what you’d previously regarded as one homogeneous set of consumers.
- You change your website a little bit, and hence have new kinds of clickstream data.
- You improve efficiency in your call center, and hence have different kinds of interactions with callers.
- You run a new marketing program, and hence have new kinds of response data.
- You up your text analytics or social media game somewhat, and hence have new kinds of sentiment or affinity data.
Any of these changes (and that’s hardly a complete list) could invalidate your existing models, or otherwise make it advantageous for you to run new ones.
Of course, “from scratch” is not necessarily entirely from scratch; while each model may be new, the underlying database is likely to change more slowly. It’s hard to do quick-turnaround predictive modeling unless you start out with a database that’s in good shape — even when one of the reasons for the quick turnaround need is that you keep adding new kinds data.
One last note — little of this is in the vein “BI has told us something interesting; now let’s start modeling.” The step from operational/monitoring business intelligence to drilldown/investigative BI happens all the time, but I’m not aware of many cases (yet) where there’s a follow-on step of quick-turnaround predictive modeling. Even when modeling is done quickly, it seems to be proactive much more than reactive — or if it is reactive, it’s reactive to big news (stock market crash, natural disaster, whatever) rather than to, say, a few surprising sales results.
The time may (and should) come when iterative investigate BI and iterative predictive analytics go hand-in-hand, but — presumably with a few exceptions I’m overlooking — that natural-seeming synergy doesn’t seem to be exploited much today.