I’m hearing a lot these days about agile predictive analytics, albeit rarely in those exact terms. The general idea is unassailable, in that it boils down to using data as quickly as reasonably possible. But discussing particulars is hard, for several reasons:
- Pundits tend to sketch castles in the air.
- Vendors tend to confuse part of the story — generally the part they happen to offer — with the whole.
- Different use cases give rise to different kinds of issues.
At least three of the generic arguments for agility apply to predictive analytics:
- Doing the correct thing soon is usually better than doing the same correct thing later.
- If it doesn’t take much time to do something, hopefully it doesn’t take that much expense (labor and so on) either.
- It’s hard to get new stuff completely right on the first try. Often, the best strategy is to come close fast, then fix what’s still not ideal.
But the reasons to want agile predictive analytics don’t stop there.
Not only is it hard to get stuff right on the first try for a given information set, but the available information can also change quickly. For example:
- If you’re a consumer marketer, consumer tastes can change quickly, due to news (of many different kinds), seasonal trends, and so on. The most recent data you have contain information unavailable in your historical data sets. Also …
- … if you change your offers, prices, ad placement, ad text, ad appearance, call center scripts, or anything else, you immediately gain new information that isn’t well-reflected in your previous models.
- If you’re in capital markets, and you figure something out, probably so will rival investors. So whatever you knew three weeks ago may already be partially obsolete.
What’s more, often you deliberately don’t want to test, model, or tune all your variables at once. First you determine whether the ad text should be “Would you be so kind as to allow us to supply you with our wares?” or “Buy it, dude!”; only afterwards do you decide whether the color scheme should rely on red or green.
With that as backdrop, how can you make your predictive analytics more agile? Let’s start by breaking predictive analytics into four pieces:
- Data mustering for the analysts.
- Actual analysis.
- Data mustering for deployment.
Only the second of those has much excuse for being an agility bottleneck; the other three are well addressed by technology you can buy (or straightforwardly build) today.
The deployment part of the story can be pretty simple, at least technically — spit out some PMML (Predictive Modeling Markup Language), and if you’re deploying to a DBMS with good enough PMML support, you’re good to go. Any vendor who doesn’t offer that degree of simplicity had better be working toward it fast. That said, your applications that are infused with predictive analytics need to be modular enough to accommodate model changes; if not, some refactoring lies ahead. And the same can be said for the work processes that surround them.
The data mustering parts should be pretty straightforward too. Setting up a relational data mart tuned for investigative analytics isn’t all that hard or costly (perhaps unless your database is enormously large), and the same actually goes for a Hadoop cluster. Beyond that, if you can model and deploy from the same database, that’s great; if not, you have an ETL (Extract/Transform/Load) need. I guess you could have data quality/MDM (Master Data Management) issues as well, but offhand I’m not seeing why you wouldn’t push their solutions back to analysis time. And any decent analytic technology stack can give sub-hour latency; while that may not suffice from all standpoints, it’s plenty fast enough for analysis-time agility.
With those preliminaries out of the way, now let’s turn to the heart of the agile predictive analytics challenge.