Business intelligence tools have been around for two decades.* In that time, many people have had the idea of integrating statistical analysis into classical BI. Yet I can’t think of a single example that was more than a small, niche success.
*Or four decades, if you count predecessor technologies.
The first challenge, I think, lies in the paradigm. Three choices that come to mind are:
- “I think I may have discovered a new trend or pattern. Tell me, Mr. BI — is the pattern real?”
- “I think I see a trend. Help me, Mr. BI, to launch a statistical analysis.”
- “If you want to do some statistical analysis, here’s a link to fire it up.”
But the first of those approaches requires too much intelligence from the software, while the third requires too much numeracy from the users. So only the second option has a reasonable chance to work, and even that one may be hard to pull off unless vendors focus on one vertical market at a time.
The challenges in full automation start:
- The software may not be able to reliably deduce:
- Exactly which hypothesis to test …
- … against exactly which data set.
- The software may not be able to reliably adjust for differences over time in areas such as:
- Your marketing choices and product cycles.
- Your competitors’ marketing choices and product cycles.
- Exogenous economic factors.
Perhaps someday those problems will be solved in full generality … but not soon.
On the other hand, just dumping statistical software onto everybody’s desk won’t work either, even if the software is something like KXEN. Some people are numerate enough to make good use of such capabilities, but many are not.
What that leaves us with is semi-automation. The template I envision is:
- The software uses heuristics to come up with one or a few guesses for the objective variable or function. A human BI user makes the final decision.
- The software invokes a pre-identified choice of data set from which to pursue any particular objective. A human data architect or statistician had previously set this up in the first place.
- The software invokes pre-identified time adjustments for any particular analysis, previously set up by a human data architect or statistician, subject to final approval by a human BI user.
- The software uses heuristics based on the actual BI query to guess at any relevant parameters identifying the specific data subset to consider. A human BI user makes the final decision.
At least that much human intervention will long be necessary.
Application areas where it might be easy to guess an objective function include:
- Many kinds of CRM. You probably want to know about sales levels, response rates, or something like that.
- Quality. You probably want to know about something like a defect rate.
- Accounting. In some contexts, it’s clear that you want to know about the incidence of something unwelcome, like bad debts or product returns.
Even so, there are many cases where humans will need to at least tweak the objective function.
- Do you want to measure a simple count of new customers, or do you want to weight them by expected lifetime value?
- Which intermediate anomaly is most crucial to your defect tracking?
- How far in arrears does a debt have to be to be “bad”?
Choosing the data subset is also tricky. A good first approximation might be a query result set the human user recently looked at. But which one? Surely she’s bounced around and done some drilling down, so at least you need to give her a UI for rewinding a bit. She might also want to specify somewhat different dimensions — or ranges of dimensional values — for the predictive analysis than were used for the query that set the inquiry off.
And finally — how vertical should this functionality be? My first inclination is to say “Very”. Consider again the application scenarios I mentioned above. If we know that the fundamental issue is likely to be “Campaign response” or “Sales vs. comparables” or “Defect rate” or “Late payments”, in each case we can envision what the heuristics and user interfaces might be like. But otherwise, where do we even start?
- My definition of investigative analytics, wherein I said it’s all about discovering unknown patterns, as opposed to monitoring for known ones.