July 31, 2012

Integrating statistical analysis into business intelligence

Business intelligence tools have been around for two decades.* In that time, many people have had the idea of integrating statistical analysis into classical BI. Yet I can’t think of a single example that was more than a small, niche success.

*Or four decades, if you count predecessor technologies.

The first challenge, I think, lies in the paradigm. Three choices that come to mind are:

But the first of those approaches requires too much intelligence from the software, while the third requires too much numeracy from the users. So only the second option has a reasonable chance to work, and even that one may be hard to pull off unless vendors focus on one vertical market at a time.

The challenges in full automation start:

Perhaps someday those problems will be solved in full generality … but not soon.

On the other hand, just dumping statistical software onto everybody’s desk won’t work either, even if the software is something like KXEN. Some people are numerate enough to make good use of such capabilities, but many are not.

What that leaves us with is semi-automation. The template I envision is:

At least that much human intervention will long be necessary.

Application areas where it might be easy to guess an objective function include:

Even so, there are many cases where humans will need to at least tweak the objective function.

Choosing the data subset is also tricky. A good first approximation might be a query result set the human user recently looked at. But which one? Surely she’s bounced around and done some drilling down, so at least you need to give her a UI for rewinding a bit. She might also want to specify somewhat different dimensions — or ranges of dimensional values — for the predictive analysis than were used for the query that set the inquiry off.

And finally — how vertical should this functionality be? My first inclination is to say “Very”. Consider again the application scenarios I mentioned above. If we know that the fundamental issue is likely to be “Campaign response” or “Sales vs. comparables” or “Defect rate” or “Late payments”, in each case we can envision what the heuristics and user interfaces might be like. But otherwise, where do we even start?

Related link

Comments

13 Responses to “Integrating statistical analysis into business intelligence”

  1. Mark on July 31st, 2012 4:40 am

    Interesting post Curt, I think you also need to consider the experimental design as part of the overall statistical analysis.

    It is often the correct choice of experimental design that enables the ‘learn’ in ‘test & learn’…

  2. Curt Monash on July 31st, 2012 7:41 am

    Mark,

    I was focusing on adoption even more mass than what you’re probably thinking of.

    Otherwise, that’s an excellent point!

  3. Larry Dooley on July 31st, 2012 3:36 pm

    Curt one place this makes sense is in Financial Services – lots of stats and modeling. Just look at options – there’s a set model of how to estimate the correct price based on the underlying etc.

    A number of the vertical software products will calculate this kind of stuff. Of course where they fall down is one the basic reporting believe it or not. They can calculate the Value at Risk for security but get a reasonable report that includes this not a lot of fun (of course they roll their own reporting language)

  4. Alex Elkin on July 31st, 2012 6:10 pm

    While the stat analysis in vertical solution is easier for all reasons you’ve mentioned, the trouble is that the vertical app is limited to its own data. An analysis is only valuable if it tells you something you don’t know already and the most interesting “discoveries” happen when you mix different datasets. For example, combining CRM data with Accounting and Census.

    Second “trap” for laymen is mixing positive correlation and cause-effect. For example, married women earn less than divorced (my intern discovered this today). Is one affect another? Is there a third factor? If we want to give the power to regular users, the software should be able to identify or help to identify the root cause.

  5. Anupam Singh on July 31st, 2012 6:14 pm

    @curt: agreed with the conclusion that the functionality should be vertical. The model becomes valuable to a business user when it is customized for that particular business and vertical. The deep, granular model is used by the BI user to answer very pointed business questions. Each question might require further transformation of the model. For instance, the same marketing mix model could be deployed in an application to execute what-if scenarios like profit maximization and budget reduction. It could also be used to correctly attribute revenues for historical marketing spend. In our experience, the key automation challenge is not about generating a single, all knowing model. Instead, the automation challenge is to build a modeling environment that applies business specific heuristics to build, validate and deploy a highly granular and customized model.

  6. Thomas W Dinsmore on August 1st, 2012 8:21 am

    Practicing analysts can tell you that domain experience is essential to success — a top-notch SEO analyst can’t walk into the actuarial department and have instant credibility. The analytic methods used aren’t the same, and domain knowledge enables analysts to distinguish useful insights from bullhockey.

    Since the analytic methods differ from niche to niche, so does the analytic software. Geneticists like to use ASREML and actuaries like to use Emblem. Hey, they both do GLM, so we can save money if they share software, right? Wrong. The differences are small, but they mean a lot to the users, and good quants have more credibility with the business than anyone in IT.

    It makes more sense to add BI capabilities to statistical tools than the other way around. BI users rarely care who makes the software as long as it looks pretty. Analytics users, on the other hand, care a lot about how the math is done inside, and whether or not the vendor has street cred — because they know t’s a lot harder to build a statistical algorithm correctly than it is to present a pretty chart of sales by region.

    But why try to mosh the software together at all? The output of analytic software is data — predictions, patterns, relationships and trends — which you can push into anything you like for reporting and visualization.

    TD

  7. Curt Monash on August 1st, 2012 11:34 am

    Thomas,

    Some enterprises have full-fledged departments for statistical analysis or predictive modeling, well-enough staffed to meet all such needs across all areas of the business. For them, I think your comment makes tons of sense.

    But many enterprises can’t afford that luxury. For them, I think the kinds of approaches I’m talking about could be a whole lot better than nothing.

  8. Brian Hoover on August 1st, 2012 3:38 pm

    An admirable goal. I think the place to start is with the recognition that the solution will be relative to the functions and ability of the user. By way of example – the check engine light in the car is a great of example of exception notification for the average “user”. The diagnostics performed at the service station that advises the replacement of a part is somewhat more sophisticated “intelligence” function, but doesn’t include any real investigative analytics. The results of the service station analysis if provided in bulk to the warranty analyst is fodder for lots of correlation studies, and it’s here that “canned” statistics provide some understanding of the impact to the business. The engineer who gets all the data related to failures including environmental and usage data is probably the best candidate for a bi overview/statistical drill down/statistical modelling with AI overtones capabilities to highlight areas for research and to suggest alternatives to consider in designing the next generation of the part.

    In most cases the inclusion of more sophisticated analysis will have to be tailored because the capability to do analysis with today’s computing power is unbounded and the time to get a job done in a day is bounded. Tailoring the solution will require professionals who are experts in analytics and experts in the subject matter.

    So – most BI users are best served by the check engine light. However, there is a market under served for the users who are capable of understanding and acting on statistical analysis and who have the requisite subject matter expertise. We are getting there 🙂

  9. Nitin Borwanar on August 2nd, 2012 12:47 am

    Hi Curt,

    A joint project between EMC/Greenplum, UCBerkeley and other universities, called MADlib, embeds statistical and numerical methods in Postgres and Greenplum. This allows in-database statistical explorations to be done at small and large scale. A project worth tracking, IMHO.

    Nitin

  10. Curt Monash on August 2nd, 2012 5:46 pm

    Hi Nitin,

    I’ve written a great deal about statistica/DBMS integration here in the past. Look around! I’ve written somewhat less about MADlib, however, because I so hated the original MAD skills paper, because I haven’t written that much about Greenplum recently, and because MADlib was still something of a joke back when I still was talking with Greenplum.

  11. People’s facility with statistics — extremely difficult to predict | DBMS 2 : DataBase Management System Services on August 6th, 2012 1:11 am

    […] recent post on broadening the usefulness of statistics presupposed two things about the statistical sophistication of business intelligence tool […]

  12. What matters in investigative analytics? | DBMS 2 : DataBase Management System Services on October 7th, 2013 1:24 am

    […] an agility standpoint, the integration of predictive modeling into business intelligence would seem like pure goodness. Unfortunately, the most natural ways to do such integration would […]

  13. Notes and links, December 12, 2014 | DBMS 2 : DataBase Management System Services on December 12th, 2014 6:05 am

    […] A couple years ago I wrote skeptically about integrating predictive modeling and business intelligence. I’m less skeptical […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.