# People’s facility with statistics — extremely difficult to predict

My recent post on broadening the usefulness of statistics presupposed two things about the statistical sophistication of business intelligence tool users:

- It varies a lot.
- In many cases, it isn’t be very high.

Let me now say a little more on the subject. My basic message is — **people’s facility with statistics is extremely difficult to predict.**

*If you DO have to make a point estimate, however, you could do worse than just putting quotation marks around the last four words of that sentence …
*

Suppose we measure people’s statistical understanding on a 5-point scale:

- People who haven’t clue what a p-value is.
- People who think a p-value of .05 signifies a 95% chance of truth.
- People who know better than that, but who still think that “statistically significant” is pretty close to the same as “true”.
- People who know better yet, but aren’t fluent in using statistical techniques correctly.
- People who are fluent in statistics.

Just knowing somebody’s job description, can you confidently predict their ranking to within, say, +/- 1 point? I suggest you can’t. People differ wildly in general numeracy and in specific statistical knowledge.

Even our guesses about average knowledge may be off, not least because education is changing things. I got to graduate school without even knowing what a conditional probability was;* now a whole generation of kids is growing up with option of taking AP Statistics. On the flip side, a long list of recent studies suggests that research scientists, physicians, et al. are less clueful about statistics than we might have thought. A quick googling on *statistical errors by scientists* turned up:

- Several stories about a paper uncovering a particular, frequent error in published neuroscience papers.
- A list of 20 common errors in biomedical papers.
- Another paper about common errors in medical research.
- An Elsevier guide to common errors reviewers might find in submitted papers.

No wonder that a large fraction of medical research can’t be reproduced.**

**3 years later I had taught a low-end college course in statistics and written a PhD thesis on game theory …but let’s not over-generalize from that part of the story. Anyhow, these days my ranking would be somewhere in the 4 range.*

***Another reason might be HeLa cell contamination, but I digress.*

*Related link*

*Subscribe to our complete feed!*

### Comments

**4 Responses to “People’s facility with statistics — extremely difficult to predict”**

**Leave a Reply**

I’d state it a bit differently. People’s facility with statistics is easy to predict: virtually nill. Even statisticians pay the math tax (gambling). Quantifying the facility and the correct application of it is difficult (viz. all the papers you linked). But you can make a lot of money empirically predicting the behavior (actuaries for casinos and insurance), or at least claiming you can (every Facebook wannabe, online gambling, MMF and other scams).

I’m proud to say, my kid was the only sophomore in his AP statistics class, and he scored the highest. Unfortunately, secondary schools don’t know what to do with the high side of the tail of the bell curve. Fortunately, people that smart usually see the problem and work around it. Experience is still required to put it into a street-smart context.

As far as creating product to serve BI markets, there’s always going to be a mismatch between user numeracy and product design. I think the real problem is there is no deterministic way to feed this back to the product design, and certainly no financial incentive.

There are serious disincentives to the application of statistics in business and science alike. Statistical inference imposes rules and standards for knowledge that are independent of what we wish to be true. But an advertising agency wishes to be able to tell the client that all of the campaigns are a success, and a researcher wishes to tell the funding agency that the experiment produced results.

Early in my career I worked for a survey research firm, an industry that largely depends on the innumeracy of its clients.

On average, the level of statistical sophistication in an analytic field depends on the stakes. Credit risk analysts tend to be more rigorous than marketing analysts, because a bad credit risk model can kill the firm, while a bad marketing model means too much spam.

One would think the standards would be higher in science, but are they really? Much of what passes for scientific publication is little more than a show for academic career-building. Given the overall low standards and grade inflation in that world, it is hardly surprising that much of the work product is suspect.

I suggest the book “Thinking, Fast and Slow” by the Nobel Prize winner in Economics Daniel Kahneman (see http://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow).

Kahneman points out that statistical thinking requires significant effort and provides several amusing tests to demonstrate how the reader, and then statistically trained persons will make poor decisions when confronted with simple statistical problems.

I come down on the side that suggests that, like the knowledge workers of the 1990′s, the data scientists of today will not materialize in volume and the market for “analytics” may be smaller than projected.

67 % of all statistics are made up on the spot.

But no one I tell that to actually laughs…