TYWKIWDBI ("Tai-Wiki-Widbee"): A detailed discussion of p values

04 August 2017

A detailed discussion of p values

An article in Vox will be of interest primarily to readers who have had a manuscript rejected (or have reviewed and rejected one) because a crucial p value was >0.05

Most casual readers of scientific research know that for results to be declared “statistically significant,” they need to pass a simple test. The answer to this test is called a p-value. And if your p-value is less than .05 — bingo, you got yourself a statistically significant result.

Now a group of 72 prominent statisticians, psychologists, economists, sociologists, political scientists, biomedical researchers, and others want to disrupt the status quo. A forthcoming paper in the journal Nature Human Behavior argues that results should only be deemed “statistically significant” if they pass a higher threshold.

“We propose a change to P< 0.005,” the authors write. “This simple step would immediately improve the reproducibility of scientific research in many fields.”...

The proposal has critics. One of them is Daniel Lakens, a psychologist at Eindhoven University of Technology in the Netherlands who is currently organizing a rebuttal paper with dozens of authors. Mainly, he says the significance proposal might work to stifle scientific progress.

Addendum: see also this article in FiveThirtyEight: "Statisticians Found One Thing They Can Agree On: It’s Time To Stop Misusing P-Values."

How many statisticians does it take to ensure at least a 50 percent chance of a disagreement about p-values? According to a tongue-in-cheek assessment by statistician George Cobb of Mount Holyoke College, the answer is two … or one. So it’s no surprise that when the American Statistical Association gathered 26 experts to develop a consensus statement on statistical significance and p-values, the discussion quickly became heated.

It may sound crazy to get indignant over a scientific term that few lay people have even heard of, but the consequences matter. The misuse of the p-value can drive bad science (there was no disagreement over that), and the consensus project was spurred by a growing worry that in some scientific fields, p-values have become a litmus test for deciding which studies are worthy of publication. As a result, research that produces p-values that surpass an arbitrary threshold are more likely to be published, while studies with greater or equal scientific importance may remain in the file drawer, unseen by the scientific community.

The results can be devastating...

Continued at the link.

4 comments:

AnonymousAugust 5, 2017 at 11:27 AM
This is a good article; thanks for sharing. As the article says, this change would not solve any of the fundamental problems with using p-values to evaluate scientific merit. The American Statistical Association recently released a statement on p-values (available at http://amstat.tandfonline.com/doi/full/10.1080/00031305.2016.1154108?scroll=top&needAccess=true#_i27). The statement includes a few relevant points. Here are a few:
"Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold."

"A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.

"By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis."

Changing the cutoff from 0.05 to 0.005 would only serve to emphasize the importance of obtaining a small p-value even more; we should move away from this mindset and towards estimates of effect size.
ReplyDelete
Replies
Lois TverbergAugust 5, 2017 at 9:08 PM
Back when I was doing research in grad school, my advisor had a saying: "If you have to use statistics to defend yourself, it's probably not significant."
ReplyDelete
Replies
TopherAugust 12, 2017 at 7:06 PM
The other downside of this mindset is that it places a lot of social science outside the bounds of "legitimacy." Available samples for hard-to-reach populations are necessarily smaller than those for large survey samples, making p-values lower than .05 more difficult to obtain.

For example, I've been doing gang research. It'd be really difficult, and extremely expensive, to find a representative (e.g. non-convenience) sample of 1000 gang members, just to hit a higher threshold of p-value. Much of social science falls in this trap, because we work with underrepresented populations.

Being held to that high standard of p-values would derail the legitimacy of our findings under a pretense that doesn't fully understand the meaning of p-values.
ReplyDelete
Replies

Add comment