An article in
Vox will be of interest primarily to readers who have had a manuscript rejected (or have reviewed and rejected one) because a crucial p value was >0.05
Most casual readers of scientific research know that for
results to be declared “statistically significant,” they need to pass a
simple test. The answer to this test is called a p-value. And if your
p-value is less than .05 — bingo, you got yourself a statistically
significant result.
Now a group of 72 prominent statisticians, psychologists,
economists, sociologists, political scientists, biomedical researchers,
and others want to disrupt the status quo. A forthcoming paper in the journal Nature Human Behavior argues that results should only be deemed “statistically significant” if they pass a higher threshold.
“We propose a change to P< 0.005,” the authors write.
“This simple step would immediately improve the reproducibility of
scientific research in many fields.”...
The proposal has critics. One of them is Daniel Lakens, a
psychologist at Eindhoven University of Technology in the Netherlands
who is currently organizing a rebuttal paper with dozens of authors. Mainly, he says the significance proposal might work to stifle scientific progress.
Addendum: see also this article in FiveThirtyEight: "
Statisticians Found One Thing They Can Agree On: It’s Time To Stop Misusing P-Values."
How many statisticians does it take to ensure at least a 50 percent
chance of a disagreement about p-values? According to a tongue-in-cheek
assessment by statistician George Cobb of Mount Holyoke College,
the answer is two … or one. So it’s no surprise that when the American
Statistical Association gathered 26 experts to develop a consensus
statement on statistical significance and p-values, the discussion
quickly became heated.
It may sound crazy to get indignant over a scientific term that few
lay people have even heard of, but the consequences matter. The misuse
of the p-value can drive bad science (there was no disagreement over
that), and the consensus project was spurred by a growing worry that in
some scientific fields, p-values have become a litmus test for deciding
which studies are worthy of publication. As a result, research that
produces p-values that surpass an arbitrary threshold are more likely to
be published, while studies with greater or equal scientific importance
may remain in the file drawer, unseen by the scientific community.
The results can be devastating...
Continued at the link.
This is a good article; thanks for sharing. As the article says, this change would not solve any of the fundamental problems with using p-values to evaluate scientific merit. The American Statistical Association recently released a statement on p-values (available at http://amstat.tandfonline.com/doi/full/10.1080/00031305.2016.1154108?scroll=top&needAccess=true#_i27). The statement includes a few relevant points. Here are a few:
ReplyDelete"Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold."
"A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
"By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis."
Changing the cutoff from 0.05 to 0.005 would only serve to emphasize the importance of obtaining a small p-value even more; we should move away from this mindset and towards estimates of effect size.
I've just added another link to the post that I think you will find i nteresting.
DeleteBack when I was doing research in grad school, my advisor had a saying: "If you have to use statistics to defend yourself, it's probably not significant."
ReplyDeleteThe other downside of this mindset is that it places a lot of social science outside the bounds of "legitimacy." Available samples for hard-to-reach populations are necessarily smaller than those for large survey samples, making p-values lower than .05 more difficult to obtain.
ReplyDeleteFor example, I've been doing gang research. It'd be really difficult, and extremely expensive, to find a representative (e.g. non-convenience) sample of 1000 gang members, just to hit a higher threshold of p-value. Much of social science falls in this trap, because we work with underrepresented populations.
Being held to that high standard of p-values would derail the legitimacy of our findings under a pretense that doesn't fully understand the meaning of p-values.