Uncommon Ground

Against null hypothesis significance testing

Several months ago I pointed out that P-values from small, noisy experiments are likely to be misleading. Given our training, we think that if a result is significant with a small sample, it must be a really big effect. But unless we have good reason to believe that there is very little noise in the results (a reason other than the small amount of variation observed in our sample), we could easily be misled. Not only will we overestimate how big the effect is, but we are almost as likely to say that the effect is positive when it’s really negative as we are to get the sign right. Look back at this post from August to see for yourself (and download the R code if you want to explore further). As Gelman and Carlin point out,

There is a common misconception that if you happen to obtain statistical significance with low power, then you have achieved a particularly impressive feat, obtaining scientific success under difficult conditions.

I bring this all up again because I recently learned of a new paper by Denes Szucs and John Ioannidis, When null hypothesis significance testing is unsuitable for research: a reassessment. They summarize their advice on null hypothesis significance testing (NHST) in the abstract:

Whenever researchers use NHST they should justify its use, and publish pre-study power calculations and effect sizes, including negative findings. Studies should optimally be pre-registered and raw data published.

They go on to point out that part of the problem is the way that scientists are trained:

[M]ost scientists…are still near exclusively educated in NHST, they tend to misunderstand and abuse NHST and the method is near fully dominant in scientific papers.

The whole paper is worth reading, and reading carefully. If you use statistics in your research, please read it and remember its lessons the next time you’re analyzing your data.

 

Gelman, A., and J. Carlin. 2014. Beyond power calculations: assessing Type S (sign) and Type M (magnitude) errors. Perspectives on Psychological Science 9:641-661. doi: http://dx.doi.org/10.1177/1745691614551642

Szucs, D., and J.P.A. Ioannidis. 2016. When null hypothesis significance testing is unsuitable for research: a reassessmen. bioRxiv doi: https://doi.org/10.1101/095570

Leave a Comment

Your email address will not be published. Required fields are marked *