One of the great debates that I have learned about at grad school is the conflict between frequentist statistics and Bayesian statistics (nicely highlighted here by xkcd). Both have their uses, but the longer I have been here, the more criticism I hear about the famous P value and whether it has a place in science. Recently, a number of publications have come out in discussion of this debate, and I indulged myself over spring break by nerding out over a few papers.
I started this task (papers listed at the bottom of this post), with a definite bias in my outlook. In the last year I have learned about likelihood, AIC and Bayesian statistics and found myself embracing these techniques that were not taught in my undergraduate Intro to stats. Notably, as I have learned more about Bayesian statistics, I have been impressed by the types of questions that can be answered that we cannot approach at with frequentist statistics. In particular I am interested in risk assessment – and with Bayesian statistics you can actually ask what the probability of a particular outcome might be – for example, what is the probability that if we fish a population at X rate, that the population will fall below a certain level? Coming from the world of the P value, where all you can do is test against a null hypothesis, I have found this an exciting ‘new’ method. Over the last year and a half, I have heard a number of people in SAFS (including myself) state the limitations of the P-value when one is interested in more complex questions (such as model selection), and as a result expected to come out of these readings with stronger arguments against its use. I expected to scoff the papers which were in support of the P value and embrace those that put it down. However, there is more complexity to the debate than my original thinking presumed.
Although I did not come out of this reading entirely scoffing the P value, I definitely did not have any renewed sense that the P value is a fantastic statistical tool. There is some strong presentation that the P value has some use, albeit in limited settings. Then there are also many strong arguments against it, arguing that there are inherent flaws (Barber and Ogle 2014), demonstrating the superiority of alternative methods – e.g. AIC (Burnham and Anderson 2014) and going so far as to write that ‘P values are flawed and not acceptable as properly quantifying evidence’ (Burnham and Anderson 2014)citing Royal 1997). However, what I noticed more consistently was some of the subtleties behind the debate that do not necessarily have to do with whether or not scientists should be using the P value.
It seemed to me that the more I read, the more authors seemed to be circling around the same type of argument: that scientists need to think critically about their results and the statistical technique that they chose to employ. Repeatedly in the papers defending the P value, the authors address the issue of the binary cut-off value. Rather than using some value, they stress the need for an analysis of what the study actually measured. Or other authors have argued that in fact a cut-off value may be of use, but should be much more conservative than what we have thus far been using (e.g. smaller than 0.05). And even the papers putting down the use of the P value in the first place argue in support of thinking through the type of statistical tool we use and better understanding the results that have been found (which more often then previously thought, may have no significance).
There is a desire to approach statistical analysis with some perfect tool which allows the scientist to be entirely objective, when in reality, no such tool exists. So my take-away from all of this is that we need to learn to embrace the reality that subjectivity comes into play and use the resources we have around us (other scientists) to try to think through the best approach to use.
And so where do I think that leaves us? In fact, I think this leaves us better off. Rather than being free to publish papers where someone blindly uses a statistical technique that shows their results are ‘important’ (and I think this happens more frequently than any of us would like to admit), scientists, or biologists I should say, are required to put consideration into both what statistical tools they are using, but also how they interpret the results.
Barber, J. J., and K. Ogle. 2014. To P or not to P? Ecology 95:621-626.
Burnham, K. P., and D. R. Anderson. 2014. P values are only an index to evidence: 20th- vs. 21st-century statistical science. Ecology 95:627-630.
de Valpine, P. 2014. The common sense of P values. Ecology 95:617-621.
Johnson, V. E. 2013. Revised standards for statistical evidence. Proceedings of the National Academy of Sciences.
Murtaugh, P. A. 2014. In defense of P values. Ecology 95:611-617.
Royall, R.M. 1997. Statistical evidence: a likelihood paradigm. Chapman and Hall, London, UK