## Choosing a narrative in isolation: Fisher, Popper and p-values (Part I)

Most applied work in scientific exploration also involves a choice among narratives: an experiment is designed on the basis of a premise being true and when the experiment is performed, data are collected and a paper is written. The latter (among other things) puts the data in the litmus of a statistical significance test to yield a p-value that the investigators use to reject the original hypothesis (if the p-value is small) or not reject it (if the p-value is large). Unfortunately large p-values usually have the undesired consequence that the research paper and the experiment are rejected from the scientific record, i.e. they are never accepted for publication:

This process seems to contradict the orderly Bayesian process for selecting among many narrative models of the world. In fact, in this approach one does not appear to have a choice to begin with! So what is going on?

Readers familiar with the work of Karl Popper may notice a similarity between this procedure and his method of empiric falsification, i.e. the attempt to weed out scientific theories (“narratives”),one at a time, by noting which ones are not supported by empiric data. This approach seems to underlie statistical hypothesis testing via p-values, as introduced by Ronald Fisher:

1. Start with a single, null, hypothesis ($H_0$) with no alternatives
2. Design an experiment on the premise that the null hypothesis is true
3. Collect some data ($D$)
4. Transform the data into a statistic ($t(D)$) whose particular depends on the experiment and the measurement process
5. Calculate the probability of obtaining a value of the test statistic at least as large as the one obtained assuming that the null is true. This is the p-value which is also the probability of observing data that are as (or more extreme) than the ones obtained:

p-value diagram from Wikimedia

6. If the p-value is smaller than a cutoff of significance, i.e. a small value such as 0.05 or 0.01,  then the hypothesis is rejected. This significance value has been fixed in advance of the experiment and before the data are collected and processed. Although this critical area approach is actually due to Neaman and Pearson, Fisher used the absolute values of the p-values as a semi-formal weight of evidence and advocated the specific cutoff of 0.05 as a compromise.

In both Popper’s and Fisher’s approaches, rejection of the null hypothesis does not lead to the acceptance of an alternative one: the researcher is back to the drawing board, figuring out alternative hypotheses that are to be tested experimentally and rejected (or not) in a similar fashion.

Even though the Popper/Fisher and the Bayesian approaches appear to contradict each other, and are usually discussed as such, we should note that the former, considered as a mathematical procedure, may be viewed as a special case of the latter under some special conditions.

This is most easily seen in the case of a narrative or model that specifies data/observables that are impossible (have zero likelihood) under the model. If it happens that such data are actually obtained, then application of Bayes theorem to the problem of choosing among alternatives yields:

$p(H_0|D) \propto p(D|H_0) \times p(H_0) = 0 \times p(H_0) = 0$

irrespective of any other alternatives one wishes to consider and implied by the proportionality constant in the Bayes theorem formula. Therefore obtaining an impossible result is sufficient to render a scientific theory false as its posterior probability becomes zero irrespective of what the data say about alternatives. On the other hand obtaining a result that is certain according to the theory only increases our confidence in it, without ever telling us whether there is another theory that provides a more plausible explanation of the world.

Although this line or reasoning shows a Bayesian (and probabilistically trivial) interpretation of Popper’s falsification one-theory-at-time approach one, it does not provide a similar basis for which to interpret the p-value in the Fisherian perspective. This is going to be dealt with in the second part of this post.