Even with non-informative priors, the Bayesian approach is preferable to frequentist linear regression

A unique aspect of the Bayesian approach is that it allows one to integrate previous knowledge (“prior beliefs“) with current data (“likelihood“). Yet even in those cases in which non-informative priors are used (as in here and here) , the Bayesian approach is preferable due to its conservative nature.

To see why, note that in both Bayesian and frequentist approaches the key quantities are the values for the “t-statistics”, which are given by: \frac{ \hat{\beta_i}}{s \sqrt{V_{\beta}[i,i]}} i.e. the regression estimate scaled by the square root of the corresponding diagonal element of the covariance matrix times the residual error. In frequentist statistics these quantities are assessed for significance using a Gaussian tail area probability; in the Bayesian perspective one can equivalently define an tail area probability. The latter informally quantifies evidence in support of the a posteriori  hypothesis that this coefficient does not display substantial directional behavior. Stated in other words, this Bayesian tail area probability measures whether the bulk of the posterior evidence is not concentrated left or right of zero.

In any case, one can ask what is the value of the t-statistic that would lead to the rejection of the null hypothesis (in the frequentist perspective we usually set this at p=0.05), or that it would be associated with a small (e.g. p=0.05) tail area posterior probability. This is shown in the figure below, which demonstrates that one would need a higher t-statistic to reach the same conclusions in the Bayesian v.s. the frequentist perspective:


To obtain a higher t-statistic, one would need either a stronger “signal” (larger numerator) or less “noise” (smaller denominator) . The discrepancy between the Bayesian and the frequentist assessments is larger for smaller degrees of freedom (=size of the dataset – number of variables in the regression model) and is only asymptotically suppressed for large dfs.

One can flip the question and ask, what is the correspondence between a Bayesian assessment yielding a p=0.05 and a frequentist test supplied with the same information, i.e. the same value of the t-statistic:


This figure also demonstrates that a Bayesian tail area probability of p=0.05 will correspond to a smaller frequentist p-value, especially for smaller sample sizes. These graphs reinforce the notion that the p-value should never be interpreted without taking into account the context of the regression (dataset, variables used to adjust outcomes in the regression). Furthermore, the Bayesian tail-area probabilities are always more conservative than the frequentist p-values. This is skeptical, yet healthy approach to the evaluation of evidence; if it were to replace the frequentist assessments carried out today it would be able to filter through findings that corresponded to either higher signals or lower noise compared to what we currently view in journals.

To see this more clearly and across a wide range of t-statistics and degrees of freedom, I generated a set of 3-D graphs that show the relationships between t-statistic, df and tail probability value for the Bayesian:


and the the frequentist approaches:


It can be seen that borderline significant findings in the frequentist approach (i.e. with p between 0.04-0.05) are not always flagged as “significant” in the Bayesian framework. Though the dfs (a quantity dominated by the size of the dataset) is important when one evaluates evidence in either viewpoints,  the Bayesian one “shrinks” the perceived significance of borderline findings in small datasets. When one has plenty of data (df>150) the two approaches largely agree with each other.

What are the larger implications for research should we (as a species) switch en mass to the Bayesian paradigm? First, we will be able to filter borderline findings and skew our collection of “significant” scientific discoveries towards larger effects or less noisy measurements (either due to better instruments or larger sample sizes). These will be realized irrespective of the adoption of an (informative prior) Bayesian analysis which would also introduce the context of previous investigations into the assessment of the current data.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: