Bayesian methods are sure to get some publicity after Vale Johnson’s PNAS paper regarding the use of Bayesian approaches to recalibrate p-value cutoffs from **0.05** to **0.005**. Though the paper itself is bound to get some heat (see the discussion in Andrew Gelman’s blog and Matt Briggs’s fun-to-read deconstruction), the controversy might stimulate people to **explore **Bayesianism and (hopefully!) to **move away **from frequentist analyses. The newcomers though will face some hurdles in this journey:

- philosophical (the need to adapt to an “alternative” inferential lifestyle)
- practical (gather all the data that came before one’s definitive study, and process them mathematically in order define the priors)
- technical (learn the tools required to carry out Bayesian analyses and summarizes results)

Though there are excellent resources out there to deal with philosophy/theory (e.g. see the books by: Jaynes, Gelman, Robert, Lee) and the necessary tools to implement Bayesian analyses (in R, JAGS, OpenBUGS, WinBUGS, STAN) my own (admittedly biased) perspective is that many people will be reluctant to simultaneously change too many things in their scientific modus operandi. One can call it intellectual laziness, human inertia or simply lack of time, but the bottom line is that one is more likely to embrace change in small steps and with as little disturbance in one’s routine as possible. So how can one embark on the Bayesian journey by taking small steps towards the giant leap?

It would appear to me that one’s least resistance journey to Bayesianism might be based on *non-informative (*uninformative/ data-dominated) priors. These simultaneously avoid the need to do the tedious searching of previous evidence/expert elicitation required to provide informative priors, while retaining the connection to one’s frequentist past in which only current data are the only important things (*hint: *they are not). Furthermore, one can even avoid learning some of the more elaborate software systems/libraries required to carry out bona fide Bayesian analysis by reusing of the R output of a frequentist analysis.

Let’s see how it is possible to cater to the needs of the lazy, inert or horribly busy researcher. First we start with the a toy linear regression example (straight from R’s lm help file):

## Annette Dobson (1990) "An Introduction to Generalized Linear Models". ## Page 9: Plant Weight Data. ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group <- gl(2, 10, 20, labels = c("Ctl","Trt")) weight <- c(ctl, trt) lmfit <- lm(weight ~ group)

The standard *non-informative* prior for the linear regression analysis example (*Bayesian Data Analysis* 2nd Ed, p:355-358) takes an improper (uniform) prior on the coefficients of the regression ( : the intercept and the effects of the “Trt” variable) and the logarithm of the residual variance . With these priors, the posterior distribution of conditional on and the response variable is:

The marginal posterior distribution for is a scaled inverse distribution with scale and degrees of freedom, where is the number of data points and the number of predictor variables. In our example these assume the values of , while is the standard frequentist estimate of the residual variance.

The quantities are directly available from the information returned by R’s lm, while can be computed from the qr element of the lm object:

QR<-lmfit$qr df.residual<-lmfit$df.residual R<-qr.R(QR) ## R component coef<-lmfit$coef Vb<-chol2inv(R) ## variance(unscaled) s2<-(t(lmfit$residuals)%*%lmfit$residuals) s2<-s2[1,1]/df.residual

To compute the *marginal* distribution of we can use a simple Monte Carlo algorithm, first drawing from its marginal posterior, and then . The following function will do that; it accepts as arguments a lm object, the desired number of Monte Carlo samples and returns everything in a data frame for further processing:

## function to compute the bayesian analog of the lmfit ## using non-informative priors and Monte Carlo scheme ## based on N samples bayesfit<-function(lmfit,N){ QR<-lmfit$qr df.residual<-lmfit$df.residual R<-qr.R(QR) ## R component coef<-lmfit$coef Vb<-chol2inv(R) ## variance(unscaled) s2<-(t(lmfit$residuals)%*%lmfit$residuals) s2<-s2[1,1]/df.residual ## now to sample residual variance sigma<-df.residual*s2/rchisq(N,df.residual) coef.sim<-sapply(sigma,function(x) mvrnorm(1,coef,Vb*x)) ret<-data.frame(t(coef.sim)) names(ret)<-names(lmfit$coef) ret$sigma<-sqrt(sigma) ret }

A helper function can be used to summarize these Monte Carlo estimates by yielding the mean, standard deviation, median, t (the ratio of mean/standard deviation) and a 95% (symmetric) credible interval:

Bayes.sum<-function(x) { c("mean"=mean(x), "se"=sd(x), "t"=mean(x)/sd(x), "median"=median(x), "CrI"=quantile(x,prob=0.025), "CrI"=quantile(x,prob=0.975) ) }

To use these functions and contrast Bayesian and frequentist estimates one simply needs to fit the regression model with lm, call the bayesim function to run the Bayesian analysis and pass the results to Bayes.sum:

> set.seed(1234) ## reproducible sim > lmfit <- lm(weight ~ group) > bf<-bayesfit(lmfit,10000) > t(apply(bf,2,Bayes.sum)) mean se t median Cr.2.5% Cr.97.5% (Intercept) 5.0332172 0.2336049 21.545857 5.0332222 4.5651327 5.4902380 groupTrt -0.3720698 0.3335826 -1.115375 -0.3707408 -1.0385601 0.2895787 sigma 0.7262434 0.1275949 5.691789 0.7086832 0.5277051 1.0274083 > summary(lmfit) Call: lm(formula = weight ~ group) Residuals: Min 1Q Median 3Q Max -1.0710 -0.4938 0.0685 0.2462 1.3690 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.0320 0.2202 22.850 9.55e-15 *** groupTrt -0.3710 0.3114 -1.191 0.249 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.6964 on 18 degrees of freedom Multiple R-squared: 0.07308, Adjusted R-squared: 0.02158 F-statistic: 1.419 on 1 and 18 DF, p-value: 0.249

It can be seen that the Bayesian estimates are almost identical to the frequentist ones (up to 2 significant digits, which is the limit of precision of the Monte Carlo run based on 10000 samples), but uncertainty in terms of these estimates (the standard deviation) and the residual variance is larger. This conservativeness is an inherent feature of Bayesian analysis which guards against too many false positives hits.

November 13, 2014 at 2:20 am |

Thanks for sharing this useful analysis. It’s usefult with many details.

qr code vb

August 7, 2015 at 7:55 am |

Are you kidding? This example is totally unintelligible – full of technical jargon and unexplained shortcuts. What amount of “tears” has actually been saved?

September 19, 2016 at 10:02 am |

John is right, you lost me at “The marginal posterior distribution” paragraph… This is a common thing when experts try to explain bayesian analysis. Seems very difficult to express in layman’s terms.

September 19, 2016 at 1:46 pm |

Hi Frank

Similar concepts are actually used in frequentist statistics when one calculates standard error and p values etc. In essence one integrates(“marginalizes”) over a large parameter space all other random quantities to summarize the uncertainty in one variable at time.

The major reason you do not see this in nonBayesian statisitics is because there everyone assumes that one is dealing with Gaussian variables in which the marginal is also a Gaussian (normal) variable

November 25, 2016 at 10:43 am |

Hi, I have a slight problem at the end step:

> bf<-bayesfit(lmfit,10000)

when I input that onto R, it gives me an error:

Error in FUN(X[[i]], …) : could not find function "mvrnorm"

thanks

December 15, 2016 at 7:04 pm |

Hi Kay,

you need to install the MASS R-package.

Place on top of your R-markdown or R-script the following:

require(MASS)

rgds