Bayesian linear regression analysis without tears (R)

Bayesian methods are sure to get some publicity after Vale Johnson’s PNAS paper regarding the use of Bayesian approaches to recalibrate p-value cutoffs from 0.05 to 0.005. Though the paper itself is bound to get some heat (see the discussion in Andrew Gelman’s blog and Matt Briggs’s fun-to-read deconstruction), the controversy might stimulate people to explore Bayesianism and (hopefully!) to move away from frequentist analyses. The newcomers though will face some hurdles in this journey:

  • philosophical (the need to adapt to an “alternative” inferential lifestyle)
  • practical (gather all the data that came before one’s definitive study, and process them mathematically in order define the priors)
  • technical (learn the tools required to carry out Bayesian analyses and summarizes results)

Though there are excellent resources out there to deal with philosophy/theory (e.g. see the books by: Jaynes, Gelman, Robert, Lee) and the necessary tools to implement Bayesian analyses (in R, JAGS, OpenBUGS, WinBUGS, STAN) my own (admittedly biased) perspective is that many people will be reluctant to simultaneously change too many things in their scientific modus operandi. One can call it intellectual laziness, human inertia or simply lack of time, but the bottom line is that one is more likely to embrace change in small steps and with as little disturbance in one’s routine as possible. So how can one embark on the Bayesian journey by taking small steps towards the giant leap?

It would appear to me that one’s least resistance journey to Bayesianism might be based on non-informative (uninformative/ data-dominated) priors. These simultaneously avoid the need to do the tedious searching of previous evidence/expert elicitation required to provide informative priors, while retaining the connection to one’s frequentist past in which only current data are the only important things (hint: they are not). Furthermore, one can even avoid learning some of the more elaborate software systems/libraries required to carry out bona fide Bayesian analysis by  reusing of the R output of a frequentist analysis.

Let’s see how it is possible to cater to the needs of the lazy, inert or horribly busy researcher. First we start with the a toy linear regression example (straight from R’s lm help file):

## Annette Dobson (1990) "An Introduction to Generalized Linear Models".
## Page 9: Plant Weight Data.
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
lmfit <- lm(weight ~ group)

The standard non-informative prior for the linear regression analysis example (Bayesian Data Analysis 2nd Ed, p:355-358) takes an improper (uniform) prior on the coefficients of the regression (\beta : the intercept and the effects of the “Trt” variable) and the logarithm of the residual variance \sigma^2. With these priors, the posterior distribution of \beta conditional on \sigma^2 and the response variable y is: \beta|\sigma^2 , y \sim N(\hat{\beta},V_{\beta}\sigma^2)

The marginal posterior distribution for \sigma^2 is a scaled inverse \chi^2 distribution with scale s and n-k degrees of freedom, where n is the number of data points and k the number of predictor variables. In our example these assume the values of n=20, k=2, while s^2 is the standard frequentist estimate of the residual variance.
The quantities \hat{\beta}, s^2 are directly available from the information returned by R’s lm, while V_{\beta} can be computed from the qr element of the lm object:

    QR<-lmfit$qr
    df.residual<-lmfit$df.residual
    R<-qr.R(QR) ## R component
    coef<-lmfit$coef
    Vb<-chol2inv(R) ## variance(unscaled)
    s2<-(t(lmfit$residuals)%*%lmfit$residuals)
    s2<-s2[1,1]/df.residual

To compute the marginal distribution of \beta|y we can use a simple Monte Carlo algorithm, first drawing \sigma^2 from its marginal posterior, and then \beta|\sigma^2 , y. The following function will do that; it accepts as arguments a lm object, the desired number of Monte Carlo samples and returns everything in a data frame for further processing:

## function to compute the bayesian analog of the lmfit
## using non-informative priors and Monte Carlo scheme
## based on N samples

bayesfit<-function(lmfit,N){
    QR<-lmfit$qr
    df.residual<-lmfit$df.residual
    R<-qr.R(QR) ## R component
    coef<-lmfit$coef
    Vb<-chol2inv(R) ## variance(unscaled)
    s2<-(t(lmfit$residuals)%*%lmfit$residuals)
    s2<-s2[1,1]/df.residual

    ## now to sample residual variance
    sigma<-df.residual*s2/rchisq(N,df.residual)
    coef.sim<-sapply(sigma,function(x) mvrnorm(1,coef,Vb*x))
    ret<-data.frame(t(coef.sim))
    names(ret)<-names(lmfit$coef)
    ret$sigma<-sqrt(sigma)
    ret
}

A helper function can be used to summarize these Monte Carlo estimates by yielding the mean, standard deviation, median, t (the ratio of mean/standard deviation) and a 95% (symmetric) credible interval:

Bayes.sum<-function(x)
    {
        c("mean"=mean(x),
          "se"=sd(x),
          "t"=mean(x)/sd(x),
          "median"=median(x),
          "CrI"=quantile(x,prob=0.025),
          "CrI"=quantile(x,prob=0.975)
          )
    }

To use these functions and contrast Bayesian and frequentist estimates one simply needs to fit the regression model with lm, call the bayesim function to run the Bayesian analysis and pass the results to Bayes.sum:

> set.seed(1234)  ## reproducible sim
> lmfit <- lm(weight ~ group)
> bf<-bayesfit(lmfit,10000)
> t(apply(bf,2,Bayes.sum))
                  mean        se         t     median    Cr.2.5%  Cr.97.5%
(Intercept)  5.0332172 0.2336049 21.545857  5.0332222  4.5651327 5.4902380
groupTrt    -0.3720698 0.3335826 -1.115375 -0.3707408 -1.0385601 0.2895787
sigma        0.7262434 0.1275949  5.691789  0.7086832  0.5277051 1.0274083
> summary(lmfit)

Call:
lm(formula = weight ~ group)

Residuals:
    Min      1Q  Median      3Q     Max
-1.0710 -0.4938  0.0685  0.2462  1.3690

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   5.0320     0.2202  22.850 9.55e-15 ***
groupTrt     -0.3710     0.3114  -1.191    0.249
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.6964 on 18 degrees of freedom
Multiple R-squared:  0.07308,	Adjusted R-squared:  0.02158
F-statistic: 1.419 on 1 and 18 DF,  p-value: 0.249

It can be seen that the Bayesian estimates are almost identical to the frequentist ones (up to 2 significant digits, which is the limit of precision of the Monte Carlo run based on 10000 samples), but uncertainty in terms of these estimates (the standard deviation) and the residual variance is larger. This conservativeness is an inherent feature of Bayesian analysis which guards against too many false positives hits.

Advertisements

Tags: , ,

6 Responses to “Bayesian linear regression analysis without tears (R)”

  1. civkan Says:

    Thanks for sharing this useful analysis. It’s usefult with many details.

    qr code vb

  2. John Says:

    Are you kidding? This example is totally unintelligible – full of technical jargon and unexplained shortcuts. What amount of “tears” has actually been saved?

  3. Frank Says:

    John is right, you lost me at “The marginal posterior distribution” paragraph… This is a common thing when experts try to explain bayesian analysis. Seems very difficult to express in layman’s terms.

    • Christos Argyropoulos Says:

      Hi Frank
      Similar concepts are actually used in frequentist statistics when one calculates standard error and p values etc. In essence one integrates(“marginalizes”) over a large parameter space all other random quantities to summarize the uncertainty in one variable at time.
      The major reason you do not see this in nonBayesian statisitics is because there everyone assumes that one is dealing with Gaussian variables in which the marginal is also a Gaussian (normal) variable

  4. Kay Says:

    Hi, I have a slight problem at the end step:
    > bf<-bayesfit(lmfit,10000)
    when I input that onto R, it gives me an error:
    Error in FUN(X[[i]], …) : could not find function "mvrnorm"

    thanks

  5. Emilio Says:

    Hi Kay,
    you need to install the MASS R-package.
    Place on top of your R-markdown or R-script the following:
    require(MASS)
    rgds

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: