## Survival Analysis With Generalized Additive Models : Part IV (the survival function)

May 3, 2015

The ability of PGAMs to estimate the log-baseline hazard rate, endows them with the capability to be used as smooth alternatives to the Kaplan Meier curve. If we assume for the shake of simplicity that there are no proportional co-variates in the PGAM regression, then the quantity modeled  corresponds to the log-hazard of the  survival function. Note that the only assumptions made by the PGAM is that the a) log-hazard is a smooth function, with b) a given maximum complexity (number of degrees of freedom) and c) continuous second derivatives. A PGAM provides estimates of the log-hazard constant, $\beta_{0}$, and the time-varying deviation, $\lambda(t_{i,j})$. These can be used to predict the value of the survival function, $S(t)$, by approximating the integral appearing in the definition of $S(t)$ by numerical quadrature.

$S(t_{i})=\exp\left(-\int_{0}^{t_{i}}h(t)\mathrm{\, d}t\right)\approx\exp\left(-\sum_{j=1}^{N_{i}}w_{i,j}\exp(\beta_{0}+\lambda(t_{i,j}))\right)$

From the above definition it is obvious that the value of the survival distribution at any given time point is a non-linear function of the PGAM estimate. Consequently, the predicted survival value, $S_{pred}(t)$, cannot be derived in closed form; as with all non-linear PGAM estimates, a simple Monte Carlo simulation algorithm may be used to derive both the expected value of $\hat{S}_{pred}(t)$ and its uncertainty. For the case of the survival function, the simulation steps are provided in Appendix (Section A3) of our paper. The following R function can be used to predict the survival function and an associated confidence interval at a grid of points. It accepts as arguments a) the vector of time points, b) a PGAM object for the fitted log-hazard function, c) a list with the nodes and weights of a Gauss-Lobatto rule for the integration of the predicted survival, d) the number of Monte Carlo samples to obtain and optionally e) the seed of the random number generation. Of note, the order of the quadrature used to predict the survival function is not necessarily the same as the order used to fit the log-hazard function.

## Calculate survival and confidence interval over a grid of points
## using a GAM
SurvGAM<-Vectorize(function(t,gm,gl.rule,CI=0.95,Nsim=1000,seed=0)
##         t : time at which to calculate relative risk
##        gm : gam model for the fit
##   gl.rule : GL rule (list of weights and nodes)
##        CI : CI to apply
##      Nsim : Number of replicates to draw
##      seed : RNG seed
{
q<-(1-CI)/2.0
## create the nonlinear contrast
pdfnc<-data.frame(stop=t,gam.dur=1)
L<-length(gl.rule$x) start<-0; ## only for right cens data ## map the weights from [-1,1] to [start,t] gl.rule$w<-gl.rule$w*0.5*(t-start) ## expand the dataset df<-Survdataset(gl.rule,pdfnc,fu=1) ## linear predictor at each node Xp <- predict(gm,newdata=df,type="lpmatrix") ## draw samples set.seed(seed) br <- rmvn(Nsim,coef(gm),gm$Vp)
res1<-rep(0,Nsim)
for(i in 1:Nsim){
## hazard function at the nodes
hz<-exp(Xp%*%br[i,])
## cumumative hazard
chz1<-gl.rule$w %*% hz[1:L,] ##survival res1[i]<-exp(-chz1) } ret<-data.frame(t=t,S=mean(res1), LCI=quantile(res1,prob=q), UCI=quantile(res1,prob=1-q)) ret },vectorize.args=c("t")) The function makes use of another function, Survdataset, that expands internally the vector of time points into a survival dataset. This dataset is used to obtain predictions of the log-hazard function by calling the predict function from the mgcv package. ## Function that expands a prediction dataset ## so that a GL rule may be applied ## Used in num integration when generating measures of effect Survdataset<-function(GL,data,fu) ## GL : Gauss Lobatto rule ## data: survival data ## fu: column number containing fu info { ## append artificial ID in the set data$id<-1:nrow(data)
Gllx<-data.frame(stop=rep(GL$x,length(data$id)),
t=rep(data[,fu],each=length(GL$x)), id=rep(data$id,each=length(GL$x)), start=0) ## Change the final indicator to what ## was observed, map node positions, ## weights from [-1,1] back to the ## study time Gllx<-transform(Gllx, stop=0.5*(stop*(t-start)+(t+start))) ## now merge the remaining covariate info Gllx<-merge(Gllx,data[,-c(fu)]) nm<-match(c("t","start","id"),colnames(Gllx)) Gllx[,-nm] } The ability to draw samples from the multivariate normal distribution corresponding to the model estimates and its covariance matrix is provided by another function, rmvn: ## function that draws multivariate normal random variates with ## a given mean vector and covariance matrix ## n : number of samples to draw ## mu : mean vector ## sig : covariance matrix rmvn <- function(n,mu,sig) { ## MVN random deviates L <- mroot(sig);m <- ncol(L); t(mu + L%*%matrix(rnorm(m*n),m,n)) } To illustrate the use of these functions we revisit the PBC example from the 2nd part of this blog series. Firstly, let’s obtain a few Gauss-Lobatto lists of weights/nodes for the integration of the survival function:  ## Obtain a few Gauss Lobatto rules to integrate the survival ## distribution GL5<-GaussLobatto(5); GL10<-GaussLobatto(10); GL20<-GaussLobatto(20); Subsequently, we fit the log-hazard rate to the coarsely (5 nodes) and more finely discretized (using a 10 point Gauss Lobatto rule) versions of the PBC dataset, created in Part 2. The third command obtains the Kaplan Meier estimate in the PBC dataset.  fSurv1<-gam(gam.ev~s(stop,bs="cr")+offset(log(gam.dur)), data=pbcGAM,family="poisson",scale=1,method="REML") fSurv2<-gam(gam.ev~s(stop,bs="cr")+offset(log(gam.dur)), data=pbcGAM2,family="poisson",scale=1,method="REML") KMSurv<-survfit(Surv(time,status)~1,data=pbc)  We obtained survival probability estimates for the 6 combinations of time discretization for fitting (either a 5 or 10th order Lobatto rule) and prediction (a 5th, 10th or 20th order rule): t<-seq(0,4500,100) s1<-SurvGAM(t,fSurv1,GL5) s2<-SurvGAM(t,fSurv1,GL10) s3<-SurvGAM(t,fSurv1,GL20) s4<-SurvGAM(t,fSurv2,GL5) s5<-SurvGAM(t,fSurv2,GL10) s6<-SurvGAM(t,fSurv2,GL20) In all cases 1000 Monte Carlo samples were obtained for the calculation of survival probability estimates and their pointwise 95% confidence intervals. We can plot these against the Kaplan Meier curve estimates: par(mfrow=c(2,3)) plot(KMSurv,xlab="Time (days)",ylab="Surv Prob",ylim=c(0.25,1),main="Fit(GL5)/Predict(GL5)") lines(s1[1,],s1[2,],col="blue",lwd=2) lines(s1[1,],s1[3,],col="blue",lwd=2,lty=2) lines(s1[1,],s1[4,],col="blue",lwd=2,lty=2) plot(KMSurv,xlab="Time (days)",ylab="Surv Prob",ylim=c(0.25,1),main="Fit(GL5)/Predict(GL10)") lines(s2[1,],s2[2,],col="blue",lwd=2) lines(s2[1,],s2[3,],col="blue",lwd=2,lty=2) lines(s2[1,],s2[4,],col="blue",lwd=2,lty=2) plot(KMSurv,xlab="Time (days)",ylab="Surv Prob",ylim=c(0.25,1),main="Fit(GL5)/Predict(GL20)") lines(s3[1,],s3[2,],col="blue",lwd=2) lines(s3[1,],s3[3,],col="blue",lwd=2,lty=2) lines(s3[1,],s3[4,],col="blue",lwd=2,lty=2) plot(KMSurv,xlab="Time (days)",ylab="Surv Prob",ylim=c(0.25,1),main="Fit(GL10)/Predict(GL5)") lines(s4[1,],s4[2,],col="blue",lwd=2) lines(s4[1,],s4[3,],col="blue",lwd=2,lty=2) lines(s4[1,],s4[4,],col="blue",lwd=2,lty=2) plot(KMSurv,xlab="Time (days)",ylab="Surv Prob",ylim=c(0.25,1),main="Fit(GL10)/Predict(GL10)") lines(s5[1,],s5[2,],col="blue",lwd=2) lines(s5[1,],s5[3,],col="blue",lwd=2,lty=2) lines(s5[1,],s5[4,],col="blue",lwd=2,lty=2) plot(KMSurv,xlab="Time (days)",ylab="Surv Prob",ylim=c(0.25,1),main="Fit(GL10)/Predict(GL20)") lines(s6[1,],s6[2,],col="blue",lwd=2) lines(s6[1,],s6[3,],col="blue",lwd=2,lty=2) lines(s6[1,],s6[4,],col="blue",lwd=2,lty=2)  Survival probability estimates: Kaplan Meier curve (black) v.s. the PGAM estimates for different orders of Gauss Lobatto (GL) quadrature Overall, there is a close agreement between the Kaplan Meier estimate and the PGAM estimates despite the different function spaces that the corresponding estimators “live”: the space of all piecewise constant functions (KM) v.s. that of the smooth functions with bounded, continuous second derivatives (PGAM). Furthermore, the 95% confidence interval of each estimator (dashed lines) contain the expected value of the other estimator. This suggests that there is no systematic difference between the KM and the PGAM survival estimators. This was confirmed in simulated datasets (see Fig 2 in our PLoS One paper). ## Survival Analysis With Generalized Additive Models : Part III (the baseline hazard) May 2, 2015 In the third part of the series on survival analysis with GAMs we will review the use of the baseline hazard estimates provided by this regression model. In contrast to the Cox mode, the log-baseline hazard is estimated along with other quantities (e.g. the log hazard ratios) by the Poisson GAM (PGAM) as: $log(h(t_{i,j})) = \beta_{0}+\lambda(t_{t,j})+\boldsymbol{x\beta} = \lambda_{I}(t_{i,j})+\boldsymbol{x\beta}$ In the aforementioned expression, the baseline hazard is equivalently modeled as a time-varying deviation ($\lambda(t)$) from a constant (the intercept $\beta_{0}$) , or as a time-varying function ($\lambda_{I}(t)$). In the latter case, the constant is absorbed into the smooth term. The choice between these equivalent forms is dictated by the application at hand; in particular, the intercept may be switched on or off by centering the smooth terms appearing in the call to the gam function. Hence, in the PGAM formulation the log-baseline hazard is yet another covariate that one estimates by a smooth function; other covariates may modify this hazard in a proportional fashion by additively shifting the log-baseline hazard ($\boldsymbol{x\beta}$). In the “standard” way of fitting a PGAM by mgcv, the log-baseline hazard is estimated in the constant+deviation form. Exponentiation may be used to derive the baseline hazard and its standard errors. Continuing the analysis of the Primary Biliary Cirrhosis example from the second part of the series, we may write: par(mfrow=c(2,2)) plot(fGAM,main="Gauss Lobatto (5)",ylab="log-basehaz") plot(fGAM2,main="Gauss Lobatto (10)",ylab="log-basehaz") plot(fGAM,main="Gauss Lobatto (5)",ylab="basehaz",trans=exp) plot(fGAM2,main="Gauss Lobatto (10)",ylab="basehaz",trans=exp) Log Baseline (top row) and Baseline (second row) hazard function in the PBC dataset for two different discretizations of the data. In all these cases, the baseline hazard (or its log) are given as time varying deviations from a constant (the value of the log-hazard where the confidence interval vanishes) There is no substantial difference in the estimated obtained by the coarse (Gauss Lobatto (5)) and finer (Gauss Lobatto (10)) discretization. Note that as a result of fitting the log-hazard as constant+ time-varying deviation, the standard error of the curve vanishes at ~1050: the value of the log-hazard at that instant in events per unit time is provided by the intercept term. Estimation of the log-baseline hazard allows the PGAM to function as a parametric, smooth alternative to the Kaplan Meier estimator. This will be examined in the fourth part of this series. ## Survival Analysis With Generalized Models: Part II (time discretization, hazard rate integration and calculation of hazard ratios) May 2, 2015 In the second part of the series we will consider the time discretization that makes the Poisson GAM approach to survival analysis possible. Consider a set of s$M$ individual observations at times $\mathcal{F}=\left\{ F_{i}\right\} _{i=1}^{M}$, with censoring indicators $\mathcal{D}=\left\{ \delta_{i}\right\} _{i=1}^{M}$ assuming the value of 0 if the corresponding observation was censored and 1 otherwise. Under the assumption of non-informative censoring, the likelihood of the sample is given by: $L=\prod_{i=1}^{M}f(F_{i})^{\delta_{i}}S(F_{i})^{1-\delta_{i}}= \prod_{i=1}^{M}h(F_{i})^{\delta_{i}}\exp\left(-\int_{0}^{F_{i}}h(t)\mathrm{\, d}t\right)$ where $h(t)$ is the hazard function. By using an interpolatory quadrature rule, one may substitute the integral with a weighted sum evaluated at a distinct number of nodes. $L=\prod_{i=1}^{M}\prod_{j=1}^{N_{i}}h(t_{i})^{d_{i,j}}\exp\left(-w_{j,j}h(t_{i,j})\right)$ where $t_{i,j}$, $w_{i,j}$ are the nodes, weights of the integration rule and $d_{i,j}$ is an indicator variable equal to 1 if the corresponding node corresponds to an event time and zero otherwise. By including additional “pseudo-observations” at the nodes of the quadrature rule, we converted the survival likelihood to the kernel of a Poisson regression with variable exposures (weights). Conditional on the adoption of an efficient quadrature rule, this is a highly accurate approximation: Bounds of the Gauss Lobatto (GL) approximation error for the integration of survival data (MST=Mean Survival Time). In order for the construct to work one has to ensure that the corresponding lifetimes are mapped to a node of the integration scheme. In our paper, this was accomplished by the adoption of the Gauss-Lobatto rule. The nodes and weights of the Gauss-Lobatto rule (which is defined in the interval $[-1,1]$ depend on the Legendre polynomials in a complex way. The following R function will calculate the weights and nodes for the N-th order Gauss Lobatto rule: GaussLobatto<-function(N) { N1<-N N<-N-1 x=matrix(cos(pi*(0:N)/N),ncol=1) x=cos(pi*(0:N)/N) P<-matrix(0,N1,N1) xold<-2 while (max(abs(x-xold))>2.22044604925031e-16) { xold<-x P[,1]<-1 P[,2]<-x for (k in 2:N) { P[,k+1]=( (2*k-1)*x*P[,k]-(k-1)*P[,k-1] )/k; } x<-xold-( x*P[,N1]-P[,N] )/( N1*P[,N1] ) } w<-2./(N*N1*P[,N1]^2); ret<-list(x=rev(x),w=w) attr(ret,"order")<-N ret } which can be called to return a list of the nodes and their weights: > GaussLobatto(5)$x
[1] -1.0000000 -0.6546537 0.0000000 0.6546537 1.0000000

$w [1] 0.1000000 0.5444444 0.7111111 0.5444444 0.1000000 attr(,"order") [1] 4 To prepare a survival dataset for GAM fitting, one needs to call this function to obtain a Gauss Lobatto rule of the required order. Once this has been obtained, the following R function will expand the (right-censored) dataset to include the pseudo-observations at the nodes of the quadrature rule:  GAMSurvdataset<-function(GL,data,fu,d) ## GL : Gauss Lobatto rule ## data: survival data ## fu: column number containing fu info ## d: column number with event indicator { ## append artificial ID in the set data$id<-1:nrow(data)
Gllx<-data.frame(stop=rep(GL$x,length(data$id)),
gam.dur=rep(GL$w,length(data$id)),
t=rep(data[,fu],each=length(GL$x)), ev=rep(data[,d],each=length(GL$x)),
id=rep(data$id,each=length(GL$x)),
gam.ev=0,start=0)
## Change the final indicator to what
## was observed, map node positions,
## weights from [-1,1] back to the
## study time
Gllx<-transform(Gllx,
gam.ev=as.numeric((gam.ev | ev)*I(stop==1)),
gam.dur=0.5*gam.dur*(t-start),
stop=0.5*(stop*(t-start)+(t+start)))
## now merge the remaining covariate info
Gllx<-merge(Gllx,data[,-c(fu,d)])
Gllx
}



We illustrate the use of these functions on the Primary Biliary Cirrhosis dataset that comes with R:

data(pbc)
> ## Change transplant to alive
> pbc$status[pbc$status==1]<-0
> ## Change event code of death(=2) to 1
> pbc$status[pbc$status==2]<-1
>
id time status trt age sex ascites hepato spiders edema bili chol albumin copper
1 1 400 1 1 58.76523 f 1 1 1 1.0 14.5 261 2.60 156
2 2 4500 0 1 56.44627 f 0 1 1 0.0 1.1 302 4.14 54
3 3 1012 1 1 70.07255 m 0 0 0 0.5 1.4 176 3.48 210
4 4 1925 1 1 54.74059 f 0 1 1 0.5 1.8 244 2.54 64
5 5 1504 0 2 38.10541 f 0 1 1 0.0 3.4 279 3.53 143
6 6 2503 1 2 66.25873 f 0 1 0 0.0 0.8 248 3.98 50
alk.phos ast trig platelet protime stage
1 1718.0 137.95 172 190 12.2 4
2 7394.8 113.52 88 221 10.6 3
3 516.0 96.10 55 151 12.0 4
4 6121.8 60.63 92 183 10.3 4
5 671.0 113.15 72 136 10.9 3
6 944.0 93.00 63 NA 11.0 3
>
> GL<-GaussLobatto(5)
> pbcGAM<-GAMSurvdataset(GL,pbc,2,3)
id stop gam.dur t ev gam.ev start trt age sex ascites hepato spiders
1 1 0.00000 20.0000 400 1 0 0 1 58.76523 f 1 1 1
2 1 69.06927 108.8889 400 1 0 0 1 58.76523 f 1 1 1
3 1 200.00000 142.2222 400 1 0 0 1 58.76523 f 1 1 1
4 1 330.93073 108.8889 400 1 0 0 1 58.76523 f 1 1 1
5 1 400.00000 20.0000 400 1 1 0 1 58.76523 f 1 1 1
6 2 0.00000 225.0000 4500 0 0 0 1 56.44627 f 0 1 1
edema bili chol albumin copper alk.phos ast trig platelet protime stage
1 1 14.5 261 2.60 156 1718.0 137.95 172 190 12.2 4
2 1 14.5 261 2.60 156 1718.0 137.95 172 190 12.2 4
3 1 14.5 261 2.60 156 1718.0 137.95 172 190 12.2 4
4 1 14.5 261 2.60 156 1718.0 137.95 172 190 12.2 4
5 1 14.5 261 2.60 156 1718.0 137.95 172 190 12.2 4
6 0 1.1 302 4.14 54 7394.8 113.52 88 221 10.6 3
>
> dim(pbc)
[1] 418 20
> dim(pbcGAM)
[1] 2090 24

The original (pbc) dataset has been expanded to include the pseudo-observations at the nodes of the Lobatto rule. There are multiple records (5 per individual in this particular case) as can be seen by examining the data for the first patient (id=1). The corresponding times are found in the variable stop, their associated weights in the variable gam.dur and the event indicators are in the column gam.ev. Note that nodes and weights are expressed on the scale of the survival dataset, not in the scale of the Lobatto rule ($[-1,1]$). To fit the survival dataset one needs to load the mgcv package and fit a Poisson GAM, using a flexible (penalized spline) for the log-hazard rate function.

The following code will obtain an adjusted (for age and sex) hazard ratio using the PGAM or the Cox model:


library(survival) ## for coxph
> library(mgcv) ## for mgcv
>
> ## Prop Hazards Modeling with PGAM
> fGAM<-gam(gam.ev~s(stop,bs="cr")+trt+age+sex+offset(log(gam.dur)),
+ data=pbcGAM,family="poisson",scale=1,method="REML")
>
> ## Your Cox Model here
> f<-coxph(Surv(time,status)~trt+age+sex,data=pbc)
>
>
> summary(fGAM)

Family: poisson

Formula:
gam.ev ~ s(stop, bs = "cr") + trt + age + sex + offset(log(gam.dur))

Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -10.345236 0.655176 -15.790 < 2e-16 ***
trt 0.069546 0.181779 0.383 0.702
age 0.038488 0.008968 4.292 1.77e-05 ***
sexf -0.370260 0.237726 -1.558 0.119
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(stop) 1.008 1.015 4.186 0.0417 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) = -0.249 Deviance explained = 2.25%
-REML = 693.66 Scale est. = 1 n = 1560
> f
Call:
coxph(formula = Surv(time, status) ~ trt + age + sex, data = pbc)

coef exp(coef) se(coef) z p
trt 0.0626 1.065 0.182 0.344 7.3e-01
age 0.0388 1.040 0.009 4.316 1.6e-05
sexf -0.3377 0.713 0.239 -1.414 1.6e-01

Likelihood ratio test=22.5 on 3 df, p=5.05e-05 n= 312, number of events= 125
(106 observations deleted due to missingness)



The estimates for log-hazard ratio of the three covariates (trt, age, and female gender) are numerically very close. Any numerical differences reflect the different assumptions made about the baseline hazard: flexible spline (PGAM) v.s. piecewise exponential (Cox).

Increasing the number of nodes of the Lobatto rule does not materially affect the estimates of the PGAM:


GL<-GaussLobatto(10)
> pbcGAM2<-GAMSurvdataset(GL,pbc,2,3)
> fGAM2<-gam(gam.ev~s(stop,bs="cr")+trt+age+sex+offset(log(gam.dur)),
+ data=pbcGAM2,family="poisson",scale=1,method="REML")
>
> summary(fGAM2)

Family: poisson

Formula:
gam.ev ~ s(stop, bs = "cr") + trt + age + sex + offset(log(gam.dur))

Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -10.345288 0.655177 -15.790 < 2e-16 ***
trt 0.069553 0.181780 0.383 0.702
age 0.038487 0.008968 4.292 1.77e-05 ***
sexf -0.370340 0.237723 -1.558 0.119
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(stop) 1.003 1.005 4.163 0.0416 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) = -0.124 Deviance explained = 1.7%
-REML = 881.67 Scale est. = 1 n = 3120



Nevertheless, the estimates of the “baseline log-hazard” become more accurate (decreased standard errors and significance of the smooth term) as the number of nodes increases.

In simulations (see Fig 3) we show that the estimates of the hazard ratio generated by the GAM are comparable in bias, variance and coverage to those obtained by the Cox model. Even though this is an importance benchmark for the proposed method, it does not provide a compelling reason for replacing  the Cox model with the PGAM. In fact, the advantages of the PGAM will only become apparent once we consider contexts which depend on the baseline hazard function, or problems in which the proportionality of hazards assumption is violated. So stay tuned.

## Survival Analysis With Generalized Additive Models : Part I (background and rationale)

May 1, 2015

After a really long break, I’d will resume my blogging activity. It is actually a full circle for me, since one of the first posts that kick started this blog, matured enough to be published in a peer-reviewed journal last week. In the next few posts I will use the R code included to demonstrate the survival fitting capabilities of Generalized Additive Models (GAMs) in real world datasets. The first post in this series will summarize the background, rationale and expected benefits to be realized by adopting GAMs from survival analysis.

In a nutshell, the basic ideas of the GAM approach to survival analysis are the following:

1. One approximates the integral defining the survival function as a discrete sum using a quadrature rule
2. One evaluates the likelihood at the nodes of the aforementioned quadrature rule
3. A regression model is postulated for the log-hazard rate function
4. As a result of 1-3 the survival regression problem is transformed into a Poisson regression one
5. If penalized regression is used to fit the regression model, then GAM fitting software may be used for survival analysis

Ideas along the lines 1-4 have been re-surfacing in the literature ever since the Proportional Hazards Model was described. The mathematical derivations justifying Steps 1-4 are straightforward to follow and are detailed in the PLoS paper. The corresponding derivations for the Cox model are also described in a previous post.

Developments such as 1-4 were important in the first 10 years of the Cox model, since there were no off-the-shelf implementations of the partial (profile) likelihood approach. This limited the practical scope of proportional hazards modeling and set off a parallel computational line of research in how one could use other statistical software libraries to fit the Cox model.  In fact, the first known to the author application of a proportional model for the analysis of a National Institute of Health (NIH) randomized controlled trial used a software implementing a Poisson regression to calculate the hazard ratio. The trial was the NCDS trial that examined adequacy indices for the hemodialysis prescription (the description of the software was published 6 months prior to the clinical paper).  Many of these efforts were computationally demanding and died off as the Cox model was implemented in the various statistical packages after the late 80s and semi-parametric theory took off and provide a post-hoc justification for many of the nuances implicit in the Cox model.  Nevertheless, one can argue that in the era of the modern computer, no one really needs the Cox model. This technical report and the author’s work on a real world, complex dataset provides the personal background for my research on GAM approaches for survival data.

The GAM (or Poisson GAM, PGAM as called in the paper) is an extension of these old ideas (see the literature survey here and here). In particular, PGAM models the quantities that are modeled semi-parametrically (e.g. the baseline hazard) in the Cox model with parametric, flexible functions that are estimated by penalized regressio. One of the first applications of penalized regression for survival analysis is the Fine and Gray spline model, which is however not a PGAM model.  There are specific benefits to be realized from penalizing the Poisson regression and adopting GAMs  in the context of survival analysis:

• Parsimony: degrees of freedom are not wasted as penalization will seek the most parsimonious representation (fewer degrees of freedom) among the many possible functions that may fit the data
• Reduction of the analyst-degrees-of freedom: the shape of the functional relationships between survival and outcome are learned from the data. This limits the potential of someone putting a specific form for this relationship (e.g. linear v.s. quadratic) and running away with the most convenient p-value
• Multiple time scale modelling: one can model more than one time scales in a dataset (i.e. not just the study time). This is useful when adjusting for secular trends in an observational dataset or even in a randomized trial. In particular cluster randomized trials at the community level may be immune to secular trends
• Non-proportional hazards modeling: when the hazards are not proportional, the Cox model is not applicable. Many oncology datasets will demonstrate a deviation from proportionality (in fact we re-analyzed such a trial in the PLoS paper) . For a particular dataset, one would like to know whether the proportionality assumption is violated, and if so one would like to “adjust” for it. Such an adjustment will take the form of a time-varying hazard ratio function and these may be estimated with the PGAMs. In such a case, one can even extract an “average” hazard ratio while still estimating a time-varying deviation around it using the PGAM. However non-proportionality should shift the analyst to:
• Alternative measures of treatment effect: These may include relative risks, absolute risk differences of even differences in the (Restricted) Mean Survival Time. Such measures are calculated from the time varying hazard ratios using statistical simulation techniques
• Handling of correlated outcomes: Correlated outcomes may arise from center effects, multiple events in the same individual or even cluster randomization designs. The analysis of such outcomes is facilitated by the interpretation of the PGAM as a generalized linear mixed model and the introduction of the corresponding random effects and their correlation structure into the regression
• Center effects: A variety of modeling options are available including stratified hazards, fixed or random effects
• Subgroup analyses
• Time varying external covariates
• Unconditional/Conditional/Population averaged effects: The unconditional estimate is obtained by indexing the individuals with the group they are assigned to (e.g. placebo or drug in an interventional trial). The conditional estimate is obtained by introducing covariates (e.g. gender, age) into the regression to calculate effects for individuals that share these characteristics. The population effect averages over the conditional effects over all the individuals in the trial. In epidemiology it is known as the corrected group prognosis method. This was introduced in a JAMA paper almost 15 years ago as a way to generate adjusted survival probability curves
• Handling of right censored/left truncated/uncensored data

These benefits follow directly from the  mixed model equivalence between semi-parametric, penalized regression and Generalized Mixed Linear Models. An excellent, survey may be found here, while Simon Wood’s book in the GAM implementation of the mgcv package in R contains a concise presentation of these ideas.

As it stands the method presented has no software  implementation similar to the survival package in R. Even though we provide R code to run the examples in the paper, the need for the various functions may not be entirely clear. Hence the next series of posts will go over the code and the steps required to fit the PGAM using the R programming language.

## The Hazards Of Hazard Ratios

May 18, 2014

Very nice overview of the hazard of using hazard ratios in observational studies and randomized trials.
Must read by all clinical epidemiologists.

http://thestatsgeek.com/2014/03/28/interpreting-changes-in-hazard-and-hazard-ratios/

Hernan’s paper Hazards Of Hazard Ratios makes for an interesting companion read

## Your astrological sign can maim you, kill you or make you insane (or not?)

April 24, 2014

Can day of birth (as in here, here too and there) or  astrological sign affect later outcomes?

These are essentially the same question – astrological sign is determined by  the day of one’s birth but our reaction to the question is different. Asked in the first way, this is a legit question about an Epidemiological signal known since antiquity and is officially known as “season of birth effect” (SBE). (Historical trivia: the first use of this term may be found in Ellsworth’s Huntington book: Season of Birth: Its Relation to Human Abilities published in 1938). The SBE appears to be relevant for many diseases, outcomes and human populations and countries. Since we westerners are more likely to bite the bullet from cardiovascular disease or cancer, rest assured that the association has been noted for the former and the latter (though not everyone agrees)

Asked in the second way, the same question can immediately bring ridicule and shame on the scientifically oriented person who dares to ask it. In fact there is a small cottage industry of papers that use astrological sign in order to caution/warn/ridicule/discredit subgroup analyses in clinical datasets. The most famous of this is a subgroup analysis of the ISIS-2 but there are certainly others.

So what makes us think that the Season of Birth Effect is real and worthy of study across disciplines, while the astrological sign association is bogus and should be discarded any time it is detected? The answer may be found in the framing effect, a form of cognitive bias in which our reaction and even answer to the same question critically depends on the way it is framed. Though the framing effect is particularly relevant in sociology, it is also operational in medicine particularly affecting the choices patients make depending on the format of presentation of the same information.

So if framing effects are also operational in scientific discourse (at least in the medical field) how do we deal with them? The first way is to acknowledge their potential presence so as to minimize their impact by appropriate use of neutral terminology and avoid sensationalism when describing associations and effects. As the season-of-birth effect example shows one can present the same information to evoke sharply different responses, so one should always be on the look for emotions when reading a story in either the scientific or the lay press.

So to come back to the initial question: does one’s day of birth or astrological sign affect later outcomes? The associations seem to be there and are relatively consistent and reproducible so some phenomenon must be at play. Whether this phenomenon is real or a statistical analysis artifact (e.g. could age-period-cohort effects and the troubled history of the 20th century be at play here) is something I don’t know. But I would certainly be willing to find more about it, irrespective of whether the information is presented by month of birth or astrological sign 🙂

## Where has evidence based medicine taken us in 20 years?

January 10, 2014

This one of the best appraisals of evidence based medicine(EBM):

http://www.kevinmd.com/blog/2013/06/evidence-based-medicine-20-years.html

It highlights one important pitfall of EBM, ie its failure to integrate scientific(biological) plausibility in the pyramid of evidence.

I think the best way to effect a synthesis between evidence from clinical data and biological plausibility is through a Bayesian argument:

– Decide on a standard EBM approach to synthesize the clinical data regarding a hypothesis from multiple sources eg a meta-analysis
– Encode the non-clinical, background knowledge into a prior. This prior encodes the scientific plausibility of the hypothesis under question

A Bayesian analysis of the clinical data under a scientific plausibility prior provides a transparent way to leverage the pyramid evidence of EBM while providing a basic science/disease understanding context for the interpretation of clinical data.

## Of means, standard deviations and blood pressure trials

December 26, 2013

Last week, the long awaited Joint National Commission (JNC) released their 8th revision of their hypertension management guidelines (more than 10 years after the 7th version!). JNC 8 will join the plethora of guidelines (ASH/ISH, ESH, KDIGO and the ACC/AC advisory that will be upgraded to full guideline set later on) that may be used (or not) to justify (or defend) treating (or not treating) patients with elevated blood pressure (BP) down to pre-approved, committee blessed targets. Though it is not my intention to go into the guidelines (since opinions are like GI tracts: everyone has one, whereas biases are like kidneys: everyone has at least one and usually two), it is interesting to seize this opportunity and revisit one of the largest blood pressure trials that is used to defend certain aspects of the guidelines recommendations.

The Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial (ALLHAT) is one of the largest hypertension trials and was all the rage 12 years ago when failing to meet its primary endpoint i.e. the superiority of various first line antihypertensive agents, it reported differences in various secondary outcomes (including blood pressure control) favoring the oldest (chlorothalidone) v.s. the newest (lisinopril/amlodipine) medication. Since 2002 all agents used in ALLHAT have gone generic (and thus have the same direct costs), diluting the economic rationale for choosing one therapy v.s. the other. Nevertheless the literature reverbated for years (2003 , 2007 v.s. 2009) with intense debates about what ALLHAT showed (or didn’t show). In particular the question of the blood pressure differences observed among the treatment arms was invoked as one of the explanations for the diverging secondary outcomes in ALLHAT. In its simplest form this argument says that the differences of 0.8 (amlodipine)/2 (lisinopril) mmHg of systolic BP over chlorothalidone may account for some of the differences in outcomes.

I was exposed to this argument when I was an internal medicine resident (more than a decade ago) and it did not really make a lot of sense to me: could I be raising the risk for cardiovascular disease, stroke, and heart failure by 10-19% of the patient I was about to see in my next clinic appointment by not going the extra mile to bring his blood pressure by 2 mmHg? Was my medical license, like the 00 in 007’s designation,  a license to kill (rather than heal)? Even after others explained that the figures reported in the paper referred to the difference in the average blood pressure between the treatment arms, not the difference of blood pressure within each patient I could not get to my self to understand the argument. So more than a decade after the publication of ALLHAT I revisited the paper and tried to infer what the actual blood pressures may have had been (no access to individual blood pressure data) during the five years of the study. The relevant information may be found in Figure 3 (with a hidden bonus in Figure 1) of the JAMA paper which are reproduced in a slightly different form below (for copyright reasons):

The figure shows the Systolic (SBP) and Diastolic (DBP) pressures of the participants who continued to be followed up during the ALLHAT study; unsurprisingly for a randomized study there were almost no differences before the study (T=0). Differences did emerge during follow-up and in fact the mean/average (“dot”) in the graph was lower for chlorothalidone v.s. both amlodipine and lisinopril during the study. However, the standard deviation (the dispersion of the individuals receiving the study drugs around the mean: shown as the height of vertical line) was also larger (by about 17%) for lisinopril suggesting that there both more people with higher and lower blood pressures compared to chlorothalidone.

This pattern is also evident if one bothers to take a look at the flowchart of the study (Figure 1) which lists patterns/reasons for discontinuation during the 5th year. Among the many reasons, “Blood Pressure Elevation” and “Blood Pressure Too Low” are listed:

 Chlorothalidone Amlodipine Lisinopril N (total) 8083 4821 5004 BP High 84 38 131 BP Too Low 91 51 76 Other reasons for discontinuation 1698 963 1192

A garden variety logistic regression for the odds of discontinuation shows there is no difference between amlodipine and chlorothalidone due to high (p=0.16) or low (p=0.74) BP. On the other hand, the comparison betweeen lisinopril and chlorothalidone is more interesting:

• Odds of discontinuation due to low BP: 1.44 95%CI: 1.00-2.06 (p=0.044)
• Odds of discontinuation due to high BP: 3.38 95%CI: 2.35-4.87 (p<0.001)

So despite the higher mean, the higher standard deviation implies that there more patients with low (and too low) BP among the recipients of lisinopril. In other words, blood pressure control was more variable with lisinopril compared to the other two drugs: this variability can be quantified by using the reported means/standard deviations to look at the cumulative percentage of patients with BP below a given cutoff (for simplicity we base the calculations on the year 3 data):

So it appears that lisinopril (at least as used in ALLHAT) was able to control more patients to < 120/70 (which are low levels based on the current guidelines), but fewer patients at the higher end of the BP spectrum. The clinical translation of this observation is that there will be patients who will be ideal candidates for lisinopril (maybe to the point where dose reduction is necessary) and others who will fail to respond, so that individualization of therapy, rather than one size fits all is warranted. Such individualization may be achieved either on the basis of short shared physician/patient decision making, n of 1 trials, biomarker levels (e.g. home blood pressure measurements) or demographic profiling (as is done in JNC8 for African American patients).

Notwithstanding these comments, one is left scratching one’s head with the following questions:

• who were the patients with an exaggerated and dampened out response to lisinopril in ALLHAT
• could the variability in BP control provide a much better explanation for the variability in secondary outcomes in ALLHAT? (the investigators did apply what is known as a time-updated analysis using the observed BPs during the trial, but this is not the statistically proper way to analyze this effect in the presence of loss-to-follow up and possibly informative censoring)
• what are the clinical implications of lowering BP to a given level when this is done with different classes of agents? This question is related to the previous one and both are not unaswerable with time-updated models of endogenous (such as BP readings) variables

At a more abstract level, should be scrutinize paper tables for the means as well as the standard deviations of response variables looking for hidden patterns that may not evident at a first look? In the clinic one is impressed with the variability of the patient responses to interventions, yet this variability is passed over when analyzing, reporting and discussing trial results in which we only look at the means. To me this is seems a rather deep inconsistency between our behaviours and discourse with our Clinician v.s. our Evidence Basis Medicine hats on, which may even decrease the chances of finding efficacious and effective treatments. Last but certainly not least, how can be begin to acknowledge variability in trial design, execution, analysis and reporting so as to better reflect what actually happens in the physical world, rather than the ivory towers of our statistical simulators?

## Even with non-informative priors, the Bayesian approach is preferable to frequentist linear regression

December 2, 2013

A unique aspect of the Bayesian approach is that it allows one to integrate previous knowledge (“prior beliefs“) with current data (“likelihood“). Yet even in those cases in which non-informative priors are used (as in here and here) , the Bayesian approach is preferable due to its conservative nature. Read the rest of this entry »

## The little non-informative prior that could (be informative)

November 26, 2013

Christian Robert reviewed on line a paper that was critical of non-informative priors. Among the points that were discussed by him and other contributors (e.g. Keith O’Rourke), was the issue of induced priors, i.e. priors which arise from a transformation of original parameters, or of observables. I found this exchange interesting because I did something similar when revisiting an old research project that had been collecting digital dust in my hard disk. The specific problem had to do with analysis of a biomarker that was measured with a qualitative technique yielding a binary classification of measurements as present or absent, in two experimental conditions (call them A and B). Ignoring some technical aspects of the study design, the goal was to calculate the odds ratio of the biomarker being expressed in condition B v.s A (the reference state signifying absence of disease).

When writing the programs for the analysis, I defaulted to the N(0.0,1.0E-6) prior that epitomizes non-informativeness in BUGS. However, one of my co-authors asked the “What the @#\$%& does this prior mean?” question. And then we stopped … and reflected on what we were about to do. You see, before the experiment started we had absolutely no prior information about the behaviour of the biomarker in either experimental state so that we did not want to commit one way or another. In other words, Laplace’s original uniform (or Beta(1,1)) prior would have been reasonable if the expression data for  A and B were to be analyzed separately. However, we wanted to analyze the data with a logistic regression model, so was the ubiquitous N(0.0,1.0E-6) the prior we were after?

The answer is a loud NO! According to Wikipedia, the mother of all knowledge, the logistic transformation of a uniform variate is the logistic distribution with location of zero and scale of 1. Hence, the prior on the intercept of the logistic regression (interpretable as the odds of the biomarker being expressed in state A) had to be a Logistic(0,1).

Surprisingly the Odds Ratio of B v.s. A was found (after trial and error and method of moments considerations) to be very well approximated by a 1:1 mixture of a logistic and a Gaussian which clearly departs from the N(0.0,1.0-6) prior we (almost) used:

Bottom line: Even informative (in the BUGS sense!) priors can be pretty non-informative in some intuitively appropriate parameterization. Conversely, one could start with a non-informative prior in a parameterization that is easier to reason about and look for an induced prior (using analytic considerations or even simulations) to convert it to a parameterization that is more appropriate to the analytic plan at hand.

(R code for the plots and simulations is given below)

## approximating uniforms
logit<-function(x) log(x/(1-x))
set.seed(1234)
N<-10000000
s<-runif(N,0,1);
s2<-runif(N,0,1);
y<-logit(s)
y2<-logit(s2)
m<-mean(y)
s<-sd(y)
x<-seq(-10,10,.1)
## logistic is logit of a uniform
hist(y,prob=TRUE,breaks=50,main="intercept",
xlab="logit(A)")
lines(x,dnorm(x,m,s),col="red")
lines(x,dlogis(x,0,1),col="blue")
legend(-15,0.20,legend=c("Normal(0,1)",
"Logistic(0,1)"),lty=1,col=c("blue","red") )

## approximating the difference of two uniforms
hist(y-y2,prob=TRUE,ylim=c(0,.25),breaks=200,
xlim=c(-10,10),main="OR between two U(0,1)",
xlab="logit(B)-logit(A)")
## logistic approximation
lines(x,dlogis(x,0,sqrt(2)),col="blue",lwd=2)
## normal
lines(x,dnorm(x,0,(pi)*sqrt(2/3)),col="red",lwd=2)
## mixture of a logistic and a normal approximation
lines(x,0.5*(dlogis(x,0,sqrt(2))+
dnorm(x,0,(pi)*sqrt(2/3))),col="green",lwd=2)
## legends
NL<-expression(paste("Normal(0,",pi*sqrt(2/3),")"))
LL<-expression(paste("Logistic(0,",sqrt(2),")"))
ML<-expression(paste("0.5 Normal(0,",pi*sqrt(2/3),")+0.5 Logistic(0,",sqrt(2),")"))
legend(-6.5,0.25,legend=c(NL,LL,ML),
lty=1,col=c("blue","red","green") )

## does it extend to more general cases?
m1<--2;m2<-2;s1<-1;s2<-2.5;
l1<-rlogis(N,m1,s1)
l2<-rlogis(N,m2,s2)
d<-l1-l2
hist(d,prob=TRUE,ylim=c(0,0.25),breaks=200)
plot(density(d))
lines(x,dlogis(x,m1-m2,sqrt(s1^2+s2^2)),col="green",lwd=2)
lines(x,dnorm(x,m1-m2,pi*sqrt((s1^2+s2^2)/3)),col="red",lwd=2)
lines(x,0.5*(dnorm(x,m1-m2,pi*sqrt((s1^2+s2^2)/3))+
dlogis(x,m1-m2,sqrt(s1^2+s2^2))),col="blue",lwd=2)


Edit (29/11/2013):
Updated the first image due to an accidental reversal of the distribution labels