## Survival Analysis With Generalized Additive Models: Part V (stratified baseline hazards)

In the fifth part of this series we will examine the capabilities of Poisson GAMs to stratify the baseline hazard for survival analysis. In a stratified Cox model, the baseline hazard is not the same for all individuals in the study. Rather, it is assumed that the baseline hazard may differ between members of groups, even though it will be the same for members of the same group.

Stratification is one of the ways that one may address the violation of the proportionality assumption for a categorical covariate in the Cox model. The stratified Cox model resolves the overall hazard in the study as:

$h_{g}(t,X) = h_{0_{g}}(t)exp(\boldsymbol{x\beta}) ,\quad g=1,2,\dotsc ,g_{K}$

In the logarithmic scale, the multiplicative model for the stratified baseline hazard becomes an additive one. In particular, the specification of a different baseline hazard for the different levels of a factor amounts to specifying an interaction between the factor and the smooth baseline hazard in the PGAM.

We turn to the PBC dataset to provide an example of a stratified analysis with either the Cox model or the PGAM. In that dataset the covariate edema is a categorical variable assuming the values of 0 (no edema), 0.5 (untreated or successfully treated) and 1(edema despite treatment). An analysis of the Schoenfeld residual test shows that this covariate violates the proportionality assumption

> f<-coxph(Surv(time,status)~trt+age+sex+factor(edema),data=pbc)
> Schoen<-cox.zph(f)
> Schoen
rho chisq p
trt -0.089207 1.12e+00 0.2892
age -0.000198 4.72e-06 0.9983
sexf -0.075377 7.24e-01 0.3950
factor(edema)0.5 -0.202522 5.39e+00 0.0203
factor(edema)1 -0.132244 1.93e+00 0.1651
GLOBAL NA 8.31e+00 0.1400
> 

To fit a stratified GAM model, we should transform the dataset to include additional variables, one for each level of the edema covariate. To make the PGAM directly comparable to the stratified Cox model, we have to fit the former without an intercept term. This requires that we include additional dummy variables for any categorical covariates that we would to adjust our model for. In this particular case, the only other additional covariate is the female gender:

pbcGAM<-transform(pbcGAM,edema0=as.numeric(edema==0),
edema05=as.numeric(edema==0.5),edema1=as.numeric(edema==1),
sexf=as.numeric(sex=="f"))

Then the stratifed Cox and PGAM models are fit as:


fGAM<-gam(gam.ev~s(stop,bs="cr",by=edema0)+s(stop,bs="cr",by=edema05)+
s(stop,bs="cr",by=edema1)+trt+age+sexf+offset(log(gam.dur))-1,
data=pbcGAM,family="poisson",scale=1,method="REML")

fs<-coxph(Surv(time,status)~trt+age+sex+strata(edema),data=pbc)



In general the values of covariates of the stratified Cox and the PGAM models are similar with the exception of the trt variable. However the standard error of this variable estimated by either model is so large, that the estimates are statistically no different from zero, despite their difference in magnitude

> fs
Call:
coxph(formula = Surv(time, status) ~ trt + age + sex + strata(edema),
data = pbc)

coef exp(coef) se(coef) z p
trt 0.0336 1.034 0.18724 0.18 0.86000
age 0.0322 1.033 0.00923 3.49 0.00048
sexf -0.3067 0.736 0.24314 -1.26 0.21000

Likelihood ratio test=15.8 on 3 df, p=0.00126 n= 312, number of events= 125
(106 observations deleted due to missingness)
> summary(fGAM)

Family: poisson

Formula:
gam.ev ~ s(stop, bs = "cr", by = edema0) + s(stop, bs = "cr",
by = edema05) + s(stop, bs = "cr", by = edema1) + trt + age +
sexf + offset(log(gam.dur)) - 1

Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
trt 0.002396 0.187104 0.013 0.989782
age 0.033280 0.009170 3.629 0.000284 ***
sexf -0.297481 0.240578 -1.237 0.216262
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(stop):edema0 2.001 2.003 242.0 <2e-16 ***
s(stop):edema05 2.001 2.001 166.3 <2e-16 ***
s(stop):edema1 2.000 2.001 124.4 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) = -0.146 Deviance explained = -78.4%
REML score = 843.96 Scale est. = 1 n = 3120</pre>


### 7 Responses to “Survival Analysis With Generalized Additive Models: Part V (stratified baseline hazards)”

1. Ricardo Pietrobon Says:

very interesting post and article. would you be concerned about situations with overdispersion?

• Christos Argyropoulos Says:

Actually no. The Poissonisation of survival analysis models yields likelihoods whose scale is equal to one. Remember that this is just a computational trick to hijack the likelihood optimization capabilities of existing software.
If you are familiar with these areas, similar tricks are used in spline based density estimator and in the encoding of arbitrary densities within the Gibbs sampler (eg in WinBUGS)

2. Distilled News | Data Analytics & R Says:

[…] Survival Analysis With Generalized Additive Models: Part V (stratified baseline hazards) In the fifth part of this series we will examine the capabilities of Poisson GAMs to stratify the baseline hazard for survival analysis. In a stratified Cox model, the baseline hazard is not the same for all individuals in the study. Rather, it is assumed that the baseline hazard may differ between members of groups, even though it will be the same for members of the same group. […]

3. Gregor Passolt Says:

This has been a really interesting series to read. How much additional work do you think it would take to adapt this to a competing risks framework?

• Christos Argyropoulos Says:

I have not thought about the subdistribution approach, but the cause specific hazard competing risk model is already estimatable by the PGAM.
These tend to be fit as stratified Cox models any way with a separate baseline hazard for each cause of failure.
This is already possible with the PGAM.
Furthermore dependence between the risk for the various failure times could be introduced by an appropriate random effect structure (after all PGAMs are just Generalized Linear Mixed Model!).

Do you have a particular example in mind?

• Gregor Thomas Says:

I do! I’ve got data on children in foster care (competing risks of reunification, adoption, and guardianship). I’ve been using a stratified Cox model. I’ve got some time next week and I’m definitely interested in trying the PGAM approach!

• Christos Argyropoulos Says:

Feel free to contact me/email if you have any questions