Who needs the Cox model anyway ?

As provocative this title may read, it is hardly my creation and due credit should be given to Bendix Carstensen for giving numerous talks under this title and providing his teaching material and R source code for people to explore this point.

The title does neither imply that the Cox model is of no use at all, nor that one should abandon it even if only for the reason that the vast majority of biomedical literature makes use of it in one form of the other. Rather, it is meant to imply that one can put other more versatile regression models to the same uses and get out more out of the same data.

As people reading this are well aware, the Cox model is a semi-parametric model that allows flexibility in modelling the baseline hazard of a given survival function, while estimating differences from this baseline under a minimal parametric proportionality of hazards assumption. A number of approaches can be used to mathematically justify Cox modelling but it was not until the introduction of counting processes that the Cox and other semi-parametric approaches could be put in a rigorous mathematical basis. From my reading of the 1970s literature Cox himself did not offer a bona fide mathematical proof for the procedures in the Cox model and the commentary in his 1972 paper that introduced this procedure before the Royal Society is a must read for anyone interested in the model and the rigor of the scientific process.

What Bendix Carstensen shows, is that Cox model (see for example the following presentation(starting from slide 70) is that the Cox model is really a “life-table” on steroids procedure as many could have guessed from the title of the 1972 paper: Regression Models and Life-Tables.
By making this connection with the life table approach, one can accomplish the same goals (and possibly much more!) as the Cox model but within the parametric modelling framework of Poisson regression models.

There are a number of potential benefits of this adoption, many of which are pointed out by Carstensen:

• one is forced back to an Occam razor mode of thinking in which catchy phrases and the illusion of minimal assumptions are replaced by sensible modelling in terms of flexible parametric models
• as observation time is shown to be just another covariate, one’s thinking is liberated to consider multiple time scales and their interactions (the so called Age-Period-Cohort problems) in epidemiological settings. In my limited personal experience, such problems do not seem to be handled very well within the Cox model, especially in large datasets where one is forced to use what is termed “secular trend” terms in the linear predictor of the Cox model
• from a teaching perspective, one can (probably) forgo counting process expositions in explaining the “mechanics” of survival analysis regression approaches to audiences with minimal exposure to stochastic calculus (e.g. most MDs using statistical modelling in their own research). As both Poisson and Logistic regression (another model commonly used in biomedical research) are Generalized Linear Model (GLM), there exists some rather exciting opportunities in presenting both models from the same GLM framework to practitioners. Consider for example an introductory course in which students are advanced from Ordinary Linear Regression to Generalized Linear Models to survival modelling without having to switch frameworks. To the best of my knowledge, such a Biostats course does not exist but I have the gut feeling that it would do a much better job in distilling the needed concepts to its students.

In subsequent posts I will provide my own explorations of the Cox/Lifetable/Survival/Poisson GLM connection starting from what, I hope to be, an accurate account of the early (1970-1980’s) work exploring the same connection and some mathematical/software considerations concerning this approach.