## Optimizing Survival Likelihoods With Poisson Models for Rate and Exposure

In a previous post I mentioned that Poisson models can be used to carry out survival analysis tasks e.g. estimation of survival curves or even relative risk modelling. Yet, I never showed how this can be done. So I will close the gap today and highlight how this works from a purely mathematical vantage point.

Let a a continuous density $f(t)$ defined over the interval $\left[0,\infty\right)$, generating lifetimes $T$. The corresponding distribution $F(t)=P(T\leq t)=\int_{0}^{t}f(x)\mathrm{\, d}x$ and survivor functions $S(t)=P(T\ge t)=1-F(t)$ are related to the hazard $h(t)$ and cumulative hazard functions $H(t)$ by the following relationships:

$h(t) = -\frac{d}{dt}\log S(x)=\frac{f(x)}{S(t)}$

$H(t) = \int_{0}^{t}h(x)\, dx$

$S(t) = \exp\left(-H(t)\right)$

Consider a set of sorted $M$ individual observations at times $\mathcal{F}=\left\{ F_{i}\right\} _{i=1}^{M}$, with censoring indicators $\mathcal{D}=\left\{ \delta_{i}\right\} _{i=1}^{M}$ assuming the value of 0 if the corresponding observation was censored and 1 otherwise. The set $\mathcal{F}$ is thus the union of the sets of failure $\mathcal{T}=\left\{ \mathcal{F}_{i}:\delta_{i}=1\right\}$ and censoring times $\mathcal{C}=\left\{ \mathcal{F}_{i}:\delta_{i}=0\right\}$. Under the assumption of non-informative censoring, the likelihood of the sample is given by:

$L=\prod_{i=1}^{M}f(F_{i})^{\delta_{i}}S(F_{i})^{1-\delta_{i}}$

Substituting the survival function into the likelihood equation yields :

$L=\prod_{i=1}^{M}h(F_{i})^{\delta_{i}}\exp\left(-\int_{0}^{F_{i}}h(t)\mathrm{\, d}t\right)$

Standard calculus allow us to write each of the integrals appearing inside the exponential function as a discrete sum of integrals between successive elements of the set $\mathcal{F}$ i.e.

$\int_{0}^{F_{i}}h(t)\mathrm{\, d}t = \sum_{j=-1}^{i}\int_{F_{j}}^{F_{j+1}}h(t)\mathrm{\, d}t$

where we have defined $F_{0}=0$. Assuming that the hazard function $h(t)$ is at least piecewise continuous, the mean value theorem allow us to write each of the integrals appearing inside the sum as:

$\int_{F_{j}}^{F_{j+1}}h(t)\mathrm{\, d}t = h(u_{j})(F_{j+1}-F_{j}) \qquad F_{j} < u_{j} < F_{j+1}$

Substituting back into the likelihood we obtain the following expression:

$L=\prod_{i=1}^{M}h(F_{i})^{\delta_{i}}\exp\left(\sum_{j=-1}^{i}h(u_{j})(F_{j+1}-F_{j})\right)$

Hence, estimation of the likelihood of the sample has led us to consider the value of the hazard function at the distinct set of (known) $\mathcal{F}$ as well as the unknown points $\mathcal{U}=\left\{ u_{i}\right\} _{i=1}^{M}$.

Before further progress can be made, we need to reflect on the goals of this exercise, i.e. the derivation of the likelihood of a given set of lifetimes in a distribution-free framework. As there are no free lunches in the universe, one cannot seriously expect that the phrase distribution free implies that we can proceed in an assumption free manner. Rather, our distribution-free endeavour has led us to consider a problem in which we are dealing with a much larger number of quantities to be estimated namely, the set $\mathcal{U}$ and the value of the hazard function at the elements of both $\mathcal{U}$ and $\mathcal{F}$.

Nevertheless, we can definitely make some additional progress by considering the simplest class of piecewise continuous hazard functions i.e. the piecewise constant ones:

$h(u) = h_{i} \quad \forall u \in (F_{i-1},F_{i}]$

Substituting back into the likelihood function we obtain the following (can’t really make up my mind, whether this is the beauty or the beast!):

$L=\prod_{i=1}^{M}h(F_{i})^{\delta_{i}}\exp\left(\sum_{j=-1}^{i}h(F_{j})(F_{j+1}-F_{j})\right)=$

$\quad \prod_{i=1}^{M}h(F_{i})^{\delta_{i}}\exp\left(N_{i}h(F_{i})(F_{i}-F_{i-1})\right)$

where $N_{i}$ is the number of individuals at risk i.e. alive and not censored just before time $F_{i}$.

This likelihood can be recognized as the kernel of a Poisson model for rates $h(F_{i})$ under variable exposures (the length of the intervals $F_{j+1}-F_{j}$), so that optimization of the Poisson likelihood is equivalent to optimizing the survival likelihood.
Stated in other words, the maximum likelihood estimates of the parameters $h(t_{i,j})$ in the Poisson model, are also the maximum likelihood estimates of the hazard function at the same time points.

I will stop my post here but ask the reader to consider the relation of the (parametric) Poisson based estimates of the hazard function to other, semi-parametric estimates found in the textbooks.