If you didn’t already know, one of my favorite models for modeling time to event data is the piecewise exponential model for hazard functions. These functions assume that there exists a partition of the time scale s_0=0<s_1<...<s_J<s_{J+1}=\max(Y)  and that on each interval (s_{j-1},s_{j}) the hazard is a constant value \lambda_j for $j=1,…,J+1$. The most famous use of this in Frequentist literature is the Cox Proportional Hazards model, which determines the locations and heights s_0=0<s_1<...<s_J<s_{J+1}=\max(Y) and \lambda_j by maximizing the partial likelihood. If you fix s_0=0<s_1<...<s_J<s_{J+1}=\max(Y), the maximum likelihood estimate of \lambda_j is similar to the maximum likelihood estimator of the exponential distribution with censoring. Namely, it is \hat{\lambda_j}=\frac{m}{\sum \limits_{Y_i>s_{j-1}} (Y_i-s_{j-1})} where Y_i are the failure or censoring times of each observation. However, as previously mentioned, this requires fixing \mathbf{s} to some reasonable partition that has enough data to approximate the hazard in each interval and adequately models the true shape of the hazard.

Another issue is how do we decide what value J should be? We could set J, \mathbf{s}, \mathbf{\lambda} to maximize the likelihood by trying out many different models. We could also allow J to vary using reversible jump MCMC as I have before. However, one issue with this approach is when there is little data or the data is not separated too much in time, the reversible jump tends to pick out one or two pieces in the hazard. When you look at actual fits to the data, these provide terrible approximations to hazards and just pick out the means. So for this reason, if we are trying to get a good fit for a hazard, we should fix the number of split points to some reasonable number and let the locations and heights on each interval to vary. This number has to be a compromise between adequately modeling the shape of the hazard and ensuring that there is enough data in each interval to get a proper fit.  My adviser always says that 3-5 pieces can accomplish this and in fact they do quite a good job. If we have at least say 100 observations, using the .2 quantiles of the data is a good starting point for these split points, which places about 20 observations in each interval to start.

There are many different Bayesian models on how to model the hazard heights, but typically the following prior is assumed for the split point locations:

\pi(\mathbf{s}|J) \propto \frac{(2J+1)! \prod \limits_{j=1}^{J+1} (s_j-s_{j-1})}{s_{J+1}^{2J+1}}

which gives a low probability of having an interval with no events. In my practice, this doesn’t always help and we will sometimes get intervals with a few events (say 2-3) so I always ensure that each interval has at least 10 observations in it to ensure that the hazard heights are estimated well. Now let’s take a look at one example of prior heights for the \mathbf{\lambda} that I find to work rather well.

\log(\lambda_1) \sim N(0,5)  \hspace{3mm} \log(\lambda_{j}) \sim N(\lambda_{j-1},\sigma)

for $j=2,..,J+1$. Where \sigma^2 \sim IG(a,b) or \sigma \sim U[0,A] which both are conjugate for the sampler. The prior standard deviation \lambda_1 was chosen to be $5$ to encourage values of the hazard function to not be too extreme (near 0 or large) because this does not usually happen in practice. Anyways, I’ve gotten too verbose with this description, so let’s take a look at some hazard fits. Below we can see a few different fits of hazards and the corresponding survival function fit for 10,000 iterations with a 90% burnin and about 100 patients.

Let’s take a look at a few lognormal distributions:

Here are fits for two different weibull distributions:


And Survival Distribution fits:


As you can see, the piecewise exponential model does a pretty good job approximating both the hazard and the survival curves for some of the standard time to event distributions. Let’s see how well it does for a mixture distribution. Below are the fits for a 0.4*Weib(3,1)+0.6*Lognormal(-1,2).


The piecewise exponential model can even pick out multiple modes! Let’s see how this hazard approximation translates to survival approximation.


It’s able to pick out the shape pretty well, with the bend happening around 2 months.

So as you can see, the piecewise exponential model, if used correctly can give a good approximation to many different time to event hazards/survival curves. Even better is it’s easily interpretable!

If you enjoyed this post, take a look at some of my other Adventures in Statistics.

-Andrew G Chapple



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s