I am experimenting Piece-Wise Exponential Model for survival data. What is the best way to fit the model? Is there an existing package?
3 Answers
You basically just need to transform the data to a suitable format. Then you can estimate the piece-wise constant baseline hazard using penalized splines. If you assume that the true, underlying hazard is smooth, you can improve the approximation by icreasing the number of intervals/reducing interval lengths (the cut argument below). More info (with R examples) on estimating the baseline hazard with PEMs/PAMMs can be found here: https://adibender.github.io/pammtools/articles/baseline.html
An example with R below:
## install the pammtools package for data transformation and example data
devtools::install_github("adibender/pammtools")
library(pammtools)
library(survival)
library(ggplot2)
theme_set(theme_bw())
library(mgcv)
# load example data
data(tumor)
## transform to piece-wise exponential data (PED)
ped_tumor <- as_ped(Surv(days, status) ~ ., data = tumor, cut = seq(0, 3000, by = 100))
# Fit the Piece-wise-exponential Additive Model (PAM) instead of PEM
# with piece-wise constant hazards, term s(tend), and constant covariate effects:
pam <- mgcv::gam(ped_status ~ s(tend) + age + sex + complications,
data = ped_tumor, family = poisson(), offset = offset)
summary(pam)
## plot baseline hazard:
ped_tumor %>%
make_newdata(tend=unique(tend), age = c(0), sex = c("male"), complications=c("no")) %>%
add_hazard(pam) %>%
ggplot(aes(x = tstart, y = hazard)) +
geom_stepribbon(aes(ymin = ci_lower, ymax = ci_upper), alpha = 0.2) +
geom_step()
- 527
-
I used your code to perform a similar analysis. I obtained hazard rate from add_hazard. How would you interpret hazard = 0.02 given that age = 20, sex = female, and tend = 100? I interpreted this as the expected number of deaths the female cohort at age 20 would experience death during the interval 0-100 days is 0.02 but that doesn't make much sense...Thank you. – Meo Sep 07 '18 at 21:55
-
@Meo
add_hazardreturns the hazard rate, which is the probability of having an event (given covariate specification) in interval 0-100, divided by the length of the interval. You usually interpret the hazard rate by comparison with hazard rates with other covariate specifications (e.g. males vs. femaies to see which covariate specification has higher hazard), for example by calculating the hazard ratio. – adibender Sep 09 '18 at 16:23 -
Hi adibender, I was trying to replicate the result for a paper where they plotted the hazard rates but using a different dataset. The confidence interval in my codes exceeded 1, which is why I am having a hard time explaining it in my report. Hence my question. Thank you so much for your reply and your R package- very helpful! – Meo Sep 10 '18 at 22:14
-
@Meo In general, the hazard rate ca be > 1, thus upper CI with values of > 1 can also occurr. Glad I could help. If you have other questions or want to collaborate on survival projects you can also drop me an email (see contact details on the package homepage) – adibender Sep 11 '18 at 06:26
in order to get a satisfying answer, you should provide some details about the context and problem for which you want to run a piecewise Exponential model. There are several packages which might address your problem and each of them has its own peculiarity. You may want to look at the CRAN Task View on Survival Analysis where you can have several references.
By far, the most know R package to run survival analysis is survival. This is a huge package which contains dozens of routines. As you pointed out in the comment, you can run a Cox proportional model through the function coxph(). There are caveats when you do this though. For instance, coxph() estimates a semi-parametric Cox Proportional Model as given by the following equation:
$$
\lambda(t) = \lambda_0(t) \exp\left[\boldsymbol\beta^T(t) \mathbf{z}(t)\right]
$$
where $\lambda_0(t)$ is the baseline hazard on which you don't take any assumptions. This is estimated non-parametrically then, while the $\exp\left[\boldsymbol\beta^T(t) \mathbf{z}(t)\right]$ is estimated through Maximum Likelihood so that it gives you the effects of every covariate you put into the model. If you want to model your data this way, then coxph() is the way to go.
survival also provides a function which estimates the same model, but assuming a fully-parametric schema. That is, you assume a probability distribution on $\lambda_0(t)$. This can be achieved through survreg().
Though, I suggest to use a more flexible package called flexsurv which extends what survival does in the context of parametric models applied to survival data.
All these packages treat any time-dependent covariate as a piece-wise constant function. This means that, you do not observe a change in the value of a given covariate except when the event of interest does occur.
Hope this helped.
Franz
There is no package. I have a code snippet, but essentially this is just an R question. You need to know where the cuts are applied to specific breakpoints, and censor subjects at those breakpoints, including a categorical effect for which cut they belong to. The approach is not unlike time-varying covariates in a Cox model.
An example here:
library(survival)
set.seed(123)
n <- 1000
x <- rbinom(n, 1, 0.5)
a <- -3
b <- .4
t <- rexp(n, rate = 1/exp(a + b * x))
i <- rep(1, n)
cuts <- c(0, quantile(t[i==1])[2:4], Inf)
nr <- rowSums(outer(t, cuts, >))
xl <- unlist(mapply(rep, x, nr))
tl <- unlist(sapply(1:n, function(j) pmin(cuts[1:nr[j] + 1], t[j])))
il <- unlist(sapply(1:n, function(j) c(rep(0, nr[j]-1), i[j])))
cl <- unlist(sapply(nr, seq))
f <- survreg(Surv(tl, il ) ~ xl + factor(cl), dist = 'exponential', robust=FALSE)
gives
Call:
survreg(formula = Surv(tl, il) ~ xl + factor(cl), dist = "exponential",
robust = FALSE)
Value Std. Error z p
(Intercept) -2.8614 0.0693 -41.31 < 2e-16
xl 0.2276 0.0635 3.58 0.00034
factor(cl)2 0.5816 0.0895 6.50 7.9e-11
factor(cl)3 0.8356 0.0895 9.34 < 2e-16
factor(cl)4 0.8431 0.0898 9.39 < 2e-16
Scale fixed at 1
Exponential distribution
Loglik(model)= 1184.2 Loglik(intercept only)= 1122.6
Chisq= 123.06 on 4 degrees of freedom, p= 1.2e-25
Number of Newton-Raphson Iterations: 5
n= 2500
where the x1 coefficient is the estimator of the b parameter and the (Intercept) coefficient estimates the a parameter. The subsequent coefficients, I believe, are interpreted as a difference in baseline incidence conditional on having survived up to a particular point. My interpretation may be wrong there, but the point is the results are effectively the same as a log linear model: glm(il ~ xl + factor(cl) + offset(log(tl)), family=poisson).
- 62,637
pchpackage on CRAN https://cran.r-project.org/web/packages/pch/pch.pdf – boscovich Sep 22 '16 at 18:52