In the image and per the code at the bottom of this post, I plot survival curves for the lung dataset from the survival package using a fitted exponential distribution model (plot red line), using the K-M nonparametric model (plot blue lines), and run/show simulations using the exponential model (plot light blue lines) with the mean of the simulations shown as the black line. Exponential doesn't provide the best fit for lung data but I'm trying to better understand modeling with exponential and extreme value distributions.
The model fit takes the form $log(T)$∼$β_0 + W$ where $β_0$ is the fit$coef and $W$ represents a standard minimum extreme value distribution (exponential in this case). In the example presented herein I only model $W$; in related posts I address the modeling of $β_0$ such as in How to appropriately model the uncertainty of the exponential distribution model when running survival simulations?. The simulations herein are, for a change, from the perspective of individuals, modeled by generating random intercepts for the exponential distribution in the line of code simPaths <- sapply(1:simNbr,function(i) 1-pexp(time,rate=1/rexp(1,rate = 1/exp(fit$icoef)))).
My question is why does the mean of the simulations shown in the black line differ so widely differ from the fitted base exponential model shown in the red line? When I simulated only the $β_0$ uncertainty in related posts, the simulations formed a band around the fitted base distribution similar in width more or less to the 95% CI around the Kaplan-Meier curve (the dashed blue lines). What am I doing wrong?
Code:
library(survival)
simNbr <- 25
time <- seq(0, 1000, by = 1)
fit <- survreg(Surv(time, status) ~ 1, data = lung, dist = "exponential")
Compute exponential survival function for the base fitted model
survival <- 1 - pexp(time, rate = 1/exp(fit$coef))
Compute the survival curves for each simulation
simPaths <- sapply(1:simNbr,function(i) 1-pexp(time,rate=1/rexp(1,rate = 1/exp(fit$icoef))))
plot(time,survival,type="n",xlab="Time",ylab="Survival Probability",main="Lung Data Survival Plot")
Plot simulations
plotSims <- data.frame(time = time,
do.call(cbind,
lapply(1:simNbr,function(i) {
lines(time, simPaths[, i], col = "lightblue", lty = "solid", lwd = 0.25)
return(curve)
}
)
)
)
Add average of simulations
avgSurvival <- apply(simPaths, 1, mean)
lines(time, avgSurvival, col = "black", lwd = 3)
Add Kaplan-Meier survival curve for the lung data
lines(survfit(Surv(time, status) ~ 1, data = lung), col = "blue", lwd = 1)
Plot the base fitted survival curve using exponential
lines(cbind(time, survival), type = "l", col = "red", lwd = 3)
legend("topright",
legend = c("Fitted exponential model","K-M & confidence intervals","Simulations", "Simulation mean"),
col = c("red", "blue", "lightblue", "black"),lwd = c(3, 1, 0.25, 3),lty = c(1, 1, 1, 1),
bty = "n")


simPathscode line; it's not sampling from the normal distribution offit$coefso far as I can see. – EdM May 17 '23 at 12:44rexp(300, rate=1/exp(6.044474)). Or maybe I did not misinterpret; what do you think? In any case I will try SO in case this is a coding problem (they'll probably kick me right back here to CV). – Village.Idyot May 17 '23 at 14:50exp(b0+b1*x1+b2*x2+c*error)directly follows the formula notation. – Village.Idyot May 17 '23 at 15:11survreg(Surv(time, status==1)...dist="exponential")is used. Why is "exponential" used, andrexp()used in that post, when the question is about simulating a Weibull model? Further, would the same references to "exponential" andrexp()be used when simulating $W$ for not only Weibull but other distributions too such as gamma and lognormal? – Village.Idyot May 19 '23 at 12:32pgamma()function to sample directly from a gamma-distributed model. Study those notes carefully and you will come to understand this much better. – EdM May 19 '23 at 12:56