This is a follow-on to post Correctly simulating an extreme value distribution for survival analysis?, as I work towards adaptation of that code to the Weibull distribution. In the below code I generate random numbers for $W$ for the Weibull model that takes the form $logT = α + σW$ where $α$ is the linear predictor and $W$ represents a standard minimum extreme value distribution. I generate random numbers for the regression intercept, but I also need reasonable log scale values to go with the randomly generated intercepts to feed into the Weibull survival formula. I assume the variance-covariance matrix (vcov(fit)) from the survreg() function has the necessary information for providing reasonable log scale values. Is the below code a reasonable way to generate the corresponding log scale values?
It makes a very simplistic assumption of linearity. There must be a better way to do this.
I wonder if it wouldn't be much easier to simply use MASS:mvrnorm() to generate both the intercepts and the corresponding log scale values, but unless I am mistaken my code below should generate a desired wider dispersion of outcomes via its use of log(rexp(...)) in the simFx function.
Code:
library(survival)
fit <- survreg(Surv(time, status) ~ 1, data = lung, dist = "weibull")
fitCoef <- fit$icoef[1] # extract intercept value from fit
vcov_mat <- vcov(fit)
Function to generate W value from extreme value distribution
simFx <- function(){
W <- log(rexp(50))
fitW <- survreg(Surv(exp(W))~1,dist="exponential")
params <- fitCoef + fitW$icoef
return(params)
}
r.intercept <- simFX() # assign random value to object
Calculate the corresponding random log scale value
r.log_scale <- fit$icoef[2]+sqrt(vcov_mat[2, 2])*(r.intercept-fit$icoef[1])/sqrt(vcov_mat[1,1])
print(paste("Random Intercept:", r.intercept))
print(paste("Random Log Scale:", r.log_scale))
EDIT 1:
Below is my attempt to simulate the uncertainty of $W$-only (no simulation of $β_0$ I don't think) representing a standard minimum extreme value distribution, for a Weibull model. Repeatedly click the last line of code to add simulation lines. Plot image below shows 10 simulations. Trying to keep the code as simple as possible! Running more iterations shows more dispersion than in my other posts where I simulate $β_0$-only). Also note the plot image (10 simulations) and code further below that the Weibull code below was adapted from, for the exponential model which I believe is correct.
Code for Weibull model:
library(survival)
time <- seq(0, 1000, by = 1)
fit <- survreg(Surv(time, status) ~ 1, data = lung, dist = "weibull")
fitCoef <- fit$icoef
weibCurve <- function(time, survregCoefs) {
exp(-(time/exp(survregCoefs[1]))^exp(-survregCoefs[2]))
}
survival <- weibCurve(time, fitCoef)
Generate random distribution parameter estimates for simulations
simFX <- function(){
W <- log(rexp(100))
fitW <- survreg(Surv(exp(W))~1,dist="weibull")
params <- fitCoef + fitW$icoef
return(weibCurve(time, params))
}
plot(time,survival,type="n",xlab="Time",ylab="Survival Probability",
main="Lung Survival (Weibull) by sampling from W extreme-value distribution")
lines(survival, type = "l", col = "red", lwd = 3) # plot base fitted survival curve
Click on the below to add simulation lines
lines(simFX(), col = "blue", lty = 2)
Now for exponential model ---
Code for exponential model:
library(survival)
time <- seq(0, 1000, by = 1)
fit <- survreg(Surv(time, status) ~ 1, data = lung, dist = "exponential")
fitCoef <- fit$icoef
survival <- 1 - pexp(time, rate = 1/exp(fitCoef))
Generate random distribution parameter estimates for simulations
simFX <- function(){
W <- log(rexp(50))
fitW <- survreg(Surv(exp(W))~1,dist="exponential")
params <- fitCoef + fitW$icoef
return(1 - pexp(time, rate = 1 / exp(params)))
}
plot(time,survival,type="n",xlab="Time",ylab="Survival Probability",
main="Lung Survival (exponential) by sampling from W extreme-value distribution"
)
lines(survival, type = "l", col = "red", lwd = 3) # plot base fitted survival curve
Click on the below to add simulation lines
lines(simFX(), col = "blue", lty = 2)
EDIT 2:
Replace the simFX() functions in the above code examples for Weibull and exponential with the below in order to correctly reflect the parametric survival form $logT∼α+σW$ for Weibull and $log(T)∼α+W$ for exponential:
Weibull:
simFX <- function(){
W <- log(rexp(100)) # W = std min extreme value for parametric survival form logT∼α+σW
newTimes <- exp(fitCoef[1] + exp(fitCoef[2])* W)
fitNewTimes <- survreg(Surv(newTimes)~1,dist="weibull")
return(weibCurve(time,fitNewTimes$icoef))
}
Exponential:
simFX <- function(){
W <- log(rexp(100)) # W = std min extreme value for parametric survival form logT∼α+W
newTimes <- exp(fitCoef + W)
fitNewTimes <- survreg(Surv(newTimes)~1,dist="exponential")
return(1 - pexp(time, rate = 1 / exp( fitNewTimes$icoef)))
}


fitW <- survreg(Surv(exp(W))~1,dist="weibull")command fits a Weibull model to time points distributed as standard minimum extreme value. I think what you want to do is to fit multiple samples from the Weibull model that you fit to thelungdata set. For that, you could generate new samples of size 100 withnewTimes <- exp(fitCoef[1] + exp(fitCoef[2])* log(rexp(100))), fit the Weibull model to each sample of size 100 directly, and plot the corresponding modeled survival curves. That properly puts the sampling variability around the originallungmodel curves. – EdM May 20 '23 at 14:49