I'm studying simulation and in particular uncertainty in survival curve estimates, by drawing from a distribution of parameters (for the Weibull distribution), using the variance-covariance matrix from the MASS package. In response to post How to generate simulation paths for Weibull distribution in R?, I've come up with the following example. My question is, is this an adequate example for simulating the Weibull distribution and in deriving survival and hazard curve estimates?
I show the output plots below and the code at the bottom of this post.
Code:
library(survival)
library(MASS)
Simulate survival data
n <- 100
t <- rweibull(n, shape = 2, scale = 3) # observed survival times for each individual in the study
cens <- rbinom(n, size = 1, prob = 0.2) # 1 = survival time right-censored (still alive/lost; 0 = event-censored
x1 <- rnorm(n) # x1 & x2 are hypothetical covariates (predictors) for individuals in the study
x2 <- rnorm(n) # see above
data <- data.frame(t, cens, x1, x2)
Fit Cox PH model
model <- coxph(Surv(t, cens) ~ x1 + x2, data = data)
Generate random samples from multivariate normal distribution
beta <- coef(model)
vcov <- vcov(model)
samples <- mvrnorm(1000, mu = beta, Sigma = vcov)
Generate new survival times; new_time = simulated new survival times
data$new_time <- NA
for (i in 1:nrow(data)) {
shape <- exp(samples[i, "x1"])
scale <- exp(samples[i, "x2"])
data$new_time[i] <- qweibull(runif(1), shape = shape, scale = scale)
}
Generate survival curves
fit <- survfit(Surv(new_time, cens) ~ 1, data = data)
sfit <- summary(fit)
tseq <- seq(min(sfit$time), max(sfit$time), length = 100)
surv <- survfit(Surv(new_time, cens) ~ 1, data = data.frame(new_time = tseq))
haz <- -diff(log(surv$surv))/diff(surv$time)
Plot survival and hazard curves, and histogram of simulated survival times
par(mfrow=c(1,3))
plot(surv, xlab = "Time", ylab = "Survival Probability", main = "Survival Function")
plot(surv$time[-1], haz, type = "l",xlab = "Time", ylab = "Hazard Function", main = "Hazard Function")
hist(data$new_time, breaks = 20, xlab = "Survival Time", ylab = "Frequency", main = "Simulated Survival Times")
Calculate average hazard rate per time period
haz_by_time <- split(haz, factor(floor(surv$time[-1])))
mean_haz <- unlist(lapply(haz_by_time, function(x) mean(x, na.rm = TRUE)))
