2

I'm studying simulation and in particular uncertainty in survival curve estimates, by drawing from a distribution of parameters (for the Weibull distribution), using the variance-covariance matrix from the MASS package. In response to post How to generate simulation paths for Weibull distribution in R?, I've come up with the following example. My question is, is this an adequate example for simulating the Weibull distribution and in deriving survival and hazard curve estimates?

I show the output plots below and the code at the bottom of this post.

enter image description here

Code:

library(survival)
library(MASS)

Simulate survival data

n <- 100 t <- rweibull(n, shape = 2, scale = 3) # observed survival times for each individual in the study cens <- rbinom(n, size = 1, prob = 0.2) # 1 = survival time right-censored (still alive/lost; 0 = event-censored x1 <- rnorm(n) # x1 & x2 are hypothetical covariates (predictors) for individuals in the study x2 <- rnorm(n) # see above data <- data.frame(t, cens, x1, x2)

Fit Cox PH model

model <- coxph(Surv(t, cens) ~ x1 + x2, data = data)

Generate random samples from multivariate normal distribution

beta <- coef(model) vcov <- vcov(model) samples <- mvrnorm(1000, mu = beta, Sigma = vcov)

Generate new survival times; new_time = simulated new survival times

data$new_time <- NA for (i in 1:nrow(data)) { shape <- exp(samples[i, "x1"]) scale <- exp(samples[i, "x2"]) data$new_time[i] <- qweibull(runif(1), shape = shape, scale = scale) }

Generate survival curves

fit <- survfit(Surv(new_time, cens) ~ 1, data = data) sfit <- summary(fit) tseq <- seq(min(sfit$time), max(sfit$time), length = 100) surv <- survfit(Surv(new_time, cens) ~ 1, data = data.frame(new_time = tseq)) haz <- -diff(log(surv$surv))/diff(surv$time)

Plot survival and hazard curves, and histogram of simulated survival times

par(mfrow=c(1,3)) plot(surv, xlab = "Time", ylab = "Survival Probability", main = "Survival Function") plot(surv$time[-1], haz, type = "l",xlab = "Time", ylab = "Hazard Function", main = "Hazard Function") hist(data$new_time, breaks = 20, xlab = "Survival Time", ylab = "Frequency", main = "Simulated Survival Times")

Calculate average hazard rate per time period

haz_by_time <- split(haz, factor(floor(surv$time[-1]))) mean_haz <- unlist(lapply(haz_by_time, function(x) mean(x, na.rm = TRUE)))

2 Answers2

1

This seems unnecessarily convoluted. As I understand your code, you sample some simulated survival times based on a fixed Weibull model, put the sampled times together for a Cox model, then sample from the joint distribution of the Cox model coefficients to get "new" survival times from another Weibull model. (For that last step, it's not immediately clear that the Cox model coefficients from which you sample are related to the shape and scale of qweibull(),although I might be missing something there.)

If you want to generate simulations to illustrate the vagaries of fitting a parametric Weibull model, cut out the intermediate Cox model. Sample from the joint distribution of shape and scale values from a parametric Weibull fit and show the corresponding variability in Weibull-based survival curves.

If you want to illustrate the vagaries of fitting a Cox model, you would be better off with a resampling approach as the baseline hazard over time is fixed, estimated from the entire data sample. Sampling based on the variance-covariance matrix of Cox-model coefficients thus would not take into account the variability in estimating the baseline hazard. Repeating the Cox modeling on multiple bootstrapped samples of the data and showing the corresponding survival curves would take into account both the variability in coefficient estimates and the variability in the estimates of the baseline hazard.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • I reflect the advice in the 2nd paragraph of your answer, stripping out the Cox model from my OP, in the answer I just posted below. – Village.Idyot Apr 24 '23 at 09:00
  • Hi EdM, in response to your comment about sampling from the joint distribution of shape and scale values, I attempted to do this with code in another post I just posted, https://stats.stackexchange.com/questions/614381/in-survival-curve-simulation-how-to-correctly-sample-from-the-joint-distributio – Village.Idyot Apr 28 '23 at 09:46
0

In the below code I reflect the advice in the 2nd paragraph of EdM's answer, "to generate simulations to illustrate the vagaries of fitting a parametric Weibull model" by removing the intermediate Cox model shown in the OP.

The code below takes into account the joint distribution of shape and scale values from the parametric Weibull fit by using the mvrnorm() function from the MASS package to simulate from a bivariate normal distribution with mean and covariance matrix equal to the estimates obtained from fitting the Weibull distribution. I use the vcov() function to extract the covariance matrix from the Weibull fit. The code generates new survival times using the simulated shape and scale parameters.

Code:

library(survival)
library(MASS)

Simulate survival data

n <- 100 cens <- rbinom(n, size = 1, prob = 0.2) x1 <- rnorm(n) x2 <- rnorm(n) data <- data.frame(cens, x1, x2)

Estimate parameters of Weibull distribution

t <- rexp(n, rate = 1/5) fit <- fitdistr(t, "weibull") shape <- fit$estimate[1] scale <- fit$estimate[2]

Generate new survival times based on estimated parameters

mean_est <- c(shape, scale) cov_est <- vcov(fit) new_params <- mvrnorm(n = n, mu = mean_est, Sigma = cov_est) new_t <- rweibull(n = n, shape = new_params[,1], scale = new_params[,2])

Fit survival curve

fit <- survfit(Surv(new_t, cens) ~ 1, data = data) sfit <- summary(fit) tseq <- seq(min(sfit$time), max(sfit$time), length = 100) surv <- survfit(Surv(new_t, cens) ~ 1, data = data.frame(new_t = tseq)) haz <- -diff(log(surv$surv))/diff(surv$time)

Plot survival and hazard curves, and histogram of simulated survival times

par(mfrow=c(1,3)) plot(surv, xlab = "Time", ylab = "Survival Probability", main = "Survival Function") plot(surv$time[-1], haz, type = "l",xlab = "Time", ylab = "Hazard Function", main = "Hazard Function") hist(new_t, breaks = 20, xlab = "Survival Time", ylab = "Frequency", main = "Simulated Survival Times")

  • 1
    It seems that you might be having some problems with Weibull parameterization. I'll post an answer to this question that should help with the several questions you've posted recently. – EdM Apr 28 '23 at 13:35
  • Yes, thank you. I am a student of survival analysis and have quite a ways to go – Village.Idyot Apr 28 '23 at 13:40
  • 1
    Actually I'll post my answer to this question. After that, to help maintain some efficiency on this site, please review your questions and delete those that aren't really at issue anymore. This site works best if there is just one question per page, but the relatively small differences among your recent questions don't seem so much like truly separate questions. – EdM Apr 28 '23 at 13:43