Monte Carlo Power calculation for Survival analysis in R

Question

for a project I have to analyze two survival analysis treatment groups using R. I want to calculate the power and type I error rate of a test with Monte Carlo simulation to set up a phase 3 medical study, but I am struggling with the exact calculations and hypotheses.

Firstly, I fit models on my data which includes the usual survival analyis categories: survival time, a right-censoring indicator and the treatment (either a new experimental medicine or the normal one). The treatment 1 Kaplan Meier curve is closest to following an exponential distribution and the treatment 2 KM curve is closest to following a loglogistic curve. I came to these conclusions by calculating the AIC for lognormal, Weibull, exponential and loglogistic fits and then checking the residuals.

To test whether treatment 2 performs better, I used a Fleming-Harrington test in my Monte Carlo analysis with a few different weights. After this however, is where I run into problems.

library(survival)
library(flexsurv)
set.seed(141752)
S = 10000
rate1 = 0.00806
dropout_rate = 0.05
follow_up = 122
n1 = 150 - dropout_rate * 150
n2 = 150 - dropout_rate * 150
shape2 = 1.218       
scale2 = 60.805
treatment_ratio = 
vec_sim.surv1 = c() 
vec_sim.surv2 = c()
sim.csurv1 = c() #censored data
sim.csurv2 = c()
cens_indic1 = c() #censoring indicators
cens_indic2 = c()
list_simdata1 = list()
list_simdata2 = list()
n_deaths1 = c()
n_deaths2 = c()
trt1_vec = rep(1, n1)
trt2_vec = rep(2, n2)
fh_0_1.result = c()
fh_1_1.result = c()
fh_.5_.5.result = c()
fh_.5_2.result = c()
for (i in 1:S){
  sim.surv1 = rexp(n1, rate1) #generate random exponential data for group 1 with the found parameters
  sim.csurv1 = sim.surv1
#censor treatment group 1 data at 122 days (4 months)
  for (j in 1:length(sim.csurv1)){
    if (sim.csurv1[j] > follow_up){
      cens_indic1[j] = 0
      sim.csurv1[j] = follow_up
    }
    else{
      cens_indic1[j] = 1
    }
  }
  sim.surv2 = rllogis(n2, shape2, scale2) #generate random log-logistic data for group 2 with the found parameters
  sim.csurv2 = sim.surv2
#censor treatment group 2 data at 122 days (4 months)
  for (q in 1:length(sim.csurv2)){
    if (sim.csurv2[q] > follow_up){
      cens_indic2[q] = 0
      sim.csurv2[q] = follow_up
    }
    else{
      cens_indic2[q] = 1
    }
  }
}
#Perform Fleming-Harrington test with different weights 
  fh_0_1 = logrank.test(time=df_simdata$time, event=df_simdata$censoring, group=df_simdata$treatment, alternative='greater', rho=0, gamma=1)
  fh_1_1 = logrank.test(time=df_simdata$time, event=df_simdata$censoring, group=df_simdata$treatment, alternative='greater', rho=1, gamma=1)
  fh_.5_.5 = logrank.test(time=df_simdata$time, event=df_simdata$censoring, group=df_simdata$treatment, alternative='greater', rho=0.5, gamma=0.5)
  fh_.5_2 = logrank.test(time=df_simdata$time, event=df_simdata$censoring, group=df_simdata$treatment, alternative='greater', rho=0.5, gamma=2)
#append the p-value for each test to a vector
  fh_0_1.result[i] = fh_0_1$test$p
  fh_1_1.result[i] = fh_1_1$test$p
  fh_.5_.5.result[i] = fh_.5_.5$test$p
  fh_.5_2.result[i] = fh_.5_2$test$p
}
fh_0_1.value = sum(fh_0_1.result<0.05)/S
fh_1_1.value = sum(fh_1_1.result<0.05)/S
fh_.5_.5.value = sum(fh_.5_.5.result<0.05)/S
fh_.5_2.value = sum(fh_.5_2.result<0.05)/S

To calculate the fh_x_x.values, I mimicked code from a course I followed, in which power was calculated for a t-test:

set.seed(3) 
S <- 1000 
n <- 15 
sigma <- sqrt(5/3)
mu0 <- 1 
mu1 <- 1.75
result <- vector(length = S)
for (i in 1:S)
{
out <- rnorm(n, mu1, sigma)
onesamplettest <- t.test(out, mu=mu0)
result[i] <- onesamplettest$p.value
}
power <- sum(result<0.05)/S           #we reject at a = 0.05
power
[1] 0.534

But I don't understand whether I'm generating data under the null hypothesis or the alternative hypothesis. I have stated my hypotheses as

H0: h1(t) <= h2(t)

HA: h1(t) > h2(t)

but I'm really not sure if these hypotheses are the correct choice.

My question therefore is, what exactly am I calculating in the fh_x_x.value? And are my hypotheses correct?

Please explain why you are performing this power calculation. It sounds like you already have the data and the model to differentiate the treatment groups. If that's the case, then you already know if your study was adequately powered and there's not much point in post-hoc power analysis. Or are you setting up plans for a new study, based on pilot data? Please provide that information by editing the question, as comments are easy to overlook and can be deleted. — EdM, Nov 04 '22 at 17:19
Also, I don't see that you provide values for rate1,scale2 or shape2, or seeds set before the random sampling, so your code isn't reproducible. Please also add that information to the question and code. — EdM, Nov 04 '22 at 17:24
@EdM I have edited the question to include the complete code used for the simulation and added my goal for the power analysis — Evert Bronstring, Nov 06 '22 at 11:27
Are the simulated data based on your current estimates for survival under the 2 treatments? Will the results be used for regulatory submission (e.g., FDA approval)? Please edit the question to provide that information. Also, look at this article for guidance on power for study design and the correct significance tests to use, depending on the hypothesis you want to test. Usually you use a null hypothesis of equal outcomes to document superiority. — EdM, Nov 10 '22 at 19:01
I fixed a couple of small items in the code you provided so that it would work as intended, but I'm not sure exactly how you put together the df_simdata data frame. Please add that to the code. — EdM, Nov 10 '22 at 19:34

Monte Carlo Power calculation for Survival analysis in R

0 Answers0