for a project I have to analyze two survival analysis treatment groups using R. I want to calculate the power and type I error rate of a test with Monte Carlo simulation to set up a phase 3 medical study, but I am struggling with the exact calculations and hypotheses.
Firstly, I fit models on my data which includes the usual survival analyis categories: survival time, a right-censoring indicator and the treatment (either a new experimental medicine or the normal one). The treatment 1 Kaplan Meier curve is closest to following an exponential distribution and the treatment 2 KM curve is closest to following a loglogistic curve. I came to these conclusions by calculating the AIC for lognormal, Weibull, exponential and loglogistic fits and then checking the residuals.
To test whether treatment 2 performs better, I used a Fleming-Harrington test in my Monte Carlo analysis with a few different weights. After this however, is where I run into problems.
library(survival)
library(flexsurv)
set.seed(141752)
S = 10000
rate1 = 0.00806
dropout_rate = 0.05
follow_up = 122
n1 = 150 - dropout_rate * 150
n2 = 150 - dropout_rate * 150
shape2 = 1.218
scale2 = 60.805
treatment_ratio =
vec_sim.surv1 = c()
vec_sim.surv2 = c()
sim.csurv1 = c() #censored data
sim.csurv2 = c()
cens_indic1 = c() #censoring indicators
cens_indic2 = c()
list_simdata1 = list()
list_simdata2 = list()
n_deaths1 = c()
n_deaths2 = c()
trt1_vec = rep(1, n1)
trt2_vec = rep(2, n2)
fh_0_1.result = c()
fh_1_1.result = c()
fh_.5_.5.result = c()
fh_.5_2.result = c()
for (i in 1:S){
sim.surv1 = rexp(n1, rate1) #generate random exponential data for group 1 with the found parameters
sim.csurv1 = sim.surv1
#censor treatment group 1 data at 122 days (4 months)
for (j in 1:length(sim.csurv1)){
if (sim.csurv1[j] > follow_up){
cens_indic1[j] = 0
sim.csurv1[j] = follow_up
}
else{
cens_indic1[j] = 1
}
}
sim.surv2 = rllogis(n2, shape2, scale2) #generate random log-logistic data for group 2 with the found parameters
sim.csurv2 = sim.surv2
#censor treatment group 2 data at 122 days (4 months)
for (q in 1:length(sim.csurv2)){
if (sim.csurv2[q] > follow_up){
cens_indic2[q] = 0
sim.csurv2[q] = follow_up
}
else{
cens_indic2[q] = 1
}
}
}
#Perform Fleming-Harrington test with different weights
fh_0_1 = logrank.test(time=df_simdata$time, event=df_simdata$censoring, group=df_simdata$treatment, alternative='greater', rho=0, gamma=1)
fh_1_1 = logrank.test(time=df_simdata$time, event=df_simdata$censoring, group=df_simdata$treatment, alternative='greater', rho=1, gamma=1)
fh_.5_.5 = logrank.test(time=df_simdata$time, event=df_simdata$censoring, group=df_simdata$treatment, alternative='greater', rho=0.5, gamma=0.5)
fh_.5_2 = logrank.test(time=df_simdata$time, event=df_simdata$censoring, group=df_simdata$treatment, alternative='greater', rho=0.5, gamma=2)
#append the p-value for each test to a vector
fh_0_1.result[i] = fh_0_1$test$p
fh_1_1.result[i] = fh_1_1$test$p
fh_.5_.5.result[i] = fh_.5_.5$test$p
fh_.5_2.result[i] = fh_.5_2$test$p
}
fh_0_1.value = sum(fh_0_1.result<0.05)/S
fh_1_1.value = sum(fh_1_1.result<0.05)/S
fh_.5_.5.value = sum(fh_.5_.5.result<0.05)/S
fh_.5_2.value = sum(fh_.5_2.result<0.05)/S
To calculate the fh_x_x.values, I mimicked code from a course I followed, in which power was calculated for a t-test:
set.seed(3)
S <- 1000
n <- 15
sigma <- sqrt(5/3)
mu0 <- 1
mu1 <- 1.75
result <- vector(length = S)
for (i in 1:S)
{
out <- rnorm(n, mu1, sigma)
onesamplettest <- t.test(out, mu=mu0)
result[i] <- onesamplettest$p.value
}
power <- sum(result<0.05)/S #we reject at a = 0.05
power
[1] 0.534
But I don't understand whether I'm generating data under the null hypothesis or the alternative hypothesis. I have stated my hypotheses as
H0: h1(t) <= h2(t)
HA: h1(t) > h2(t)
but I'm really not sure if these hypotheses are the correct choice.
My question therefore is, what exactly am I calculating in the fh_x_x.value? And are my hypotheses correct?
rate1,scale2orshape2, or seeds set before the random sampling, so your code isn't reproducible. Please also add that information to the question and code. – EdM Nov 04 '22 at 17:24df_simdatadata frame. Please add that to the code. – EdM Nov 10 '22 at 19:34