I have observations collected over a time period of 12 months. For each month, I recorded the total number of observations (n_per_month), the number of "success" events (n_success) and then computed the success rate as n_success/n_per_month (success_rate). Here it is an example of the data:
data_df = data.frame(month = 5:11,
n_per_month = c(17, 21, 15, 8, 14, 14, 1),
n_success = c(13, 13, 6, 6, 10, 14, 1),
success_rate = c(0.7647059, 0.6190476, 0.4, 0.75, 0.7142857,
1, 1))
Then I fitted a logistic regression using glm R function:
model_obs <- glm(success_rate ~ month, data = data_df,
family = binomial(), weights=n_per_month)
And add the corresponding fitted values in the original dataframe:
data_df$fitted = as.numeric(predict.glm(model_obs,
type = "response"))
Following a past posted question: Simulation of logistic regression power analysis - designed experiments, I modified that R code to perform post-hoc power analysis. That is, I wanted to know how reliable is the success rate I computed from the observed data, given the sample size of each month. For this example code, I repeat the analysis 10 times (repetitions) and record the results of the power analysis in significant matrix. In this post-hoc power analysis I assume that the rates estimated from the observed data are true.
repetitions = 10
significant = matrix(nrow=repetitions, ncol=3)
for(i in 1:repetitions){
final_rep = NULL
for(j in 1:nrow(data_df)){
resp_val = rbinom(n=data_df$n_per_month[j], size=1,
prob=data_df$fitted[j])
tmp_df = data.frame(month=rep(data_df$month[j],
length(resp_val)), events=resp_val)
final_rep = rbind(final_rep, tmp_df)
}
n_success_df = aggregate(events ~ month, data = final_rep, sum)
n_month_df = aggregate(events ~ month, data = final_rep, length)
rep_df = data.frame(month=n_success_df$month,
n_per_month=n_month_df$events,
n_success=n_success_df$events)
rep_df$success_rate = rep_df$n_success/rep_df$n_per_month
model <- glm(success_rate ~ month, data = rep_df,
family = binomial(), weights=n_per_month)
significant[i, 1] = (summary(model)$coefficients[2, 4]<.05)
significant[i, 2] = sum(significant[i, 1])
modelDev = model$null.deviance-model$deviance
significant[i, 3] = (1-pchisq(modelDev, 1))<.05
rm(temp_df, count_df, rep_df, model, modelDev)
}
sum(significant[, 2]==1)/repetitions
Would you agree with this procedure?