5

I am trying to understand how much power we lose by selecting a categorical exposure versus a continuous exposure in a generalised linear model (GLM).

I didn't have problems calculating power using the effect size from the original dataset. We did that using SIMR package.

For model 1 (continuous exposure)

model1b       <- glm(y ~ x, family=binomial(link='logit'), data=data)
power_model1b <- powerSim(model1b, nsim=1000, seed=30, test=fixed("x"))

For model 2 (categorical exposure)

model2b              <- glm(y ~ categorical_x, family=binomial, data=data)
power_model2b_low    <- powerSim(model2b, nsim=1000, seed=30, 
                                 test=fixed("categorical_xlow"))
power_model2b_medium <- powerSim(model2b, nsim=1000, seed=30, 
                                 test=fixed("categorical_xmedium"))

We saw that the power decreases when we use categorical measures. However, I read that using effect size from the data already collected might lead to bias. Therefore, a solution for that is to set a fixed effect size and do the comparison.

I am confused about how to address this because model 1 is using continuous exposure, and model 2 is categorical exposure. Would it be okay if I just change the coefficients using coef option in SIMR and run the calculations again to compare power between models?

model_fixed1b          <- model1b
coef(model_fixed1b)[2] <- -0.22    # changing coef.
model_fixed2b          <- model2b
coef(model_fixed2b)[2] <- -0.22    # changing coef for level 1 categorical exposure
coef(model_fixed2b)[3] <- -0.22    # changing coef for level 2 categorical exposure
Andrea
  • 51

1 Answers1

3

Don't use a canned simulation function to conduct your power analysis—just do your own simulation from scratch. Your situation isn't that difficult. There are various resources on CV to assist you with simulating a logistic regression model (in particular, see: Simulation of logistic regression power analysis - designed experiments).

The background information in your question is rather sparse, but I gather you think there is a linear (in log odds) relationship between a continuous $X$ and a binary $Y$. You want to plan a subsequent study based on the same relationship found in your data, but wonder what would happen if you dichotomized $X$. So just simulate that, but in one case leave $X$ as is, and in the other case dichotomize it. Then compare the observed power. Here is a simple example (coded in R):

set.seed(7309)                     # this makes the example exactly reproducible
x  = runif(30, min=0, max=2)       # these are our original x-values
lo = 0 + 1*x                       # I simulate the true underlying log odds
p  = exp(lo)/(1+exp(lo))           # I convert them to probabilities  
y  = rbinom(30, size=1, prob=p)    # I generate the observed, binomial y-values
m  = glm(y~x, family=binomial)     # I'm pretending this is your original model
B0 = coef(m)[1]                    # here I extract the fitted coefficients  
B1 = coef(m)[2]
lo = B0 + B1*x                     # which I use to create a new set of log odds...
p  = exp(lo)/(1+exp(lo))           # ... and probabilities

x.cat = ifelse(x>median(x), 1, 0) # here I dichotomize your x-values

p.vect.cont = vector(length=1000) # these will store the p-values p.vect.cat = vector(length=1000) for(i in 1:1000){ y = rbinom(30, size=1, prob=p) # I simulate new observed y's each time, m.cont = glm(y~x, family=binomial) # refit the models, m.cat = glm(y~x.cat, family=binomial) p.vect.cont[i] = coef(summary(m.cont))[2,4] # and get the p-values p.vect.cat[i] = coef(summary(m.cat))[2,4] } mean(p.vect.cont<.05) # [1] 0.389 # these are empirical power estimates mean(p.vect.cat<.05) # [1] 0.21