0

I am trying to conduct a post-hoc power analysis based on the generalised linear model I used in my study. I'm trying to determine what the power is so that I can use it to compare against another study with a smaller sample size. I am using the pwr.f2.test() function in the pwr package, as follows:

pwr.f2.test(u = numerator, v = denominator, f2 = effect size, sig.level = 0.05, power = NULL)

However I am struggling to work out if I have used the correct numbers for the calculation as how I currently have it I get a power result of 1, which I understand is highly unlikely to be true.

Here is my model and output:

Call:
glm(formula = cbind(germination, total - germination) ~ scarification * 
    stratification, family = binomial, data = df)

Deviance Residuals: Min 1Q Median 3Q Max
-4.8183 -2.1860 -0.8318 1.1555 5.5181

Coefficients: Estimate Std. Error z value Pr(>|z|)
Control (Intercept -3.81830 0.33960 -11.244 < 2e-16 *** scarificationEthanol 1.61499 0.39428 4.096 4.2e-05 *** scarificationWater 0.27348 0.44337 0.617 0.5373
scarificationScalpel 3.07329 0.37289 8.242 < 2e-16 *** scarificationSandpaper 2.16413 0.38222 5.662 1.5e-08 *** time 0.47223 0.05267 8.965 < 2e-16 *** scarificationEthanol:time -0.05626 0.06656 -0.845 0.3980
scarificationWater:time 0.15325 0.07470 2.051 0.0402 *
scarificationScalpel:time -0.11316 0.06727 -1.682 0.0925 .
scarificationSandpaper:time -0.11959 0.06490 -1.843 0.0654 .


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1285.19  on 59  degrees of freedom

Residual deviance: 408.11 on 50 degrees of freedom AIC: 599.94

Number of Fisher Scoring iterations: 5

Here are the numbers I used

#r2 value is the residual deviance/null deviance from the generalised linear model
r2 <-1-(408.11/1285.19)
f2 <- r2/(1-r2)
#u = no. of independent variables minus 1 (5 scarification and 4 stratification minus 1 = 8), v = number of observations minus no. of independent variables (60 - 9 = 51); u + v = 59 (degrees of freedom of the model)
pwr.f2.test(u = 8, v = 51, f2 = f2, power = NULL)`

And this is the output

Multiple regression power calculation

      u = 8
      v = 51
     f2 = 2.149126
  sig.level = 0.05
      power = 1

I'm hoping that someone can help me to work out where I might have gone wrong, or if this is correct, then how to interpret this very high power level.

Some context for the study in case it is helpful: I measured the proportion of seed germination across 20 treatment combinations; 5 levels of scarification and 4 levels of time. In the model I used time as a continuous variable. Each of the treatment combinations used 40 seeds from 3 sources (total 120 seeds per treatment, 3 proportion values for the model).

Stacey
  • 15
  • 3
    You have conducted a post-hoc power analysis (based on the observed effect size), which cannot be interpreted as the results of a priori power analysis. See here. In a post-hoc power analysis, low p-value will always yield a very high power and such analyses are generally not considered informative. – Sointu Jul 15 '23 at 06:42
  • @Sointu thank you for your response. I'm aware that it might not be terribly useful. Does my calculation look correct though? Do you know of any other post-hoc test I could do instead to see what my power would have been, for instance if I used an estimated effect size prior to my experiment? – Stacey Jul 16 '23 at 01:49
  • 1
    @Stacey The correctness of the computations is irrelevant. Even in the event they were correct, the approach is fundamentally flawed and provides no more evidence that the p value for the associated test would provide. Se this answer of mine for more. – Demetri Pananos Jul 16 '23 at 14:07

1 Answers1

1

To me it seems that the reasonable way to do this would be to compare the 2 studies based on their respective sample sizes and some reasonably chosen effect size estimate and not to use any parameters from your actual model.

Let's imagine you had no data yet and were in the process of planning your study and wanted to choose between 2 sample sizes, say 100 and 150, and wanted to know the power for each. You would then choose the effect size to be tested - you'd choose the smallest effect size that is meaningful to you. In the following, I use Cohen's "medium" effect size for f2 (0.15), this corresponds to a R-squared of about 0.13. Also, it seems you have 10 parameters in the model so u would be 9 (intercept counts as "predictor" in this context).

library(pwr)

pwr.f2.test(u=9, v=90, f2=0.15, sig.level = 0.05) #v=90 because n=100 and n=v+u+1

Multiple regression power calculation

          u = 9
          v = 90
         f2 = 0.15
  sig.level = 0.05
      power = 0.7318872

#power is below the conventional .80 level #Now the n=150 calculation

pwr.f2.test(u=9, v=140, f2=0.15, sig.level = 0.05)

Multiple regression power calculation

          u = 9
          v = 140
         f2 = 0.15
  sig.level = 0.05
      power = 0.9217457

So, with n=150, you have good power to detect a medium-sized R2 deviation from zero with 10 parameters and you can say the n=150 is a better choice of a sample size than n=100 (probably a bit smaller sample than 150 would also have adequate power).

You could substitute the sample sizes of the studies you want to compare into the above calculations.

Of course, if you have an a priori reason (that you had before you saw your model's R2) to believe that your predictors would together have very high explanatory power, you could use large effect size estimate in this type of calculations. However, you shouldn't put an observed power estimate into a power calculation because they are noisy and the results will be uninformative, and you shouldn't let the observed R2 to affect your comparison calculations either.

Edited to add an answer to your original question, to me it seems your calculations are technically correct (except that I think u=9, not 8), it's just that the effect size you give to pwr is absolutely massive, so you will get a power estimate of ~ 1 with almost any sample size (even with n=20 you'd get power of about .90), so even a priori power comparisons involving such effect size would be meaningless in practice.

Sointu
  • 1,846