1

I'm assessing a paper, the authors of which state that they recruited 3000 patients because that was what the power calculation suggested was necessary to detect a 5% difference at 85% power.

They actually only found an effect size of 2% (0.2-5.6), but calculate that this was statistically sinificant (p=0.045).

I'm wondering, if they had set out to find a difference of 2%, they'd have needed far more than 3000 patients, which would make this study of 3000 patients underpowered (this may be a false assumption, please correct me). I appreciate that underpowered studies can still find statistically significant effects, but there is a greater risk of the size of the effect being overstated and p values being artificially low (could this have happened in my paper?). Is this an invalid line of thinking because the 95% CIs included the predicted 5% difference?

I know that power is calculated prior to a study recruiting patients, so doing a post-hoc power analysis would be invalid. I'm approaching this from a 'there is a risk this study is actually underpowered' stance, rather than 'this study is underpowered and I can prove it' stance.

I'm also unsure if this would be a contributory issue, but I think the authors suggest from their use of Robust Standard Errors and the Huber-White estimator that they had heteroskedastic data. I'm inferring from this that between-cluster variability was higher than within-cluster variability. From this, the intracluster correlation coefficient equation would suggest that p is high, which would reduce the effective sample size of the paper and further contribute to the qualitative assessment that this study risks being underpowered.

Am I way off the mark? I'm a paramedic not a statistician so I would appreciate if someone could explain in some detail but summarise in lay terms. Thanks.

  • See a similar recent question (that was closed). Very briefly, no: powering a priori means the study has $1-\beta$ probability of rejecting the null under the alternative, and probability $\alpha$ under the null - these frequentist properties hold. Observing lower effect means you might've been 'lucky' (up to type I assertion with prob. $\alpha$) or 'unlucky' (alternative holds but sample turned out lower) - you can't tell from one observation. – PBulls Dec 02 '23 at 07:37
  • I think a better question would be 'is the effect really 5%' which is more challenging to address & depends on your statistical philosophy. Given these data the answer is 'maybe not', but it's also quite probably not zero. It then becomes a clinical question whether 5 or 3 or 2% is worth making the switch - you can statistically quantify a difference down to 0.000..% given enough sample size but at some point this loses touch with clinical reality. – PBulls Dec 02 '23 at 08:01
  • Thanks for your input guys. I think I'm asking a different question to the one you linked to (or I just don't undertsand it). I'm asking if we can infer that there is a risk the study is under powered, whilst the other question you linked to 'confirms' the study is underpowered and then asks if that matters to a stat sig result. – user356816 Dec 02 '23 at 11:38

0 Answers0