I'm assessing a paper, the authors of which state that they recruited 3000 patients because that was what the power calculation suggested was necessary to detect a 5% difference at 85% power.
They actually only found an effect size of 2% (0.2-5.6), but calculate that this was statistically sinificant (p=0.045).
I'm wondering, if they had set out to find a difference of 2%, they'd have needed far more than 3000 patients, which would make this study of 3000 patients underpowered (this may be a false assumption, please correct me). I appreciate that underpowered studies can still find statistically significant effects, but there is a greater risk of the size of the effect being overstated and p values being artificially low (could this have happened in my paper?). Is this an invalid line of thinking because the 95% CIs included the predicted 5% difference?
I know that power is calculated prior to a study recruiting patients, so doing a post-hoc power analysis would be invalid. I'm approaching this from a 'there is a risk this study is actually underpowered' stance, rather than 'this study is underpowered and I can prove it' stance.
I'm also unsure if this would be a contributory issue, but I think the authors suggest from their use of Robust Standard Errors and the Huber-White estimator that they had heteroskedastic data. I'm inferring from this that between-cluster variability was higher than within-cluster variability. From this, the intracluster correlation coefficient equation would suggest that p is high, which would reduce the effective sample size of the paper and further contribute to the qualitative assessment that this study risks being underpowered.
Am I way off the mark? I'm a paramedic not a statistician so I would appreciate if someone could explain in some detail but summarise in lay terms. Thanks.