5

I've read that post-hoc power analysis is useless when the result of a test is statistically insignificant, which I understand (post-hoc power analysis is just circular logic in this scenario). But what if the result is significant?

Isn't a significant but underpowered result really dubious, which would justify the systematic use of post-hoc power analysis after finding a significant result?

Maybe not systematic, but I'm thinking for example of the case of secondary data analysis when you can't possibly run some a priori power analysis (sample size calculation), or the case of exploratory or pilot studies.

In these situations at least, doesn't post hoc power analysis help avoiding possible misinterpretion or overinterpretion of significant results? Am I missing something?

I ask the question, because many texts seem to make a strong point that post-hoc power analysis is useless, but unless I missed something they always talk about the case of insignificant results. So I'm wondering if it applies to the case of significant results too.

Subsidiary question: are there situations where it wouldn't be justified to run post-hoc power analysis after finding a significant result?

If on the top of an answer, you have any good additional references relative to this issue, I'm interested. Thanks.

Giulia
  • 51
  • 1
    If your result is significant, it's just going to suggest that you did have sufficient power, but in reality post-hoc power analysis is too noisy to be useful--whether your results are significant or not. – num_39 Mar 21 '23 at 06:26
  • 1
    Num_39: there are situations where you can have a significant result yet you have insufficient power to detect the effect you observed. I'm talking about this scenario. – Giulia Mar 21 '23 at 06:28
  • Here is an example of a paper talking about the existence of significant but underpowered results https://pubmed.ncbi.nlm.nih.gov/22023341/, but unfortunately it doesn't seem to really answer my question (and anyway I don't have access to the paper) – Giulia Mar 21 '23 at 06:42
  • 1
    I like the "Sample Size Justification" chapter in the book Improving Your Statistical Inferences by Daniël Lakens, which if freely available online. As I understand it, power is a tool for planning experiments (ie. choosing a sample size); once you've planned and executed the experiment, it's too late to be thinking of what the power might be. So don't do it. – dipetkov Mar 21 '23 at 08:03
  • dipetkov: thanks for the ref! what about secondary data analysis, where we did not get a chance to determine the sample size ourselves? Can't we use post-hoc power to say that the result is inconclusive for our effect size of interest, despite the result being significant, and to use that to call for larger subsequent studies (if the goal is to determine the true effect size) ? – Giulia Mar 21 '23 at 08:06
  • 2
    @Giulia There are other quantities you can look at (precision/standard error of the estimates) that tell you much more directly how much you learned from the data and how convincing your result is. Why not focus on those, rather than power? – dipetkov Mar 21 '23 at 08:09
  • 1
    That being said, here is an answer by @mkt arguing that -- if carefully done -- post-hoc power calculation can be helpful: https://stats.stackexchange.com/a/586479/237901. Note that he isn't advocating to plug in the estimate you got from the study into the power/sample size formula. That's an exercise in circular logic, as explained in the answer here by Stephan Kolassa and many other CV posts. – dipetkov Mar 21 '23 at 08:20
  • 2
    @dipetkov do you mean things like confidence intervals? Good point, I didn't think about that. I guess I focus on power, because it may serve as a guide for sample size calculations in future studies on the subject of interest. But anyway i think I have to re-read the interesting conversation you had with Stephan Kolassa in the comments to get a good grasp of the whole issue. – Giulia Mar 21 '23 at 08:23
  • @StephanKolassa thanks for the link. I didn't see this question, and while the answers are useful and directly relevant to my own question, I'm not sure if the two questions are exactly the same (the other question is in a context where there's a large sample size). But sure, the dipetkov's answer on the other question definitely answers mine. I'm not sure however if this other question would elicit additional answers relevant to my own specific question (e.g. what if we're talking about small sample sizes?). I let the moderators or more experienced users decide! – Giulia Mar 21 '23 at 08:58
  • I think that dipetkov's answer at the other thread does sufficiently address also the small sample size aspect (in fact, they wrote it before the OP there mentioned the large sample in their question), so I think it's quite appropriate as a duplicate - especially if you think it answers your particular question. Let's see how others vote. – Stephan Kolassa Mar 21 '23 at 09:01

1 Answers1

2

If you have observed a significant effect, then your study was by definition powerful enough to detect this effect. Whether your study was "underpowered" (as in "power was lower than some specific threshold") or not does not enter this argument.

As a result, "observed power" is nothing else than a reformulation of the p value of your observed effect, and therefore, it adds no information beyond the p value.

The best exposition of this is in my opinion Hoenig & Heisey (2001, The American Statistician), "The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis". I particularly recommend their Figure 1.

Stephan Kolassa
  • 123,354
  • 3
    I thought that "underpowered" means "not enough power to detect the true effect size", which is surely the goal of the study. Rather than "powered to detect the estimate that was observed." – dipetkov Mar 21 '23 at 07:16
  • 1
    @dipetkov: I did not define "underpowered" as "not powerful to detect the effect size that was observed", per my first sentence. In general, I would not say that "underpowered = not enough power to detect the true effect size" is useful, because we don't know the true effect size! We regularly get questions about what effect size to assume in sample size analysis here, where my recommendation always is to use an effect size "you would be sorry to miss" in the calculation. I would say that this is a more useful way to think about "under/overpowered". ... – Stephan Kolassa Mar 21 '23 at 07:24
  • 1
    ... and of course we could ex post compare our power not to the observed effect size, but to one "we would be sorry to miss", whether that effect was stronger or weaker than the effect we actually did observe. I would say that this is the only way post hoc power would make sense. – Stephan Kolassa Mar 21 '23 at 07:25
  • 1
    What if "an effect size you would be sorry to miss" is much bigger than the true effect size? I don't see how the a-priori power calculation would be of much help if we know absolutely nothing (or ignore what we know) -- either a reasonable estimate of the variance or a reasonable guess for the range of the true effect size (if not the true effect size exactly). – dipetkov Mar 21 '23 at 07:42
  • 2
    @dipetkov: you make two different and very good points. The second first: of course our decision to even consider a study should be informed by our knowledge of possible effect sizes (consider this Bayesian). Which means that if the true effect is much smaller than the one we are "sorry to miss", we were far off in our priors. – Stephan Kolassa Mar 21 '23 at 07:47