10

Is there a reason to report classical ANOVA F values (Fisher's F), instead of Welch's F-tests?

I know Welch's F test is corrected, so that it's robust against heterogeneity of variance. This leaves me wondering: Is there any reason why studies don't regularly only report Welch's F?

foggy
  • 211

3 Answers3

10

Several points:

  1. The first thing is for the significance level calculations to be valid the assumption would be about the situation when $H_0$ is true. It would be possible for the distributions to have the same (or at least very similar) spread under $H_0$ but to differ as the means move (e.g. in a situation where the spread increased when the mean increased, as might be quite plausible with a non-negative response where if $H_0$ is true the group-factor might be expected to do nothing at all - such as a completely ineffectual treatment, perhaps - so that all groups would effectively be a single population).

    Which is to say, the incidental appearance of the sample data may not necessarily be particularly relevant to the consideration of the assumption required to obtain correct or almost-correct significance levels and p-values (which related to the case where $H_0$ holds, which will usually not be the case in the observed data, even if you fail to reject the null).

  2. The significance level in one way ANOVA is highly robust to the variances being different under $H_0$ in the case where the sample sizes are equal.

  3. Those things aside, there's usually very little lost when applying the Welch-Satterthwaite approximation even when the population variances are exactly equal. Which is to say, it is usually a perfectly reasonable default if you are not confident whether assuming equal variances under the null would make sense in your situation.

In summary: sometimes it can make sense to use the equal-variance version even when it might look like you should not (not that I am advocating choosing your test on the basis of what you happen to see in the data you want to use the test on). On the other hand, there's usually little to lose by using Welch-Satterthwaite ANOVA as a matter of course.

Glen_b
  • 282,281
  • Thanks. However, if I get you right, in point 1) you're saying it's not often classical ANOVA is worse than Welch's F, which doesn't seem like an argument to use it to me. Point 3) sounds like precisely what I'm interested in - is there really any information lost, when we only report Welch's F? I.e. could theoretically Welch's F come out significant and Fisher's F non-significant, so that their contrast would indicate something meaningful? – foggy Jan 18 '24 at 13:45
  • I don't quite agree with your characterization of what I said in 1. Something that happens less often than many people think is not necessarily all that rare. It's a matter of understanding the circumstances where it works just fine, and thinking about whether that might apply. In some application areas it will be happening a lot and in others, perhaps not so much. 2. Be careful not to conflate performance across data sets (any discussion of significance level or power) with something specific to a single data set (one being significant, the other not). That can happen both ways. ... ctd
  • – Glen_b Jan 18 '24 at 16:19
  • 1
    ... I did give both specific and general advice. One downside of the Welch that I didn't make explicit is that it's no longer exact even when the assumption all hold exactly. Whether you think that's an issue is another matter. I generally don't but people that get very antsy about not exceeding nominal significance levels (I see it being a very strong concern in specific circumstances and then almost completely ignored in others), might need to worry about it in this situation if they're going to behave consistently. I'll consider edits if I can see how to be both clear & fairly accurate – Glen_b Jan 18 '24 at 16:24
  • So if I get you right, you're saying that indeed in some cases Welch's F could come out significant and Fisher's F non-significant and it would be a reason to dismiss the hypothesis, because Welch's F is less accurate? – foggy Jan 20 '24 at 12:19
  • No, again I'm talking about properties across data sets (population-level considerations of the test properties like significance level) while you're focusing on properties (/happenstance) of one data set. – Glen_b Jan 21 '24 at 02:22