Effect of treatment on outcome variance

Question

I am in the context of an observational study, but let's take as an example a randomized control trial studying the effect of treatment $T$ on outcome $Y$.

A difference-in-means test indicated no change in $E[Y]$. However, you notice that $Var[Y \mid T=1]$ is much higher than $Var[Y \mid T=0]$. a) How do you test for $Var[Y \mid T=1] > Var[Y \mid T=0]$? b) Is there a way to quantify this difference (as in "$T$ caused an increase in $Y$'s variance by x amount")?

score 1 · Answer 1 · answered Mar 15 '23 at 23:10

While we often use F-testing for testing differences in means (e.g., ANOVA), the F-test is actually a test of variances that methods like ANOVA use cleverly to investigate differences in means.

Therefore, the first thought might be to use an F-test of the two variances. This can be implemented in R software, for instance, using var.test.

Unfortunately, the F-test lacks robustness to deviations from normality. The JBStatistics channel on YouTube has a video showing this, and it might be fun to come up with your own simulations to show this.

A more robust alternative is the Ansari-Bradley test, implemented in R through ansari.test. Technically, this is not quite a variance test, but it tends to do a good job and could be worth a read.

If you want to get into a more general setting where you find the variance conditional on multiple covariates, this question of mine is asking the same and has yet to get the kind of resolution I had hoped to get.

For quantifying the effect size, I find it natural to talk about the ratio of the two variances, rather than the difference. It makes sense to me to say that one distribution has twice or half the variance of another, and this ratio is part of what is calculated in the F-test.

Finally, establishing causality is likely to encounter the same kind of bugaboos that occur when it comes to establishing causality in a regression that estimates conditional means. This is good, because people who do causal inference already have tools to do so (e.g., instrumental variables), yet the estimation is different (estimating a conditional variance instead of a conditional mean), so the theoretical motivation in the causal inference may be more difficult, and the techniques may not be as well established with easy availability in software (e.g., the analogue to instrumental variables when conditional variances are being estimated).

thanks for this extensive answer. The mention of causality in the Q is probably just distracting, as most of the time it is established at the design level, rather than the estimation level. Agree re ratios too, taking the difference is much inferior. I am still leaving the Q open, we might both get something more out of it. — Marti, Mar 16 '23 at 03:51

score 1 · Answer 2 · answered Mar 15 '23 at 23:31

If the data for each group are independent and identically distributed according to normal distribution, you could conduct a two-sample F-test to determine whether $\mathbb{V}(Y\mid T=1) > \mathbb{V}(Y\mid T=0)$. The F-statistics is $F=\frac{s^2_{1}}{s^2_{0}}$, where $s^2_{1}$ and $s^2_{0}$ are the sample variances for group $T=1$ and $T=0$. Then compare it to the critical value $F_{n_1-1,n_0-1}$. You can reject the null hypothesis and conclude $\mathbb{V}(Y\mid T=1) > \mathbb{V}(Y\mid T=0)$ if $F>F_{n_1-1, n_0-1}$.

Welcome to Cross Validated! The trouble with using an F-test this way is that it lacks robustness to the deviations from the normal distribution assumption that are likely to appear in real data. — Dave, Mar 15 '23 at 23:51

Effect of treatment on outcome variance

2 Answers2