How to test for significance if the variances are different?

Question

If got data (value) of two types (cond true or false):

  > str(test.df)
  'data.frame': 3208 obs. of  2 variables:
   $ cond : Factor w/ 2 levels "FALSE","TRUE": 1 1 1 1 1 1 1 1 1 1 ...
   $ value: num  31 25 21 29 18 41 15 7 33 6 ...

The mean and the variance (and also the number of cases) are different:

    > stats <- ddply(test.df, .(cond), summarize, mean=mean(value), var=var(value), n=length(value))
    > stats
       cond     mean      var    n
    1 FALSE 17.33918 141.3199 3137
    2  TRUE 25.91549 177.4499   71

So different variances means I can't use wilcoxon-test, right?

If I plot the densities I see the following:

densities

The Shapiro-test says they are not normal-distributed.

    > shapiro.test(test.df[test.df$cond==TRUE, 'value'])

      Shapiro-Wilk normality test

    data:  test.df[test.df$cond == TRUE, "value"]
    W = 0.9583, p-value = 0.01901

    > shapiro.test(test.df[test.df$cond==FALSE, 'value'])

      Shapiro-Wilk normality test

    data:  test.df[test.df$cond == FALSE, "value"]
    W = 0.9348, p-value < 2.2e-16

No normal distribtion means I can't use Welch-test, right?

So how can I test if the difference of the means is significant?

To me, the variances don't seem too unequal. But if you want a test that is robust to violations of distributional assumptions: have you considered bootstrapping it?
Also possibly related: http://stats.stackexchange.com/questions/88457/hypothesis-testing-wilcoxon-test-bootstrapping-or-something-else — jona, May 03 '14 at 13:28

Alexis · Accepted Answer · 2014-05-03T13:45:10.880

You can use an unpaired t test for mean difference assuming unequal variances, even with non-normal data (the sample mean is distributed asymptotically normal, regardless, for all finite i.i.d. distibutions, and you've reasonably large sample sizes).

Or you can also use a (Wilcoxon/Mann-Whitney/Mann-Whitney-Wilcoxon) rank sum text. The latter test is most generally a test of stochastic dominance with H$_{0}\text{: P}(X_{\text{True}} > X_{\text{False}})=0.5$ ) i.e. the probability that a randomly observed value from the condition true group is greater than a randomly observed value from a condition false group is equal to one half; the alternative hypothesis is that one group is more likely to have a greater observed value than the other). Under the additional assumptions that (1) the distribution of value is the same shape, and (2) differs only in central location (i.e. not in variability), then the rank sum test can be interpreted as a test of median difference.

score 0 · Answer 2 · edited Jun 11 '20 at 14:32

Consider what statistical significance means: the probability of observing an outcome of the magnitude seen or larger, given that the null hypothesis is true.

In your case, the null hypothesis is that there is no difference in means between the two group, i.e. $E(y|x=1)=E(y|x=0)$. But even under this model, we would expect some random noise. A way to simulate that noise is to randomly assign cases to the two groups and then measure the difference in means. This is the reasoning behind the so-called permutation tests.

An illustration, implemented in Stataesque pseudo-code

use dataset.dta
t-test Y, by(group) // gives us the observed difference
forvalues i<1000{
gen x = uniform()
sort x
gen id = _n // the above lines put the observations in random order
recode id (1 / x = 0) (x+1 / N = 1) // divides the observations into two random groups of size x and N-x, corresponding to the sample size for each exposure.
t-test Y, by(id) // size of difference under this permutation
}

You then compare your observed difference with the set of differences returned under random permutations. A 5% significance corresponds to your observed difference being the 95%-tile of the random permutations.

The moral of the story is: there is only one test. t-tests, F-tests etc. are fast ways to do it, and a permutation test for a regression model can be a horrible task. But in the end, computing data from the null model and comparing that to the observations is what all statistical tests do, in some way.

@Alexis, sure. I just thought writing it in pseudocode would illustrate the reasoning behind the test better than pointing to a specific command. — abaumann, May 03 '14 at 13:58
Per the Wikipedia link, the exchangeability of observations is a requisite assumption of permutation tests, and different variances between groups violates this assumption. — Alexis, May 03 '14 at 14:11

How to test for significance if the variances are different?

2 Answers2