What to do when comparing mean of two populations and one is not normal?

Question

I have a dataset with two populations (unpaired). First one with 36 observations, second one 74. The first passes the Shapiro normality test with p-value = 0.1521, the second fails it with p-value = 0.01551.

I would like to perform a test t.test(first, second), how can I do? If I transform with a BoxCox the second population it doesn't make any more sense to compare it with the first population.

Edit: Here is the data asked by @BruceET

> lapply(g, function(x) c(mean=mean(x), var=var(x), sd=sd(x)))
$mark
     mean       var        sd 
21.986111  6.364087  2.522714
> lapply(p, function(x) c(mean=mean(x), var=var(x), sd=sd(x)))
$mark
     mean       var        sd 
25.378378  7.608293  2.758313

These tests of Normality are essentially irrelevant. It's unlikely you have a problem--but the best way to assess that begins by plotting the data distribution and not by reporting the p-value of a Normality test. — whuber, Jun 04 '22 at 13:13
See also: This CrossValidated link; the answer by Caracol for implementing a permutation test in R, and note the first comment by whuber under Caracol's answer regarding the applicability of a traditional t-test for the example. — Sal Mangiafico, Jun 05 '22 at 14:10

BruceET · Accepted Answer · 2022-06-05T19:10:41.477

I attempted to digitize your data (approximately), based on your histograms. Of course, my two samples do not exactly match yours, but they seem sufficiently similar to your data to use for illustrative purposes:

x1
 [1] 18 18 18 18 18 19 19 19 19 19 19 20 20 20 20 20 20 20 20 20 20 21 21 21 21
[26] 21 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 25 25 26 27
length(x1); mean(x1); sd(x1)
[1] 50
[1] 21.12
[1] 2.115444
x2
 [1] 19 19 19 19 21 21 21 21 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23
[26] 23 24 24 24 24 24 24 24 24 25 25 25 25 25 25 25 25 25 25 25 26 26 26 26 26
[51] 26 26 26 26 26 26 26 26 27 27 27 27 27 27 28 28 28 28 28 28 29 29 29 29
length(x2); mean(x2); sd(x2)
[1] 74
[1] 24.41892
[1] 2.668983

Normal quantile-quantile plots of both samples (x1 at left) are roughly linear, suggesting that neither sample is far from normally distributed.

Also, boxplots show no outliers or extreme skewness. So a pooled t test seems a reasonable choice. Notches in the sides of the boxes are nonparametric CIs calibrated to that lack of overlap suggests difference in medians. We should not be surprised if means are also significantly different.

A pooled 2-sample t test finds a highly significant difference in the two sample means--with t statistic $T = -7.32$ and P-value very near $0.$

t.test(x1, x2, var.eq=T)
    Two Sample t-test


data:  x1 and x2
 t = -7.3204, df = 122, p-value = 2.896e-11
alternative hypothesis: 
 true difference in means is not equal to 0
95 percent confidence interval:
 -4.191024 -2.406814
sample estimates:
 mean of x mean of y 
  21.12000  24.41892

We get the same results in 'stacked format':

x = c(x1, x2);  g = rep(1:2, c(50,74))
t.test(x1, x2, var.eq=T)$stat
        t 
-7.320372

As in the discussion in Comments, it seems to me that a t test is appropriate. However, if some qualms about using a t test remain in spite of the usual favorable indications, we could use the pooled t statistic as the metric in a permutation test.

Whether the t statistic has exactly Student's t distribution with 122 degrees of freedom or not, this statistic seems a reasonable way to express the difference between sample means, compared with the variability of the samples.

Below we use R to approximate the permutation distribution of the pooled t statistic. We scramble the $n_1 + n_2 = 124$ observations between groups 1 and 2 repeatedly and find the t statistic for each permutation. The resulting values of $T$ form the permutation distribution of the t statistic. Here, the P-value of the approximate permutation test is essentially $0.$ [In fact, the approximate permutation distribution of the t statistic is approximately $\mathsf{T}(\nu=122),$ shown in the figure below.]

set.seed(2022)
t = replicate(10^4, t.test(x~sample(g), var.eq=T)$stat)
mean(abs(t)>=7.32)
[1] 0        # P-value of aprox permutation test.

Note: Also, the implementation of the 2-sample Wilcoxon rank sum test in R gives a reasonable P-value to indicate a change in location between samples x1 and x2. [There are many ties, but the sample sizes are large enough to get a reliable P-value.

wilcox.test(x1, x2)
    Wilcoxon rank sum test 
    with continuity correction


data:  x1 and x2
W = 642.5, p-value = 6.301e-10
alternative hypothesis: 
 true location shift is not equal to 0

Thank you very much for this, I will dig deep into the argument! — Pier, Jun 12 '22 at 14:50

score 0 · Answer 2 · answered Jun 05 '22 at 20:05

If normality is heavily violated, you can consider non-parametric methods that do not make assumptions about population distributions. Usually, there is a non-parametric equivalent to common problems like the two-sample t-test where the two samples are independent. In this case, you can use a very general Mann-Whitney test.

What to do when comparing mean of two populations and one is not normal?

2 Answers2