2

Say I have 15 or 20 samples each from two samples. The data shows evidence of strong skew.

I am considering a two-sample t-test of means, based on the assumption that the sampling distribution of the mean is t-distributed.

My test is based on pooled sample variance, comparison of means against a null hypothesis of no difference.

$t^*=\dfrac{\bar{x}_1-\bar{x}_2-0}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}$

How do I know that I have enough degrees of freedom to make this assumption valid?

Of course, some talk about a 30-sample threshold for the t and normal distributions being exchangeable.

But how do we know the t is appropriate? Given modern computation, should we not just use the permutation test or a Mann-Whitney?

What assumptions can I check before considering my choice of test?

  • Usually skewness makes the test less powerful than you might think. To find out how much, you could emulate the study I presented at https://stats.stackexchange.com/a/69967/919. – whuber Oct 14 '23 at 23:39
  • I would assume no test is valid. But that is a conclusion I'm jumping to and I'm not sure how to validate it empirically. @whuber – Estimate the estimators Oct 14 '23 at 23:46
  • I suppose simulation is one idea but that assumes we know moments of the sampling distribution or at least an approximation – Estimate the estimators Oct 14 '23 at 23:56
  • 1
    You state you're asking about a two-sample t-test (per your first and third sentences) but the formula for $T_A$ that yo gave relates to a one sample statistic. It matters which you're asking about (the null distribution for the one-sample test is more impacted by skewness than the two sample test). Please clarify. – Glen_b Oct 15 '23 at 02:02
  • @Glen_b I was too quick to copy that. I updated. Two samples, test of means. – Estimate the estimators Oct 15 '23 at 03:25
  • 1
    The n=30 “rule” is a joke when asymmetry is present. And the original question implies that there is something wrong with using the Wilcoxon-Mann-Whitney test if the data happen to be Gaussian. There isn’t. – Frank Harrell Oct 15 '23 at 12:39
  • @FrankHarrell That seems a bit harsh as a reply. I'm in no way critical of the technique. I'm just asking how to know if I should use it or the t-test given limited information about the data generating process. – Estimate the estimators Oct 15 '23 at 16:39
  • Fair enough. I would not rely on large sample theory to make the decision. When the variable is asymmetric the SD is not a good measure of dispersion and I have an example where N=50,000 is insufficient for achieving adequate confidence interval coverage. – Frank Harrell Oct 16 '23 at 11:53

0 Answers0