0

ANOVA presupposes a normal distribution and equal variance. Kruskal–Wallis (non-parametric ANOVA) assumes that all population distributions are the same (except their parameters).

I'd like to know if there is an assumption-free test: an ANOVA test which just assumes a continuous distribution and independent and identically distributed data.

  • 1
    How does Kruskal-Wallis assume all distributions are the same? Do you mean under the null hypothesis? (That is a typical assumption under the null hypothesis.) – Dave Dec 01 '21 at 04:25
  • They actually are the same except for its parameters, for example, let $F_i$ be as acumulative of population $i$ parametrized for $\theta_i$ then kruskal wallis assumes $F_i(\theta_i)=F_j(\theta_j)$ only if $\theta_i=\theta_j$ so all distributions belong to the same function class $\mathcal{F}$ – Davi Américo Dec 01 '21 at 09:56
  • What do you think about permutation anova ? Or aligned ranks transformation anova? – Sal Mangiafico Dec 01 '21 at 22:32
  • https://stats.stackexchange.com/questions/384742/wilcoxon-test-non-normality-non-equal-variances-sample-size-not-the-same/384926#384926 (FWIW, you will always need the assumption of independence even if you manage to get rid of the distributional assumptions ...) – Ben Bolker Dec 01 '21 at 22:53
  • I know, this assumption is so basic that I didnt even stated but I've updated my post though. – Davi Américo Dec 02 '21 at 01:45
  • 3
    "Kruskal wallis (non-parametric ANOVA) assumpts that all population distributions are the same." False. – Alexis Dec 03 '21 at 16:06
  • 2
    There is no such thing as an assumption-free statistic, so there cannot be an assumption-free ANOVA. – Galen Dec 03 '21 at 17:34
  • @Alexis In order to be more formal, kruskal wallis assumes that all distributions belong to the same family. – Davi Américo Dec 03 '21 at 22:48
  • @DaviAmérico Also false. The null of the K-S is $\text{H}{0}\text{: }P(X_i > X_j)=0.5$ for $i,j \in 1,\dots, k$, and $i \ne j$ with $\text{H}{\text{A}}\text{: }P(X_i > X_j)\ne0.5$, and this makes only the most general distributional assumptions (i.i.d., finite mean and variance). Folks propounding "same shape"/"same distribution" type assumptions for K-S (and for rank sum test) are, in my opinion, trying to bend the test away from its nonparametric beauty into a grotesquely unwieldy version of a parametric one-way ANOVA (or t test). – Alexis Dec 03 '21 at 23:48
  • So are there two kruskal wallis? If so please go it as an answer I'll consider it. – Davi Américo Dec 03 '21 at 23:55
  • Please see this paper where I learnt kruskal test from: https://www.researchgate.net/publication/289442433_Methodology_and_Application_of_the_Kruskal-Wallis_Test, page 116 – Davi Américo Dec 04 '21 at 00:06
  • 1
    @DaviAmérico No, just one. If you make additional assumptions (1) distributions all have same shape, and (2) distributions all have same variance, you can treat the K-S as a test for location shift (i.e. omnibus mean difference, omnibus median difference), but again: I think that dispenses with the useful generality of the test as per my previous comment. – Alexis Dec 04 '21 at 00:19
  • To add to @Alexis 's description of the test. Conover, 1999, Practical Nonparametric Statistics, 3rd, adds the following assumption: Either the k population distribution functions are identical, or else some of the populations tend to yield larger values than other populations do. I think the idea is that if you start with two population with the same location and distribution shape, but with different variances, that you can get an inflated type-I error rate. I may have checked this with simulation, but I don't quite remember. – Sal Mangiafico Dec 04 '21 at 17:02
  • 1
    @SalMangiafico and Davi Américo And to be super explicit: my point is not that one cannot use KS for a test of location shift by making those additional assumptions, my point is that the KS test is fundamentally more general than that, and is still useful when making more general/fewer assumptions. – Alexis Dec 04 '21 at 17:06
  • 1
    @Alexis , Right. Conover is not setting up the test as a location shift, but keeping it as a test of stochastic dominance. I think, though, that you need that weird additional assumption to keep the type-I error rate at the nominal rate in certain cases. Practically speaking, it's not something I would worry about. And I do think that thinking of this test as test of stochastic dominance is more valuable than typing to contort it to a test of location shift. – Sal Mangiafico Dec 04 '21 at 17:15
  • @Alexis , but this additional assumption does make H0 and H1 a little weird (but probably statistically correct). I added an answer with this information from Conover. – Sal Mangiafico Dec 04 '21 at 17:23

2 Answers2

1

If you are performing inference on the mean and would like to compare groups (even while adjusting for covariates) you can use a semi-parametric generalized estimating equation (GEE) model where the variance is modeled independently from the mean (which is still a least squares model like ANOVA). You can also include a non-linear link function between the mean and the linear predictor. For robust inference you can use asymptotic Wald tests and confidence intervals based on the empirical sandwich covariance estimator. All of this allows for inference on means without needing to specify the underlying data distribution. You can fit such a semi-parametric model using a generalized linear model package like glm in R or Proc Genmod in SAS.

In contrast, a typical ANOVA uses a single common variance term to calculate all of the standard errors for the model parameters and inference is performed using t-tests under the assumption of normally distributed data. Of course the t-test is very similar to the Wald test and is robust to distribution misspecification so long as the mean estimator is approximately normally distributed and the variance estimator is consistent.

As an example I simulated $10,000$ Monte Carlo samples of $n=50$ observations from a $\text{Weibull}(k=1.1,\lambda=3)$ distribution to investigate the coverage probability of the $95\%$ Wald confidence interval for the mean, $\mu=\lambda\Gamma(1+1/k)$, based on least squares estimating equations and the sandwich covariance estimator. Using an identity link function the $95\%$ Wald CI covered $93.1\%$ of the time. Using a log link function the $95\%$ Wald CI covered $93.6\%$ of the time. With a sample size of $n=100$ these coverage probabilities become $93.6\%$ and $94.1\%$, respectively. These results are based on SAS Proc Genmod.

To address Frank Harrell's concern I simulated $1,000$ Monte Carlo samples of $n=50,000$ from a $X\sim$ $\text{Pareto}(x_m=1, \alpha=3)$ distribution with $E[X]=\frac{\alpha x_m}{\alpha-1}=1.5$ and $\text{Var}[X]=\frac{x_m^2\alpha}{(\alpha-1)^2(\alpha-2)}=3/4$. The largest simulated value was over 900. Both the Wald interval with an identity link and a log link covered $E[X]$ $95.7\%$ of the time. I also simulated $1,000$ Monte Carlo samples of $n=50,000$ from a $\text{Pareto}(x_m=1, \alpha=2)$ distribution with $E[X]=\frac{\alpha x_m}{\alpha-1}=2$ and $\text{Var}[X]=\infty$. The Monte Carlo variance of the sample mean was $15.15$ and the largest simulated value was over $6,000$. The $95\%$ confidence intervals with an identity and log link covered $93.2\%$ and $93.4\%$ of the time, respectively. Using a higher confidence level such as $96\%$ or $97\%$ should bring the true coverage rate closer to $95\%$.

Of course with $n=50,000$ observations one might feel comfortable fitting a parametric Pareto model. Here is a thread on ResearchGate where I describe inverting the CDF of the maximum likelihood estimator while profiling nuisance parameters to construct confidence limits and confidence curves for the shape and scale parameters of a Pareto distribution. This approach could also be used for inference on the mean.

@Frank Harrel, if there is a particular distribution you would like to suggest where $n=50,000$ is insufficient for reliable inference on the mean using semi-parametric generalized estimating equations, let me know.

  • How do you figure that ANOVA uses t-tests? – Dave Dec 02 '21 at 03:17
  • I'm thinking about the output I see from SAS packages like Proc GLM (General Linear Model) and Proc Reg, as well as Proc Mixed. These procedures report t-tests. Of course one is free to construct other tests and the t-test is quite robust to the departure of the normality assumption so long as the distribution of the sample mean is well approximated by a normal distribution. – Geoffrey Johnson Dec 02 '21 at 03:21
  • 2
    The methods Geoffrey outlined make many more assumptions than they seem to, or rely on very large samples to be accurate. – Frank Harrell Dec 02 '21 at 22:13
  • The assumptions are i) the first two moments exist and are finite, ii) a correctly specified linear predictor, and iii) a modest sample size so that the central limit theorem can take effect when constructing p-values and confidence intervals. If the data generative process is right skewed, incorporating a log link function when constructing p-values and confidence intervals can ensure proper operating characteristics. – Geoffrey Johnson Dec 03 '21 at 02:51
  • 2
    No the central limit theorem is only a limit theorem and I have examples where N=50,000 is not sufficient for obtaining sufficient accuracy. The methods you proposed are also very sensitive to how you transform Y. And in practice the determination of data being "right skewed" is far from obvious. – Frank Harrell Dec 03 '21 at 15:40
  • With this modeling approach one does not transform the dependent variable. The link function is applied to the mean and its estimator. That sounds like a very interesting setting where 50,000 observations would not provide an accurate semi-parametric estimate of the mean. In such a setting one could investigate this through simulation and re-sampling to identify the confidence level that provides the desired coverage rate. Alternatively, one could use a parametric generalized linear model instead of semi-parametric generalized estimating equations. – Geoffrey Johnson Dec 03 '21 at 16:38
1

It may be helpful to have the assumptions of the Kruskal-Wallis test from Conover, 1999, Practical Nonparametric Statistics, 3rd posted here:

  1. All samples are random samples from their respective populations.

  2. In additional to independence within each sample, there is mutual independence among the various samples.

  3. The measurement scale is at least ordinal.

  4. Either the k population distribution functions are identical, or else some of the populations tend to yield larger values than other populations do.

This yields the hypotheses:

H0: All of the k population distribution functions are identical.

H1: At least one of the populations tend to yield larger observations than at least one of the other populations.

Sal Mangiafico
  • 11,330
  • 2
  • 15
  • 35
  • Sal, I think the second clause in assumption four is backward? Otherwise rejecting the null is evidence against (0th order) stochastic dominance. Right? $P(X_i > X_j) = 0.5$. – Alexis Dec 04 '21 at 17:25
  • Or am I misinterpreting that fourth assumption as an assumption specifically under the null? – Alexis Dec 04 '21 at 17:41
  • 1
    @Alexis , No, I don't think it's under the null. And yes, rejecting the null is against stochastic equality. It just says the test works correctly when *Either* the distributions are the same *or* one tends to have higher values than the others. As far as I can tell, this just rules out the case where the locations are the same, the shape of the distributions is the same, but the variance is different. But I might be missing something. – Sal Mangiafico Dec 05 '21 at 11:50
  • Got it! Thank you, as always. :) – Alexis Dec 05 '21 at 17:13