0

In the case of multicollinearity, I wonder why:

  1. We typically talk about lack of it as an assumption (thus, we assume non-multicollinearity):

https://www.statology.org/multiple-linear-regression-assumptions/

https://www.linkedin.com/pulse/4-main-assumptions-multi-linear-regression-ritik-karir

  1. Sometimes, we even use the word "test" for the Variance Inflation Factor:

https://kandadata.com/non-multicollinearity-test-in-multiple-linear-regression/

https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/assumptions-of-multiple-linear-regression/

  1. If we talk about non-perfect multicollinearity, this can be easily checked (and, in case the "assumption" is not met, we are forced to remove at least 1 predictor, or let the software programme do so). So, I can't see it as an assumption, but as something we can verify. If we talk about the problem of correlated predictors leading to correlation between parameter estimates, I see it as something on a continuum, while an assumption is something describing a precise situation (for example, independence, equal variance or normality), any deviation from which leading to failure of such assumption (then, of course, there can be robustness, so that results are still reliable even in case of deviations from the assumption, but that's another story).

  2. In my view, tests refer to p-values (in the frequentist case) or probability of the null hypothesis (in the bayesian case). Also, tests use a sample to refer to an underlying population. The VIF doesn't perform a frequentist nor a bayesian test, and it just describes the correlation structure you have among your predictors in your sample, without doing any inference on it in the population. This is because the VIF flags an issue you may have by performing a regression in your sample, regardless of what would happen if you observed the whole population.

  • 3
    The wording assumptions is extraordinarily common, but my own suggestion is that we would be much better off talking about ideal conditions. In elementary logic, the failure of an assumption makes an argument invalid. In statistics, the failure to match an ideal condition can be anywhere between trivial and catastrophic, depending. Anscombe used this wording in 1961, and no doubt it wasn't new then. – Nick Cox Oct 06 '23 at 08:17
  • 3
    Agree, multicollinearity should not be listed as an assumption. Certainly, it is something to be aware of, but not an assumption. When you try to formulate it as a testable hypothesis, eg, no pairwise correlation greater than .9, the silliness of it becomes obvious. – BigBendRegion Oct 06 '23 at 11:02
  • 2
    In the case of multiple regression (MR), I have always found it confusing that some people state "no multicollinearity" as an "assumption" of MR. After all, one of the strength of multiple regression is that it can and does properly handle correlated predictors. Some of the very goals of MR are to identify spurious relationships, redundancies, effects of suppression, etc. all of which are related to collinearity (correlated predictors). – Christian Geiser Oct 06 '23 at 15:50
  • It's a shame that your references are poor ones. Here on CV you can find extensive discussion of these issues. See, for instance, https://stats.stackexchange.com/questions/16381. As far as your title question goes, if "no multicollinearity" were some kind of assumption required for regression, then the vast majority of successful applications of regression would be impossible. You would have to throw out the analyses of most obseravational datasets as well as excluding any designed experiment with missing data points. – whuber Oct 06 '23 at 20:35

0 Answers0