1

I have one IV (faculty) and multiple continuous DVs and wanted to do a one-way MANOVA. As MANOVA requires normal-distributed data, I plotted the DVs and saw that they are heavily long-tailed (keep in mind y-axis is already log-scaled): enter image description here

Now, I was reading a lot about possible solutions to this. These include

  • data transformation

I couldn't find any information though on how to transform long-tailed data such that an approximate normal distribution would result.

  • Non-parametric MANOVA

The most common non-parametric test I found is the multivariate Kruskal-Wallis test but there seems to be no implementation in both Python or R to do this. I have also seen some people probably doing this manually but I don't know if this is the same as an MKW: https://stackoverflow.com/questions/70419691/kruskal-wallis-test-for-multiple-comparison-using-python

  • Semi-parametric MANOVA

I found this package during one discussion on researchgate that could be helpful: https://cran.r-project.org/web/packages/MANOVA.RM/MANOVA.RM.pdf

What would you suggest me to do in this situation? I'm a bit overwhelmed with possibilities and don't know what would be best in this situation.

beld
  • 63
  • 1
  • 4
  • If I am understanding this correctly, you want to pool several outcome variables on quite different scales (some measured, some counted) and throw them into some single procedure. That seems quite wrong to me. What is the statistical question you are trying to answer? – Nick Cox Aug 24 '22 at 12:43
  • I want to see if there are different characteristics between these variables for the different categories (faculty). Characteristics refers to all the continuous variables. I don't really understand the remark about the different scales, sorry – beld Aug 24 '22 at 12:47
  • 1
    That goal requires separate analyses for each outcome. You can't lump together variables that aren't measured on the same scale. If you measured lifespan in year or size in GB you would get different numbers and neither would be comparable with anything counted. – Nick Cox Aug 24 '22 at 12:50
  • Thanks, that sounds like the best approach then would be to do a kruskal wallis test for each dependent variable and analyze each result separately. It seems that I have misunderstood the purpose of a MANOVA. – beld Aug 24 '22 at 12:59
  • Sorry, can you link me a resource that states the assumption / prerequisite that MANOVA requires dependent variables on the same scale? I don't see this mentioned anywhere. In fact, this answer states the opposite: https://stats.stackexchange.com/questions/236184/manova-standardization – beld Aug 24 '22 at 14:18
  • No doubt your software may allow it. Good luck making sense of the results and justifying any inferences! – Nick Cox Aug 24 '22 at 16:00
  • I have never mentioned that the software won't allow this. I can also just run an ANOVA on non-normally distributed data which is obviously allowed by software but it still is an assumption that ANOVA makes. No resource I could find states this assumption regarding scale about MANOVA. I was asking politely whether you could send me some resources that explain the reasoning behind the need for same scale as I couldn't find any. I don't understand why you have to answer in such a petty way. – beld Aug 24 '22 at 16:12
  • In turn I don't know why you seek to interpret an attempt to be frank in other terms, but sorry if you take my tone as unhelpful. To back up, my first wording "quite wrong" was not best chosen. I'd say rather that it is not a best strategy to throw in a mix of quite different outcomes into an analysis, but evidently people do this and I can't predict in advance how helpful you'll find it. – Nick Cox Aug 24 '22 at 16:57
  • My stance here is strategic rather than tactical: people sometimes want to believe that a single method or test will sort out their data, but I usually find the opposite, that quite different outcomes may mean several linked analyses. – Nick Cox Aug 24 '22 at 16:58
  • Strictly, anova assumes that errors are normal, not the data. – Nick Cox Aug 24 '22 at 16:59
  • 1
    You could be interested in https://stats.stackexchange.com/questions/190156/t-tests-manova-or-logistic-regression-how-to-compare-two-groups – kjetil b halvorsen Aug 27 '22 at 04:37

1 Answers1

2

I spent a lot of time on a very similar issue, and the relatively novel Nonparametric Comparison of Multivariate Samples (npmv package in R) seemed to check all the boxes for me.

It also stands up to, what the authors refer to as, "multivariate data which usually involve different, typically dependent characteristics measured in rather different units." See here.