0

I'm still a bit new to the world of statistics, so I apologize for the naivety of this question.

I have three populations each with fifteen features, and I would like to determine if Population C is more similar to Population A or Population B. Is there some test which will allow me to come up with one statistic to determine their similarities? I was using the Welch's test to compare the populations one feature at a time, but in some cases, the t-statistic was high, and others low. I have an idea of the importance of each feature after running them through a random forest algorithm and extracting feature importance, but I would like to approach this as statistically robustly as possible.

Further, what is more important if I'm just interested in the similarity between populations: the t-statistic or the p-value?

The three populations also have very different sample sizes: Population A has about 18000, Population B has about 300, and Population C has about 300.

Thank you for any help you can provide! Let me know if there is any more information I can provide that may be helpful in solving this problem!

Bre
  • 1
  • 1
  • 1
    For us to give a good answer, you have to say what, exactly, you mean by "similar". Similar mean? Spread? Median? Or what? – Peter Flom Mar 19 '24 at 16:36
  • One idea is multinomial logistic regression, see https://stats.stackexchange.com/questions/190156/t-tests-manova-or-logistic-regression-how-to-compare-two-groups Your sample sizes should suffice – kjetil b halvorsen Mar 19 '24 at 17:17
  • Canonical correlation. – wjktrs Mar 20 '24 at 02:07

0 Answers0