0

I have a dataset with 200 individuals, 50 features and an outcome variable with 2 levels. The majority of features are not (statistically) significantly different (T-test, Kolmogorov-Smirnov test). I want to argue, however, that even though individual features lack differences, when considering all together there might exist a region in the hyperparameter space in which both groups are (clearly) separated.

I think this is fair to say and I'm looking at these questions for arguments:

Distinguishing between two groups in statistics and machine learning: hypothesis test vs. classification vs. clustering

Discriminatory model but no discriminatory features? (especially like the counterexample provided here by user Dave)

To finally cement this, I would love to include an academic reference, paper or book that discusses this or at least takes this into account. Does anything come to mind?

amr95
  • 13

1 Answers1

0

Question is not totally clear, and you would benefit from giving more details. But try logistic regression, even if individual predictor does not discriminate between the two groups, there might be multivariate information there that can help. But 200 observations is not many ...

Have a look at T-tests, manova or logistic regression - how to compare two groups?