0

I have a dataset with a categorical target ($y$) and multiple categorical features ($x_1$, $x_2$, ..., $x_i$). I have been able to successfully use a logistic regression model to calculate an odds ratio for each feature individual.

I was wondering what metric is traditionally used for groups of categorical features together. The idea would be that the $x$ values can be split into separate groups, as in Group A ($x_1$, $x_2$, $x_3$), Group B ($x_4$, $x_5$, $x_6$), etc.

My goal is to find some statistical tool to compare the combined predictive power for the $x_i$ features together within each group. Alternatively, my $y$ values are also available as a continuous variable, so I would be grateful for any suggestions given a continuous $y$, not just a categorical $y$. The odds ratio might be the wrong analogy here, and I might be looking for something completely different. Regression might not be the right tool either.

User81646
  • 101
  • Hi! You should elaborate a bit more on what you mean exactly by grouping variables, and why you think you need to do that. You might be interested in modeling interactions in your regression model (see https://stats.stackexchange.com/questions/600207/what-are-the-indications-that-one-should-be-using-interaction-variables-in-their), but it's not entirely clear to me what you're after. You could also use some dimension reduction method like multiple correspondence analysis, and then include the resulting dimensions in your regression, but again more details would be useful to be sure. – J-J-J Nov 24 '23 at 21:48

0 Answers0