1

i got 30 binary variables and i want to find out if there is any combination of this variables which leads to a high value on a metric variable.

For example: i got the binary variables "loud", "hot", "new", "nice", "hard" and i got the dependent metric variable "liking" (1-100).

I want to find out which combination of "loud", "hot", "new", "nice" and "hard" leads to the highest ratings on "liking". If it was just 2 binary variables i would think a 2-factorial anova would do the job? But i got 30 binary variables and i have no idea how to do this.

i could calc 30 t-tests to finde out which binary has an effect alone. But if i do so, i wont finde out how they interact with each other. i could calc a regression with all the 30 variables and have the same problem.

Is there any method to finde out the best combination of binary-ratings for a high liking value?

And as a second theoretical question: If there is any combination of binary variables which leads to a high rating, does that imply that all the binary variables in this combination are correlated? If yes, i could look for correlations.

thanks a lot for all your help!

1 Answers1

1

You can't check every possible combination. There are $2^{30} = 1,073,741,824$ of them, so unless you have a truly gigantic sample, you won't even be able to even find a case for each of them, let alone model them.

A more tenable approach is to check for every effect of a single binary variable and assume no interactions, or only allow all the two-way interactions (of which there are a mere ${30 \choose 2} = 435$). Whatever combinations you want to check, you can fit a linear regression model with whatever predictors you want, then look at what the model says is the combination of covariates with the greatest conditional mean.

Kodiologist
  • 20,116