I am looking to test whether there is a significant relationship between two nominal variables, one which has 100 levels and the other with 10 levels. I initially considered doing a $\chi^2$ test of independence but unfortunately the expected values in a number of cells is less than 5, violating an assumption of the $\chi^2$ test. I also considered a Fisher's exact test but this results in substantial computational expense for a 100x10 contingency. Finally I considered changing my approach and making the 10-level independent variable binary (i.e. it is either in one of the classifications or it's not) and then making the 100-level independent variable into a dummy variable, and building a (large / sparse) logistic regression model. What is the best practice for investigating relationships between two nominal variables with levels of this magnitude?
Asked
Active
Viewed 125 times
2
kjetil b halvorsen
- 77,844
wellington
- 709
-
A test approach which is in-between the asymptotic test and the exact test is Monte Carlo test. Istead of doing all possible permutations it simulates random permutations many times. – ttnphns Jan 15 '15 at 18:51
-
Maybe methods like: http://people.unipmn.it/rapallo/papers/alessandria_talk_handout are more appropriate? – kjetil b halvorsen May 17 '17 at 11:42