My goal is to predict y, but my dependent variable y has more than 20 levels. I dont think multi-nomial model would be a good choice ? Any suggestions or pointers on what modeling methodology I should explore for this problem is much appreciated. Thanks in advance.
Asked
Active
Viewed 81 times
3
bison2178
- 487
1 Answers
1
Predicting a discrete outcome with too many levels is a hard problem. Usually people do one vs. others approach, where you build many models and each model can detect one specific level of the output.
Here is why: Think about you have a 100 side-dice, and you know the true distribution. Where $P(S_1)=0.1$, and $P(S_2)=P(S_3)=P(S_{100})=0.9/99=0.009090$. Now what you do with Maximum a posteriori estimation? You will always guess you get the first side $S_1$, since it has largest probability comparing to others. However you will get wrong $90\%$ of the times!!
For details, please check my answers in this post
ythat has so many levels. This maybe due to my inexperience on this issue. – bison2178 Jun 03 '16 at 18:51