I have some output data (around 800 data points) that very nicely fits a negative binomial distribution. I checked using fitdistr() in R and it is a very good fit.
Given this, my plan was to use negative binomial regression from other variables to derive a model so that i can predict the output data. I have around 65 variables that are available to use. I used glm.nb() in R to derive a model from all of these and only about 3 of the variables were significant.
I want to try to reduce the number of variables that aren't contributing very much and have been thinking about using PCA to identify the ones I can cut out. Is this sensible way of doing it? Or can PCA not be used for a negative binomial model?
I have also tried using stepAIC() in R, but i couldn't get it to work. It stopped after a single iteration (even if started from a simple 1 variable model and worked up, or backwards from a complete model). I also read people criticising the use of the stepAIC approach regardless. What are the problems with using it?
Any tips on both the points above (and negative binomial regression model selection) would be appreciated!