2

My aim is to carry out a generalized linear model (glm) with 1 response variable and 13 explanatory variables.Unfortunately 3 out of the 10 explanatory variables contain NA values (2/3 of data set of this 3 variables are NA values). I realized that the "step" function does not work with NA values (https://stackoverflow.com/questions/11819472/why-does-the-number-of-rows-change-during-aic-in-r-how-to-ensure-that-this-does). Therfore my question: How can I proceed to automatically improve my glm without eliminating my sites with NA values?

This question has been duplicated (I do not understand why - it seems as if sbd migrated it): R Linear model step NA values

kalakaru
  • 551
  • 1
    First notice that stepwise regression is probably something that you do not want to do, e.g. http://stats.stackexchange.com/questions/13686/what-are-modern-easily-used-alternatives-to-stepwise-regression/13698 and http://stats.stackexchange.com/questions/20836/algorithms-for-automatic-model-selection/20856#20856. And about NAs - you are probably interested in imputing them http://en.wikipedia.org/wiki/Imputation_%28statistics%29 or http://www.jclinepi.com/article/S0895-4356%2806%2900197-1/pdf – Tim Apr 11 '15 at 08:06
  • Harrell's book on regression has a chapter on the topic and he suggests imputing/interpolating as well (Similar to this post). Another option would be to use WinBUGS/OpenBUGS/JAGS and use a Bayesian approach to estimate your missing values. – Richard Erickson Apr 11 '15 at 13:02
  • Could you describe your data in greater detail? Do you know why are those values missing? The answer to your question depends on this. – Tim Apr 11 '15 at 14:34
  • Missing data handling (via multiple imputation or other methods) prior to analysis might be a good option. Check my related answer. – Aleksandr Blekh Apr 12 '15 at 07:48

0 Answers0