I'm doing logistic regression in R with binary data (0's and 1's), sample size around 300 : Predicting 1 target variable (varp)
If I use one independent variable ( varx), it's significant (p 0.03, the AIC is 200) : glm(formula = varp ~ varx, family = binomial, data = mydata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.0251 0.2215 -4.681 0.00000245 ***
indepvar1 -0.6551 0.3612 -2.118 0.0322 *
Dispersion parameter for binomial family taken to be 1)
Null deviance: 211.06 on 205 degrees of freedom
Residual deviance: 206.36 on 204 degrees of freedom
AIC: 200
But when I use multiple independent variables of interest the AIC becomes 170, glm(formula = varp ~ varx+varb+vargg+varkkk...., family = binomial, data = mydata)
How to select the model ( the one with 1 var or the group of vars) that best predict the varp ?:
- the model with One independent variable (varx) with AIC 200 , or
- a group of variables with AIC 170, in this group, the varx becomes non significant and instead another one is significant ...
AICis smaller and variables are included. I would really recommend you to read more into the discard of non-significant variables. It is seen as a bad commonly adopted idea in statistics, here. And, I still think a model with only one variable is not really a model, at most a very very dubious model. Just because 9 of the 10 variables are not significant does not tell you they are useless. – Thomas Aug 20 '20 at 21:50glmnetpackage and the opportunities with ridge, lasso & elasticnet models for binary outcome here – Thomas Aug 20 '20 at 21:50do yo use 'Targets' or other package that can deal with that situation in an automatic matter. Any examples using lasso / glmnet ?
– Den Aug 21 '20 at 09:21glmnet. I followed this. – Thomas Aug 21 '20 at 11:35Hi Thomas, a new question arrise: the intercept is sometimes not significant, I've read that it's not important, it's the explicative variables that are important, how to select the model if one is with a significant intercept < 0.05 and another is with not singnificant : p : 0.69 ... ? (same intercept but in another model)
i did not get the point where you tell models with only one indepenedent significant variable are not good ? I found one model and selected the variable with very small p, the AIC of that model is average , so there are better AIC models but weaker P of indep var
AICandBIC(individually, as described before). I would choose the model with the lowestAICif positive, or highest ifAICis negative similar forBIC. – Thomas Aug 27 '20 at 11:43