1

I need to perform model selection using a standard p-value approach. Using logistic regression we would like to compare the following models:

Y =       A + B + C + D
Y = A*B + A + B + C + D
Y = A*C + A + B + C + D
Y = A*D + A + B + C + D

I performed the following analysis in R:

first  <- glm(Y~G+N+E+C,     family="binomial")
second <- glm(Y~G*N+G+N+E+C, family="binomial")
third  <- glm(Y~G*E+G+N+E+C, family="binomial")
fourth <- glm(Y~G*C+G+N+E+C, family="binomial")

summary(first)
summary(second)
summary(third)
summary(fourth)

anova(first, second, third, fourth, test="Chisq")

However, I think I do not have the right output here for a model selection based on p-values?

anova(first, second, third, fourth, test="Chisq")
# Analysis of Deviance Table
# Model 1: Y ~ G + N + E + C
# Model 2: Y ~ G * N + G + N + E + C
# Model 3: Y ~ G * E + G + N + E + C
# Model 4: Y ~ G * C + G + N + E + C
# Resid. Df | Resid. Dev | Df | Deviance | Pr(>Chi)
# 1   595   |  609.90    |                     
# 2   594   |  609.90    | 1 | 0.000169 |   0.9896
# 3   594   |  609.81    | 0 | 0.087775 |        
# 4   594   |  609.90    | 0 | -0.085001|  

So, how to perform a model selection here, using a standard p-value approach?

Sophie
  • 11
  • 2

1 Answers1

1

Could it be possible to formulate these models as a nested structure?

First you would have a model like

Y=A*B+A*C+A*D+A+B+C+D 

which is H1, and then

Y=A*B+A*C+A+B+C+D

which is H0.

Now you could do a Likelihood-Ratio test which is -2*logLik(H0)+2*logLik(H1) and is distributed as a $\chi^2$ with $k+p-k$ degrees of freedom where $p$ is number of extra parameters in the H1. If the result is large then you can say that it is not proper to use the H0 model, H1 is needed to explain variation in the original data (seen in the extra deviance which increases if second largest is used versus largest model).

This could be done in steps: first take largest model as H1, and second largest as H0; in the next step (if H0 is rejected) you will take second largest model as H1 and third largest model as H0. And so on...

Comp_Warrior
  • 2,173
Analyst
  • 2,655
  • Hi, I only need to compare the models as specified above, which means that I have got non-nested models. I need to use a standard p-value approach, only one I could think about here was the Chi-square. Which means that I can only compare model 1 with all the others, but not model 2 with model 3 for example. However, the statistics favor model 1 with p-values above .72. But than the question remains, is this the best way to perform model selection using a standard p-value approach? – Sophie Aug 28 '13 at 20:50