0

I fit a logistic regression model with 14 predictors, here's the code and output:

veg.fit <- glm(veg~., family = binomial, data=df.c)
summary(veg.fit)
Call:
glm(formula = veg ~ ., family = binomial, data = df.c)

Deviance Residuals: Min 1Q Median 3Q Max
-1.26825 -0.50300 -0.22594 -0.08373 2.85784

Coefficients: Estimate Std. Error z value Pr(>|z|)
(Intercept) 11.8685548 9.0614215 1.310 0.1903
gender -0.4982546 1.1332216 -0.440 0.6602
age 0.0122210 0.0653607 0.187 0.8517
hsgpa -1.6201620 1.4236274 -1.138 0.2551
cogpa -1.6635211 1.7608781 -0.945 0.3448
dhome -0.0003964 0.0004423 -0.896 0.3700
dres 0.1457214 0.1326084 1.099 0.2718
tv 0.0158743 0.0845977 0.188 0.8512
sport -0.2994841 0.2173609 -1.378 0.1683
news 0.1128158 0.1988480 0.567 0.5705
aids 0.0811258 0.1685509 0.481 0.6303
affil -0.4934158 0.6130881 -0.805 0.4209
ideol -1.0391178 0.5932011 -1.752 0.0798 . relig 0.9825565 0.7048663 1.394 0.1633
abor 0.1605618 1.8966853 0.085 0.9325


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 50.725  on 59  degrees of freedom

Residual deviance: 36.613 on 45 degrees of freedom AIC: 66.613

Number of Fisher Scoring iterations: 7

Then I did the likelihood ratio test for null hypothesis: $\beta_{1}=...=\beta_{14}=0$

1-pchisq(50.725-36.613,59-54)
0.01491341

which shows significant, but when I check the coefficient of each predictor, none of them show significant(all their p-value is large). I wonder how could this happen?

  • 1
    Residual deviance has df=45, but you typed 54 in pchisq function. – danbrown May 26 '21 at 22:14
  • 1
    While the answer below by @EdM is correct in general, if you use the right df you get a global test with 14 df and pchisq(50.725-36.613,59-45,lower.tail=FALSE) is 0.44, so there's not even anything to explain in this specific case – Thomas Lumley May 27 '21 at 00:37

1 Answers1

1

Your model includes too many predictors for the number of observations.

A rule of thumb for logistic regression is to have no more than 1 predictor for each 15 or so members of the minority class. With about 60 total cases (based on the null degrees of freedom), you have no more than 30 members of the minority class. So anything over about 2 or 3 predictors is likely to lead to an overfit model.

So you can seem to fit your data with a large number of predictors and get a "significant" p-value overall while no single predictor shows a significant association with outcome. You've basically just contorted your set of predictor values to fit this particular data set. Your model would probably not work well on another sample from the population.

You should use your knowledge of the subject matter to select (without looking at outcomes) a subset of predictors or to combine multiple related predictors into single combined predictors. Alternatively, look into penalized regression methods like ridge regression or LASSO, which can help with this type of situation with too many predictors for too few data points.

EdM
  • 92,183
  • 10
  • 92
  • 267