3

I'm running a logistic regression model where anecdotally I expected age to be a very large factor. If you see from the charts I made in Excel before running the model through R, this is how the support lines up by age:

enter image description here

Looks pretty significant.

Though when I run the model, as you can see below, age is the only thing that's not significant -- which was very surprising:

> attach(mydata) 
> 
> # Define variables 
> 
> Y <- cbind(support)
> X <- cbind(sex, region, age, supportscore1, supportscore2, county)
>
> # Logit model coefficients 
> 
> logit <- glm(Y ~ X, family=binomial (link = "logit"), na.action = na.exclude) 
> 
> summary(logit) 

Call:
glm(formula = Y ~ X, family = binomial(link = "logit"), na.action = na.exclude)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.1019  -0.7609   0.5231   0.7101   2.3965  

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)    
(Intercept)            4.013446   0.440962   9.102  < 2e-16 ***
Xsex                  -0.229256   0.104859  -2.186 0.028792 *  
Xregion               -1.103308   0.091497 -12.058  < 2e-16 ***
Xage                   0.004569   0.003209   1.424 0.154512    
Xsupportscore1        -0.019262   0.005732  -3.360 0.000778 ***
Xsupportscore2         0.019810   0.005264   3.764 0.000168 ***
Xcounty               -0.047581   0.011161  -4.263 2.02e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2871.5  on 2072  degrees of freedom
Residual deviance: 2245.5  on 2066  degrees of freedom
  (66 observations deleted due to missingness)
AIC: 2259.5

Number of Fisher Scoring iterations: 4

My only guess on this is that the previous support scores (both 0-100 numerical values) I'm using may have already taken age into account, and the model doesn't want to count it twice. Though, to compare, region and county are just two different ways of cutting up the geography -- and those both seem significant.

Can somebody let me know what you would think if your model told you that age wasn't significant when in clearly is? Trying to figure out if there's a way of thinking about it that I'm missing or if something in my code is wrong.

Thanks!

-- EDIT

Pairs plot added to show correlation (despite some factors being categorical):

pairs(~sex + region +  age + supportscore1 + supportscore2 + county, data=mydata)

enter image description here

Ryan
  • 313

0 Answers0