Bit stuck on how to choose between models. My goal is to evidence the direction of the regression slope (negative shows improvement in a metric, positive shows decline).
Models 4 and 1b have the lowest AICc scores, but in the summary output their slopes aren't significant.
The remaining models (2a, 3, 1a, 2b) have higher AICc scores, and have significant slopes in the summary output.
aictab(cand.set = list(wa_glmm_1a, wa_glmm_1b, wa_glmm_2a, wa_glmm_2b, wa_glmm_3, wa_glmm_4),
+ modnames = c("wa_glmm_1a", "wa_glmm_1b", "wa_glmm_2a", "wa_glmm_2b", "wa_glmm_3", "wa_glmm_4"), nobs = nrow(facs_3mth))
Model selection based on AICc:
K AICc Delta_AICc AICcWt Cum.Wt LL
wa_glmm_4 11 32407.88 0.00 1 1 -16192.89
wa_glmm_1b 5 32473.03 65.15 0 1 -16231.50
wa_glmm_2a 5 40310.60 7902.72 0 1 -20150.29
wa_glmm_3 8 40316.30 7908.42 0 1 -20150.13
wa_glmm_1a 3 40386.08 7978.20 0 1 -20190.04
wa_glmm_2b 6 40386.75 7978.87 0 1 -20187.36
Here's the summary output from the lowest AICc scoring model (4)
summary(wa_glmm_4) # 3 Fix + 1 random intercept + 2 random (1 intercept, 1 slope) | ~ month_id + CareHomeSize + Ratings + (1|FacilityKey) + (1+month_id|FacilityKey)
Generalized linear mixed model fit by maximum likelihood (Adaptive Gauss-Hermite Quadrature, nAGQ = 0) ['glmerMod']
Family: binomial ( logit )
Formula: cbind(Wasted_N, TotalAdministrations) ~ month_id + factor(CareHomeSize) + factor(Ratings) + (1 | FacilityKey) + (1 + month_id | FacilityKey)
Data: facs_3mth
AIC BIC logLik deviance df.resid
32407.8 32470.1 -16192.9 32385.8 2120
Scaled residuals:
Min 1Q Median 3Q Max
-9.2439 -1.6898 -0.3326 1.4233 15.2294
Random effects:
Groups Name Variance Std.Dev. Corr
FacilityKey (Intercept) 1.32136 1.1495
FacilityKey.1 (Intercept) 0.85391 0.9241
month_id 0.01802 0.1342 -1.00
Number of obs: 2131, groups: FacilityKey, 294
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.108831 0.677533 -10.492 < 2e-16 ***
month_id -0.003654 0.008601 -0.425 0.671
factor(CareHomeSize)2 1.116707 0.172013 6.492 8.47e-11 ***
factor(CareHomeSize)3 1.671183 0.194661 8.585 < 2e-16 ***
factor(Ratings)2 0.459942 0.705246 0.652 0.514
factor(Ratings)3 0.428870 0.690388 0.621 0.534
factor(Ratings)4 0.501436 0.719224 0.697 0.486
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) mnth_d f(CHS)2 f(CHS)3 fc(R)2 fc(R)3
month_id -0.089
fctr(CrHS)2 0.000 0.002
fctr(CrHS)3 0.000 0.004 0.600
fctr(Rtng)2 -0.953 0.003 -0.163 -0.179
fctr(Rtng)3 -0.974 0.003 -0.168 -0.140 0.968
fctr(Rtng)4 -0.935 0.002 -0.177 -0.162 0.934 0.950
And here's the summary output of the lowest AICc scoring significant models (2a)
summary(wa_glmm_2a) # 2 Fix + 1 random intercept | ~ month_id + factor(CareHomeSize) + (1|FacilityKey)
Generalized linear mixed model fit by maximum likelihood (Adaptive Gauss-Hermite Quadrature, nAGQ = 0) ['glmerMod']
Family: binomial ( logit )
Formula: cbind(Wasted_N, TotalAdministrations) ~ month_id + factor(CareHomeSize) + (1 | FacilityKey)
Data: facs_3mth
AIC BIC logLik deviance df.resid
40310.6 40338.9 -20150.3 40300.6 2126
Scaled residuals:
Min 1Q Median 3Q Max
-12.5595 -2.1237 -0.4018 1.6615 19.9675
Random effects:
Groups Name Variance Std.Dev.
FacilityKey (Intercept) 1.293 1.137
Number of obs: 2131, groups: FacilityKey, 294
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.7085120 0.1365901 -49.114 < 2e-16 ***
month_id 0.0036121 0.0008711 4.147 3.37e-05 ***
factor(CareHomeSize)2 1.1396969 0.1668958 6.829 8.56e-12 ***
factor(CareHomeSize)3 1.7190618 0.1867524 9.205 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) mnth_d f(CHS)2
month_id -0.046
fctr(CrHS)2 -0.817 -0.001
fctr(CrHS)3 -0.730 0.000 0.597
So do I choose the model with the lowest AICc score regardless of the significance (i.e. choose 4), or do I choose the one with the lowest AICc score that also has a statistically significant result in the summary output (i.e. choose 2a)?
Or am I thinking about this completely wrong.
This part is confusing.
– Jul 05 '19 at 16:42monthas my main explanatory variable, and then I have another two explanatory variables (CareHomeSizeandRatings) that I want to control for. – B_Real Jul 08 '19 at 09:23wa_glmm_3. This model basically includes everything i need to control for. For this particular metric it just so happens that its not statistically significant which I guess is still useful to know (kinda means that using the system has no effect on the metric, with the information I have available). – B_Real Jul 09 '19 at 10:15