Following the same m3 model approach proposed by Gavin Simpson on my data, I compared an ordered factor model (m3_of) to a non-ordered factor model (m3). The global shapes trend to be similar; however the m3_of plot displays a marked wigglyness for both groups.
Question 1: How to explain this difference of wigglyness between both models?
Question 2:
Comparing both models using itsadug::compareML(m3, m3_of), the best model is m3 (lower AIC); however, m3_of has a slightly better deviance explained (9.67% vs 9.63%), knowing that such model advantageously allow a more interpretable comparison (groupof=1 compared to the reference groupof=0).
Hence, what is the more appropriate model?
Thanks for any help, advice, reference.
Data:
n = 180,000
bmk: outcome, continuous, positive
delay: predictor, continuous, positive
group: factor (n=2)
groupof: ordered factor (n=2, groupof=0 being the reference groupof)
medu: random effect (n=91 different medical units)
Models:
m3 <- bam(bmk ~
group +
s(delay, k=20, by = group) +
s(delay, k=20, medu, bs = "fs"),
data = dat, method = 'fREML', family = inverse.gaussian(link="identity"), discrete = TRUE)
m3_of <- bam(bmk ~
groupof +
s(delay, k=20) +
s(delay, k=20, by = groupof) +
s(delay, k=20, medu, bs = "fs"),
data = dat, method = 'fREML', family = inverse.gaussian(link="identity"), discrete = TRUE)
Plots:
par(mfrow = c(1,2), cex = 1.1)
plot_smooth(m3, view="delay", plot_all="group", rm.ranef=FALSE, n.grid = 50, col=c("blue","red"),
xlim=c(0,90), ylim=c(11.5,14.5), main = "m3")
plot_smooth(m3_of, view="delay", plot_all="groupof", rm.ranef=FALSE, n.grid = 50, col=c("blue","red"),
xlim=c(0,90), ylim=c(11.5,14.5), main = "m3_of")
gratia::appraise(m3)
Summary(m3):
> summary(m3)
Family: inverse.gaussian
Link function: identity
Formula:
bmk ~ group + s(delay, k = 20, by = group) + s(delay, k = 20,
medu, bs = "fs")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.76764 0.19840 59.31 <2e-16 ***
group1 0.32901 0.02879 11.43 <2e-16 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(delay):group0 1.001 1.002 2.903 0.0882 .
s(delay):group1 1.002 1.003 17.792 2.46e-05 ***
s(delay,medu) 145.751 1626.000 10.319 < 2e-16 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.0792 Deviance explained = 9.63%
fREML = -1.8252e+05 Scale est. = 0.0089033 n = 179659
> gam.check(m3)
k' edf k-index p-value
s(delay):group0 19 1 0.98 0.54
s(delay):group1 19 1 0.98 0.48
s(delay,medu) 1700 146 0.98 0.48
Summary(m3_of):
> summary(m3_of)
Family: inverse.gaussian
Link function: identity
Formula:
bmk ~ groupof + s(delay, k = 20) + s(delay, k = 20, by = groupof) +
s(delay, k = 20, medu, bs = "fs")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.70146 0.14033 83.39 <2e-16 ***
groupof1 0.33219 0.02974 11.17 <2e-16 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(delay) 9.370 11.457 2.188 0.01100 *
s(delay):groupof1 1.004 1.007 10.200 0.00134 **
s(delay,medu) 176.807 1626.000 10.313 < 2e-16 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.0792 Deviance explained = 9.67%
fREML = -1.8246e+05 Scale est. = 0.0089023 n = 179659
> gam.check(m3_of)
k' edf k-index p-value
s(delay) 19.00 9.37 0.98 0.69
s(delay):groupof1 19.00 1.00 0.98 0.68
s(delay,medu) 1700.00 176.81 0.98 0.68
Compare models:
itsadug::compareML(m3, m3_of)
Model m3 preferred: lower fREML score (56.679), and equal df (0.000).
-----
Model Score Edf Difference Df
1 m3_of -182464.7 9
2 m3 -182521.3 9 56.679 0.000
AIC difference: -60.87, model m3 has lower AIC.


