I am wondering whether anyone has any insight on comparing the performance of two GAMMs?
Specifically, I want to compare two models:
- A "nested model", one with a smoothed age term
s(age)and a sex termsexspecified as separate response variables. - A "complex model" with a
s(age,by=sex)term.
My understanding is that the former is essentially a nested version of the latter, as the latter will fit two smoothed/spline terms--one for male, one for female--while the former will just fit a single, global smoothed/spline term (if this is the correct terminology).
I want to determine whether there is significant sexual dimorphism in ageing trajectories. In a standard multiple linear regression (MLR) model, I would be able to include an age^n*sex interaction and check the p-value, but the addition of the smoothed term complicates matters (the models also contain a random effect, hence the need for a GAMM). The summary(complex_model) printout specifies a separate p-value for the smoothed fit for each sex, making the interpretation of sexual dimorphism more complicated. I don't think that it is an interaction term in the same sense as in the MLR model. However, it seems clear to me that a significant improvement in overall model performance in a complex model vs a nested model would nevertheless serve as evidence of sexual dimorphism.
Since both models have log likelihoods, I could see an argument for using a Likelihood Ratio Test (LRT). However, I understand that GAMMs have properties that makes their analysis less straightforward than some other types of models, so I am a bit wary of using the LRT uncritically. I also know that I can compare the AICs of both models, but it is not clear to me how to test for "a significant difference" between two AIC values, or if this is even possible. I am thus wondering whether there is another robust way of comparing the performance of these two models, particularly one that provides a p-value.
Here is a little more information, in case it is helpful:
- The models are being generated in R using
mgcv::gamm() - The
summary()I refer to is the "lme" option, since the "gam" option seems to disregard the random effect specified in the model. - The printout of the "lme" summary reads "Linear mixed-effects model fit by maximum likelihood", so it might not technically be a GAMM, despite being produced with the
mgcv:gamm()command(?)
Edit 30 Nov 2023 the exact model specification, as per Shawn Hemelstrand's comment, is:
complex_model <- mgcv::gamm(strength ~ sex + s(age, by=sex) + ethnicity, random=list(strength_measurement_device = ~1+ usage_date_of_strength_measurement_device), data=df)
The "device*date" random effect is necessary because the dataset includes a large number of measurement devices, many of which I know to have been miscalibrated, or otherwise become uncalibrated over time; however, I do not have good enough calibration records to manually correct all of the data, and unsurprisingly the measurement 'drift' is not identical across all devices. So, I think that the random effect is required, and can't be substituted with a simpler assumption, thus necessitating the use of a GAMM rather than a GAM (my understanding is that a GAMM is required in order to accommodate fixed effects, random effects, and smoothing terms within the same model, but please correct me if I am wrong).
The summary(complex_model$lme) print-out reports the random effects (although I am not really sure how to interpret them); however, I can find no mention of the random effect or its constituent predictor variables in the summary(complex_model$gam) print-out, and the formula is explicitly specified as strength ~ sex + s(age, by=sex) + ethnicity in the print-out, hence my inference that the "gam" model 'disregards'--i.e. does not incorporate--the random effect. This was also in keeping with my understanding that a GAM would not contain a random effect (see above).
Hopefully this clarifies things somewhat.

gamimplementation. Can you please show the exact model specification? I might be able to help with that depending on what the problem is. For now, I just provide a general answer about model comparison. – Shawn Hemelstrand Nov 29 '23 at 23:06gamm().gam()is perfectly content fitting simple random effects vias(f, bs = "re"). The$gamcomponent of the model you fitted is conditional upon the random effects, it's just that it doesn't report info about them. But good luck interpreting the output of the$lmecomponent if you want to look at the smooths. There's a duality between penalized smooths and random effects; they are two views on the same thing, & we can represent GAMs as mixed models & vice versa. – Gavin Simpson Dec 01 '23 at 10:19summary(complex_model$gam)does account for the random effects, but doesn't reference them in the summary, so the reported AIC has taken the random effects into account? I am not so interested in interpreting the smooths themselves, so the$gamprint-out should be fine for my purposes, if it is actually a GAMM in essence. Thank you again Shawn and Gavin. – PhelsumaFL Dec 04 '23 at 09:51