The following question builds on the discussion found on this page.
Given a response variable y, a continuous explanatory variable x and a factor fac, it is possible to define a General Additive Model (GAM) with an interaction between x and fac using the argument by=. According to the help file ?gam.models in the R package mgcv, this can be accomplished as follows:
gam1 <- gam(y ~ fac +s(x, by = fac), ...)
@GavinSimpson here suggests a different approach:
gam2 <- gam(y ~ fac +s(x) +s(x, by = fac, m=1), ...)
I have been playing around with a third model:
gam3 <- gam(y ~ s(x, by = fac), ...)
My main questions are: are some of these models just wrong, or are they simply different? In the latter case, what are their differences? Based on the example that I am going to discuss below I think I could understand some of their differences, but I am still missing something.
As an example I am going to use a dataset with color spectra for flowers of two different plant species measured at different locations.
rm(list=ls())
# install.packages("RCurl")
library(RCurl) # allows accessing data from URL
df <- read.delim(text=getURL("https://raw.githubusercontent.com/marcoplebani85/datasets/master/flower_color_spectra.txt"))
library(mgcv)
These are the mean color spectra at the locality level for the two species (rolling means were used):
Each color refers to a different species. Each line refers to a different locality.
My final goal is to model the (potentially interactive) effect of Taxon and wavelength wl on % reflectance (referred to as density in the code and dataset) while accounting for Locality as a random effect in a mixed-effect GAM. For the moment I won't add the mixed effect part to my plate, which is already full enough with trying to understand how to model interactions.
I'll start with the simplest of the three interactive GAMs:
gam.interaction0 <- gam(density ~ s(wl, by = Taxon), data = df)
# common intercept, different slopes
plot(gam.interaction0, pages=1)
summary(gam.interaction0)
Produces:
Family: gaussian
Link function: identity
Formula:
density ~ s(wl, by = Taxon)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.3490 0.1693 167.4 <2e-16 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(wl):TaxonSpeciesA 8.938 8.999 884.3 <2e-16 ***
s(wl):TaxonSpeciesB 8.838 8.992 325.5 <2e-16 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.523 Deviance explained = 52.4%
GCV = 284.96 Scale est. = 284.42 n = 9918
The parametric part is the same for both species, but different splines are fitted for each species. It is a bit confusing to have a parametric part in the summary of GAMs, which are non-parametric. @IsabellaGhement explains:
If you look at the plots of the estimated smooth effects (smooths) corresponding to your first model, you will notice that they are centered about zero. So you need to 'shift' those smooths up (if the estimated intercept is positive) or down (if the estimated intercept is negative) to obtain the smooth functions you thought you were estimating. In other words, you need to add the estimated intercept to the smooths to get at what you really want. For your first model, the 'shift' is assumed to be the same for both smooths.
Moving on:
gam.interaction1 <- gam(density ~ Taxon +s(wl, by = Taxon, m=1), data = df)
plot(gam.interaction1,pages=1)
summary(gam.interaction1)
Gives:
Family: gaussian
Link function: identity
Formula:
density ~ Taxon + s(wl, by = Taxon, m = 1)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.3132 0.1482 272.0 <2e-16 ***
TaxonSpeciesB -26.0221 0.2186 -119.1 <2e-16 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(wl):TaxonSpeciesA 7.978 8 2390 <2e-16 ***
s(wl):TaxonSpeciesB 7.965 8 879 <2e-16 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.803 Deviance explained = 80.3%
GCV = 117.89 Scale est. = 117.68 n = 9918
Now, each species also have its own parametric estimate.
The next model is the one that I have trouble understanding:
gam.interaction2 <- gam(density ~ Taxon + s(wl) + s(wl, by = Taxon, m=1), data = df)
plot(gam.interaction2, pages=1)
I have no clear idea of what these graphs represent.
summary(gam.interaction2)
Gives:
Family: gaussian
Link function: identity
Formula:
density ~ Taxon + s(wl) + s(wl, by = Taxon, m = 1)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.3132 0.1463 275.6 <2e-16 ***
TaxonSpeciesB -26.0221 0.2157 -120.6 <2e-16 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(wl) 8.940 8.994 30.06 <2e-16 ***
s(wl):TaxonSpeciesA 8.001 8.000 11.61 <2e-16 ***
s(wl):TaxonSpeciesB 8.001 8.000 19.59 <2e-16 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.808 Deviance explained = 80.8%
GCV = 114.96 Scale est. = 114.65 n = 9918
The parametric part of gam.interaction2 is about the same as for gam.interaction1, but now there are three estimates for smooth terms, which I cannot interpret.
Thanks in advance to anyone who will take the time to help me understand the differences in the three models.





gam3then that is wrong (thebysmooths are centred so you need the parametric terms). If you meant the third way the OP shows in the extended answer, then I agree with you to some extent, although we have to now deal with identifiability issues; the multiple smooths ofwldo cause problems in many case that mean we need to add some extra shrinkage. Thegam1approach is also fine. In this instance, I would suggest using thegam1approach, increasingkif needed, and handling theSampleIDissue as I describe in my answer below. – Gavin Simpson Apr 24 '19 at 16:40gam1plus something for theSampleIDeffect plus you need to do something about the non-constant variance problem; These data don't seem to be conditionally distributed Gaussian because of the lower bound. – Gavin Simpson Apr 24 '19 at 16:44densityis negative at small wavelengths for some observations. I also noted that you mention the data are % reflectance. Are those negativedensityvalues real? If so, how do they arise? Is it via some normalization? If these are real, that excludes the Tweedie family and the Gamma. If this is a true %, (question remains why are there some negative values) and assuming the negative values can be excluded (or something?), then a beta regressionfamily = betar()might be more appropriate. – Gavin Simpson Nov 10 '22 at 09:41family = betar()probably being a better candidate than Tweedie/Gamma. Well spotted, thanks. – Marco Plebani Nov 10 '22 at 17:32