Spatio-temporal predictions with GAMs

Question

I am using R package mgcv to model spatio-temporal variation in an ecological variable (y). I have data for "y" from across 8 years, and from >2000 locations across the United Kingdom. I added uncorrelated predictors to explain the spatio-temporal variation in "y" using the following model (where Tmax, Rain, Wind and Frost are environmental variables recorded for the x,y locations, and the rest are percentage land cover of the respective category at that location; ALAN is the intensity of light pollution):

gam7 <- bam(y ~ s(Year, bs = "tp") +
              s(x, y) +
              ti(x, y, Year,  d = c(2,1)) +
              s(Tmax) + s(Rain) + s(Wind) + s(Frost) + s(Arable) + 
              s(Mountain..heath.and.bog) + s(Saltwater) + s(Freshwater) + 
              s(Coastal) + s(Built.up.gardens) + s(ALAN) + s(Elevation) + 
              s(Woodland) + s(Grassland),
            data = df, family = "gaussian", discrete = TRUE, select = TRUE,  
            nthreads = 6)

In the image below you can see the x,y locations that are used to build the above model shown as black points on an elevation map

You will notice that some locations are off-the coast, and the "Coastal" covariate is assigned 100 in those categories. The model diagnostics :

and this is the model output:


Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)

(Intercept)   6.8946     0.2164   31.85   <2e-16 ***

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
                               edf Ref.df      F p-value

s(Year)                      4.915      6 28.670  <2e-16 ***
s(x,y)                     629.085    699 67.410  <2e-16 ***
ti(x,y,Year)                36.837     49 23.424  <2e-16 ***
s(Tmax)                     20.328     24 44.886  <2e-16 ***
s(Rain)                      8.855     49  5.112  <2e-16 ***
s(Wind)                      5.843     99  0.712  <2e-16 ***
s(Frost)                    14.902     34  4.684  <2e-16 ***
s(Arable)                   29.199     99  2.862  <2e-16 ***
s(Mountain..heath.and.bog)  11.376     99  1.572  <2e-16 ***
s(Saltwater)                 6.667     34  1.311  <2e-16 ***
s(Freshwater)                7.047     34  1.412  <2e-16 ***
s(Coastal)                   4.800     24  4.657  <2e-16 ***
s(Built.up.gardens)         10.795     99  0.455  <2e-16 ***
s(ALAN)                      4.945     24  1.370  <2e-16 ***
s(Elevation)                87.439     99 13.808  <2e-16 ***
s(Woodland)                 27.048     99  1.634  <2e-16 ***
s(Grassland)                41.934     99  2.566  <2e-16 ***

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) =  0.469   Deviance explained = 47.3%
fREML = 1.5914e+05  Scale est. = 0.6031    n = 134598

I then stacked the rasters for all the covariates included in the model (for all years of the original df). I converted the raster stack into a dataframe using raster::as.data.frame(stacked_df, xy = TRUE). And finally, I used the mgcv::predict function to predict "y" for these data:

predictions <- predict(gam7, new_df, type = "response", se.fit = TRUE)

I then plotted the resulting spatio-temporal maps of "y" and they look like this:

This is what I get:

The maps look suspicious, simply because all locations which were used to build the original model (gam7), look peculiarly dark (low "y"), and all the surrounding regions are lighter (higher "y").

I am trying to understand what could cause something like this? What are the things I can check in my model/data to make sure this is not spurious in any way? (And I think that it is spurious and not ecologically meaningful, because when I removed a particular cluster of locations from the south from gam7, and then re-did the predictions - the darker regions from around those clusters vanished, and the region turned yellow on the viridis scale)

I think adding the s(x,y) takes care of the spatial autocorrelation as explained here

I think this is more of a conceptual issue in my understanding, rather than a 'coding' issue so I have not included the data and/or reproducible example. But if people feel that it might help in anyway, please let me know - I will add examples for a smaller subset.

Spatio-temporal predictions with GAMs

0 Answers0