I have some questions about concurvity in a GAM using time series weather and mortality data (for details and model output see below). I'd be really appreciative of some help as most questions about concurvity in GAMs seem to remain unanswered on CV.
- I have run the
concurvityfunction which tells me there is high concurvity, but how do I examine the impact of it on the model results? I have read aboutsp.vcovandgam.vcompin R/ mgcv but don't understand how to interpret the results of these functions. - In case there is nothing I can do about the concurvity, how should I interpret the model results?
The GAM has high concurvity between some variables, as assessed via concurvity(m1, full=FALSE) in R/mgcv. I came across lecture slides by Simon Wood stating that "under ML or REML smoothness selection sp.vcov and gam.vcomp can help diagnose this problem". Whilst I can run these functions (and after having read this unanswered CV post), I don't quite understand what it is I am looking at.
The GAM models the relationship between daily number of deaths and some heat-related weather variables using time series data:
m1 <- gam(deaths~s(time, k = 700, bs='tp') + heap +
te(temp, lag, k=c(19,6), bs='tp') +
te(sh, lag, k=c(19,6), bs='tp')+
te(solar, lag, k=c(10,6), bs='tp')+
te(wind2m, lag, k=c(7,6), bs='tp')+
te(precip_hourly, lag, k=c(10,6), bs='tp'),
data = dat, family = nb, method = 'REML', select = TRUE)
The variables of interest are temperature in Celsius (temp), specific humidity (sh), solar irradiance (solar) and wind speed (wind2m). Hourly precipitation (precip_hourly) is included in the model as it's deemed to be a confounder. A lag term (6 days) is included to account for the fact that deaths may occur not just on the same day but also over a period of time following particularly high exposure. Time is a counter of days over the 14-year regular daily time series. Weather variables were measured during one specific hour during the day, the choice of which was based on some criterion. This hour can vary somewhat from day to day.
Concurvity is very high between time and each of the weather variables. Modelling the passing of time as an interaction between year and day of year (e.g. te(year, doy)) or increasing k for time to 1000 do not change this. There is also very high concurvity (over 0.8) between temperature and specific humidity.Taking the daily mean sh rather than at a specific hour did not help either. Taking the daily mean temperature is not an option (at least not in this particular model) for theoretical reasons. Concurvity is also very high between hourly precipitation and temperature. Including total daily precipitation instead of hourly precipitation reduced concurvity somewhat from 0.96 to 0.87 but this is still very high:
$worst
para s(time) te(temp,lag) te(sh,lag) te(solar,lag) te(wind2m,lag) te(precip_hourly,lag)
para 1.000000e+00 9.993814e-12 0.9981945 0.6439723 6.633887e-24 0.67659813 0.9562051
s(time) 9.993820e-12 1.000000e+00 0.9920124 0.9896377 9.049675e-01 0.91273377 0.9736969
te(temp,lag) 9.979807e-01 9.910896e-01 1.0000000 0.8763731 4.494137e-01 0.71124247 0.9649570
te(sh,lag) 6.439723e-01 9.896377e-01 0.8763861 1.0000000 3.007846e-01 0.70633535 0.7630440
te(solar,lag) 1.574437e-23 9.049675e-01 0.4497540 0.3007846 1.000000e+00 0.09675508 0.3044992
te(wind2m,lag) 6.765981e-01 9.127338e-01 0.7110435 0.7063353 9.675508e-02 1.00000000 0.6780678
te(precip_hourly,lag) 9.562051e-01 9.736969e-01 0.9652421 0.7630440 3.044992e-01 0.67806780 1.0000000
"Observed" and "estimate" concurvity are much lower, but all advice I've seen states that one should look at "worst" concurvity. (I've read the help page on concurvity but still don't really understand the difference between the three).
There are very good reasons for keeping both humidity and temperature in the model as theory indicates humidity should have an effect on deaths in addition to the effect of temperature. Both variables are the focus of the study and can therefore not be dropped from the model.
Below is the model output. Temperature in a model without other weather covariates is normally highly significant, so I'm thinking that its relatively high p-value is a consequence of concurvity. The problem is that there are next to no studies on the effect of these 4 variables together on deaths, so it's difficult to know what to expect. (It's why I'm doing this study in the first place). Theory doesn't provide much of a clue either, other than that these 4 variables should all have some impact on deaths (and the relationship between these - based on heat transfer theory- seems highly complex).
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(time) 2.217e+01 699 167.514 < 2e-16 ***
te(temp,lag) 1.068e+00 106 3.816 0.01517 *
te(sh,lag) 1.177e+01 108 46.793 < 2e-16 ***
te(solar,lag) 3.296e-04 54 0.000 0.87718
te(wind2m,lag) 2.087e+00 36 9.847 0.00188 **
te(precip_hourly,lag) 1.442e+00 44 3.094 0.09729 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.841 Deviance explained = 54.5%
-REML = 8899.9 Scale est. = 1 n = 5107
Method: REML Optimizer: outer newton
full convergence after 28 iterations.
Gradient range [-0.001334817,0.001263843]
(score 8899.909 & scale 1).
eigenvalue range [-2.200311e-07,21.04783].
Model rank = 1233 / 1233
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(time) 6.99e+02 2.22e+01 0.94 0.015 *
te(temp,lag) 1.13e+02 1.07e+00 NA NA
te(sh,lag) 1.08e+02 1.18e+01 NA NA
te(solar,lag) 5.40e+01 3.30e-04 NA NA
te(wind2m,lag) 3.60e+01 2.09e+00 NA NA
te(precip_hourly,lag) 5.40e+01 1.44e+00 NA NA
Below is the output from sp.vcov(m1). The help page states that this is "a matrix corresponding to the estimated covariance matrix of the log smoothing parameter estimators". But I don't know how to interpret this (or even where the 18 rows and columns are coming from):
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0.0492970614 -0.046872422 0.0005593624 -1.240256e-02 1.757094e-02 -0.002762492 0.0009913406 -0.004443343 8.375059e-03 -1.062816e-03
[2,] -0.0468724223 1.320985417 -0.0304809087 5.286347e-01 1.125271e-01 -0.093732748 -0.1278341644 0.058484430 -9.308127e-02 6.969281e-02
[3,] 0.0005593624 -0.030480909 1.1216968385 -6.782943e-03 -3.400436e-02 0.012593477 0.0130862520 -0.000712981 6.743050e-03 -6.476500e-03
[4,] -0.0124025565 0.528634738 -0.0067829434 1.391503e+04 -3.739014e+03 -0.222061529 -0.2360128067 0.043654929 -2.806442e+01 -7.930296e+00
[5,] 0.0175709389 0.112527070 -0.0340043596 -3.739014e+03 7.407950e+04 -0.568400656 0.0146994665 -0.047839740 -1.970345e+02 -5.477528e+01
[6,] -0.0027624923 -0.093732748 0.0125934770 -2.220615e-01 -5.684007e-01 3.218813838 0.1289822367 -0.194476230 4.673213e-01 4.208698e-01
[7,] 0.0009913406 -0.127834164 0.0130862520 -2.360128e-01 1.469947e-02 0.128982237 0.5066881798 -0.267576903 5.240905e-02 6.167484e-03
[8,] -0.0044433430 0.058484430 -0.0007129810 4.365493e-02 -4.783974e-02 -0.194476230 -0.2675769034 2.212469239 7.204000e-02 -3.999280e-02
[9,] 0.0083750588 -0.093081267 0.0067430502 -2.806442e+01 -1.970345e+02 0.467321344 0.0524090542 0.072040003 1.422914e+04 -1.076931e+01
[10,] -0.0010628157 0.069692815 -0.0064764999 -7.930296e+00 -5.477528e+01 0.420869825 0.0061674840 -0.039992800 -1.076931e+01 1.240723e+04
[11,] 0.0199194792 0.530874329 -0.0755754114 -2.873036e+02 -1.983094e+03 -1.550245896 -0.1499697906 0.175714409 -3.961080e+02 -8.768036e+03
[12,] -0.7427614131 -16.280587832 2.4201239797 9.088338e+03 6.273147e+04 34.838374455 5.5079324764 -6.835428393 1.252538e+04 3.703824e+03
[13,] 0.0023648729 -0.009930889 0.0031287714 -1.461549e+00 -1.039105e+01 0.038186207 0.0166473280 -0.004483276 -1.802650e+00 -5.482065e-01
[14,] -0.0001524556 -0.010839426 0.0041596232 1.847032e-02 -9.998109e-02 0.043909074 -0.0101252862 0.047102893 6.418191e-02 -7.710783e-03
[15,] -0.0028204255 0.029157503 0.0004189777 -1.569117e+00 -1.102083e+01 -0.021023812 -0.0510267950 0.040018847 -2.206038e+00 -6.431959e-01
[16,] 0.0296914966 -0.877334276 0.0712757702 -4.759815e+01 -3.366279e+02 1.358193403 -0.0349262813 0.232467055 -6.671997e+01 -1.852782e+01
[17,] -0.0066875595 0.010464352 -0.0015964890 -5.525566e-01 -3.465985e+00 0.002101187 0.0353476070 -0.127664153 -7.219733e-01 -2.183352e-01
[18,] 0.6910303545 14.906284087 -3.4976545516 -1.888799e+04 -1.304611e+05 -37.868991152 -5.4042532957 4.639101459 -2.604892e+04 -7.249276e+03
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18]
[1,] 1.991948e-02 -7.427614e-01 2.364873e-03 -0.0001524556 -2.820426e-03 2.969150e-02 -6.687560e-03 6.910304e-01
[2,] 5.308743e-01 -1.628059e+01 -9.930889e-03 -0.0108394260 2.915750e-02 -8.773343e-01 1.046435e-02 1.490628e+01
[3,] -7.557541e-02 2.420124e+00 3.128771e-03 0.0041596232 4.189777e-04 7.127577e-02 -1.596489e-03 -3.497655e+00
[4,] -2.873036e+02 9.088338e+03 -1.461549e+00 0.0184703175 -1.569117e+00 -4.759815e+01 -5.525566e-01 -1.888799e+04
[5,] -1.983094e+03 6.273147e+04 -1.039105e+01 -0.0999810898 -1.102083e+01 -3.366279e+02 -3.465985e+00 -1.304611e+05
[6,] -1.550246e+00 3.483837e+01 3.818621e-02 0.0439090744 -2.102381e-02 1.358193e+00 2.101187e-03 -3.786899e+01
[7,] -1.499698e-01 5.507932e+00 1.664733e-02 -0.0101252862 -5.102679e-02 -3.492628e-02 3.534761e-02 -5.404253e+00
[8,] 1.757144e-01 -6.835428e+00 -4.483276e-03 0.0471028931 4.001885e-02 2.324671e-01 -1.276642e-01 4.639101e+00
[9,] -3.961080e+02 1.252538e+04 -1.802650e+00 0.0641819078 -2.206038e+00 -6.671997e+01 -7.219733e-01 -2.604892e+04
[10,] -8.768036e+03 3.703824e+03 -5.482065e-01 -0.0077107830 -6.431959e-01 -1.852782e+01 -2.183352e-01 -7.249276e+03
[11,] 1.500686e+05 1.244574e+05 -2.087028e+01 -0.0778830618 -2.198611e+01 -6.756914e+02 -7.029805e+00 -2.623370e+05
[12,] 1.244574e+05 -8.535900e+06 6.596107e+02 1.7880868639 6.954293e+02 2.138854e+04 2.326675e+02 8.299519e+06
[13,] -2.087028e+01 6.596107e+02 7.502855e+02 0.1000755748 -2.877533e-01 -3.552331e+00 -1.263276e-02 -1.370971e+03
[14,] -7.788306e-02 1.788087e+00 1.000756e-01 1.0330010236 -6.779120e-02 3.512008e-03 4.415048e-02 -2.871725e+00
[15,] -2.198611e+01 6.954293e+02 -2.877533e-01 -0.0677912011 7.913028e+02 -3.707017e+00 4.090653e-03 -1.445090e+03
[16,] -6.756914e+02 2.138854e+04 -3.552331e+00 0.0035120079 -3.707017e+00 2.428175e+04 -2.865450e+00 -4.451108e+04
[17,] -7.029805e+00 2.326675e+02 -1.263276e-02 0.0441504825 4.090653e-03 -2.865450e+00 3.876790e+00 -2.215861e+02
[18,] -2.623370e+05 8.299519e+06 -1.370971e+03 -2.8717247325 -1.445090e+03 -4.451108e+04 -2.215861e+02 -7.805131e+06
Below is the output from gam.vcomp(m1). Again, I'm not sure how to interpret this in relation to concurvity. How can I see whether concurvity has much impact on the model output? Clearly, values like 3.943563e+112 are wild but how can I judge whether a value like 4.629084e-01 is appropriate? And why are there 2 or 3 listings for each term in the model?
Standard deviations and 0.95 confidence intervals:
std.dev lower upper
s(time)1 1.269244e-04 4.194625e-05 3.840581e-04
s(time)2 1.639614e-01 5.807483e-02 4.629084e-01
te(temp,lag)1 1.532360e-06 8.776815e-57 2.675375e+44
te(temp,lag)2 9.759467e-05 2.415257e-121 3.943563e+112
te(temp,lag)3 8.125570e-02 1.400437e-02 4.714590e-01
te(sh,lag)1 1.535021e-03 7.641288e-04 3.083627e-03
te(sh,lag)2 1.670459e-01 3.888990e-02 7.175212e-01
te(sh,lag)3 4.108676e-04 5.962470e-55 2.831245e+47
te(solar,lag)1 1.761396e-08 6.812702e-56 4.554016e+39
te(solar,lag)2 1.990545e-05 1.779246e-172 2.226938e+162
te(solar,lag)3 8.721333e-06 7.563626e-06 1.005624e-05
te(wind2m,lag)1 5.007184e-05 1.098936e-16 2.281470e+07
te(wind2m,lag)2 1.904373e-02 7.033728e-03 5.156068e-02
te(wind2m,lag)3 3.156832e+00 3.358466e-12 2.967303e+12
te(precip_hourly,lag)1 8.669031e-08 2.897283e-74 2.593882e+59
te(precip_hourly,lag)2 1.468014e-01 2.124938e-02 1.014178e+00
te(precip_hourly,lag)3 1.362012e-04 0.000000e+00 Inf
Rank: 16/17