GAMs with many slightly correlated predictors

Question

Say I'm constructing a GAM for a response variable R in terms of predictors A, B, C, and D. Something like this (in quasi-R-code):

R ~ s(A) + s(B) + s(C) + s(D)

Before I construct this model, I check for colinearity by calculating Pearson's correlation coefficients. This shows that A is slightly correlated with all of the other predictors (e.g., values of around ±0.3). As the correlation coefficients aren't very high individually, I'm okay with proceeding. The best model according to AIC is R ~ s(A).

Now, I'm interpreting the model results and they don't make a lot of sense physically (i.e., the shape of the relationship between A and R). What I'm concerned about is that the reason the top model was selected is because A is kind of a composite variable of all of the predictors and therefore provides a proxy for all these variables without incurring the penalty of including all of these extra terms.

My question: is there a test/method that accounts for cumulative collinearity across multiple predictors, like this? I'm sure that my terminology is incorrect, so if someone could tell me what I need to Google, that would be a great help.

score 4 · Accepted Answer · answered Aug 29 '17 at 03:23

You need to consider more than just linear correlations among covariates when working with GAMs. You need to consider nonlinear correlations or dependencies. This is known as concurvity in the smoothing/GAM context.

I would argue also that your model selection process is also wrong; I would fit the full model you consider and include some for of shrinkage/regularization to perform selection. I would also consider measures of concurvity for the input data when evaluating the model.

In the penalized GLM approach to GAMs, for example as implemented in the R package mgcv by Simon Wood, a smoothness penalty term that penalizes wiggliness of the fitted smooth as a component of the model penalized likelihood that is maximised during fitting. That smoothness penalty can force a wiggly term back to a linear function. However, the way this penalty works, it cannot act on the infinitely smooth parts of the spline basis expansion. In other words it can't shrink beyond a linear term.

One approach that solves this problem is that of Marra and Wood (2011) — they place a second penalty on the perfectly smooth parts of the basis (the penalty null space). With both penalties in place, the wiggly parts of the basis (the range space) and the perfectly smooth parts of the basis (the null space) can both be subject to shrinkage. The potential effect is that, if warranted a spline's contribution to the fitted model can be shrunk to become effectively, but not quite, zero.

Your approach is making an explicit statement that the effects of the other covariates are exactly equal to zero.

Using a more principled approach to feature selection in GAMs will help avoid you estimated smooths changing too much between candidate models.

You can read a little more about concurvity here in the help page for the R function mgcv::concurvity().

Marra, G. & Wood, S. N. Practical variable selection for generalized additive models. Comput. Stat. Data Anal. 55, 2372–2387 (2011).

Just to clarify, @Gavin: Wood seems to advocate here for penalizing smoothing terms (as you describe above) and then using some metric (GCV/UBRE/ML) to choose the best model. Is this how you'd suggest picking the best model, too (whilst also taking concurvity into account, of course)? — Dan, Aug 29 '17 at 12:10
I wouldn't see that as a recommendation as to how all model selection should proceed. It can be useful in some settings, like using AIC to select models based on prediction error if your aim is prediction. I would use select = TRUE if I had a small set of candidate variables I was interested in but wasn't sure all were needed. If I were comparing a model with a covariate of interest against a baseline model that accounted for controlling variables, I might use AIC or similar to decide if the covariate of interest adds anything over the baseline model. — Gavin Simpson, Aug 29 '17 at 14:22

GAMs with many slightly correlated predictors

1 Answers1

Linked