Say I'm constructing a GAM for a response variable R in terms of predictors A, B, C, and D. Something like this (in quasi-R-code):
R ~ s(A) + s(B) + s(C) + s(D)
Before I construct this model, I check for colinearity by calculating Pearson's correlation coefficients. This shows that A is slightly correlated with all of the other predictors (e.g., values of around ±0.3). As the correlation coefficients aren't very high individually, I'm okay with proceeding. The best model according to AIC is R ~ s(A).
Now, I'm interpreting the model results and they don't make a lot of sense physically (i.e., the shape of the relationship between A and R). What I'm concerned about is that the reason the top model was selected is because A is kind of a composite variable of all of the predictors and therefore provides a proxy for all these variables without incurring the penalty of including all of these extra terms.
My question: is there a test/method that accounts for cumulative collinearity across multiple predictors, like this? I'm sure that my terminology is incorrect, so if someone could tell me what I need to Google, that would be a great help.
select = TRUEif I had a small set of candidate variables I was interested in but wasn't sure all were needed. If I were comparing a model with a covariate of interest against a baseline model that accounted for controlling variables, I might use AIC or similar to decide if the covariate of interest adds anything over the baseline model. – Gavin Simpson Aug 29 '17 at 14:22