For concreteness:
library( mgcv )
set.seed( 1 )
RawData <- data.frame( y = rbinom( 1000, 1, 0.5 ), x1 = rnorm( 1000 ),
x2 = as.factor( rbinom( 1000, 1, 0.5 ) ), x3 = rnorm( 1000 ),
x4 = as.factor( rbinom( 1000, 1, 0.5 ) ) )
fit <- gam( y ~ s( x1 ) + x2 + s( x3, by = x2 ) + x4, data = RawData,
family = nb( link = log ) )
How to measure the importance of these four variables?
I understand that "variable importance" is not a well-defined concept, so I am looking for the most straightforward way, such as an explained variance approach.
The ANOVA table seems to be a natural choice, however, as explained in this answer, it is not working: for the smooth terms in GAM models they do not have an explained variance interpretation.
What is the sound approach then?
x3be the age of a patient,x2be the sex,x1be his/her blood pressure,x4be whether he/she received a certain drug, andybe the number of times a certain event happened to him/her. The question the doctors ask: "OK, I understand blood pressure is significant, but we have a very high sample size, so it doesn't mean a lot, I'd be more interested to see how it compares to the other predictors in explaining y". – Tamas Ferenci Nov 18 '17 at 14:59