Which variable relative importance method to use?

Question

Following is a plot from relaimpo package of R which shows relative importance of predictor variables for regression of mpg on all other variables in mtcars dataset. The relative importance is performed by different methods and the results are quite different. Especially see bars for wt and disp variables. The LMG method is recommended by authors but are there any situation when other methods may be more preferable? Which method would you recommend for general use?

enter image description here

The thing to understand is that these, & other, metrics correspond to different concepts of "relative importance", rather than their being better or worse approximations to some generally agreed-on definition. "How good's my model if I drop this predictor?" is one way of thinking about it; "How good's my model if I use only this predictor?" is another; "Given my model, how does the effect of varying this predictor compare to that of varying others?" is yet another. — Scortchi - Reinstate Monica, Jun 03 '15 at 14:02
Importance of predictors in multiple regression: Partial R2 vs. standardized coefficients may be of interest. — Scortchi - Reinstate Monica, Jun 03 '15 at 14:20

Frank Harrell · Accepted Answer · 2015-06-03T12:06:48.323

8

I prefer to compute the proportion of explainable log-likelihood that is explained by each variable. For OLS models the rms package makes this easy:

f <- ols(y ~ x1 + x2 + pol(x3, 2) + rcs(x4, 5) + ...)
plot(anova(f), what='proportion chisq')
# also try what='proportion R2'

The default for plot(anova()) is to display the Wald $\chi^2$ statistic minus its degrees of freedom for assessing the partial effect of each variable. Even though this is not scaled $[0,1]$ it is probably the best method in general because it penalizes a variable requiring a large number of parameters to achieve the $\chi^2$. For example, a categorical predictor with 5 levels will have 4 d.f. and a continuous predictor modeled as a restricted cubic spline function with 5 knots will have 4 d.f.

If a predictor interacts with any other predictor(s), the $\chi^2$ and partial $R^2$ measures combine the appropriate interaction effects with main effects. For example if the model was y ~ pol(age,2) * sex the statistic for sex is the combined effects of sex as a main effect plus the effect modification that sex provides for the age effect. This is an assessment of whether there is a difference between the sexes for any age.

Methods such as random forests, which do not favor additive effects, are not likelihood based, and use multiple trees, require a different notion of variable importance.

edited Jun 03 '15 at 12:06

answered Jun 03 '15 at 03:06

Frank Harrell

91,879
6
178
397

Thanks for your reply. Would you say that this is a very good method for general use? Is it usually better than or as good as other methods like ones mentioned above or using importance() function of randomForest ? Would it be similar to comparison of coefficients if regression is done with all variables (including dependent and factor variables) converted to numeric, scaled and standardized (to make mean 0 and SD 1) ? – rnso Jun 03 '15 at 03:20
Which is better "what" parameter for most presentations: 'proportion R2' for ols and 'proportion chisq' for lrm?' – rnso Jun 03 '15 at 05:11
No standardization is needed. These measures are independent of coding of variables as long R knows which variables are categorical so that appropriate indicator variables are created. – Frank Harrell Jun 03 '15 at 12:08
The rms package is impressive. I am currently reading your 'Regression Modelling Strategies' pdf. I cannot find when to add rcs() function to predictor variables. It is used often in the examples, including your example in answer above. Also what about default "what" parameter for ols and lrm? – rnso Jun 03 '15 at 12:23
Are you looking at the RMS course notes? They go into some details about rcs and the other functions. The default what for plot(anova()) is "chisqminusdf" for all models. – Frank Harrell Jun 03 '15 at 12:32
1

How do you extract the values that get plotted with plot(anova(f), what='proportion chisq')? – Phil Aug 28 '18 at 21:32
2

w <- plot(anova(f), what=..., pl=FALSE) with w being an object you can manipulate in R. – Frank Harrell Aug 29 '18 at 20:55
By description, this sounds like the same approach as last from relaimpo. Or are they different in some way? – carbocation Mar 04 '19 at 23:58
So to get the various metrics, one would have to plot each of them, save to object, then grab the values? Not very practical. However, one could extract the functional code from the plot function and make an intermediate function. It's in this file https://github.com/harrelfe/rms/blob/4d525730430a735f626a46e4e6eecd5668737941/R/anova.rms.s – CoderGuy123 May 10 '19 at 12:18
1

pl=FALSE means to not actually plot, so all you get is a matrix. If you looked at the whole anova result instead you'd see a lot of sub-tests such as tests of linearity and not just overall association tests. – Frank Harrell May 10 '19 at 15:55

Which variable relative importance method to use?

1 Answers1

Linked