2

I've been looking at leverage plots, but it seems to me that they are always related to linear regression models.

For instance, this explanation of a hat matrix considers a linear regression model: Hat matrix and leverages in classical multiple regression

Does it make sense to calculate leverages if we are using, say, a random forest regressor? Or are there more appropriate measures to identify X-outliers that are model independent, such as: https://scikit-learn.org/stable/modules/outlier_detection.html

Rafael L
  • 45
  • 5

1 Answers1

5

The concept of leverage certainly applies to any model (how much influence/leverage does this single observation have on the overall fit). The formulation for such a quantity will certainly be different for other models.

There is a '92 paper on various formulations of leverage in nonlinear regression. This might help point you in the right direction.

Leverage and Superleverage in Nonlinear Regression Author(s): Roy T. St. Laurent and R. Dennis Cook

I'm not familiar with any literature in this area on Random Forests in particular.

bdeonovic
  • 10,127
  • Link for referenced paper https://www.researchgate.net/profile/Roy-St-Laurent/publication/254287282_Leverage_and_Superleverage_in_Nonlinear_Regression/links/57ba3b4d08ae14f440bd8f5c/Leverage-and-Superleverage-in-Nonlinear-Regression.pdf – kjetil b halvorsen Mar 07 '24 at 17:31