In R, we can use the build-in function lm() for linear regresson. However, we also use the lqs() function from packages MASS, and the rq() function from the packages quantreg. It seems like rq() calculates the quantile according to the R documentation, but what do they exactly calculating?
- 309
1 Answers
These represent different estimation techniques, arguably different models.
For lm, this is the classic OLS linear regression that minimizes the sum of squared residuals to estimate the regression parameters.
$$ \hat y_i = \hat\beta_0 + \hat\beta_1x_{i1} + \dots + \hat\beta_px_{i,p} \\ \hat\beta = \underset{\hat\beta}{\arg\min}\left\{ \overset{N}{\underset{i=1}{\sum}}\left( y_i - \hat y_i \right)^2 \right\} $$
For quantreg::rq, this estimates all kinds of quantile models. Explicitly, quantile models estimate conditional quantiles instead of conditional means. They do this by calculating parameter estimates by minimizing a different criterion that the sum of squared residuals. Define the following for an individual observation and its prediction.
$$ l_{\tau}(y_i, \hat y_i) = \begin{cases} \tau\vert y_i - \hat y_i\vert, & y_i - \hat y_i \ge 0 \\ (1 - \tau)\vert y_i - \hat y_i\vert, & y_i - \hat y_i < 0 \end{cases} $$
Use this to define the optimization.
$$ \hat y_i = \hat\beta_0 + \hat\beta_1x_{i1} + \dots + \hat\beta_px_{i,p} \\ \hat\beta = \underset{\hat\beta}{\arg\min}\left\{ \sum_{i=1}^Nl_{\tau}(y_i, \hat y_i) \right\} $$
Finally, for MASS::lqs, the various methods represent different ways of estimating the regresion coefficients. The documentation gets into more detail and gives references for learning more about robust regression. Briefly, the estimation techniques in this function are supposed to fit the model to just the "good" points (in the words of the authors).
Fit a regression to the good points in the dataset, thereby achieving a regression estimator with a high breakdown point.
The various methods that can be passed to the method argument represent different ways of determining the "good" points and how to do the estimation with them.
- 62,186
lqs()? what kind of regression is the function doing? I can't find it in the R documentation. :( – GarlicSTAT Jan 26 '20 at 22:40