Differences between the lm(), lqs(), and rq() function

Question

In R, we can use the build-in function lm() for linear regresson. However, we also use the lqs() function from packages MASS, and the rq() function from the packages quantreg. It seems like rq() calculates the quantile according to the R documentation, but what do they exactly calculating?

Does this answer your question https://stats.stackexchange.com/questions/160354/how-does-quantile-regression-work ? — Tim, Jan 26 '20 at 22:36
I don’t know the MASS function, but quantile regression predicts conditional quantile (such as median), as opposed to OLS, which calculates a conditional mean. If this business about a conditional mean is news to you, do say so. That’s a critical point about regression that often gets skipped or forgotten. — Dave, Jan 26 '20 at 22:37
@Tim what about the lqs() ? what kind of regression is the function doing? I can't find it in the R documentation. :( — GarlicSTAT, Jan 26 '20 at 22:40
@Dave so is quantile regression better than linear regression? — GarlicSTAT, Jan 26 '20 at 22:41
You’ve just opened up a major can of worms with that question. Like pretty much everything in statistics, blindly calling a method “better” is a bit too simple. Quantile regression has some nice robustness, such as if you have an outlier in your days that drags the trend line away from where it looks like it should be. On the other hand, it is not as efficient, meaning that you need more data. Also, that outlier might not be something you really want to dismiss. It will depend on what you want to do and what assumptions you’re making. — Dave, Jan 26 '20 at 22:47
Also, I found lqs documentation here: https://www.rdocumentation.org/packages/MASS/versions/7.3-51.5/topics/lqs. — Dave, Jan 26 '20 at 22:50

score 0 · Answer 1 · answered May 02 '23 at 23:56

These represent different estimation techniques, arguably different models.

For lm, this is the classic OLS linear regression that minimizes the sum of squared residuals to estimate the regression parameters.

$$ \hat y_i = \hat\beta_0 + \hat\beta_1x_{i1} + \dots + \hat\beta_px_{i,p} \\ \hat\beta = \underset{\hat\beta}{\arg\min}\left\{ \overset{N}{\underset{i=1}{\sum}}\left( y_i - \hat y_i \right)^2 \right\} $$

For quantreg::rq, this estimates all kinds of quantile models. Explicitly, quantile models estimate conditional quantiles instead of conditional means. They do this by calculating parameter estimates by minimizing a different criterion that the sum of squared residuals. Define the following for an individual observation and its prediction.

$$ l_{\tau}(y_i, \hat y_i) = \begin{cases} \tau\vert y_i - \hat y_i\vert, & y_i - \hat y_i \ge 0 \\ (1 - \tau)\vert y_i - \hat y_i\vert, & y_i - \hat y_i < 0 \end{cases} $$

Use this to define the optimization.

$$ \hat y_i = \hat\beta_0 + \hat\beta_1x_{i1} + \dots + \hat\beta_px_{i,p} \\ \hat\beta = \underset{\hat\beta}{\arg\min}\left\{ \sum_{i=1}^Nl_{\tau}(y_i, \hat y_i) \right\} $$

Finally, for MASS::lqs, the various methods represent different ways of estimating the regresion coefficients. The documentation gets into more detail and gives references for learning more about robust regression. Briefly, the estimation techniques in this function are supposed to fit the model to just the "good" points (in the words of the authors).

Fit a regression to the good points in the dataset, thereby achieving a regression estimator with a high breakdown point.

The various methods that can be passed to the method argument represent different ways of determining the "good" points and how to do the estimation with them.

Differences between the lm(), lqs(), and rq() function

1 Answers1