Is performing a quantile regression different than using slope interaction dummies?

Question

Just becoming introduced to the concept of quantile regression. It seems rather useful, but I'm not sure if I completely understand the concept yet.

Does a quantreg essentially set slope interaction dummies for .25, .5, .75, .1 or whatever chosen percentiles?

A common demonstration I see is:

and it appears that the same effect could be achieved through slope interaction (and possibly intercept interaction) dummies. Am I way off base here?

Quantile regression is often performed by optimizing a non-symmetric L1-norm (least squares regression optimizes the L2-norm).
So the effect is obtained by changing how much the positive and negative errors are being weighted in the loss function that is being optimised. — Sextus Empiricus, Jan 30 '20 at 16:51
Slope interaction meaning adding an interaction variable, which would essentially be a dummy that turns on and off depending on if the observation in the specified quantile. Dave outlines a possible equation of that below. — Benjamin Parsons, Jan 30 '20 at 22:49
And interesting about the L1 norm! I wasn't aware of that difference. — Benjamin Parsons, Jan 30 '20 at 22:58
that might be an interresting concept (although you do not really know what to do with that dummy variable, which points would you a sign it to from the start?). What you potentially could do is add an effect that scales with the size of the residuals (so not a dummy variable) which might effectively be like that asymmetric L1-norm and perform some itterative algorithm to readjust the solution (bases on recalculated residuals) and hope it converges to a stable solution. I'll try to see if that works. — Sextus Empiricus, Jan 30 '20 at 23:54

score 3 · Answer 1 · answered Jan 30 '20 at 16:33

The red, green, and light blue lines on your plot each could exist on its own. Linear regression tries to find a line through the middle of the data (loosely speaking). A quantile regression at quantile 0.9 tries to find a line that hugs the top 10% of the data (again, loosely speaking). There is nothing going on with an interaction term. I think you're thinking of a regression like $y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3 x_1x_2$ for a continuous $x_1$ and a categorical $x_2$, which is a separate idea. In fact, you can apply quantile regression to such predictors just like you can apply OLS!

I see what you mean about how the regression lines get twisted to a different slope for different groups, which happens in the linear regression equation I gave. However, quantile regression is a separate idea that's all but unrelated.

I guess I understand that they're different methods, but the appearance is that they give similar results. As far as I'm understanding, quantile regression is a method to deal with differently distributed observations in different regions, but wouldn't the effect of both methods essentially be the same? — Benjamin Parsons, Jan 30 '20 at 22:50
@BenjaminParsons Look at your OLS regression line and your 0.9 quantile regression line. You get totally different results. Even regression at the median can be quite different if you have some weird outliers. — Dave, Jan 30 '20 at 23:58

Sextus Empiricus · Accepted Answer · 2020-01-31T02:07:47.047

You can sort of get the same result with a dummy variable. But, it is not very straightforward. (it involves a parameter which needs to be adjusted and is not known beforehand)

Regular quantile regression by optimizing an asymmetric L1-norm

Note that quantile regression is normally done by optimizing an asymmetric L1-norm of the residuals $r_i = \hat{y}_i-y_i$. For the $q$-th quantile you minimize

$$\text{Loss function} = \sum_{\text{all points $i$}} c_i \text{abs}(r_i) $$

where $c_i = q$ for positive residuals and $c_i = 100-q$ for negative residuals, which makes the L1-norm asymmetric.

This asymmetry has the following effect for the minimum of the loss function:

If there are a q% points below the regression line and 100-q% above the regression line, then moving the line up will reduce 100-q% residuals above the regression line with weight $c_i=q$ and increase q% residuals below the regression line with weight $c_i=100-q$ (those two effects cancel each other, you can't improve the norm, thus you found a minimum, or at least a point where the slope of the change of the norm is zero);

thus the effect is that optimizing this L1-norm is finding the quantile $q$. See also: https://math.stackexchange.com/questions/699685/proof-that-a-median-minimizes-1-norm
In this image, you might see it intuitively. Say you look for the 30% quantile of these 11 points (I took rnorm(11,0,1) with set.seed(1)). Then this line is at the 4th point. Notice that shifting the line will decrease/increase the residuals of 3/7 points below/above the line (or left/right in the image) and these effects will cancel when the residuals have weights 7/3.

The alternative with a dummy variable that interacts with slope and offset.

You can sort of obtain the same effect by introducing an asymmetric dummy variable. Say of the regular function optimizes the L1 norm $\sum abs(r_i)$ for:

$$y_i = a + bx_i + r_i$$

Then now we do this for

$$y_i = a + bx_i + c z_i + d z_i x_i + r_i$$

and $z_i$ is a dummy variable with asymmetric values (by this I mean that it does not have values 0 and 1, but some other values) in order to make the influence of the dummy variable different for the positive and negative residuals. This is done in an iterative scheme (adjusting the values of $z_i$ according to recomputed $r_i$). Then you get something that looks very much like the quantile regression:

See in the simulation image below. The points are simulated data. The three black curves are 10% 50% and 90% quantiles. The two red curves are created with the dummy variables

The reason that this works is because you get a simular effect as with the asymmetric L1 norm. The red curve is obviously not optimizing L1-norm, the least absolute value, however the $z_i$ terms correct for this. The true predicted values are the red curve with plus minus a term $z_i(c + d x_i)$ which will be more or less strong depending on the asymmetry of the value $z_i$ and whether the residuals are positive or negative. Eventually, the red curve will lie in a position where shifting it up or down can be corrected by the interaction terms, but not equally for the negative and positive residuals.

Therefore you get sort of a similar effect. However, the asymmetry value in the regular way is straightforward and is independent of the data. For this method, you need to find it by trial and error. In the example code below (to generate the above image) you see that I needed to select values 0.56 and 0.6

library(L1pack)
library(quantreg)

# some simulated data
set.seed(1)
x <- 31:430
y <- rnorm(length(x),0.03*x,0.01*x )

# scatterplot of data
plot(x,y, pch=21, col=1,bg=1,cex=0.3)

# add lines for .1 .5 and .9 quantiles
# the rq function optimizes an asymmetric L1-norm
mod1 <- rq(y ~ x, tau = .1)
lines(x,predict(mod1))
mod5 <- rq(y ~ x, tau = .5)
lines(x,predict(mod5))
mod9 <- rq(y ~ x, tau = .9)
lines(x,predict(mod9))

#
# add lines according to some (assymetric) interaction variable
# this is done in a loop to repeatedly recalculate the residuals
# according to the new line eventually this should stabilize (although I have no proof for that)
#
modl <- lad(y ~ x) # intial regression for median
for (i in 1:300) {
  # compute z according to the sign of the residual
  z <- sign(y-modl$coefficients[1]-modl$coefficients[2]*x)+0.56
  # perform the regression with the interaction
  modl <- lad(y ~ x+z*x)
}
modl
# plot the line 
# (this is without the interaction part,
#  which comes on top of this line and 
#  will correct for the larger L1-norm of residuals 
#  in comparison to this line without the effect)
lines(x,modl$coefficients[1]+modl$coefficients[2]*x,col=3)


modl <- lad(y ~ x)
for (i in 1:300) {
  z <- sign(y-modl$coefficients[1]-modl$coefficients[2]*x)-0.6
  modl <- lad(y ~ x+z*x)
}
modl
lines(x,modl$coefficients[1]+modl$coefficients[2]*x,col=2)

Interresting to note is that you can do this itterative stuff also with the least squares method, and find a minimal L1norm by adjusting weights.

q = 0.5
modlm <- lm(y ~ x)
for (i in 1:100) {
  r <- predict(modlm)-y
  w <- (q*sign(r)+0.5-0.5*sign(r))/abs(r) 
  modlm <- lm(y ~ x, weights = w)
}
modlm

But to be honest, my expertise ends here and I do not know whether this method is typically used for performing quantile regression (what I do know is that this iterative reweighted least squares is used to perform GLM, also while looking for a reference I found that there is a link with minimizing other norms: https://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares).

Is performing a quantile regression different than using slope interaction dummies?

2 Answers2

Regular quantile regression by optimizing an asymmetric L1-norm

The alternative with a dummy variable that interacts with slope and offset.