Is my intuition behind the weight matrix correct for quantile regression?

Question

Motivating Question

I was asked by a colleague today why one would run a quantile regression on quantiles that are "extreme" (such as $.10$, $.90$, etc.), if there are too few observations in that quantile (say a total $n=200$ and the quantile contains some division of this number). This perplexed me, as this is the second time I've been asked this question but my way of explaining it may be limited. As it is my understanding, quantile regressions estimate conditional quantiles based on a weight matrix that observes all of the values in a distribution, and consequently the location of the quantile shifts the "fulcrum" of the weighting rather than only estimating that specific quantile's data points.

Indeed, when we run an OLS regression, we expect a naive estimate to be the mean of $y$, noted $\bar{y}$, and the conditional mean of $y$ to be the expected value of the conditional mean of $y$ given $x$, or $E(y|x)$. This is not estimated with just the values around the mean, but the entire distribution of values. Similarly for the quantile regression framework, we instead condition the expectation to be $Q(y|x)$ instead, where $Q$ is the conditional quantile of $y$. Because of the estimation method using absolute residuals rather than sums of squared residuals, my best guess of how to visualize this is to plot lines of residuals based on their weighting for a given distribution.

Solution

Here I have run some R code for a quantile regression which estimates $\tau = .25$, or the conditional 25th quantile. I have changed the size of the residual lines under the fitted line to resemble that the residuals here are given proportionate weights based on the fitting.

#### Load Libraries ####
library(quantreg)
library(tidyverse)
Sim Data
set.seed(123)
x <- rnorm(200)
y <- (.40 * x) + rnorm(200)
plot(x,y)
Fit Q25 Regression
qu <- .25
fit <- rq(
  y ~ x,
  tau = qu
)
summary(fit)
Plot
broom::augment(fit) %>% 
  mutate(weight = ifelse(.resid > 0, "Higher","Lower")) %>% 
  ggplot(aes(x=x,y=y))+
  geom_point(
    size = 3,
    color = "steelblue"
  )+
  stat_quantile(
    quantiles = .25
  )+
  geom_segment(
    aes(xend=x,
        yend=.fitted,
        linewidth = ifelse(weight == "Higher",
                           .25, .75),
        alpha = .3)
  )+
  theme_classic()+
  theme(legend.position = "none")+
  labs(x="Simulated X",
       y="Simulated Y",
       title="Weighted Observations for Q25")

Shown below:

Is this a correct way to visualize this? Is my understanding incorrect?

Edit

Whuber's advice provides a useful test case for my assumptions. Running the same regression with an extreme quantile of $\tau = .001$ gives us this:

#### Load Libraries ####
library(quantreg)
library(tidyverse)
Sim Data
set.seed(123)
x <- rnorm(200)
y <- (.40 * x) + rnorm(200)
Fit Q001 Regression
qu <- .001
fit <- rq(
  y ~ x,
  tau = qu
)
summary(fit)
Plot
broom::augment(fit) %>% 
  mutate(weight = ifelse(.resid > 0, "Higher","Lower")) %>% 
  ggplot(aes(x=x,y=y))+
  geom_point(
    size = 3,
    color = "steelblue"
  )+
  stat_quantile(
    quantiles = .001
  )+
  geom_segment(
    aes(xend=x,
        yend=.fitted,
        linewidth = ifelse(weight == "Higher",
                           .001, .999),
        alpha = .3)
  )+
  theme_classic()+
  theme(legend.position = "none")+
  labs(x="Simulated X",
       y="Simulated Y",
       title="Weighted Observations for Q001")

Which gives us this plot:

But this seems wrong on two fronts: 1) The values here are so extreme that there are literally no points where you can find residuals, which in the case I am considering this isn't reality (fitting to $\tau = .01$ gives similar results 2) the weighting of the lines now looks bad (perhaps based on poor coding) and so this doesn't instruct me further on what is wrong/right here.

Do you mean if the sum of the areas below equals the sum of the areas above? — Dave, Nov 08 '23 at 13:12
Yes, I assume that because we use absolute residuals, we are weighting their magnitude based on where the quantile is fit, thus if there are a lower number of data points below $Q25$, then this is weighted appropriately to properly estimate the range of values that makes the residuals sum to zero in some sense. — Shawn Hemelstrand, Nov 08 '23 at 13:16
What would be the equation that you expect to sum to zero? Do you just mean the usual quantile loss function but without the absolute values? $$\sum_i \begin{cases} \tau( y_i - \hat y_i) & y_i - \hat y_i \ge 0 \ (1 - \tau)( y_i - \hat y_i), & y_i - \hat y_i < 0 \end{cases}$$ — Dave, Nov 08 '23 at 14:06
Maybe "sum to zero" isn't the best way to explain that since I this is an OLS term. What I mean to convey through visualization is how the regression fit weights each data point based on location under/over the conditional quantile. In the OLS/Q50 sense we expect 50% of the values to lie above and below the regression line. With a Q25 regression, we expect 25% to fall below the line, where we weight this in some way OLS doesn't. So while I know how pinball loss generally works from previous explanations here (loss attributed to the sign of the residual), visualizing this weighting is unclear. — Shawn Hemelstrand, Nov 08 '23 at 14:17
I believe that experimenting with a more extreme case will help you understand your colleague's point. Try fitting, say, the 0.001 conditional percentile. — whuber, Nov 08 '23 at 14:22

Alecos Papadopoulos · Accepted Answer · 2023-11-10T16:47:43.187

The OP wrote

As it is my understanding, quantile regressions estimate conditional quantiles based on a weight matrix that observes all of the values in a distribution,(...)

Two pieces of information on the matter

In the paper that introduced Quantile Regression, Koenker, R., & Bassett Jr, G. (1978). Regression quantiles. Econometrica, Theorem 3.1 tells us that the coefficient vector related to a quantile, is equal to a linear function of a subset of the sample. Let $\{y,X\}$ denote the sample (dependent variable, regressor matrix), and let $h$ be a set containing some observation indices, with cardinality equal to the number of regressors (columns of $X$). So $\{y(h), X(h)\}$ represents some subset of the sample and $X(h)$ is a square matrix. Then Theorem 3.1 states, that for quantile probability $\tau$, the solution/optimizer vector is $$\beta(\tau)^* = X(h)^{-1}y(h).$$ This, for example, means that if your regressor matrix has two columns (say, one constant and one variable), then the coefficient vector (for each $\tau$) will be determined by the above function of just two observations.
In Koenker's book "Quantile Regression" (2005), the author writes (p. 11)

One occasionally encounters the view that quantile regression estimators must "ignore sample information" since they are inherently determined by a small subset of the observations. This view neglects the obvious fact that all the observations participate in which "basic" observations are selected as basic.

He refers to how the members of $h$ are chosen.

These properties of quantile regression come from the fact that it can be formulated as a linear programming problem.

Is my intuition behind the weight matrix correct for quantile regression?

Motivating Question

Solution

Sim Data

Fit Q25 Regression

Plot

Edit

Sim Data

Fit Q001 Regression

Plot

1 Answers1