Motivating Question
I was asked by a colleague today why one would run a quantile regression on quantiles that are "extreme" (such as $.10$, $.90$, etc.), if there are too few observations in that quantile (say a total $n=200$ and the quantile contains some division of this number). This perplexed me, as this is the second time I've been asked this question but my way of explaining it may be limited. As it is my understanding, quantile regressions estimate conditional quantiles based on a weight matrix that observes all of the values in a distribution, and consequently the location of the quantile shifts the "fulcrum" of the weighting rather than only estimating that specific quantile's data points.
Indeed, when we run an OLS regression, we expect a naive estimate to be the mean of $y$, noted $\bar{y}$, and the conditional mean of $y$ to be the expected value of the conditional mean of $y$ given $x$, or $E(y|x)$. This is not estimated with just the values around the mean, but the entire distribution of values. Similarly for the quantile regression framework, we instead condition the expectation to be $Q(y|x)$ instead, where $Q$ is the conditional quantile of $y$. Because of the estimation method using absolute residuals rather than sums of squared residuals, my best guess of how to visualize this is to plot lines of residuals based on their weighting for a given distribution.
Solution
Here I have run some R code for a quantile regression which estimates $\tau = .25$, or the conditional 25th quantile. I have changed the size of the residual lines under the fitted line to resemble that the residuals here are given proportionate weights based on the fitting.
#### Load Libraries ####
library(quantreg)
library(tidyverse)
Sim Data
set.seed(123)
x <- rnorm(200)
y <- (.40 * x) + rnorm(200)
plot(x,y)
Fit Q25 Regression
qu <- .25
fit <- rq(
y ~ x,
tau = qu
)
summary(fit)
Plot
broom::augment(fit) %>%
mutate(weight = ifelse(.resid > 0, "Higher","Lower")) %>%
ggplot(aes(x=x,y=y))+
geom_point(
size = 3,
color = "steelblue"
)+
stat_quantile(
quantiles = .25
)+
geom_segment(
aes(xend=x,
yend=.fitted,
linewidth = ifelse(weight == "Higher",
.25, .75),
alpha = .3)
)+
theme_classic()+
theme(legend.position = "none")+
labs(x="Simulated X",
y="Simulated Y",
title="Weighted Observations for Q25")
Shown below:
Is this a correct way to visualize this? Is my understanding incorrect?
Edit
Whuber's advice provides a useful test case for my assumptions. Running the same regression with an extreme quantile of $\tau = .001$ gives us this:
#### Load Libraries ####
library(quantreg)
library(tidyverse)
Sim Data
set.seed(123)
x <- rnorm(200)
y <- (.40 * x) + rnorm(200)
Fit Q001 Regression
qu <- .001
fit <- rq(
y ~ x,
tau = qu
)
summary(fit)
Plot
broom::augment(fit) %>%
mutate(weight = ifelse(.resid > 0, "Higher","Lower")) %>%
ggplot(aes(x=x,y=y))+
geom_point(
size = 3,
color = "steelblue"
)+
stat_quantile(
quantiles = .001
)+
geom_segment(
aes(xend=x,
yend=.fitted,
linewidth = ifelse(weight == "Higher",
.001, .999),
alpha = .3)
)+
theme_classic()+
theme(legend.position = "none")+
labs(x="Simulated X",
y="Simulated Y",
title="Weighted Observations for Q001")
Which gives us this plot:
But this seems wrong on two fronts: 1) The values here are so extreme that there are literally no points where you can find residuals, which in the case I am considering this isn't reality (fitting to $\tau = .01$ gives similar results 2) the weighting of the lines now looks bad (perhaps based on poor coding) and so this doesn't instruct me further on what is wrong/right here.

