6

It is easy to show using matrix algebra when least squares will produce bias.

\begin{equation} \begin{split} \text{E}[B]& = \text{E}[(X'X)^{-1}]\times\text{E}[X'Y] \\ & = \text{E}[(X'X)^{-1}]\times\text{E}[X'(XB +\epsilon)] \\ & = \text{E}[(X'X)^{-1}]\times\text{E}[X'XB] + \text{E}[(X'X)^{-1}]\times\text{E}[X'\epsilon] \\ & = B + \text{E}[(X'X)^{-1}]\times\text{E}[X'\epsilon] \\ \end{split} \end{equation}

Bias is introduced in the last term when there is correlation between X and e.

My question is, how do we know when $\text{E}[B]$ for the LAD estimator will be biased? Will bias arise when there is correlation between $X$ and $\epsilon$, as in least squares? Or is quantile regression robust to correlation between $X$ and $\epsilon$? I'm guessing it can't be demonstrated using matrix algebra because the LAD estimator is the following:

\begin{align} B(\tau) = \operatorname{argmin} \text{E}[\rho(Y_i - X'B)] \end{align}

and the LAD estimator is not calculated using linear algebra (?). If that's right, then how can we demonstrate when quantile regression will produce bias? Use a monte carlo simulation?

Richard Hardy
  • 67,272
Devon
  • 203
  • 1
    Please clarify. If we condition on $X$, $X$ is non-stochastic and doesn't correlate with a random variable in the usual sense. – Frank Harrell Jul 30 '13 at 20:06
  • 1
    What are your assumptions? It is almost always the case that $\mathbb{E}[e]=0$, whence your bias term reduces to zero in the usual application where $X$ is fixed. But if you're viewing $X$ as random, your expression for the expectation doesn't simplify in the way you write. – whuber Jul 30 '13 at 20:07
  • You should have a look at . CHernozhukov and Hansen "Quantile Models with Endogeneity", 2012, forthcoming Annual Review of Economics : http://www.mit.edu/~vchern/papers/IVQRReview5.pdf – PAC Jul 30 '13 at 21:55

1 Answers1

5

If you have a model $$ Y_i = X_i B(\tau) + \epsilon_i(\tau) $$ then a sufficient condition for $\tau$-quantile regression to give an unbiased estimate of $B(\tau)$ is that the $\tau$-th quantile of $\epsilon(\tau)$ conditional on $X$ is zero. This follows from the fact that (i) the sample quantile regression objective function, $\mathbb{E}_n[ \rho_\tau(Y-X'\beta)]$ converges uniformly to $\mathrm{E}[\rho_\tau(Y-X'\beta)]$ and (ii) $\mathrm{E}[\rho_\tau(Y-X'\beta)]$ is "uniquely" (actually something slightly stronger is needed) minimized at $B(\tau)$. (i) will be true under standard regularity conditions. (ii) is true because $\mathrm{E}[\rho_\tau(Y-X'\beta)]$ is convex as a function of $\beta$ and its first order condition can be written $$ \tau \mathrm{E}[ P(Y - X\beta>0|X) X] = (1-\tau) \mathrm{E}[ P(Y - X\beta<0|X) X] $$ which is satisfied at $\beta = B(\tau)$ if $Q_\tau(\epsilon(\tau)|X) = 0$. You can also see from this that $Q_\tau(\epsilon(\tau)|X) = 0$ is stronger than needed, but there doesn't seem to be an easy to interpret weaker condition.

None of the above tells you what the bias in $\hat{B}(\tau)$ would be if $Q_\tau(\epsilon(\tau)|X) \neq 0$. I don't know of a general expression for the bias like you can get for OLS, but you can get some nice results in a few cases. For example, Angrist, Chernozhukov, and Fernandez-Val (2006) give an omitted variables bias formula for quantile regression. If your model satisfies the conditions above, and $X = (X_1, X_2)$, and then you estimate a quantile regression leaving out $X_2$, then the expectation of your estimated coefficient on $X_1$ is $$ \beta_1(\tau) + \mathrm{E}[w_\tau(X) X_1'X_1]^{-1} \mathrm{E}[w_\tau(X)X_1' (X_2' \beta_2(\tau))] $$ where $w_\tau(X)$ are some weights that depend on $X$, $\tau$, and the distribution of $\epsilon$.

paul
  • 1,016
  • If we have $\epsilon_i\overset{iid}{\sim} N(0,1)$ and fit a quantile regression at $\tau=0.75$, then the $0.75$ quantile of $\epsilon_i$ is not 0. Does this mean that the $0.75$ quantile regression is biased? – Dave May 21 '20 at 19:57
  • Not necessarily. We need to be more specific about what we're assuming about the data generating process and what we want to estimate. Suppose $y_i = \beta_0 + \epsilon_i$ with $\epsilon \sim^{iid} N(0,1)$. We can rewrite this as $y_i = B_0(\tau) + \epsilon_i(\tau)$ with $B_0(\tau) = \Phi^{-1}(\tau) + \beta_0$ and $\epsilon_i(\tau) = \epsilon_i - \Phi^{-1}(\tau)$. Then $\epsilon_i(\tau)$ satisfies the conditions above, and quantile regression consistently estimates this $B_0(\tau)$. – paul May 22 '20 at 20:19
  • What, then, keeps the requirement to get unbiased parameter estimates from being that $\mathbb{E}[\epsilon_i]=0?$ – Dave May 22 '20 at 20:26