6

I am searching for a theorem that gives upper bounds for the bias of the Lasso estimator from Tibshirani[1]. Do anybody know such a theorem?

[1] Tibshirani, R., (1996).
“Regression Shrinkage and Selection via the Lasso”,
Journal of the Royal Statistical Society. Series B (methodological) 58:(1), p 267–88.

Glen_b
  • 282,281
Markus
  • 63
  • 3
  • 4
    I doubt very much there is any such theorem, because with a high enough regularization parameter the estimates will be driven to zero, and the selection of the regularization parameter e.g. through cross-validation is ad-hoc (which makes it hard to analyze). – jbowman Feb 05 '16 at 23:08
  • 1
    @jbowman: yes, but if you could get an estimate of the bias conditional a given value of the penalty (which seems plausible), then this seems it could answer the question. – Cliff AB Feb 06 '16 at 03:59
  • 4
    If the underlying ordinary least squares regression problem is rank deficient, then the bias can be arbitrarily large (that is, there is no bound) for any fixed value of the regularization parameter. Even if it is just very badly conditioned, the bias can be huge. – Brian Borchers Feb 06 '16 at 04:13
  • Lasso is used for correlated predictors. That seems to be a contradiction that the bias is huge if it is badly conditioned? @Brian: How you can see that bias is huge if it is badly conditioned? – Markus Feb 06 '16 at 04:35
  • @Brian: Could you cite a paper or a book which contains a mathematical proof? Thanks a lot! – Markus Feb 06 '16 at 04:43
  • 2
    @Markus I believe that Lasso performs poorly for very correlated predictors. If my memory is good, it is discussed in the new book by Tibshirani & co "Statistical Learning with Sparsity". http://web.stanford.edu/~hastie/StatLearnSparsity/index.html – Vladislavs Dovgalecs Feb 06 '16 at 06:38

1 Answers1

5

Suppose that we have a rank deficient least squares problem

$\min \| X\beta -y \|_{2}$

where $\alpha$ is a nonzero vector in the null space of $X$. That is, $X\alpha=0$.

The lasso estimator can be formulated either as a constrained least squares problem or as unconstrained problem with an objective that includes the least squares term and the one-norm regularization. I'll used the constrained formulation:

$\min \| X \beta - y \|_{2}^{2} $

subject to

$\| \beta \|_{1} \leq t $

where $t$ is a fixed regularization parameter. Let $\beta_{L}$ be the Lasso estimator obtained by solving this constrained optimization problem. Let $\beta_{2}$ be the minimum 2-norm solution to the least squares problem and suppose that $\| \beta_{2} \|_{1} \leq t$.

Now consider what happens if the true $\beta$ that we're trying to estimate is of the form

$\beta=\beta_{2}+ s \alpha$,

where $s$ is a very large scalar. Since the $s \alpha$ term has no effect on the least squares objective, but greatly increases $\| \beta \|_{1}$, we could have a true $\beta$ that is arbitrarily far from the Lasso estimator and has just as good a least squares objective value as $\beta_{2}$.

If you use a Bayesian approach you can avoid this issue by specifying a prior that effectively rules out such solutions.