6

Bridge regression coefficient estimate $\hat{β}^{br}$ are the values that minimize the \begin{equation} \text{RSS} + \lambda \sum_{j=1}^p|\beta_j|^q , \end{equation} where $q \in \mathbb{R}$ and $q > 0 $.

My question is: why this kind of regression called BRIDGE regression?

I know that in 1993 Frank and Friedman proposed this in (1). However, at that time in that paper, there was no term like "bridge" nor "bridge regression". Confusingly, just 3 years later in 1996, Robert Tibshirani in the paper (2) cited the paper (1) using the term "bridge", viz., in section 11:

Frank and Friedman (1993) discuss a generalization of ridge regression and subset selection, through the addition of a penalty of the form $\lambda \sum_{j=1}^p|\beta_j|^q$ to the residual sum of squares. This is equivalent to a constraint of the form $\sum_{j}|\beta_j|^q \le t$; they called this the 'bridge'.

Emmm... They called? When the word "bridge" even do not occur in (1)?

I search on Google scholar and find no more paper before (2) citing (1), so where the word "bridge" come from? Do I miss something important?

I think my question might be related to Why is ridge regression called "ridge", why is it needed, and what happens when $\lambda$ goes to infinity?


References:

  1. A Statistical View of Some Chemometrics Regression Tool (pdf)
  2. Regression Shrinkage and Selection via the Lasso (pdf)

1 Answers1

4

The word "bridge" does not occur in the particular reference. But in other references it does occur. For instance equation 33 in Friedman, Jerome H. "An overview of predictive learning and function approximation." From statistics to neural networks (1994).

Another approach is to approximate the discontinuous penalty (30) by a close continuous one, thereby enabling the use of numerical optimization. This is motivated by the observation that both (28) and (29) (30) can be viewed as two points on a continuum of penalties, such as $$\eta_q(\theta_1,\dots,\theta_p) = \sum_{j=1}^p |\theta_j|^q \quad\text{("bridge")} \tag{33} $$ (Frank and Friedman, 1993), or $$\eta_q(\theta_1,\dots,\theta_p) = \sum_{j=1}^p \frac{(\theta_j/w)^2}{1+(\theta_j/w)^2} \quad\text{("weight decay")} \tag{34}$$ (Wiegand, Huberman and Rumelhart, 1991). With the "bridge" penalty (33) $q=2$ yields the ridge penalty (28), whereas subset selection (29) (30) is approached in the limit as $q \to 0$.

Therefore if "bridge" is meant to be the figurative bridge between two points as Kjetil mentioned as possiblity in the comments, then it is a bridge between subset selection and ridge and not between Lasso and ridge. Lasso didn't exist yet when this "bridge" penalty was conceptualized.