Shrinkage operator for elastic net regularization

Question

Consider the elastic net problem

$$\min_x f(x) + \lambda_1 \Vert x \Vert_1 + \lambda_2 \Vert x \Vert_2^2$$

Is there a shrinkage operator for this objective function, similar to the soft thresholding operator for L1 regularization (which in this case would be $\text{sgn}(x) (\vert x \vert - \lambda_1)_+$)?

To elaborate:

In glmnet, for linear regression, given a design matrix $X \in \mathbb{R}^{n \times p}$ and a vector of observations $y \in \mathbb{R}^n$, the elastic net problem is of the form

$$\min_{\beta \in \mathbb{R}^p} \frac{1}{2n}\Vert y - X \beta \Vert_2^2 + \alpha \lambda \Vert \beta \Vert_1 + (1-\alpha)\frac{\lambda}{2}\Vert \beta \Vert_2^2$$

where $\alpha \in [0,1]$ is the elastic net parameter. When $\alpha = 1$ this is clearly equivalent to lasso linear regression, in which case the proximal operator for L1 regularization is soft thresholding, i.e.

$$\text{prox}_{\lambda \Vert \cdot \Vert_1}(v) = \text{sgn}(v)(\vert v \vert - \lambda)_+$$

My question is: When $\alpha \in [0,1)$, what is the form of $\text{prox}_{\alpha\lambda\Vert\cdot\Vert_1 + (1-\alpha)\frac{\lambda}{2}\Vert \cdot\Vert_2^2}$ ?

I think the question and notation are not clear to me. Are you saying sign instead of sgn?, also, the parenthesis are not in pairs ... — Haitao Du, Sep 25 '16 at 01:37
@hxd1011 "sgn" is widely used (see e.g. https://en.wikipedia.org/wiki/Sign_function) and the parentheses appear to be in pairs to me. — mark999, Sep 25 '16 at 01:49
sorry my bad ..., but i still do not understand what is $\text{sgn}(x) (\vert x \vert - \lambda_1)_+$? — Haitao Du, Sep 25 '16 at 01:50
That's the soft-thresholding operator for L1 regularization. Since $\Vert x \Vert_1$ is not differentiable with respect to $x$, we can first solve $\min_x f(x)$ and then use the soft thresholding operator to set the elements in $x$ that have absolute value less than $\lambda$ to 0 (in an iterative procedure). My question is: Is there an equivalent shrinkage operator for elastic net regularization? — user3294195, Sep 25 '16 at 01:54
do not understand "first solve min f(x) then use soft thresholding ...", we can directly solve it, see this post. we can directly write the derivative of $|x|$ as sign(x), most solver works — Haitao Du, Sep 25 '16 at 04:08
@hxd1011 I am not certain, but I believe the OP is effectively referring to the combination of this and this. — GeoMatt22, Sep 25 '16 at 05:06
@hxd The $(\cdot)_+$ operator simply returns its argument if its positive and is 0 otherwise. — Glen_b, Sep 25 '16 at 06:51
@GeoMatt22 my question is related to the second link in your comment. I've added more detail, in case the original question was unclear. — user3294195, Sep 25 '16 at 17:42
You'll likely be interested in this monograph. In particular, see page 189 (as paginated on the page; it's page 71 of the pdf) where it states $$\mathrm{prox}{\lambda f} (v) = \left(\frac{1}{1+\gamma \lambda} \right)\mathrm{prox}{\lambda |\cdot|_1}(v)$$ where $\gamma$ is a slightly different parameterization than the one you're considering. — cardinal, Sep 25 '16 at 17:58
Thanks - I'm familiar with this paper, but for some reason, the proximal operator for elastic net defined in this way does not replicate the results of glmnet linear regression. It does match the function I've defined in this question though: http://stats.stackexchange.com/questions/236866/replicating-results-for-glmnet-linear-regression-using-a-generic-optimizer — user3294195, Sep 25 '16 at 22:29

score 4 · Accepted Answer · answered Jul 07 '17 at 21:46

The two tutorials, Ryu et al. 2016 and Parikh et al. 2013 give a nice overview of operator splitting methods involving gradient operator, proximal operator. Proximal operator is a special case of resolvent operator.

The resolvent of a relation A on $\mathbb{R}^n$ is defined as $$ R = (I-\lambda A)^{-1} $$

where $I$ is the identity function.

$\text{prox}_f(x)$, the proximal operator of function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is defined as $$ \text{prox}_{\lambda f}(x) = \arg\min_z(\lambda f(z) + 1/2\|z-x\|_2^2) $$

Here, we only consider $f$ is convex. We can reduce $\text{prox}_{\lambda f}(x)$ to $R_{\lambda \partial f}(x)$ by: let $z \in \text{prox}_{\lambda f}(x)$ \begin{align} z &= \arg \min_z \lambda f(z)+ 1/2 \|z-x\|_2^2 \\ 0 &\in \lambda\partial f(z) + z - x \\ x &\in \lambda\partial f(z) + z \\ z &= (I + \lambda \partial f)^{-1}(x) \end{align}

A nice property for proximal operator is that iteratively solving for $\text{prox}_{\partial f}(x)$ has linear convergence rate for strongly convex function. For $\alpha \neq 1$, the elastic net objective function is $(1-\alpha)\lambda/2$-strongly convex.

Applying the similar trick to the elastic net objective function, let $z \in prox_{elastic}(\beta)$, \begin{align} z &= \arg \min_z 1/2n\|Y - Xz\|_2^2 + \alpha \lambda\|z\|_1 + (1-\alpha)\lambda/2 \|z\|_2^2 + 1/2\|z -\beta\|_2^2 \\ 0 &\in \frac{X^TXz - Y^TX}{n} + \alpha\lambda \text{sgn}(z) + (1-\alpha)\lambda z+ z - \beta \end{align}

There is no closed form for $z$, unless we impose orthogonal design $X^TX = I$. Then, the first order condition is separable for different coordinates. \begin{align} 0 &\in z_i/n - (Y^TX)_i/n + \alpha \lambda \text{sgn}(z_i) + (1-\alpha)\lambda z_i + z_i -\beta_i \\ \beta_i - ((1-\alpha)\lambda + 1+ 1/n)z_i + (Y^TX)_i&\in \alpha\lambda \text{sgn}(z_i) \end{align} let $t_i = \frac{\beta_i + (Y^TX)_i }{(1-\alpha)\lambda + 1+ 1/n}$, then $$ z_i = sgn(t_i)(|t_i| - \frac{\alpha \lambda}{(1-\alpha)\lambda + 1+ 1/n})_+ $$

Iteratively applying the proximal operator to $\beta_{init}$, you will get $\beta^*$, where $\beta^* = \text{prox}_{elastic}(\beta^*)$, which implies $\partial f (\beta^*)= 0 $

$\text{prox}_{\alpha\lambda\|\beta_i\|_1 + (1-\alpha)\lambda \| \beta_i\|_2^2}$ is even easier to computer, \begin{align*} \beta_i - ((1-\alpha)\lambda +1 )z_i \in \alpha \lambda \text{sgn}(z_i) \end{align*} Again, let $t_i = \frac{\beta_i}{(1-\alpha)\lambda +1}$ $z_i = \text{sgn}(t_i)(|t_i| -\frac{\alpha\lambda}{(1-\alpha)\lambda + 1})_+ $.

Please let me know if I made any numerical mistake. Good luck

How would you solve this similar problem - https://math.stackexchange.com/questions/2595199? — Royi, Mar 13 '18 at 15:59

Shrinkage operator for elastic net regularization

1 Answers1