Why in the Ridge regression, the coefficients cannot be 0?

Question

In the second answer (https://stats.stackexchange.com/a/368426/287815) to the question (Why will ridge regression not shrink some coefficients to zero like lasso?), the OP found out that,

$β = /(^2+λ)$

My question:

Can't the $\beta$ be 0 if one of x or y is a zero vector?

The $x$ and $y$ are vectors (or at least collections of numbers), so it will be important to define what it means to multiply two vectors, square a vector, and add a scalar $\lambda$ to a squared vector. // Since they are vectors (or at least collections of multiple numbers), what does it mean for $x$ or $y$ to be zero? — Dave, Aug 28 '22 at 19:45
@Dave, Just edited my question: can't anyone of x or y be a zero vector? — Deepak Tatyaji Ahire, Aug 28 '22 at 19:47
$xy$ is really $\mathbf x \cdot \mathbf y$ which if zero may make linear regression not really going to work — Henry, Aug 28 '22 at 20:01
@Henry I may be wrong or I do not fully understand what you are trying to say, but y can be a zero vector, correct? And if it is, then our regression line is nothing but the X-axis. Correct me if I am wrong. — Deepak Tatyaji Ahire, Aug 28 '22 at 20:09
the point is that it is not shrunk to zero. It is zero for all values of $\lambda$ (including for plain OLS). the original question was about shrinking coefficients to zero for lambda large enough — seanv507, Aug 28 '22 at 20:18
If all your observations of the "dependent variable" are $0$ then you may decide to always predict $0$. This is not really linear regression - you have not used the "independent variable" — Henry, Aug 28 '22 at 20:18

Tim · Answer 1 · 2022-08-29T07:04:29.780

7

What the referred answers say is that in ridge regression the regularization does not shrink the parameters to exact zeros. It doesn't say that the parameters cannot be zeros, just that it's not what the regularization does. Your example with zero vectors is a pathological case and even without regularization, the parameter would be zero, so it has nothing to do with regularization.

edited Aug 29 '22 at 07:04

answered Aug 28 '22 at 20:30

Tim

138,066

Hey @Tim Can you please tell me what is the difference between "shrinking to exact 0" and "parameters can be 0" mean? – Deepak Tatyaji Ahire Aug 28 '22 at 20:40
@DeepakTatyajiAhire I mean that regression not using regularization would find the parameter equal to zero, so regularization would play no role here. – Tim Aug 29 '22 at 05:06
I like the answer but would say something like "non-generic" or "special" rather than "pathological" ... – Ben Bolker Sep 01 '22 at 19:45

score 1 · Answer 2 · answered Sep 22 '22 at 08:11

what is the difference between "shrinking to exact 0" and "parameters can be 0"

If the OLS solution is non-zero (which means that $y$ is non-zero) then the ridge regression regularisation will not be able to shrink the parameters to exact 0.

As you found out the parameters of ridge regression can be zero when $y=0$ ($x=0$ makes no sense*). But that is only the case when $y=0$ and in that case the OLS solution is also zero (so it is not non-zero and there is nothi g to shrink). For this case it is not due to shrinking that the ridge regression parameters are zero, but due to the OLS solution being already zero.

BTW when you observe a continuous variable then the case of a zero $y$ vector and a zero coefficient will have zero probability.

* The vector $x$ won't be zero. Because that would be a useless model $y = \beta \cdot 0$.

Why in the Ridge regression, the coefficients cannot be 0?

2 Answers2

Linked