1

For weighted OLS, the objective function can be written as

$$ \arg \min_{\beta} ||W^{0.5}(y - X\beta)||^2 $$

This is quite similar to the objective function for plain OLS, except without the $W$ term:

$$ \arg \min_{\beta} ||(y - X\beta)||^2 $$

Now my question is how do we write the analogous weighted forms for weighted lasso and weighted ridge regression?

For unweighted ridge, the objective function is:

$$ \arg \min_{\beta} ||(y - X\beta)||^2 + ||\lambda \beta||^2 $$

and for lasso: $$ \arg \min_{\beta} ||(y - X\beta)||^2 + ||\lambda \beta||_1 $$

It's not clear to me how the weighted form should look like. I imagine the $W$, weights, will be applied to the first norm like it was in OLS, but what about the regularizer term?

24n8
  • 1,137
  • You can separately weight the regularization term and the OLS part. I discuss this possibility at https://stats.stackexchange.com/a/164546/919 in the context of Ridge regression, but similar considerations apply to Lasso. – whuber Nov 21 '23 at 15:39

1 Answers1

2

It's not applied to the regulariser term.

Consider eg a repeated data set.. you can rewrite it with weights for the count of each data point (X,y) ie distinct row. The weights would be on the error term but not on the regularisation term ( to get the same result as the non aggregated form).

The whole idea is that you create a trade-off. The fitting error vs the cost of increasing the complexity of the model. The weights affect the fitting error (eg either as counts or as heteroskedasticity of the datapoints). the regularisation term is on the model coefficient(s). The sum of fitting error term and regularisation term creates a trade off: one unit of fitting error is 'worth' $\lambda$ coefficient units. [Note you have written the objective function 'wrong' it is typically written $\lambda \|\beta\|^2$ ]

this doesn't mean you can't have weights on the regularisation term, just that it means different things (eg it might be used to weight different coefficients differently, or to impose smoothness between neighbouring coefficients), see tikhonov matgrix in https://en.wikipedia.org/wiki/Ridge_regression.

seanv507
  • 6,743