1

Let us consider the linear regression model in finite dimensions given by $Y = X \beta + \epsilon$ where $Y \in \mathbb{R}^n, X \in \mathbb{R}^{n \times m}, \beta \in \mathbb{R}^m$, and $ \epsilon \in \mathbb{R}^n$ is the Gaussian noise. I know that to compute the loss function, using the $\ell^2$ or $\ell^p$ error for finite-dimensional spaces is used to measure the misfit.

I am wondering if other norms from functional analysis can be used for linear regression such as the sobolev norms or negative sobolev norms adapted to the finite-dimensional setting.

Is there any literature on this topic? Would it be too overkill to use other types of norms instead of the $\ell^2$ norm for the misfit?

Comments appreciated!

  • This question strikes me as needing some additional information or assumptions. The Sobolev norms involve derivatives. What do you hope the analog of a derivative would be in a finite dimensional space? – whuber Dec 14 '20 at 17:49
  • At the top of my head, the derivatives can be discretized using finite differences. Does this help? I'm primarily seeking references and intuition. – secondrate Dec 14 '20 at 19:55
  • Such discretization would appear to introduce nothing new. You seem to be in pursuit of a question rather than having any particular question to ask. – whuber Dec 14 '20 at 20:18
  • This question came about because I was curious how various norms on the misfit are sensitive to the noise $\epsilon$ for linear regression. Do you have any references on this? I have not encountered this while reading elements of statistical learning. – secondrate Dec 14 '20 at 20:24
  • Perhaps the mention of Sobolev norms has distracted from the main point of using norms that aren't $l^p$ norms (perhaps even metrics that don't come from norms). – Dave Dec 14 '20 at 20:26
  • Birkes and Dodge have a book titled Alternative Methods of Regression which should have some leads. Also, Lugosi, Mean estimation and regression under heavy-tailed distributions—a survey uses alternative methods as well. https://arxiv.org/abs/1906.04280 – user78229 Dec 14 '20 at 20:43
  • The $L^2$ norm is canonical because minimizing your objective with respect to this norm represents an orthogonal projection into the subspace spanned by $X$. – Yashaswi Mohanty Mar 14 '23 at 19:14
  • @YashaswiMohanty Yet minimizing other norms can lead to useful estimators. – Dave Mar 14 '23 at 19:17
  • @Dave of course: the conditional expectation is not the only useful function of the joint distribution. It has some nice properties though, as Hilbert spaces are generally nicer than Banach spaces in general. – Yashaswi Mohanty Mar 14 '23 at 19:22

1 Answers1

0

Minimizing a norm other than $\ell^2$ is, in some sense, just some other extremum estimator. The OLS estimator is an extremum estimator because the estimated parameters are the points giving the minimum of $\vert\vert y - \hat y\vert\vert_2$. If you want to have the objective function of some other extremum estimator to be some other norm $\vert\vert y - \hat y\vert\vert_{\text{other}}$, go for it.

$$ \hat\beta_{\text{ols}} = \underset{\hat\beta}{\arg\min}\{ \vert\vert y - X\beta \vert\vert_2 \}\\ \hat\beta_{\text{other}} = \underset{\hat\beta}{\arg\min}\{ \vert\vert y - X\beta \vert\vert_{\text{other}} \} $$

For instance, minimizing the $\ell^1$ norm leads to at the median. Depending on the situation, this can give a better estimate of the mean than minimizing the $\ell^2$ norm gives. Consequently, more than just $\ell^2$ minimization is useful. Getting away from $\ell^p$ norms, minimizing a weighted norm corresponds to weighted least squares, and a similar idea should apply for generalized least squares, so it is not just $\ell^p$ norms whose minimizations find use in statistics.

Regarding Sobolev norms in particular, I do not see a way for that to make sense, since a Sobolev norm involves derivatives of the function in the function space.

Dave
  • 62,186