3

From my question here, it is evident that estimation approaches to linear regression other than ordinary least squares can result in the predictions and residuals lacking orthogonality, despite the model being linear.

What, if any, approaches to estimating the $\beta$ of $y=X\beta+\epsilon$ are there that are not equivalent to ordinary least squares (yield a different answer than $\hat\beta=(X^TX)^{-1}X^Ty$) yet still yield this orthogonality?

Let’s rule out $\hat\beta=\vec 0$. If it happens that $X\hat\beta=X\hat\beta_{ols}$, so be it, but that is not a requirement.

Dave
  • 62,186
  • I think OLS is the unique estimator which minimizes any weighted sum of squares which has this property. (due to uniqueness of the orthogonal projection operator). – John Madden Oct 03 '22 at 23:22
  • You can always adjust the predictions of any other estimator by adding a constant. In most cases the constant can be chosen to make the orthogonality condition hold. The size of this constant is $O(n^{-1}).$ – whuber Oct 03 '22 at 23:33
  • 1
    @whuber That’s interesting, but it seems to contradict what Ben posted. Perhaps you could expand on that in an answer, please. – Dave Oct 04 '22 at 00:21
  • It's not a contradiction, because the prediction does not comprise the entire column space of the design matrix. – whuber Oct 04 '22 at 13:38
  • Could you clarify what you mean by "lacking orthogonality"? After all, you could (artificially and completely arbitrarily) construct a different design matrix $Z$, use it to compute another estimator $\tilde y$ using OLS, and $\tilde y$ would satisfy the equations on your linked question. This $\tilde y$ would usually not be the least squares estimator associated with $X,$ of course! – whuber Oct 04 '22 at 22:28

1 Answers1

1

For the benefit of completeness, I'll note that you have ruled out the case where $\hat{\boldsymbol{\beta}}=\mathbf{0}$, which gives the zero vector for the predicted response. The zero vector is (trivially) orthogonal to any other vector, so this estimator also gives a predicted response vector that is (trivially) orthogonal to the residual vector.

Setting aside this special case, or mixtures with this case, the hat matrix from OLS is the unique projection matrix that gives an orthogonal projection onto the column space of the design matrix. This projection uniquely defines OLS estimation (equivalently MLE under the Gaussian linear model) and so there are no other estimation methods that use this orthogonal projection that are not equivalent to OLS. This means that the only way you can get a guarantee of (non-trivial) orthogonality is to use OLS estimation.

Ben
  • 124,856
  • I’d like to think this is true, but what about $\hat\beta=\vec 0?$ Is that just the “it’s orthogonal to everything” zero vector solution? Then what about the comment by @whuber related to the $O(n^{-1})$ constant? – Dave Oct 04 '22 at 00:21
  • @Dave: The estimator you give is ruled out in the text of the question. Re the comment by whuber, I take the requirement to "guarantee" orthogonality to mean that orthogonality must occur under every possible dataset. – Ben Oct 04 '22 at 01:21
  • The uniqueness of orthogonal projectors is only directly relevant to linear estimators of $\hat{\beta}$, right? – John Madden Oct 04 '22 at 01:32
  • I guess what I wonder is if the zero vector is another option that results in orthogonality (in addition to the OLS solution), what keeps there from being a third? What is the proof that the only $\hat\beta$ options there are for achieving this orthogonality are the OLS solution and the zero vector? // I do not follow what you mean by occurring under every possible dataset. Then again, I don’t really follow Whuber’s comment and hope he will post a full answer that expands on it. – Dave Oct 04 '22 at 01:35
  • 1
    @Dave: While it is outside the scope of the question, if you use the estimator $\hat{\beta}=0$ then the predicted response vector is also the zero vector, which is trivially orthogonal to any other vector (including the residual vector). Re proof of the remaining case, it can be constructed by noting that a guarantee of orthogonality requires the use of an orthogonal projection onto the column-space of the design matrix (see e.g., this related question) and this projection is uniquely defined. – Ben Oct 04 '22 at 02:07
  • Mixtures with the special case of the zero vector? What do you mean? – Dave Oct 04 '22 at 06:00
  • 1
    @Dave: I mean that you could create an estimator that is the OLS estimator with some arbitrary probability $\phi(\mathbf{x},\mathbf{y})$ and the zero vector with probability $1-\phi(\mathbf{x},\mathbf{y})$ and that would still guarantee orthogonality (again, relying on trivial orthogonality in the second case). – Ben Oct 04 '22 at 08:20
  • If we run a ridge or LASSO regression with such a small penalty that the OLS solution has a norm less than the constraint, then the ridge/LASSO solution would coincide with the OLS solution, right? Then would it be fair to say that such a penalized estimate also gives the orthogonality? Would that fit with your answer because such a solution also is the OLS solution? – Dave Dec 05 '22 at 13:48
  • @Dave: Yes, I think that would be right, at least approximately. – Ben Dec 05 '22 at 20:51
  • @Ben Why only approximately? I’m curious what your reservation is. – Dave Dec 05 '22 at 20:57
  • @Dave: I mean "approximately" because your stipuated condition that the OLS solution has a norm less than the constraint is a condition on the data so it cannot be guaranteed to hold under the regression model. What you could reasonably do is to set the constraint so softly that for a set of random data under the model most of the time the OLS solution has a norm less than the constraint. If you do that, you will still find that there is some non-zero probability of data that breach this requirement. So the LASSO estimator would not be identical to the OLS estimator. – Ben Dec 05 '22 at 21:08
  • 1
    @Ben Interesting, I hadn’t thought of it like that. – Dave Dec 05 '22 at 21:14