1

In ordinary least squares (OLS), the best fit solution of input matrix $X$ (size $p \times N$ -- N samples and p features) and output vector $y$ (of size $N$) is

$\hat \beta = (X^T X)^{-1} X^T y$.

Assume now that I exclude row i from matrix $X$ and correspondingly $y$, i.e., exclude one sample from the observations. Let's call the new quantities $X_{-i}$ and $y_{-i}$ and solve the OLS problem again and similarly name the new solution $\hat \beta _{-i}$.

My question is if there is any relationship between the two solutions? Can I get from $\hat \beta $ to $\hat \beta _{-i}$ by some formula?

Thank you.

arash
  • 189
  • 2
    There's certainly a relationship between them. Without loss of generality, consider that the omitted row was the last one. You can now partition the original X and y into two parts, the retained and the omitted parts. Does that simplify things sufficiently for you to make progress? There are algorithms (and libraries) that allow for adding or removing observations which will recompute Choleski or QR decompositions from the smaller (or larger) set to the next one. However, observation downdating is not especially stable, so its use in a situation where you might have many downdates is risky. – Glen_b Dec 26 '23 at 13:36
  • LINPACK, for example, includes functions for downdating observations. See also the book by Golub and Van Loan. The R package rollRegres has functions for windowed regression that in turn calls such observation-downdate functions. It's not completely clear whether you're seeking something algebraic or computational. – Glen_b Dec 26 '23 at 13:40
  • @Glen_b Thanks for the hint and resources. I'll check the book and try to work it out myself, and then update/answer my question. The ideal case would be having an algebraic relation, but as I'm going to program it, a computational one is likely to be good enough. – arash Dec 26 '23 at 14:19
  • Generally speaking, don't use the algebraic relationship in code. – Glen_b Dec 27 '23 at 00:35

0 Answers0