0

If there are four variables each with five observations:

X1 X2 X3 X4
5  3  0   2
0  9  -9  0
3  1  0   2
7  3  0   4
5  2  0   3

Why can't you regress a dependent variable Y on X1, X2, X3, and X4? I have a feeling it is because of collinearity and it violates full rank assumption, but could someone explain why in greater detail please?

whuber
  • 322,774
Jack
  • 61
  • 2
    You can perform this regression. Why not? When there's collinearity that only means you have redundant variables. Removing them won't change the fit. What you cannot accomplish in the presence of collinearity is estimate every coefficient: some, maybe all of them, will not be definite values. – whuber Dec 01 '22 at 20:47
  • mathematically you have to invert the matrix $X^TX$. Failing that, you can use a pseudoinverse, but the solution is not unique. – AdamO Dec 01 '22 at 20:52
  • 1
    @AdamO One does not need to invert that matrix. That presupposes applying a particular algorithm. The Normal Equations only require you to find a solution to $X^\prime X\beta = X^\prime y$ -- and, mathematically, that does not necessitate inverting the matrix. Moreover, the best numerical algorithms avoid inverting $X^\prime X$ anyway: it's numerically less stable and inefficient. – whuber Dec 01 '22 at 21:01
  • 1
    @whuber this is true. We have better solutions (computationally) yet subject to the same constraints - "constraints" only in the sense that you might require uniqueness of solutions. There's only so many ways to explain it short of a proper full linear algebra course. – AdamO Dec 01 '22 at 21:16

0 Answers0