0

Imagine for the sake of simplicity that I am regressing $Y$ on $X$ with the model

$Y = \beta_0 + \beta_1X + \epsilon$

Now imagine that observations on my $X$ are constant, e.g. take $Y = \{2, 7, 9\}$ and $X = \{3, 3, 3\}$. In this case, I am regressing $Y$ against a constant predictor. What is the statistical implication of this? Specifically, which of the assumptions of OLS am I breaking and Y does lm function in R produce NAs for the estimate of the coefficient $\beta_1$.

1 Answers1

0

The estimated coefficient $\hat{\beta}_1$ is calculated as follows (using lm):

$$ \hat{\beta_1} = \frac{\sum_{i = 1}^n (x_i-\bar{x})(y_i-\bar{y})}{\sum_{i = 1}^n (x_i-\bar{x})^2}$$

You will get 0 in the denominator for constant x, and the fraction is not defined, so R gives you a NA output.

kajsam
  • 601
  • 1
    This is a little misleading, because R does estimate $\beta_0.$ One problem is that this standard formula applies only for full-rank model matrices--but that does not mean the problem has no solution. In fact, it means the opposite: there is an entire manifold of solutions and R selects one (somewhat arbitrarily) from that set. – whuber Mar 23 '21 at 18:38
  • Yes, I agree that it is not the actually the 0 denominator but the rank that is the problem. But I didn't mean to say that there is no solution. – kajsam Mar 23 '21 at 19:02