What's wrong with this argument that the parameter $\beta$ is always identified in the linear regression model?

Question

If we have a linear regression model $y=X\beta + e$, then $E(y)=E(X\beta)+E(e)=E(X)\beta + 0$ Therefore $$E(X)\beta = E(y)$$

Doesn't this pinpoint the value of $\beta$, assuming that the sample size $n$ is larger than the amount of explanatory variables $k$ in $X$?

Hence $\beta$ is always identifiable (every probability distribution yields a specific value of the parameter), regardless of whether there is correlation with the error term.

This doesn't seem right. Where am I going wrong?

Assuming you're conditioning on $X$, $E(X)=X$, but how are you inverting $X$ (or $E(X)$) there? — Glen_b, Nov 29 '17 at 09:41
consider $X = (1,1,1)^T$ and $Y=(2,3,7)^T$. What's $\beta$? If you look at the first observation, it's $2$, if you look at the second observation, it's $3$, if you look at the third observation it's $7$.... — Glen_b, Nov 29 '17 at 09:56
More to the point, y is random by virtue of the $\epsilon$, but X is not random. So while it is true that $\mathbb{E}y=X\beta$, the expectation on your X doesn't buy you anything. So now you're left inverting a $n\times d$ matrix (typically with a pseudo inverse) which is what you do in OLS. — David Kozak, Nov 29 '17 at 15:15
The basic problem for identifiability is that $E[X]$ (or $X$ itself for those who don't view it as a random variable) might not have full rank. In the example by @Glen_b, $\beta$ is identifiable (and OLS estimates it as $(2+3+7)/3=4$). But suppose that $$X=\pmatrix{1&1&0\1&1&0\1&0&1\1&0&1}$$ for $y=(2,3,7,16)^\prime$. Now $\beta$ cannot be identified because the parameter vectors $$\beta(t)=(\beta_1+t,\beta_2-t,\beta_3-t)^\prime$$ give the same values of $X\beta(t)$ for all real numbers $t$ and there is no basis to select among those possibilities. — whuber, Nov 29 '17 at 16:53
@whuber in the usual iid setting, $E[y] = E[X]\beta$ defines just one equation and $E[X]$ is just a matrix with $n$ identical lines. — Carlos Cinelli, Jan 09 '18 at 21:07
@Glen_b your case is precisely the case where the condition is enough for identification --- you have only one covariate in $X$. Then $\beta$ is simply the mean of $y$ divided by the mean of $x$, as pointed out by whuber. — Carlos Cinelli, Jan 09 '18 at 21:20
Try to understand the precise definition of identifiabillity before making any claim.You may also want to look at this answer. — Zhanxiong, Jan 09 '18 at 22:51
@Carlos The point of my question in comments was to get the OP to recognize a problem with the way that the question was set up. The issue of identifiability was being confused by introducing different issue of understanding. I hoped to get an improved question. — Glen_b, Jan 09 '18 at 23:04
@Carlos Cinelli Sorry for any confusion, but I made this suggestion to OP, not to you :) — Zhanxiong, Jan 09 '18 at 23:22
@Zhanxiong, identifiabillity essentially means that there is a bijection between the parameter space and the space of probability distributions over observables that is consistent with the model, correct? That is at least how I've been thinking about it. — user56834, Jan 10 '18 at 04:25

Carlos Cinelli · Accepted Answer · 2018-01-10T01:40:37.900

The condition $E[Y] = E[X]\beta$ defines just one equation. Since we are dealing with population quantities here (the expectation), the number of observations is irrelevant, but just to make ideas clear, imagine that if you had $n$ observations then: $E[X]$ is a matrix with $n$ identical lines and $p$ columns.

To illustrate this in a simple example, imagine $E[y] = 10$ and that you have three covariates with $E[x_1] =1$, $E[x_2]=2$, and $E[x_3]=3$. Your condition means $1\beta_1 + 2\beta_2 + 3\beta_3 = 10$. That is, you have only one equation for three parameters, rendering the vector $\beta$ not identifiable, because several vectors $\beta = (\beta_1, \beta_2, \beta_3)$ are consistent with the restriction $1\beta_1 + 2\beta_2 + 3\beta_3 = 10$.

So to sum up, several different values of $\beta_j$ satisfy the sum $\sum_{j =1}^p\beta_j E[x_j] = E[y]$ when $p>1$ and that's why $\beta$ is not identifiable imposing only the assumption $E[\epsilon] = 0$. Some comments mentioned that people usually assume $X$ to be fixed. In econometrics one usually doesn't, but if you do assume $X$ is fixed, then $E[X]$ has no meaning, and assuming $E[\epsilon] =0 $ with a fixed $X$ essentially means assuming $E[\epsilon|X] = 0$ which does render $\beta$ identified, as explained in this other question.

Finally, it's worth making a distinction about observational and structural quantities. Provided you have enough data, the linear projection $\beta^{OLS} = (X'X)^{-1}X'y$ is always estimable (assuming you don't have variables that are linear combination of the other, so you can invert $X'X$). This is an observational quantity and you can always get it from the data, so it doesn't make sense to talk about its identifiability, as again explained in this other question.

What's wrong with this argument that the parameter $\beta$ is always identified in the linear regression model?

1 Answers1