4

I have been confused by the concept of parameter identification for a while now. I am asking this question mainly to test my understanding.

Here is the linear regression model: $$y=X\beta + \epsilon$$

Let's say that $\beta$ refers to any value of the parameter vector, and that $\beta_0$ refers to the "correct" value.

And consider these moment equations: $$ \begin{align} &E(X_i\epsilon_i)\equiv E(X_i(y_i-X_i^T\beta))&=0\quad \quad(1)\\ &E(y-X\beta)&=0\quad \quad(2)\\ &E(y-X\beta|X)&=0\quad \quad(3)\\ \end {align} $$

I have the following questions:

  1. Do these "moment equations" (if that is the right term for them), each uniquely identify a $\beta$? (Though this $\beta$ does not necessarily equal $\beta_0$). My intuition suggests that $(3)$ identifies the linear projection $\beta$.

  2. Do they identify the same $\beta$ always?

  3. More generally, I am not entirely sure what assumptions we need to make for these moment equation to identify $\beta$. Will these equations identify a unique $\beta$ regardless of the joint distribution of $y$ and $X$? For example, let's say that $E(x_i\epsilon_i)\neq 0$, or that $E(\epsilon|X)\neq 0$, then do the equations still identify a $\beta$ (although perhaps not $\beta_0$)?

user56834
  • 2,809
  • Does $x$ refers to $X$ in your equation $(1)$? If not, to what, some arbitrary weights? – jbowman Jan 08 '18 at 14:46
  • Sorry that was a mistake, it is supposed to capture the orthogonality condition – user56834 Jan 08 '18 at 15:42
  • In standard statistics language, we do not say a "parameter" is "identified", instead, we say a "parameter" is "estimable", or a "model" is "identifiable". – Zhanxiong Jan 10 '18 at 04:17
  • By the way, @Programmer2134, you might want to check this paper by Pearl, it might help shed some light on identification http://ftp.cs.ucla.edu/pub/stat_ser/R207.pdf – Carlos Cinelli Jan 10 '18 at 05:06

1 Answers1

4

One first clarification we need here is that identification refers to structural parameters, it means the ability of identifying these parameters with observed quantities. If you have several parameters that are observationally equivalent, then the parameters are not identified. I think the initial discussion in this paper by Pearl might help with some of the ideas. For now let's assume $n = \infty$ to avoid the discussion of sampling issues, which is not the main concern here.

The linear projection of $y$ on $X$ is always "identified" since this is a property of the data. You don't need to make any assumptions about the error term to "identify" the linear projection, it's the opposite, the linear projection defines its error term. Hence $\beta^{OLS} = (X'X)^{-1}X'y$ is what it is regardless of what assumptions you make about the structural process. And you can also write $y = X\beta^{OLS} + \epsilon^{OLS}$ and $X'\epsilon^{OLS} = 0$ by the properties of OLS. So you can always get $\beta^{OLS}$ (provided (X'X) is invertible) but that doesn't mean it represents anything structurally meaningful.

Likewise, the conditional expectation $E[y|X]$ is always "identified" since it's just a property of the data. You may mispecify the conditional expectation and estimate it incorrectly (for example, assuming it's linear when it's not), but there's no identification problem. For instance, nowadays, given enough data, we actually have universal approximation algorithms that theoretically could be able to estimate this quantity (on a bounded domain).

Thus, identification is neither about the best linear approximation nor the "true" conditional expectation. It is about structural quantities. The question here is, assuming $y = X\beta + \epsilon$ is structural, what do we have to assume about $\epsilon$ to be able to identify $\beta$?

In your case, both assumptions (1) and (3) work for identification. For (1), multiply each side of the equation by $X$ to obtain:

$$ E[X'y] = E[X'X]\beta + E[X'\epsilon] = E[X'X]\beta $$

Then, just solve for $\beta$. The case of (3) is also straightforward. Taking the expectation conditional on $X$

$$ E[y|X] = X\beta + E[\epsilon|X] = X\beta $$

Thus, we see that the structural conditional expectation equals the observed conditional expectation. Furthermore, $E[X'\epsilon] = E[E[X'\epsilon|X]] = E[X'E[\epsilon|X]] = 0$, and you can actually solve for $\beta$ as before. Assumption (2) is not enough, since it only imposes $E[y] = E[X]\beta$: if you have more than one dimension on $X$, several combinations of the parameter can give you the same expected value, that's why the structural parameters are not identified.

  • Thank you this is clarifying. I just have one main question: You said that identifiability is a property of structural parameters. But it seems to me that there is nothing special about structural parameters when it comes to the question of identification: "is there a unique probability distribution over observables for each parameter value?". We can ask this question for any parameter, correct? whether it is a structural one or not. As you say, we can ask this question about the linear projection. (I think the "parameter" we would be identifying is $\beta + E(x_ix_i^T)E(x_i\epsilon)$ then? – user56834 Jan 10 '18 at 05:09
  • 1
    @Programmer2134 for observational quantities, it makes more sense to talk about estimability, not identification, because observed quantities are always "identified". That is, $\beta^{OLS}$, provided it's estimable, is always possible to be obtained from the data. However, a causal effect such as, say, in a linear causal model, $\frac{\partial E[Y|do(X)]}{\partial X} = \beta$, is not always identifiable: sometimes even with infinite data you can't estimate it. – Carlos Cinelli Jan 10 '18 at 05:12
  • cntd... Since that is the limiting value of $\hat \beta_{OLS}$. – user56834 Jan 10 '18 at 05:13
  • "That is, $\beta^{OLS}$, provided it's estimable, is always possible to be obtained from the data". But just to be clear, $\beta^{OLS}$ is not a parameter, but a "sample statistic" or whatever you want to call it. So it would not make sense to talk about the identifiability of $\beta^{OLS}$, since identifiability is a property of "population" quantities (parameters). But can't we talk about the identifiability of an arbitrary non-structural (population) parameter, such as $\beta + E(x_ix_i^T)E(x_i\epsilon)$? We could even define $\beta_{pleh} = (1 ... 1)^T$ and ask whether it is identifiable? – user56834 Jan 10 '18 at 05:18
  • Meaning, we would have $y=X\beta_{pleh}+\epsilon$. where $\epsilon$ is defined such that this is correct. Then we could ask, "can we identify $\beta_{pleh}$?". Obviously, this would be ridiculous, but I am just testing whether it is correct that "identification" is not necessarily structural. – user56834 Jan 10 '18 at 05:20
  • @Programmer2134 $\beta^{OLS}$ is a population quantity, it's the population value of the coefficient of the linear projection. For any finite sample $n$, $\hat{\beta}^{OLS}$ is a sample estimate of $\beta^{OLS}$. Regarding the structural vs not structural part, see if the second paragraph of this paper by Pearl addresses your concerns: http://ftp.cs.ucla.edu/pub/stat_ser/R207.pdf – Carlos Cinelli Jan 10 '18 at 05:23
  • I would summarize as this, whenever you are taking about parameters of a model and this parameter equates to some function of the observed joint distribution of the data only under some circumstances, that parameter is "structural". It might have different meanings, it doesn't need to be causal, for instance you could be estimating properties of latent factors, but what makes it structural is that this is a property of the model, not of the data, and it requires structural assumptions to "identify" this part of the model with some function of the data. – Carlos Cinelli Jan 10 '18 at 05:29
  • 2
    Ah I see, I thought by $\beta^{OLS}$ you meant $\hat \beta^{OLS}$. You are really helping me with this. I will read Pearl's paper before I continue. – user56834 Jan 10 '18 at 05:36