5

I'm reading these notes which are discussing the NRCM approach to analyzing causal relationships, that is to say, treats the causal inference problem like a missing data problem (where the missing data are the counter-factual events). So suppose that we use a binary treatment $D_{i}=0,1$ and we have a measurable effect $Y_{i}=Y_{i}(1)D_{i}+Y_{i}(0)(1-D_{i})$, and we define the causal effect of treatment to be $\tau_{i} = Y_{i}(1)-Y_{i}(0)$, and we assume independence and positivity,

$$ \big( Y_{i}(0),Y_{i}(1) \big) \perp D_{i}$$

$$ 0 < P(D_{i}=0) < 1$$

Then we get

$$ E[\tau_{i}] = E[Y_{i}(1)] - E[Y_{i}(0)] = E[Y_{i}|D_{i}=1] - E[Y_{i}|D_{i}=0]$$

and I think I understand the logic here: Because of the independence assumption, we can conditionalize on $D_{i}=0$ or $1$ and the expectation of $Y_{i}(0)$ should be unchanged and likewise for $Y_{i}(1)$.

However, later in the notes it claims that, if we don't have the independence assumption, we could instead derive

$$E[Y_{i}|D=1] - E[Y_{i}|D=0] = E[Y_{i}(0)+\tau_{i}|D_{i}=1]-E[Y_{i}(0)|D_{i}=0]$$

$$ = E[\tau_{i}|D_{i}=1]+\big( E[Y_{i}(0)|D_{i}=1] - E[Y_{i}(0)|D_{i}=0] \big)$$

which is identified as the average treatment effect plus the selection bias.


My questions: Why is it that $E[Y_{i}|D_{i}=0]=E[Y_{i}(0)|D_{i}=0]$? Didn't we need the independence assumption to get this? Or is this just because, conditional on $D_{i}=0$, we get $Y_{i} = Y_{i}(0)$?

Also, whatever the answer to that question is, why does it not also imply that $E[Y_{i}|D_{i}=1]=E[Y_{i}(1)|D_{i}=1]$? It's my impression that, if independence fails, we shouldn't get the result that $E[\tau_i]=E[Y_i|D_i=1]−E[Y_i|D_i=0]$. This would mean that you could conduct a test using a non-independent treatment and still just average the results of treatment and non-treatment and compute the difference. But intuitively, that shouldn't be correct. However, since we know that $E[\tau_i]=E[Y_i(1)−Y_i(0)]=E[Y_i(1)]−E[Y_i(0)]$ since expectation is linear, then we shouldn't be able to always equate this with $E[Y_i|Di=1]−E[Y_i|D_i=0]$. So I'm trying to figure out exactly where it is that the failure of independence interrupts the usual proof, and I was assuming it would be at the step where $E[Y_i|D_i=1]$.

Juho Kokkala
  • 7,913
Addem
  • 489
  • It has been suggested that this question would likely get more attention and possibly an answer on [stats.se]. If you consider that a promising option and request a migration there, we'd be happy to oblige. –  Feb 20 '15 at 19:19
  • Re last question: why do you think $\mathbb{E}(Y_i \mid D_i = 1) = \mathbb{E}(Y_i(1) \mid D_i = 1)$ does not hold? – Juho Kokkala Feb 24 '15 at 19:38
  • @JuhoKokkala it's my impression that, if independence fails, we shouldn't get the result that $E[\tau_{i}] = E[Y_{i}|D_{i}=1]-E[Y_{i}|D_{i}=0]$. This would mean that you could conduct a test using a non-independent treatment and still just average the results of treatment and non-treatment and compute the difference. But intuitively, that shouldn't be correct. – Addem Feb 25 '15 at 02:44
  • However, since we know that $E[\tau_{i}] = E[Y(1)-Y(0)] = E[Y(1)]-E[Y(0)]$ since expectation is linear, then we shouldn't be able to always equate this with $E[Y|D_{i}=1] - E[Y|D_{i}=0]$. So I'm trying to figure out exactly where it is that the failure of independence interrupts the usual proof, and I was assuming it would be at the step where $E[Y_{i}|D_{i}=1]$. – Addem Feb 25 '15 at 02:46
  • I edited your last two comments into the question, please edit further if you think this edit does not capture your intent. – Juho Kokkala Feb 25 '15 at 06:41
  • And now edited my answer to handle this. – Juho Kokkala Feb 25 '15 at 06:50

1 Answers1

3

Your explanation 'because conditional on $D_i=0$ we get $Y_i = Y_i(0)$' is correct.

To further convince ourselves, we can also obtain this by directly substituting $Y_i = D_i\,Y_i(1) + (1-D_i\,Y_i(0))$ and applying linearity of expectation: \begin{equation} \mathbb{E}(Y_i \mid D_i=0) = \mathbb{E}(D_i\,Y_i(1) + (1-D_i)\,Y_i(0)) \mid D_i=0) \\= \mathbb{E}(D_i\,Y_i(1) \mid D_i=0) + \mathbb{E}((1-D_i)\,Y_i(0) \mid D_i=0) \end{equation} and now, since $D_i$ and $1-D_i$ are determined by the conditioning event, they can be treated as constants: \begin{align} =& 0\,\mathbb{E}(Y_i(1)\mid D_i = 0 ) + (1-0)\,\mathbb{E}(Y_i(0) \mid D_i = 0) \\=& \mathbb{E}(Y_i(0) \mid D_i = 0). \end{align}

Why the proof fails in the non-independence case

The aforementioned logic holds also for $\mathbb{E}(Y_i \mid D_i = 1) = \mathbb{E}(Y_i(1) \mid D_i = 1)$, and therefore we get \begin{equation} \mathbb{E}(Y_i \mid D_i = 1) - \mathbb{E}(Y_i \mid D_i = 0) = \mathbb{E}(Y_i(1) \mid D_i = 1) - \mathbb{E}(Y_i(0) \mid D_i = 0). \end{equation} Furthermore, you correctly show that \begin{equation} \mathbb{E}(\tau_i) = \mathbb{E}(Y_i(1)) - \mathbb{E}(Y_i(0)). \end{equation} However, this does not prove the desired equality - note that the right hand sides of the two equations above are different. When independence of $Y_i(1)$ and $D_i$ does not hold, $\mathbb{E}(Y_i(1))$ is possibly not equal to $\mathbb{E}(Y_i(1) \mid D_i = 1)$. Intuitively: this is since $D_i$ contains information about the value of $Y_i(1)$.

Juho Kokkala
  • 7,913