Conditional expectation to define causal effect

Question

I'm reading these notes which are discussing the NRCM approach to analyzing causal relationships, that is to say, treats the causal inference problem like a missing data problem (where the missing data are the counter-factual events). So suppose that we use a binary treatment $D_{i}=0,1$ and we have a measurable effect $Y_{i}=Y_{i}(1)D_{i}+Y_{i}(0)(1-D_{i})$, and we define the causal effect of treatment to be $\tau_{i} = Y_{i}(1)-Y_{i}(0)$, and we assume independence and positivity,

$$ \big( Y_{i}(0),Y_{i}(1) \big) \perp D_{i}$$

$$ 0 < P(D_{i}=0) < 1$$

Then we get

$$ E[\tau_{i}] = E[Y_{i}(1)] - E[Y_{i}(0)] = E[Y_{i}|D_{i}=1] - E[Y_{i}|D_{i}=0]$$

and I think I understand the logic here: Because of the independence assumption, we can conditionalize on $D_{i}=0$ or $1$ and the expectation of $Y_{i}(0)$ should be unchanged and likewise for $Y_{i}(1)$.

However, later in the notes it claims that, if we don't have the independence assumption, we could instead derive

$$E[Y_{i}|D=1] - E[Y_{i}|D=0] = E[Y_{i}(0)+\tau_{i}|D_{i}=1]-E[Y_{i}(0)|D_{i}=0]$$

$$ = E[\tau_{i}|D_{i}=1]+\big( E[Y_{i}(0)|D_{i}=1] - E[Y_{i}(0)|D_{i}=0] \big)$$

which is identified as the average treatment effect plus the selection bias.

My questions: Why is it that $E[Y_{i}|D_{i}=0]=E[Y_{i}(0)|D_{i}=0]$? Didn't we need the independence assumption to get this? Or is this just because, conditional on $D_{i}=0$, we get $Y_{i} = Y_{i}(0)$?

Also, whatever the answer to that question is, why does it not also imply that $E[Y_{i}|D_{i}=1]=E[Y_{i}(1)|D_{i}=1]$? It's my impression that, if independence fails, we shouldn't get the result that $E[\tau_i]=E[Y_i|D_i=1]−E[Y_i|D_i=0]$. This would mean that you could conduct a test using a non-independent treatment and still just average the results of treatment and non-treatment and compute the difference. But intuitively, that shouldn't be correct. However, since we know that $E[\tau_i]=E[Y_i(1)−Y_i(0)]=E[Y_i(1)]−E[Y_i(0)]$ since expectation is linear, then we shouldn't be able to always equate this with $E[Y_i|Di=1]−E[Y_i|D_i=0]$. So I'm trying to figure out exactly where it is that the failure of independence interrupts the usual proof, and I was assuming it would be at the step where $E[Y_i|D_i=1]$.

It has been suggested that this question would likely get more attention and possibly an answer on [stats.se]. If you consider that a promising option and request a migration there, we'd be happy to oblige. — , Feb 20 '15 at 19:19
Re last question: why do you think $\mathbb{E}(Y_i \mid D_i = 1) = \mathbb{E}(Y_i(1) \mid D_i = 1)$ does not hold? — Juho Kokkala, Feb 24 '15 at 19:38
@JuhoKokkala it's my impression that, if independence fails, we shouldn't get the result that $E[\tau_{i}] = E[Y_{i}|D_{i}=1]-E[Y_{i}|D_{i}=0]$. This would mean that you could conduct a test using a non-independent treatment and still just average the results of treatment and non-treatment and compute the difference. But intuitively, that shouldn't be correct. — Addem, Feb 25 '15 at 02:44
However, since we know that $E[\tau_{i}] = E[Y(1)-Y(0)] = E[Y(1)]-E[Y(0)]$ since expectation is linear, then we shouldn't be able to always equate this with $E[Y|D_{i}=1] - E[Y|D_{i}=0]$. So I'm trying to figure out exactly where it is that the failure of independence interrupts the usual proof, and I was assuming it would be at the step where $E[Y_{i}|D_{i}=1]$. — Addem, Feb 25 '15 at 02:46
I edited your last two comments into the question, please edit further if you think this edit does not capture your intent. — Juho Kokkala, Feb 25 '15 at 06:41

Juho Kokkala · Accepted Answer · 2015-02-25T06:50:21.473

Your explanation 'because conditional on $D_i=0$ we get $Y_i = Y_i(0)$' is correct.

To further convince ourselves, we can also obtain this by directly substituting $Y_i = D_i\,Y_i(1) + (1-D_i\,Y_i(0))$ and applying linearity of expectation: \begin{equation} \mathbb{E}(Y_i \mid D_i=0) = \mathbb{E}(D_i\,Y_i(1) + (1-D_i)\,Y_i(0)) \mid D_i=0) \\= \mathbb{E}(D_i\,Y_i(1) \mid D_i=0) + \mathbb{E}((1-D_i)\,Y_i(0) \mid D_i=0) \end{equation} and now, since $D_i$ and $1-D_i$ are determined by the conditioning event, they can be treated as constants: \begin{align} =& 0\,\mathbb{E}(Y_i(1)\mid D_i = 0 ) + (1-0)\,\mathbb{E}(Y_i(0) \mid D_i = 0) \\=& \mathbb{E}(Y_i(0) \mid D_i = 0). \end{align}

Why the proof fails in the non-independence case

The aforementioned logic holds also for $\mathbb{E}(Y_i \mid D_i = 1) = \mathbb{E}(Y_i(1) \mid D_i = 1)$, and therefore we get \begin{equation} \mathbb{E}(Y_i \mid D_i = 1) - \mathbb{E}(Y_i \mid D_i = 0) = \mathbb{E}(Y_i(1) \mid D_i = 1) - \mathbb{E}(Y_i(0) \mid D_i = 0). \end{equation} Furthermore, you correctly show that \begin{equation} \mathbb{E}(\tau_i) = \mathbb{E}(Y_i(1)) - \mathbb{E}(Y_i(0)). \end{equation} However, this does not prove the desired equality - note that the right hand sides of the two equations above are different. When independence of $Y_i(1)$ and $D_i$ does not hold, $\mathbb{E}(Y_i(1))$ is possibly not equal to $\mathbb{E}(Y_i(1) \mid D_i = 1)$. Intuitively: this is since $D_i$ contains information about the value of $Y_i(1)$.

Conditional expectation to define causal effect

1 Answers1

Why the proof fails in the non-independence case