I am trying to understand how DAGs and potential outcomes look together. I came across these excellent posts (here and here, but I am trying to understand how this looks in a real world example. There are not a lot of DAG examples in my field (International Relations), so I'll use a study on military experience and international conflict initiation as an example.
Study finds that ex-military leaders (T) are more likely to start conflicts (Y). So the treatment is veteran leader, or rather the selection of a veteran leader. Of course, the reasons for T=1 may be different than for T=0, so they control for a wide range of relevant confounders (C*). However, these are all measured with some error and are essentially just proxies for C (international political factors). Also, domestic politics (D) is probably an unobservable confounder that causes states to selects veterans and initiate conflict. And there is also an unobservable mediator between X and T - the % of veterans in population (V). So far, the true causal effect of T is unidentifiable, but thats expected given the nature of the data. Not a problem since we are still interested in the effect of T, just removing as much bias as we possibly can by controlling for C*.
So here is the DAG I've got so far. I think its right at this point.
The potential outcomes part is where I am having difficulty incorporating this into a DAG. For assignment of T, there are four potential outcomes:
- Could have selected a veteran leader but did not (Pr[T = 1] > 0)
- Could have selected a non-veteran leader but did not (Pr[T = 0] > 0)
- Could never have selected a non-veteran leader (Pr[T = 1] = 0)
- Could never have selected a veteran leader (Pr[T = 0] = 0)
For example, consider elections in State A and State B. If say, 5/5 of the candidates were veterans in state A and 0/5 in state B, then obviously T=0 for State A and T=1 for state B could never have happened. For the sake of argument, assume that D and V do not cause veterans to be more likely to be chosen over non-veterans where there is say, 3/5 candidates. So what this means is that D and V affect whether treatment assignment is even possible, not necessarily treatment assignment. And if, say, State A were historically more conflict-prone, then this would entirely explain why veterans appeared more likely to initiate conflict.
So, if we were to only look at elections where T=1 or T=0 was even possible, is this perhaps the better way to reduce bias and close the backdoor paths from T to Y? How does this look in a DAG? Since selection is a common effect of D and V, it seems like this is somehow conditioning on a collider. But theoretically it makes sense to me.
