I am trying to wrap my mind around causal inference, mostly because I think it will be useful for my work. I read the Book of Why, I read "Causal inference in statistics: A primer", I did an online course on that. As far as the maths in the primer goes, I can follow it without a great effort (but maybe I am fooling myself).
My problem is bridging gap between theory and simplistic examples and actual data. For simple topics, like DEGs, d-separation, backdoor paths, colliders I could immediately see parallels to my research and find use for them. When it comes to the do() operator, the adjustment formula, g-methods – I think I see how all that is useful but I am still a bit insecure about how to apply it to real data. Which is fine, I will figure that out.
However, when it comes to counterfactuals, I am completely lost. They just don't make sense to me. To wit, in the "Primer", the authors write:
From this example, the reader may get the impression that counterfactuals are no different than ordinary interventions, captured by the do-operator.
Yes, that is the impression that I am getting. The authors helpfully explain:
Note, however, that, in this example we computed not merely the probability or expected value of $Y$ under one intervention or another, but the actual value of $Y$ under the hypothesized new condition $X = x$.
I don't really see that. How is that different from calculating the predicted $y$ for a particular value of $x$ in a simple regression model?
The authors define a counterfactual like this:
$$Y_x(u) = Y_{M_x}(u)$$
where $M_x$ is a "modified version" of the model $M$, with the equation for $X$ replaced by $X = x$. They do not give a strict definition of the $\text{do}()$ operator, but on p. 55 of the "Primer" they write:
In notation, we distinguish between cases where a variable $X$ takes a value $x$ naturally and cases where we fix $X = x$ by denoting the latter $\text{do}(X = x)$.
(Whatever "fix" is supposed to mean). This is where I need to ask for your help! How are counterfactuals useful and how should I understand them? How do they differ from calculating $y$ for a given value of $x$, or from the $\text{do}()$ operator?
