How to think about counterfactuals?

Question

I am trying to wrap my mind around causal inference, mostly because I think it will be useful for my work. I read the Book of Why, I read "Causal inference in statistics: A primer", I did an online course on that. As far as the maths in the primer goes, I can follow it without a great effort (but maybe I am fooling myself).

My problem is bridging gap between theory and simplistic examples and actual data. For simple topics, like DEGs, d-separation, backdoor paths, colliders I could immediately see parallels to my research and find use for them. When it comes to the do() operator, the adjustment formula, g-methods – I think I see how all that is useful but I am still a bit insecure about how to apply it to real data. Which is fine, I will figure that out.

However, when it comes to counterfactuals, I am completely lost. They just don't make sense to me. To wit, in the "Primer", the authors write:

From this example, the reader may get the impression that counterfactuals are no different than ordinary interventions, captured by the do-operator.

Yes, that is the impression that I am getting. The authors helpfully explain:

Note, however, that, in this example we computed not merely the probability or expected value of $Y$ under one intervention or another, but the actual value of $Y$ under the hypothesized new condition $X = x$.

I don't really see that. How is that different from calculating the predicted $y$ for a particular value of $x$ in a simple regression model?

The authors define a counterfactual like this:

$$Y_x(u) = Y_{M_x}(u)$$

where $M_x$ is a "modified version" of the model $M$, with the equation for $X$ replaced by $X = x$. They do not give a strict definition of the $\text{do}()$ operator, but on p. 55 of the "Primer" they write:

In notation, we distinguish between cases where a variable $X$ takes a value $x$ naturally and cases where we fix $X = x$ by denoting the latter $\text{do}(X = x)$.

(Whatever "fix" is supposed to mean). This is where I need to ask for your help! How are counterfactuals useful and how should I understand them? How do they differ from calculating $y$ for a given value of $x$, or from the $\text{do}()$ operator?

Totally agree with patagonicus. The questions you're asking are great questions, but the answers are book-long. They do define the do operator. $\operatorname{do}(X=x)$ means that you force the random variable $X$ to take on the specific value $x.$ See https://stats.stackexchange.com/questions/529899/how-to-interpret-pearls-do-notation/529901#529901. Counterfactuals are of the essence of causality, because in the end, you define "A causes B" by saying that, had A not happened, B would not have happened - a counterfactual. — Adrian Keister, Oct 17 '22 at 13:29
"Counterfactuals are of the essence of causality, because in the end, you define "A causes B" by saying that" No, I don't. But we need not dwell on this, since my question is not about philosophy: as stated above, it is about practical applications. I read chapter 4 and I am still dumb. Maybe I should leave it at that ;-) — January, Oct 17 '22 at 13:34
@patagonicus It appears that, quite possibly, I lack the necessary brains to understand chapter 4, since it must be quite obvious from my question that reading chapter 4 (which I even quote!) is what prompted me to ask the question in the first place. — January, Oct 17 '22 at 13:36
I found Chapter 4 pretty tough slogging, as well. Some points stood out to me: 1. The 3-step process for computing a Counterfactual (abbr. CF) is more general than interpolation or extrapolation from a regression model, since the functional form of the CSMs are allowed to be about anything. 2. CFs are about individuals, usually, as opposed to stats in general which is primarily about groups. 3. The 3-step CF process finally did make sense to me after awhile. Lifeline is the example in Primer Section 4.2.3. — Adrian Keister, Oct 17 '22 at 14:18
@AdrianKeister ad (2) I do not understand this. I have seen it mentioned several times, but I don't get it. Sounds not very informative. It is like saying "calculating a regression model is about groups and using it to calculate $y$ based on a particular $x$ is about individuals". Sort of true, but rather meaningless, or am I missing some subtlety? — January, Oct 17 '22 at 15:02

DaSim · Answer 1 · 2022-10-26T12:36:35.567

How are counterfactuals useful and how should I understand them?

Counterfactuals are useful in situations for which you observed a set of parameters and you want to reason about other scenarios that are in contradiction with the actual one. They are used for studying individual cases, as opposed to do-operators that are used for studying average effects by keeping all the variables in the network fixed (you don't change their values) and by just setting $X$ to $x$. I'll leave some references at the end of the question that explain this in a better way.

I'll try to be more specific on your case by answering your other question in which you were referring to a regression model.

How do they differ from calculating $y$ for a given value of $x$, or from the $do()$ operator?

So, let us suppose you have a regression model that predicts values of $y$ given an observed input $x$.

Let me first point out that such model, as other machine learning models, captures only correlations and not causality because you train it with a dataset of observed features and related target values so you may have unobserved counfunders making you predict spurious correlations.

But let us suppose that your model is somehow able to learn interventional probabilities $p(y|do(X=x))$ instead of observational probabilities $p(y|x)$$^1$. Let us also suppose to be in the confounded scenario of the following image.

Let us suppose that we observe $X=x_1$. By performing an intervention using your model you are able to get the average value of $y$ after imposing $do(X=x_2)$ (that is $p(y|do(X=x_2))$) which may differ from the counterfactual value of $y$ had $X$ been $x_2$ (that is $p(y|X=x_1,do(X=x_2))$) because in the second case you exploit the extra information you get from your specific observation to get infos on the value of the unobserved variable $U$ as well. In particular, counterfactuals require to perform 3 steps:

Abduction: update the probability of unobserved factors $P(u)$ exploiting the current observation $P(u|e)$
Action: Perform the intervention in the model (that is $do(X=x_2)$)
Prediction: Predict the value of $Y$ in the modified model. Please note that this step exploits updated probabilities from the previous 2 points, that is not as performing just the intervention.

In my opinion this was a great question, I had to dig into different resources to try to answer it so I'll leave my references here, maybe they can complement my answer.

I found the first answer very useful (especially the example) to get the differences between do-notation and counterfactuals. I'd suggest you to try to run my example on the data tables provided on the first answer.
Judea Pearl's twitted about the difference between counterfactuals and do-operations.

$^1$ For the sake of completeness, there should be in the literature some models able to capture interventional probabilities if provided with interventional data.

How to think about counterfactuals?

1 Answers1