Which Theories of Causality Should I know?

Question

Which theoretical approaches to causality should I know as an applied statistician/econometrician?

I know the (a very little bit)

Neyman–Rubin causal model (and Roy, Haavelmo etc.)
Pearl's Work on Causality
Granger Causality (though less treatment-oriented)

Which concepts do I miss or should I be aware of?

Related: Which theories are foundations for causality in machine learning?

I have read these interesting questions and the answers (1, 2, 3) but I think is a different question. And I was very surprised to see that "causality", for example, is not mentioned in Elements of Statistical Learning.

Check out Andrew Gelman's review of several works on causality in AJS: Gelman, A. (2011). Causality and Statistical Learning. American Journal of Sociology, 117(3), 955-966. doi:10.1086/662659. It is a short overview of causality in the social science with specific references to Rubin's and Pearl's works, as well as some others. A good place to scour the references. — paqmo, Dec 05 '16 at 15:05
To begin with, (John Stuart) Mill's methods. https://en.wikipedia.org/wiki/Mill's_Methods — noumenal, Dec 05 '16 at 16:21
See my comment under the accepted answer regarding possible misinterpretation of Granger causality there. — Richard Hardy, Nov 06 '18 at 07:45

score 29 · Accepted Answer · edited Jul 05 '22 at 00:47

Strictly speaking, "Granger causality" is not at all about causality. It's about predictive ability/time precedence, you want to check whether one time series is useful to predict another time series---it's suited for claims like "usually A happens before B happens" or "knowing A helps me predict B will happen, but not the other way around" (even after considering all past information about $B$). The choice of this name was very unfortunate, and it's a cause of several misconceptions.

While it's almost uncontroversial that a cause has to precede its effect in time, to draw causal conclusions with time precedence you still need to claim the absence of confounding, among other sources of spurious associations.

Now regarding the Potential Outcomes (Neyman-Rubin) versus Causal Graphs/Structural Equation Modeling (Pearl), I would say this is a false dilemma and you should learn both.

First, it's important to notice that these are not opposite views about causality. As Pearl puts it, there's a hierarchy regarding (causal) inference tasks:

Observational prediction
Prediction under intervention
Counterfactuals

For the first task, you only need to know the joint distribution of observed variables. For the second task, you need to know the joint distribution and the causal structure. For the last task, of counterfactuals, you will further need some information about the functional forms of your structural equation model.

So, when talking about counterfactuals, there's a formal equivalency between both perspectives. The difference is that potential outcomes take counterfactual statements as primitives and in DAGs counterfactuals are seen as derived from the structural equations. However, you might ask, if they are "equivalent", why bother learning both? Because there are differences in terms of "easiness" to express and derive things.

For example, try to express the concept of M-Bias using only potential outcomes --- I've never seen a good one. In fact, my experience so far is that researchers who never studied graphs aren't even aware of it. Also, casting the substantive assumptions of your model in graphical language will make it computationally easier to derive its empirical testable implications and answer questions of identifiability. On the other hand, sometimes people will find it easier to first think directly about the counterfactuals themselves, and combine this with parametric assumptions to answer very specific queries.

There's much more one could say, but the point here is that you should learn how to "speak both languages". For references, you can check out how to get started here.

Could you provide an example of something that is easy to express in terms of POs, but not in DAGs? — Guilherme Duarte, Jan 03 '18 at 11:47
@GuilhermeDuarte mediation quantities involving nested counterfactuals for instance — Carlos Cinelli, Jan 03 '18 at 14:11
I think your view of Granger causality (at least as laid out here) is a bit imprecise. Granger causality $A\xrightarrow{Granger} B$ means that $A$ has added predictive power when the history of $B$ itself is used for predicting $B$. Thus roosters do not Granger-cause sunrise since they do not improve upon the prediction of sunrise based on historical data on sunrises. — Richard Hardy, Nov 06 '18 at 07:42
I think Granger causality does not suggest using inferior predictive models with only the history of B to justify the need for an additional variable A and hence Granger causality. Rather, one would ideally aim for as good a model as possible using own history of B and then see if adding A (in some form) helps predict B. And of course, "a perfect rooster" is a rather utopian concept. Given this, I think editing the answer to reflect this might be a good idea. — Richard Hardy, Nov 06 '18 at 17:39
Your first paragraph with the two example claims still reflect the original idea with the rooster. It seems to me the qualifier "beyond the history of B itself" is missing there. — Richard Hardy, Nov 06 '18 at 18:15
+1. I have found the potential outcomes framework easier to present to people with no science background, the "what-if" notion is easy to understand even if you never went to school. On the other hand, talking with more experience colleagues, DAG/Causal Graphs gives a nice way to formalise/visualise assumptions and investigate the dynamics of the overall systems. — usεr11852, May 10 '20 at 12:34

Which Theories of Causality Should I know?

1 Answers1

Linked