2

Consider the standard example for variables that are pairwise independent but joint dependent.

$$ (x,y,z)= \begin{cases} (0,0,0) & \text{probability 1/4} \\ (1,1,0) & \text{probability 1/4} \\ (1,0,1) & \text{probability 1/4} \\ (0,1,1) & \text{probability 1/4} \\ \end{cases} $$

enter image description here

I'm trying to represent $X, Y, Z$ in a causal diagram.

Certainly, such a distribution, if represented causally, would have latent confounders -- but I'm not sure what these would look like.

E.g. if one applies the Fast Causal Inference algorithm (Sec 3.2 of this paper) to these variables, all edges are immediately removed because they are all pairwise independent.

What's going on?

D.W.
  • 6,668
  • Cross-posted: https://math.stackexchange.com/q/4456135/14578, https://stats.stackexchange.com/q/576192/2921. Please do not post the same question on multiple sites. – D.W. Jul 11 '22 at 17:00
  • Please don't make trivial edits to posts to bump them. See https://meta.stackexchange.com/q/7046/160917, https://meta.stackexchange.com/q/125965/160917, https://meta.stackexchange.com/a/4409/160917, https://meta.stackexchange.com/help/privileges/edit. – D.W. Jul 11 '22 at 17:07
  • There are multiple causal diagrams possible here. For example, let X,Y be independent Bernoulli variables and let Z= XOR(X,Y) thus there are two arrows, one from X to Z and one from Y to Z. How did you apply those algorithms? Are these algorithms removing edges in case of pairwise independence? – Sextus Empiricus Jul 12 '22 at 11:49
  • In what sense your variables are are pairwise independent but joint dependent? It seems me that if all pairs are independent, variables are completely independent. No kind of dependencies can be found. – markowitz Jul 12 '22 at 16:56
  • @SextusEmpiricus I mean, that isn't causally faithful, right? – Abhimanyu Pallavi Sudhir Jul 13 '22 at 18:07

1 Answers1

2

Causal discovery for pairwise independent joint dependent variables

It is well known the mantra like "association do not imply causation". However, in general, we can said that "causation imply association" (*). The problem is to show what kind of associations are implied by some causal assumptions. It seems me that any causal theory, in uncertain framework, try to answer to this question. It seems me that Pearl theory give most reliable answers (related: Criticism of Pearl's theory of causality).

The paper you cited is based on this theory and all causal discovery algorithm presented are based on the concept of d-separation (or d-connection) or related ones. Indeed any associational implications of causal assumptions come from this concept.

E.g. if one applies the Fast Causal Inference algorithm (Sec 3.2 of this paper) to these variables, all edges are immediately removed because they are all pairwise independent.

What's going on?

d-connection deals with conditional or unconditional dependencies but in your example no unconditional (marginal) dependencies exist. Moreover note that even unobserved confounders produce (spurious) dependencies among declared variables; but this is not your case.

In most cases all dependencies (and causal effects) come from marginal ones (direct causal effects), but in your case no marginal dependencies exist. For this reason all edges disappear and FCI algorithm fail.

(*)My previous sentence can be questioned at logical level and seems that can be built some special counterexamples. However it seems me that, at most, only some but not all presumed associations disappear. Moreover the practical relevance of those examples seems me poors. Read here for a discussion: Does causation imply correlation?

I realized that your case match precisely the particular case I had in mind. I talk about Carlos Cinelli example given here. https://stats.stackexchange.com/q/301823 This case is quite special, I suspect that most common causal discovery algorithms cannot help you in such situations. The difference is that you do not have the DAG; you look for it but in your case any of the three variables can be put in the collider node. Observable data cannot help you.

Let me add something

I'm trying to represent X,Y,Z in a causal diagram.

Certainly, such a distribution, if represented causally, ... .

Take care, causal discovery algorithms can help you to bridge the gap but remember that causal structure, usually represented with DAGs or SEMs, is not a translation of joint distribution of observed variables. This is a clear violation of the demarcation line suggested by Pearl. This is a tremendous mistake (read previous link and those: Under which assumptions a regression can be interpreted causally? ; How would econometricians answer the objections and recommendations raised by Chen and Pearl (2013)?). What is true is that for a given causal structure, and a specified joint distribution for structural errors, we can deduce a joint distribution for endogenous variables involved. However, in general, more than one distribution for endogenous variables is compatible with a given causal structure. This can depend from variations in the distribution of errors. More important, several causal structures can share the same set of dependences (via d-connection/separation). Indeed, from this fact, emerge the concept of "Equivalent class". It proof mathematically, for the first time, the mantra "association do not imply causation".

Indeed your example is compatible with three DAGs, with independent exogenous errors. In particular you can put any variables in collision node. Those DAGs represent a Markov Equivalent Class.

markowitz
  • 5,519
  • What? By joint dependent, I mean that the joint distribution P(X, Y, Z) cannot be factored as P(X)P(Y)P(Z), even though each marginal P(X, Y) = P(X)P(Y). It's true that multiple causal models correspond to the same joint distribution, but on the converse, a given causal model can have only one joint distribution given by the Markov factorization. – Abhimanyu Pallavi Sudhir Jul 13 '22 at 18:17
  • No, the same SCM is compatible with many joint distributions. Indeed precise specification for exogenous variables is not mandatory. – markowitz Jul 13 '22 at 21:47
  • More precisely, in general a given causal structure is compatible with many joint distributions (distributions of endogenous variables: $X$ $Y$ and $Z$ in your case). Moreover, into this theory, Markov condition is related to causal structure and pose restrictions therein. It is not a pure statistical assumption, If you do not impose any causal structure so you cannot invoke it. – markowitz Jul 14 '22 at 10:55
  • Moreover, I can go wrong but it seems me that if condition like $P(X,Y)=P(X)P(Y)$ holds for any paris, even the factorization $P(X,Y,Z)=P(X)P(Y)P(Z)$ must hold. – markowitz Jul 14 '22 at 10:55
  • @markovitz I mean, the example of $X,Y,Z$ in my question is a counter-example to that (each pair $P(X,Y)$ is factorable but $P(X,Y,Z)$ is not). I'm confused as to what you mean by the same SCM being compatible with many joint distributions -- Markov factorization allows you to write the joint distribution as the product of all conditionals on parents. – Abhimanyu Pallavi Sudhir Jul 14 '22 at 22:26
  • Markov condition impose some restrictions on conditional independecies among declared variables, but it do not impose any form for distribution of structural errors, then no form for distribution of declared variables is imposed. More important, even under markovianity the same joint distribution (for declared variables) can be compatible with several DAGs, the concept of Markov Equivalent Class come from that. In general there are no biunivicity between causal structure and joint distribution. – markowitz Jul 15 '22 at 07:26
  • The same causal structure can induce several joint distribution and any joint distribution can be induced from several causal structure; this remains true even under markovianity. Moreover, it is important to remark that markovianity is a causal assumption; it regards the causal model. It is not an assumption regards observed joint distribution, the only thing you describe in the question. – markowitz Jul 15 '22 at 07:27
  • I know that the same joint distribution can be compatible with many DAGs -- I'm saying given a DAG and the conditionals of each variable on its parents, you can calculate the joint distribution by Markov factorization. By Markov factorization I'm not talking about the Markov condition, I mean the rule that $P(X_1,\dots X_n)=\prod_{i=1}^n{P(X_i\mid\mathrm{Parents}(X_i))}$ (see e.g. for this terminology). – Abhimanyu Pallavi Sudhir Jul 16 '22 at 09:37
  • It do not seem me that what I said contradict this result. More important, you don't give any DAG in the question. So I don't see the relevance of this point for your question. – markowitz Jul 17 '22 at 08:42
  • I'm saying that if all variables are pairwise independent, then the DAG must have all edges removed, but even the DAG with all edges removed gives rise to a different joint distribution, namely the product of the marginals $P(X)P(Y)P(Z)$ which is not equal to our joint distribution because the variables are not joint independent. – Abhimanyu Pallavi Sudhir Jul 18 '22 at 07:55
  • In particular, from the joint distribution above what dependencies you see? – markowitz Jul 18 '22 at 08:46
  • I edited my answer, I get the point I suppose. – markowitz Jul 18 '22 at 14:02
  • I don't think Carlos Cinelli's example and mine are identical. His example is that of a standard collider, wherein there actually are correlations between $X$ and $Y$, whereas in mine there is no marginal dependence. $X\rightarrow Y\leftarrow Z$ is not a causally faithful model for our purposes. – Abhimanyu Pallavi Sudhir Jul 19 '22 at 09:12
  • I doubts you had read carefully Cinelli’s example and my explanation. It is a collider example but far from standard one. It is so precisely because all marginal dependencies are absent. This case match the distribution you shown. The relevant differences between this example and your case are explained in my answer. – markowitz Jul 19 '22 at 10:11
  • Ok you're right, it is identical -- just transformed. This surprised me, though -- there really isn't a causally faithful model for this, huh? – Abhimanyu Pallavi Sudhir Jul 19 '22 at 19:47
  • As I said above, your distribution is compatible with three DAGs. With data alone you cannot said more. If you have some idea about the substantive meaning of variables maybe you can understand what variable there is in collision node. If you consider my answer exhaustive, you can accept it. – markowitz Jul 20 '22 at 13:02
  • I'm saying it's not faithful -- in the sense that there are events that are independent but not implied to be so by the model. Correct? – Abhimanyu Pallavi Sudhir Jul 22 '22 at 09:31
  • This interpretation seems me correct. Indeed the main take away of this example seems me to grasp the difference between causal nexus and statistical associations. Two different things, even if usually related. – markowitz Jul 22 '22 at 11:15