Interpretation of correlation coefficient between two binary variables

Question

I have a dataset with binary variables for mitigation measures (0= a measure is not implemented, 1 = a measure is not implemented).

I now want to know how often a certain measure is put in place together with another measure. For this I used a using the Pearson coeefficient.

How do I interpret the coefficients? For example the 0.18 interaction between expensive_contents and electricals_above? because the two do not show up together 18% of the time in the data, but around 9% of the time.

This is a common misconception of the correlation coefficient--but for Binomial variables, correlation is at least related to the proportion of the time the two variables are the same (as well as to the proportion of the time they are both present, which perhaps is what "show up together" might mean). A formula is given at https://stats.stackexchange.com/questions/284996, where $\rho$ is the correlation, $p$ and $q$ are the individual chances of being $1,$ and $a+p+q-1$ is the proportion of time both variables equal $1.$ — whuber, Jun 15 '22 at 17:22

score 3 · Answer 1 · answered Jun 15 '22 at 19:13

For binary data, the correlation coefficient is:

$$r = \frac{p_{11}-p_{1 \bullet} p_{\bullet 1}}{\sqrt{p_{1 \bullet} p_{\bullet 1} (1-p_{1 \bullet})(1-p_{\bullet 1})}},$$

where $p_{1 \bullet}$ and $p_{\bullet 1}$ are the proportions of occurrences for each individual variable and $p_{11}$ is the proportion of mutual occurrence in both variables taken together (the latter is your 18% in this case). As you can see from the formula, it is not generally the case that $r=p_{11}$. The formula also takes account of the proportion of occurrences in each of the individual samples.

score 3 · Accepted Answer · answered Jun 15 '22 at 20:03

There are several possible interpretations. They come down to understanding the correlation between two binary variables.

By definition, the correlation of a joint random variable $(X,Y)$ is the expectation of the product of the standardized versions of these variables. This leads to several useful formulas commonly encountered, such as

$$\rho(X,Y) = \frac{\operatorname{Cov}(X,Y)}{\sqrt{\operatorname{Var}(X)\operatorname{Var}(Y)}}.$$

The distribution of any binary $(0,1)$ variable is determined by the chance it equals $1.$ Let $p=\Pr(X=1)$ and $q=\Pr(Y=1)$ be those chances. (To avoid discussing the trivial cases where either of these is 100% or 0%, let's assume $0\lt p \lt 1$ and $0\lt q \lt 1.$)

When, in addition, $b=\Pr((X,Y)=(1,1))$ is the chance both variables are simultaneously $1,$ the axioms of probability give full information about the joint distribution, summarized in this table:

$$\begin{array}{cc|l} X & Y & \Pr(X,Y)\\ \hline 0 & 0 & 1 + b - p - q\\ 0 & 1 & q-b\\ 1 & 0 & p-b\\ 1 & 1 & b\\ \hline \end{array}$$

From this information we may compute $\operatorname{Var}(X) = p(1-p),$ $\operatorname{Var}(Y)=q(1-q),$ and $\operatorname{Cov}(X,Y) = b-pq.$ Plugging this into the formula for the correlation gives

$$\rho(X,Y) = \frac{b - pq}{\sqrt{p(1-p)q(1-q)}} = \lambda b - \mu$$

where the positive numbers $\lambda$ and $\mu$ depend on $p$ and $q$ but not on $b.$ This informs us that when the marginal distributions are fixed,

the correlation of $X$ and $Y$ is a linear function of the chance $X$ and $Y$ are simultaneously equal to $1;$ and vice versa.

The latter statement follows by solving $b = (\rho + \mu)/\lambda,$ which is a linear function of $\rho.$

Since $1-X$ and $1-Y$ are binary variables, too, this result when applied to them translates to a slight generalization: the correlation is a linear function of any one of the four individual probabilities listed in the table.

Consequently, you can always re-interpret the correlation in terms of the chance of any specific joint outcome when the variables are binary.

As an example, suppose $p=q=1/2$ and you have in hand (through a calculation, estimate, or assumption) a correlation coefficient of $\rho = 0.12.$ Compute that $\lambda = 4$ and $\mu = 1.$ Because $0\le b \le 1/2$ is forced on us by the laws of probability, $\rho = 4b-1$ ranges from $-1$ (when $b=0$) to $+1$ (when $b=1/2$). Conversely, $b = (1 + \rho)/4$ in this case, giving $b = (1 + 0.12)/4 = 0.28.$

Another natural interpretation would be in terms of the proportion of time the variables are equal. According to the table, that chance would be given by $(1+b-p-q) + b=1+2b-p-q.$ Calling this quantity $e,$ we have $b = (e+p+q-1)/2,$ which when plugged into the formula for $\rho$ gives

$$\rho(X,Y) = \frac{e-(1-p)(1-q)-pq}{2\sqrt{p(1-p)q(1-q)}} = \kappa e - \nu$$

for positive numbers $\kappa$ and $\nu$ that depend on $p$ and $q$ but not on $e.$ Thus, just as before,

the correlation of $X$ and $Y$ is a linear function of the chance $X$ and $Y$ are simultaneously equal to each other; and vice versa.

Continuing the example with $p=q=1/2,$ compute that $\kappa = 2$ and $\nu = 1.$ Consequently $e = (\nu + \rho)/\kappa = (1 + \rho)/2.$

It might be handy, then, to have efficient code to convert a correlation matrix into a matrix of joint probabilities and vice versa. Here are some examples in R implementing the first interpretation. Of course, both functions require you to supply the vector of binary probabilities ($p,$ $q,$ and so on) and they assume your probabilities and matrices are mathematically possible.

#
# Convert a correlation matrix `Rho` to a matrix of chances that 
# binary variables are jointly equal to 1.  `p` is the array of expected values.
#
corr.to.prop <- function(Rho, p) {
  s <- sqrt(p * (1-p))
  Rho * outer(s, s) + outer(p, p)
}
#
# Convert a a matrix of chances `B` that binary variables are jointly equal to 1
# into a correlation matrix.  `p` is the array of expected values.
#
prop.to.corr <- function(B, p) {
  s <- 1/sqrt(p * (1-p))
  (B - outer(p, p)) * outer(s, s)
}

Thank you for your answer, much appreciated. With relation to your code example, given a a correlation table [simply cor(w,x,y,z)] how do i find p for the individual measures? Is it simply a vector of their individual distributions, so for example number of w obervations over number of all observations? — tookja, Jun 17 '22 at 14:47
You have to estimate the $p_i.$ (You cannot deduce them from the correlation matrix in general.) Although there are many methods to do so, a standard one uses the proportion appearing in your data for that estimate--and it's guaranteed to be mathematically compatible with the correlation matrix. — whuber, Jun 17 '22 at 14:51
Ah, so p is the population proportion, so the proportion of it in my data will be sample proportion and therefore estimate my population? I assume I will have to do significance tests for that assumption to hold? — tookja, Jun 17 '22 at 14:56
Could you point me towards a source to find b (the joint probability of both x and y = 1) directly otherwise? because this would be the outcome i am actually interested in! my current method of just manually typing it in manually for each pair using length(which(x==1 & y == 1 ) just seems extremely inefficient. Again, thank you so much for helping by the way, i cant stress how helpful this is. — tookja, Jun 17 '22 at 14:59
You estimate the joint probability in the same way you estimate any probability: use the count of the times both $x$ and $y$ equal $1,$ relative to the total number of times you have values of both. — whuber, Jun 17 '22 at 15:00

Interpretation of correlation coefficient between two binary variables

2 Answers2