7

I've been trying to come up with a formal definition for a 'measure of association'. An intuitive definition might be something along the lines of 'a function that tells you about the existence or strength of dependence among a collection of random variables'.

I've constructed the following definition with this intuitive notion of association. Notice that I use an implication, rather than a biconditional. This is to allow for a function to tell us about specific types of association, rather than dependence in general.

Given a suitable probability space $(\Omega, \mathcal{F}, P)$ with real-valued random variables $\{X_j{(\omega}) | \omega \in \Omega \}_{j=1}^{n}$, a measure of association of order n is a function $f:\mathbb{R}^n \mapsto \mathbb{R}$ such that $\perp\!\!\!\!\perp \left( X_1, \cdots, X_n \right) \implies f \left( X_1, \cdots, X_n \right) = 0$.

However, it comes up a bit short. This definition doesn't really involve any notion of quantifying the strength of association. I've been mulling over the idea that with sufficient smoothness that perhaps some expression in terms of derivatives would be possible. In a comment below, @whuber nicely summarizes my dissatisfaction with this definition:

It would be more accurate to characterize your definition as an indicator of association. To be a "measure," it ought to change monotonically with some property of "association." The issue revolves around what might constitute a property one would characterize as quantifying some aspect of "association." The main difficulty is that "dependence among variables" is a rich and complex thing that is inadequately characterized by any single scalar-valued function. AFAIK, there is no axiomatization of such things.

How can this definition be revised to include functions that quantify the strength of association?

Galen
  • 8,442
  • Do you mean mutual information? – Dave Jul 13 '21 at 20:20
  • @Dave No, MI is a specific measure of association. I'm trying to formally define a family that includes MI, among other estimators. – Galen Jul 13 '21 at 20:21
  • What association do you not consider mutual information? – Dave Jul 13 '21 at 20:22
  • @Dave A basic example would be Pearson's R. – Galen Jul 13 '21 at 20:23
  • Correlation is part of mutual information. – Dave Jul 13 '21 at 20:24
  • @Dave I'm not sure I know what you mean. It is the case that $MI(X,Y) = 0 \iff \perp!!!!\perp (X,Y)$ and $R(X,Y) \neq 0 \implies MI(X,Y) \neq 0$. Is that what you're referring to? – Galen Jul 13 '21 at 20:27
  • It looks like you have it. – Dave Jul 13 '21 at 20:29
  • @Dave Okay, I see where you're coming from. That is a logical entailment, and in that sense I agree that correlation is 'part of' mutual information. They're not the same since $\exists X,Y : R(X,Y) =0 \land MI(X,Y) \neq 0$. – Galen Jul 13 '21 at 20:35
  • 4
    It would be more accurate to characterize your definition as an indicator of association. To be a "measure," it ought to change monotonically with some property of "association." The issue revolves around what might constitute a property one would characterize as quantifying some aspect of "association." The main difficulty is that "dependence among variables" is a rich and complex thing that is inadequately characterized by any single scalar-valued function. AFAIK, there is no axiomatization of such things. – whuber Jul 13 '21 at 21:03
  • 1
    @whuber I 100% agree. That nicely summarizes my own dissatisfaction with the definition I proposed above. – Galen Jul 13 '21 at 21:07
  • 1
    Perhaps there's a parallel in https://stats.stackexchange.com/a/507000/17230. – Scortchi - Reinstate Monica Jul 17 '21 at 23:37
  • @Scortchi-ReinstateMonica I think so! There's definitely a point to be made that we're not always precise with our meaning of "measure of". – Galen Jul 17 '21 at 23:40
  • 2
    A copula could be seen as a characterization of association, so maybe a "measure of association" is a functional of the bivariate distribution $F(x,y)$ that only depends on the copula $C(x,y)$ – kjetil b halvorsen Jul 19 '21 at 04:54
  • 1
    @kjetilbhalvorsen I am not familiar with that perspective. I would appreciate your response to this post if you have time. – Galen Jul 19 '21 at 13:11

4 Answers4

7

Books on this topic include Correlation and Dependence by Samuel Kotz and Dominique Drouet and Multivariate Models and Multivariate Dependence Concepts by Harry Joe. The second is more practical, the first more theoretical.

And there is a paper by A Rényi: On measures of dependence in Acta Mathematica Academiae Scientiarum Hungaricae 10, 441–451 (1959). https://doi.org/10.1007/BF02024507, proposing some criteria a measure of association $A(x,y)$ should satisfy. Let us list them:

I Standardization A should be in $[0,1]$
II Independence $A=0$ when independence holds
III Functional dependence $A=1$ if $x$ is a function of $y$, or viceversa
IV Increasing property $A$ must be increasing when dependence is increasing
V Invariance with respect to separate linear (or affine) transformations of each of the variables. A stronger requirement would be that $A$ is marginal free, that is, it only depends on the bivariate distribution through its copula
VI Symmetry If the variables are exchangeable, $A$ should be symmetric
VII Relationship with measures for ordinal variables If $A$ is defined for both ordinal and numerical variables, there should be a close connection between the two cases

As some of these criteria are informal, they cannot really be called axioms. Let us look at a Pearson correlation as an association measures and see how it fares:

Pearson correlation only seems to comply with VI, VII. Specifically, it is not marginal free. Let us look more in detail at that, as it has interesting consequences that should be more known, and maybe taken into account in interpretation. If we transform $x$ and $y$ separately with increasing transformations, this will only change the marginal distributions, and the copula will remain the same. But if these transformations are nonlinear, then they will destroy straight lines in the scatter plot, and so the Pearson correlation will change. And, when the marginal distributions have different shapes, the maximal correlation value of 1 is not reachable!

Let us use some simple example data distributed with R:

data(mammals, package="MASS")
with(mammals, cor(body, brain))
with(mammals, cor(log(body), log(brain)))
[1] 0.9341638
[1] 0.9595748

Now, to calculate the maximal correlation possible with the actual marginal distribution of the data, we can just sort the values in increasing order before calculating the correlation. That will preserve the marginals, but obviously destroys the copula:

maxcor <- function(x, y, ...) {
    xx <- sort(x) ; yy <- sort(y)
    cor(xx, yy, ...)
}

with(mammals, maxcor(body, brain)) with(mammals, maxcor(log(body), log(brain))) [1] 0.9435413 > [1] 0.9921567

So the maximal correlation possible is actually only $0.944$, compared with the actual value of $0.934$. And since the log transformation preserves the copula, but not the Pearson correlation, we see that it is not marginal free.

Galen
  • 8,442
  • 2
    +1 It's worth noting that the standardization and functional dependence requirements immediately rule out the usual correlation coefficients, indicating there is some difference between the concepts of "association" you relate here and correlation. – whuber Jul 24 '21 at 13:59
  • 2
    https://www.springer.com/gb/book/9783319989259 is a massive compilation of measures of association. I can't say how helpful it might be for this question. – Nick Cox Jul 25 '21 at 07:03
  • Interesting that the standardization is onto $[0,1]$ rather than $[-1,1]$. Perhaps a corresponding "measure of disassociation" would involve a standardization onto $[-1,0]$. Pearson's correlation would then have subdomains would look like one or the other to some extent, but wouldn't simply be either. – Galen Jul 26 '21 at 03:19
  • 1
    @Galen: I guess the idea is to generalize to $m$ variables, then a direction of association is not so easy to define ... – kjetil b halvorsen Jul 26 '21 at 03:22
  • @kjetilbhalvorsen That makes a lot of sense. – Galen Jul 26 '21 at 03:29
  • @kjetilbhalvorsen I have accepted the question as the answer because your two sources really packed a lot of good information in them. However, I see that you may not have finished saying all you want to say. Please come back and finish your thoughts when you have time! :) – Galen Jul 26 '21 at 04:49
  • 1
    I will certainly finish, I had hoped to get time today, but it is night now, so hopefully tomorrow! – kjetil b halvorsen Jul 26 '21 at 04:52
  • The Spearman correlation involves computing the Pearson correlation on ranks. I imagine there is some relation in the limit between the ranks of the observations and the cumulative distribution function. Since copulae are functionals of the generalized inverses of the marginal CDF's, I wonder if there is a connection there. – Galen Oct 28 '21 at 18:01
  • 2
    I like this answer, but it is interesting that it does not in any way start with a definition of the meaning of association in informal language. – Alexis Mar 08 '22 at 23:22
4

Below are some desiderata that might be useful. I'm not certain these desiderata will work, but this would be a reasonable starting point for inquiry. Essentially, you need some kind of property to ensure that the "measure" is well-ordered, specified as an inequality that the measure must satisfy. Here I have used the idea that adding an independent random vector should "derogate" from the association.

The first property is the one you give in your question, but I have also added some other properties that I think would be useful in a measure of association. The smoothness property reflects the idea that you want your measure to change continuously when you change the random vector continuously. The derogation desiderata reflects the idea that adding an independent random vector to your existing random vector should not increase the association, and will decrease association when the added vector is non-degenerate. If you want your measure of association to have a maximum value, I would also suggest the last assumption.


No association: If the elements of $\mathbf{X} = (X_1,...,X_n)$ are mutually independent then we have $f(\mathbf{X}) = 0$.

Smoothness: Given a random vector $\mathbf{Y} = (Y_1,...,Y_n)$ independent of $\mathbf{X} = (X_1,...,X_n)$, the function $f(\mathbf{X} + \alpha \mathbf{Y})$ is continuous with respect to $\alpha$.

Derogation (weak): Given a random vector $\mathbf{Y} = (Y_1,...,Y_n)$ independent of $\mathbf{X} = (X_1,...,X_n)$ we have $f(\mathbf{X}+\mathbf{Y}) \leqslant f(\mathbf{X})$.

Derogation (strong): Weak derogation applies, and additionally, if $\mathbf{Y} = (Y_1,...,Y_n)$ is non-degenerate (i.e., it does not have a point-mass distribution) we have $f(\mathbf{X}+\mathbf{Y}) < f(\mathbf{X})$.

Maximum association (optional): If all values in $\mathbf{X}$ have a point-mass distribution when we condition on any single value $X_i = x_i$ then we have $f(\mathbf{X}) = 1.$


One other thing you should bear in mind here is that you might need to define the "measure of association" with respect to the distribution of the random vector rather than the random vector itself (though there are other concepts in probability/statistics where we define an operation on a random vector that implicitly uses its distribution).

Ben
  • 124,856
0

I think about this a little differently than I used to, and will respond to myself here. kjetil's answer is still the best researched answer here and Ben's answer is thought provoking. The following are just my idiosyncratic brain droppings.

Beginning with a definition of statistical independence

$$F_{X_1, \cdots, X_n}(x_1, \cdots, x_n) = \prod_{j=1}^n F_{X_j}(x_j)$$

where $F_{X_1, \cdots, X_n}$ is the joint cumulative distribution function (CDF) and $F_{X_j}$ is the marginal CDF for the $j$th variable.

Not all variables are statistically independent, which prompted me to define the notion of an independence gap to be

$$\phi_{X_1, \cdots, X_n}(x_1, \cdots, x_n) \triangleq F_{X_1, \cdots, X_n}(x_1, \cdots, x_n) - \prod_{j=1}^n F_{X_j}(x_j)$$

by simply by taking the difference of the product of the marginals from the joint.

When $\phi > 0$ things are happening more together than they do apart, and similarly when $\phi < 0$ things are happening less together than they do apart. So to speak, anyway. The former to me seems intuitively to match the term "association", and the latter "disassociation".

Any function $g(x_1, \cdots, x_n)$ which is (co)monotonic with the independence gap is a measure of association, and any function $h(x_1, \cdots, x_n)$ which is antimonotonic with the independence gap is a measure of disassociation.

Galen
  • 8,442
  • I have also thought about using this notion in what I have termed dependence entropy. – Galen Jun 06 '23 at 17:44
  • The independence gap is more general in that it can be expressed in terms of any collection of events, not just the cumulative distribution functions.

    https://stats.stackexchange.com/a/623001/69508

    – Galen Aug 05 '23 at 17:51
0

Similar to this answer which defines a difference between the joint distribution and the product of the marginals, one could similarly take a ratio.

I recently encountered exactly that on Wikipedia, termed coherence. I would write it as the following ratio

$$\phi \triangleq \operatorname{coherence}_{(X_1, \ldots, X_n)}(X_1, \ldots, X_n) \triangleq \frac{F_{X_1, \ldots, X_n}(x_1, \ldots, x_n)}{\prod_{j=1}^n F_{X_j}(x_j)}.$$

Just like zero being the additive identity in the independent gap, in coherence one is the multiplicative identity. When $\phi < 1$ then we have negative association, and when $\phi > 1$ we have positive association. When $\phi = 1$ then there is statistical independence.

Unlike the independence gap which is always defined whenever the joint distribution is defined, the coherence requires that the product of the marginals is not zero in order to be defined.

Galen
  • 8,442