Is there a z transformation for the correlation of non-normal distributions?

Question

I'm writing code to calculate if the correlation between two random variables is significant.

I've recently come across Fisher's z transformation as a method for finding significance. But from reading around:

it seems this transform only applies to normal variables. A lot of the variables I'm working with aren't normal. Is there a corresponding transform for non-normal random variables?

Background

The variables I'm dealing with

Most of my variables have some amount of skew and so are not perfectly normally distributed.
My dataset also has binary indicator variables, with Bernoulli distributions.

The excerpt from Wikipedia I'm concerned about

If $(X, Y)$ has a bivariate normal distribution with correlation ρ and the pairs $(X_i, Y_i)$ are independent and identically distributed, then $z$ is approximately normally distributed with mean $${1 \over 2}\ln \left({{1+\rho } \over {1-\rho }}\right),$$ and standard error $${1 \over {\sqrt {N-3}}},$$

Why do you want to apply this transform? What kind of correlation are you interested in testing? — user2974951, May 19 '23 at 08:26
Fisher's (not Fischer's) z transform is applied to correlations, not the original data. If you're worried that the original data are a long way from normal, then consider bootstrapping your correlations, or transforming the variables. — Nick Cox, May 19 '23 at 09:05
I know that, but if you take a look at the definition on Wikipedia: https://en.m.wikipedia.org/wiki/Fisher_transformation#Definition, I'm concerned about this statement: "If (X, Y) has a bivariate normal distribution with correlation ρ and the pairs (Xi, Yi) are independent and identically distributed, then z is approximately normally distributed with mean"
Wouldn't this mean the underlying variables (X, Y) have to be normal? — Connor, May 19 '23 at 09:23
As the sentence you quote there makes explicit (it has the form "If A then B") the derivation of the transformation relies on X and Y being bivariate normal; however, there's a number of issues here, of which I'll mention two 1. As just alluded to, it's not just the marginal distributions, but the joint distribution of the pair (X,Y) that determines the distribution of the Pearson correlation coefficient (and hence what r-distribution needs to be transformed to be "more nearly normal"). ... ctd — Glen_b, May 19 '23 at 09:34
ctd ... 2. For most joint distributions, linear correlation doesn't fully describe the dependence (including that "uncorrelated" no longer necessarily implies "independent"), so the more basic question of whether linear correlation is necessarily a particularly useful description of the dependence arises. $:$ . . .$,$ If you are confident that a linear correlation is what you want, then I tend to agree with Nick Cox, that a permutation test might be a good approach, perhaps specifically using a studentized correlation for the statistic as recommended by DiCiccio and Romano (JASA, 2017). — Glen_b, May 19 '23 at 09:35
I think, from your edit, that we're dealing with an XY problem (see also Wikipedia). Specifically your problem is not doing something like a z-transform at all. At first glance it now sounds like your problem is that you want to test for independence (vs some form of association) for some specific variables (many of which are binary). . . . However, I think that this, too, may potentially be an XY problem of its own. Why are you looking at testing a large collection of bivariate correlations? — Glen_b, May 19 '23 at 09:48
Note that (for the bivariate normal) the Fisher transform is really only needed when the population correlation you're testing is non-zero; the usual correlation t-test on $t=r\sqrt{\frac{n-2}{1-r^2}}$ works perfectly well for testing a null correlation. The same test works "as is" in a much wider class of cases than bivariate normality (per regression, it's derived assuming conditional normality of one variable given the other) and is fairly robust to that assumption. — Glen_b, May 19 '23 at 09:56
Although I have written on the z transform elsewhere -- as a way of getting confidence intervalsm for correlation, although a test can be linked -- I agree with @Glen_b that expansion of the question, and extra comments, make this seem a case of the X-Y problem. It seems that you're expecting correlations to do the job of finding relationships and worrying how to do it best. Whether it will work well at all would be my concern. In particular, correlation has a meaning if either or both variables are (0, 1) so long as neither is constant, but that is a long way away from bivariate normal. — Nick Cox, May 19 '23 at 11:29
It could be X-Y in the sense that I'm trying to find a confidence interval for the amount of information shared between two variables, which is not explicitly what I'm asking for here. Correlation may not be a reasonable measure at all for some of these pairs. But I have stated, at the top of the question, that I'm interested in the significance of the correlation between the two variables. Which I believe Fisher's z test gives me, but only in a specific circumstance, hence the question of other possible transformations! — Connor, May 19 '23 at 12:28
@Glen_b Thank you for such a detailed comment chain. So does that mean that Fisher's z statistic is slightly broader than I originally thought, and as long as the bivariate distribution is normal, it's still reasonable to use Fisher's z test. I'd also be interested to know your opinion on using it as a heuristic, it may not be exact, but is it close to the ideal method? — Connor, May 19 '23 at 12:29
Repeating myself: 1. "calculate if the correlation between two random variables is significant" means you're testing a null of $\rho=0$ vs $\rho\neq 0$. You don't need an asymptotic adjustment for skewness of $r$ under bivariate normal when the null distribution is already symmetric; use the t-test in that case, it's exact small-sample. 2. This t-test doesn't rely on bivariate normality, but on the weaker regression assumptions. It's also fairly robust. 3. If you want protection against non-normality, Nick's mention of permutation tests is relevant. (I still think this is an XY problem) — Glen_b, May 20 '23 at 04:32
@Glen_b I don't understand what you mean. Why would my null be an assumed correlation of $\rho = 0$? I'm looking at variables with correlations closer to $\rho = 0.6$, I'd like to know how significant that correlation is. In that scenario, won't my distribution be skewed? I would be happy to try the t-test, what's the best way to find a resource on how to do the t-test for correlation? Is it XY though, or just a simplified statement of what the issue is? There's almost always a deeper reason behind every question, but is the extra information useful? Past experience tells me no. — Connor, May 20 '23 at 05:18
@Glen_b Is this a good example of using the t-test to calculate the significance of a correlation? https://www.statology.org/t-test-for-correlation/ — Connor, May 20 '23 at 05:37

Is there a z transformation for the correlation of non-normal distributions?

Background

0 Answers0