3

I have calculated the correlation $r$ between $X$ and $Y$. I'd like to test $H_0: r \leq \rho$, $H_1: \rho < r$.

For the case when $r = 0$, I can apply the transform $t = \rho \frac{\sqrt{n-2}}{\sqrt{1 - \rho^2}}$, which follows the $t_{n-2}$ distribution. I can then perform a one-tailed test and see if $\int_t^{\infty} p_{t_{n-2}} < \alpha$, rejecting the null hypothesis if it is.

How can I generalize this case to the described scenario when $r \not= 0$?

Edit: how would I determine the sample size required to perform this test with specified $\alpha$ and $\beta$ (1 - power)?

Lmnop
  • 51
  • Check for Fisher's Z transformation. – User1865345 Sep 03 '23 at 12:28
  • @User1865345 Can you elaborate please? Also any thoughts on determining sample size. – Lmnop Sep 03 '23 at 12:43
  • Just to clarify some things. There are two transformations: r-to-t (the above) and r-to-z' (Fisher's Z). – Lmnop Sep 03 '23 at 13:30
  • Suppose we are using the r-to-t transformation. If we want to find the sample size in the r=0 case, then we just find the sample size for the t-test. An effect size is required to do this, specifically Cohen's f. For a simple linear regression, the $R^2$ is the square of the correlation $r$, and Cohen's $f$ is $f = \frac{R^2}{1 - R^2}$, so we can easily convert between correlation and effect size to work out the minimum correlation we want to detect. – Lmnop Sep 03 '23 at 13:37
  • 1
    Similar Qs: https://stats.stackexchange.com/questions/8192/hypothesis-test-for-whether-correlation-coefficient-is-greater-than-specified-va, https://stats.stackexchange.com/questions/13810/threshold-for-correlation-coefficient-to-indicate-statistical-significance-of-a, https://stats.stackexchange.com/questions/17371/example-of-strong-correlation-coefficient-with-a-high-p-value, https://stats.stackexchange.com/questions/278751/how-do-i-determine-whether-two-correlations-are-significantly-different (NOT duplicates) – kjetil b halvorsen Sep 03 '23 at 16:01

2 Answers2

1

Using the Z transform its pretty easy to determine the sample size for this test. Suppose we want to test $H_0: r \leq \rho_1$, and we specify that for a correlation of $\rho_2$ ($\rho_1 < \rho_2$) our test must have probability of type I error $\alpha$ and probability of type II error $\beta$.

We consider the problem in terms of Z transformed variables. (for correlation, a Fisher transform is commonly used) Let $c$ denote the threshold value on $N(\rho_1', \frac{1}{\sqrt{n-3}})$, such that for $c < r'$ we reject the null hypothesis. Where $\rho_1'$ is the transform of $\rho_1$.

Consider the assumptions of the null hypothesis. Since we specified the probability of making a type I error as less than $\alpha$, we must have $c \geq \rho_1' + z_{1-\alpha} \frac{1}{\sqrt{n-3}}$.

Now consider the assumptions of the alternative hypothesis. For $r \geq \rho_2'$, we must have probability of type II error as less than $\beta$. Therefore, $c \leq \rho_2' -z_{1 - \beta} \frac{1}{\sqrt{n-3}}$.

It follows that $$\rho_1' + z_{1-\alpha} \frac{1}{\sqrt{n-3}} \leq \rho_2' -z_{1 - \beta} \frac{1}{\sqrt{n-3}}.$$ This can be rearranged to $n \geq 3 + (\frac{z_{1-\alpha} + z_{1-\beta}}{\rho_2' - \rho_1'})^2$.

Comments and corrections welcome.

Lmnop
  • 51
0

The standard approach: Calculate a confidence interval for the true correlation $\rho$ and check if it excludes the hypothesized value $\rho_o$. In your case, if the lower 95 percent limit is higher than $\rho_o$, reject H0 in favour of H1 at the approximate 5% level.

The confidence interval is usually calculated via the t distribution.

Example

We are interested in the working hypothesis $\rho_o > 0.3$ at the significance level of 0.05. We crunch a parametric two-sided, equal-tailed 90% confidence interval in Python and R:

Python:

from scipy.stats import pearsonr
from sklearn import datasets

X, _ = datasets.load_iris(as_frame=True, return_X_y=True) cor = pearsonr(X["petal length (cm)"], X["petal width (cm)"]) cor.confidence_interval(confidence_level=0.9)

ConfidenceInterval(low=0.9515705602254544, high=0.9715644582981584)

R

cor.test(~ Petal.Length + Petal.Width, data = iris, conf.level = 0.9)
# 90 percent confidence interval:
#   0.9515706 0.9715645

Since even the lower limit 0.952 is above the hypothesized value $\rho_o=0.3$, we claim (at the approximate level 0.05) that our working hypothesis is true.

Michael M
  • 11,815
  • 5
  • 33
  • 50
  • Thanks, this makes sense. – Lmnop Sep 03 '23 at 13:45
  • Do you know if/when the r-to-t transformation is more appropriate than the r-to-z' transformation? – Lmnop Sep 03 '23 at 13:47
  • 2
    Confidence intervals for $r$ are usually, or at least better, calculated on Fisher's $z$ scale and back-transformed. The sampling distribution isn't in general going to be symmetric, but only when true correlation is (near) zero. – Nick Cox Sep 03 '23 at 13:49
  • Fortunately, software typically do exactly what @Nick proposes. So one does not need to care about such implementation details and rather focus on the important aspects like: are my observations independent? Can they be considered a random sample etc. – Michael M Sep 03 '23 at 13:55
  • 1
    Unfortunately performance is an issue so this is being written in python and eventually C++. – Lmnop Sep 03 '23 at 14:11
  • Does that mean the confidence_interval method of scipy.stats.pearsonr is too slow? Frankly: I can't imagine a situation where calculating such a simple interval is too slow and makes sense... – Michael M Sep 03 '23 at 14:18
  • I wasn't aware of this method - I'm sure that will do - cheers. – Lmnop Sep 03 '23 at 14:21
  • I need to play with it as well :-) – Michael M Sep 03 '23 at 14:27
  • 1
    It's actually quite a bit harder to vectorize but the implementation for the Fisher Z and testing shouldn't be too bad. – Lmnop Sep 03 '23 at 14:41
  • I think $r$ and $\rho$ might be mixed up in this. We want to consider the assumptions of the null hypothesis: $r \sim N(\rho, \frac{1}{\sqrt{n-3}})$. If r is belong the 95% upper limit of this distribution we reject the null hypothesis - the p-value is $\int_r^\infty p(\rho, \frac{1}{\sqrt{n-3}})$. – Lmnop Sep 04 '23 at 21:35
  • I meant to say “beyond” but I can’t edit it now for some reason. – Lmnop Sep 04 '23 at 21:46
  • Why -1 without comment? I see the advantage of chatgpt over our site... no inpoliteness over there. – Michael M Sep 05 '23 at 06:38
  • 2
    I didn't downvote. Sometimes I will downvote without declaring myself when a bad post has already been explained to be bad and there is some risk of personal hostility from the OP. But that's not true here. So, what to say except that random unexplained downvotes are part of the experience here and can mean all sorts of things such as "I would prefer a different answer" or even "I don't like you". Naturally I am just guessing, because the problem is lack of explanation. – Nick Cox Sep 05 '23 at 11:35
  • Added an example to make it look less meager ;) – Michael M Sep 05 '23 at 12:08
  • 1
    I also didn't downvote. Your answer helped me to tackle the crux of the problem (sample size). – Lmnop Sep 07 '23 at 16:26