After reading numerous articles on sample size as it relates to a Pearson Correlation, I have not found any reference to what happens if I have a small sample (n=15) but still get a strong r (.830) that is significant at the 0.01 level (2-tailed). Does this output mean I can trust there is a reasonably strong correlation? Everything I read states small samples are not advised but nothing I've read shares how to interpret strong results like this (even with a small sample). Thank you for any direction on this.
1 Answers
At the first level, I'd say there's nothing wrong with interpreting a significant correlation even if your sample size is small. There are some caveats/reasons to take the results with a slightly bigger grain of salt than usual, discussed below.
First, using R to make up some data that match what you say you've got:
set.seed(101)
x <- MASS::mvrnorm(n=15, mu = rep(0,2), Sigma = matrix(c(1, 0.83, 0.83, 1), 2),
empirical = TRUE)
cor.test(x[,1], x[,2])
Results (lightly edited):
t = 5.3654, df = 13, p-value = 0.0001286
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.5527576 0.9418211
sample estimates: cor = 0.83
- If you really have a correlation of 0.83 with n=15, then your p-value should be two orders of magnitude smaller than 0.01 (in case you care about the precise value of the p-value; maybe you were only saying that your p-value was less than 0.01 [i.e. "significant at alpha = 0.01"], not that it was (approximately) equal to 0.01?)
- as @ChristianHennig comments, the confidence interval is wide because the sample size is small: you can confidently reject the null hypothesis, but the 95% CI ranges from 0.55 to 0.94 (you should be able to get a similar answer from SPSS)
- the tests here assume that the data follow a Normal distribution. While the kinds of statistical analyses discussed here are reasonably robust to non-Normality, the results do tend to be more sensitive to non-Normality when the sample sizes are small (you could try a non-parametric correlation like Spearman or Kendall if you're worried about this).
- the last caveat has to do with statistical significance filtering. That is, if your experimental design has low power (lots of noise/small sample size), and you only pay attention to statistically significant results, then you will generally overestimate the effect size (i.e., your correlation may be an overestimate). Illustrating this takes a little bit of work: I'm going to use the
retrodesignpackage for R, first translating the correlation using the Fisher transform (info from Wikipedia) and the standard error of a correlation on this scale. I also have to make an assumption about what the true effect size is: for example, below I will suppose that the true correlation for this system is 0.3. Then:
fz <- atanh(0.83) ## Fisher transformation
fe <- 1/sqrt(nrow(15)-3) ## see Wikipedia
retrodesign(atanh(0.3), fe)
## $power
## [1] 0.1885498
## $typeS
## [1] 0.006438651
## $exaggeration
## [1] 2.32502
power=0.188means I had a fairly low probability of rejecting the null hypothesis;typeS=0.006means I also had a low probability of getting the wrong sign (i.e. estimating a negative instead of a positive correlation);exaggeration=2.32(also called 'Type M error') means that on average a significant result will inflate the correlation (on the transformed scale) by a factor of 2.32, meaning that if we get a correlation of 0.83 (and we are only paying attention to significant results), the true correlation would be more liketanh(atanh(0.83)/2.32) = 0.47. (Guessing that the true correlation is 0.3 may be too pessimistic — that's something you need to think about — but this kind of effect inflation is a real concern.)
Finally, if sources on the internet are telling you that you need $n \ge 25$ to test a correlation, they are assuming that the true correlation is around 0.5: power calculations are usually done assuming a target power (probability of rejecting the null hypothesis) of 0.8 and a significance cutoff of 0.05:
library(pwr)
pwr.r.test(n=25, sig.level = 0.05, power = 0.8)
approximate correlation power calculation (arctangh transformation)
n = 25
r = 0.5280313
sig.level = 0.05
power = 0.8
alternative = two.sided
The source you cited in comments assumes we are trying to detect a true correlation around 0.3, in which case you would need $n > 85$ [in R you would do pwr.r.test(r = 0.3, sig.level = 0.05, power = 0.8); the author provides a link so you can do your own calculations]
Low-powered experimental designs are bad because (1) they waste time, money, and effort; (2) they provide a strong motivation for questionable research practices (you can look up terms like p-hacking); (3) in conjunction with a statistical significance filter, they lead to inflated estimates of effect size.
- 43,543
-
With MANY thanks to you and @Dave. This has been rich learning. Thank you. – ND_Coder Jul 09 '23 at 23:31
-
2"That is, if your experimental design has low power (lots of noise/small sample size), and you only pay attention to statistically significant results, then you will generally overestimate the effect size (i.e., your correlation may be an overestimate)." Answers here get into this in more detail. $//$ $+1$ – Dave Jul 10 '23 at 11:21

n=17(not 15), and represent only one sample: to compute a correlation you must (??) have two sets of data of equal length ... ?? – Ben Bolker Jul 09 '23 at 21:44