Is it wrong to trust Pearson Correlation output of r=.830 (sig at 0.01 2-tailed) with n=15?

Question

After reading numerous articles on sample size as it relates to a Pearson Correlation, I have not found any reference to what happens if I have a small sample (n=15) but still get a strong r (.830) that is significant at the 0.01 level (2-tailed). Does this output mean I can trust there is a reasonably strong correlation? Everything I read states small samples are not advised but nothing I've read shares how to interpret strong results like this (even with a small sample). Thank you for any direction on this.

Does this answer your question? Is my model any good, based on the diagnostic metric ($R^2$/ AUC/ accuracy/ RMSE etc.) value? — mkt, Jul 09 '23 at 21:09
Welcome to Cross Validated! What kind of confidence interval do you have? Does it include values that are positive but represent weak (even if positive) correlations? // @mkt I like the answer posted there and believe it to be recommended reading, but it does not seem perinent to this particular question. — Dave, Jul 09 '23 at 21:13
I have not calculated a confidence interval. I am using SPSS. Would a One Sample T. Test work for this? — ND_Coder, Jul 09 '23 at 21:24
Probably not. What would you test, and would that get you a confidence interval? — Dave, Jul 09 '23 at 21:31
I was led this direction by my reading into the various steps necessary. Obviously I'm not exactly sure how to derive the confidence level. My data are 67.4, 77.0, 88.0, 77.1, 92.8, 81.3, 97.8, 74.5, 76.7, 96.4, 89.6, 88.6, 75.5, 54.7, 82.9, 90.6, 90.8. Based on this could you suggest the right way to determine the confidence interval? Thank you! — ND_Coder, Jul 09 '23 at 21:33
The data you posted in your comment have n=17 (not 15), and represent only one sample: to compute a correlation you must (??) have two sets of data of equal length ... ?? — Ben Bolker, Jul 09 '23 at 21:44
What do you mean by "trust"? Of course you can trust the sample correlation value as far as your computation is correct. If you're interested in knowing something about an assumed underlying "true" correlation, a confidence interval will tell you more, see https://en.wikipedia.org/wiki/Pearson_correlation_coefficient The issue with a small sample is that the sample correlation is imprecise as an estimator, as expressed by the confidence interval. — Christian Hennig, Jul 09 '23 at 21:45
You can get confidence intervals for a Pearson correlation in SPSS from the correlation dialogue window -> Confidence interval... -> tick the box "Estimate confidence interval..." or in some new versions, there's a separate choice under Analyze -> Correlate -> Bivariate with confidence intervals. To add: I'm surprised you haven't ran into many, many sources saying that minimum sample size for Pearson correlation is 25-30 (I'm not saying this is necessarily absolutely true, but it is a very commonly stated rule of thumb). — Sointu, Jul 09 '23 at 21:54
@BenBolker: You are right about n=17 in what I supplied in my comment. My apologies; in my hurry to answer I copied and pasted without removing two that are not used in my calculations. Also, I did not think I would have to provide both samples if I was asking how to calculate the confidence interval. I have since found what I think is the right way to perform the confidence interval and I will add it to my original question. Thank you. — ND_Coder, Jul 09 '23 at 21:57
Do you want to know something about a mean or a correlation? Both? — Dave, Jul 09 '23 at 22:02
@Sointu. You are right that there are many comments about minimum sample sizes but they all discuss variations like 'effect size' and 'confidence intervals' etc. so my ask was to see if - in fact - it would be safe to assume some inference when the r and sig is strong. — ND_Coder, Jul 09 '23 at 22:02
@Dave. I am looking for any correlation that may exist between the two datasets. Thank you. — ND_Coder, Jul 09 '23 at 22:04
What are you trying to calculate in that confidence interval you posted? It appears to be related to a mean, not a correlation. — Dave, Jul 09 '23 at 22:08
@Dave: In one of the original comments I was asked "What kind of confidence interval do you have?". Based on this question I found several references to determining the confidence interval for a dataset using SPSS (Analyze -> Explore). The default setting was 95% and this was the output. If I'm going about this the wrong way, it's only because I'm not familiar with what is being ask BUT I am a willing learner so if there is something I'm missing I would love some direction. Thank you. — ND_Coder, Jul 09 '23 at 22:13
Set aside what you know how to do in your software or what you've read about how to do. What do you want to know? Is it about a correlation? A mean? Both? Something else? Then the correlation/mean/both/something else of what? — Dave, Jul 09 '23 at 22:16
@Dave. Okay. I have two datasets of the exact same length (n=15). One represents level of occupancy in 15 different hotels (avg over a 3-month period). The second dataset is the net margin (in $) of the same hotels over the same time period. I am trying to determine whether a relationship exists between occupancy and net margin. So, I believe I am looking to calculate the correlation between the two. — ND_Coder, Jul 09 '23 at 22:30
There is more to a relationship than just correlation, but with only $15$ paired observations, it will be difficult to do much beyond correlation. When you calculate the correlation between your pairs of data, that if how you get the $0.83$, with a p-value below $0.01$, correct? — Dave, Jul 09 '23 at 22:34
@Dave. Yes. Exactly. But this is what confuses me because how can I achieve such a strong correlation when everything I read suggests I can't use a small sample size like this for a Pearson correlation. Is the "sample size" discussion centered around what is "typically" needed to reach a strong r and strong p and it just so happens that my data are much more linear so I don't need to worry about sample size so much? — ND_Coder, Jul 09 '23 at 22:38
You can wind up with wide confidence intervals for the correlation and high p-values when the sample sizes are large, but there is nothing inherently wrong with calculating and testing correlations when the sample size is $15$ like you have. What do these references give as justification for why you can't do it? Are these references written by credible statisticians? More people teach statistics than know the subject, especially online. (I recognize the irony of posting this online.) — Dave, Jul 09 '23 at 22:41
@Dave. As you know, credibility is easy to 'suggest' on the Internet but most of my reading (30+ documents) were on University websites. Perhaps I was misinterpreting or misunderstanding but in every case it was suggested that a conclusion could not be drawn with a small sample size but NO ONE (at least what I was reading) went on to describe a situation when a strong r and p DID actually come from a small sample size. This caused me the confusion and made me wonder if I had to reject what I was seeing. Your help is very much appreciated. Thank you! — ND_Coder, Jul 09 '23 at 22:49
Here is one: https://www.statisticssolutions.com/sample-size-for-bivariate-correlation-pearson-correlation-and-pearson-product-moment-correlation/ — ND_Coder, Jul 09 '23 at 23:24
Note by the way that the p-value does not tell you anything about whether a value should be "trusted" or a correlation "can" be computed. If your computations are correct and model assumptions are OK, your result is valid regardless of whether you have $p<0.01$ or $p=0.45$, say. The latter would say that you have no evidence against the true correlation being zero, and there'd be nothing wrong with that. Also, a low correlation value is no less trustworthy than a large one. The issue with small sample size is low power as discussed elsewhere, and nothing else. — Christian Hennig, Jul 09 '23 at 23:42
Statisticssolutions.com is the very opposite of a university website: it's purely commercial. I have seen some good materials there as well as some problematic ones. YMMV. BTW, most university websites host course materials that are not reviewed by anyone and therefore should not be implicitly trusted. Many are excellent sources of information but, not infrequently, they contain some real howlers. See the end of my post at https://stats.stackexchange.com/a/17148/919 for an example. — whuber, Jul 10 '23 at 12:52

Ben Bolker · Accepted Answer · 2023-07-10T13:32:11.777

At the first level, I'd say there's nothing wrong with interpreting a significant correlation even if your sample size is small. There are some caveats/reasons to take the results with a slightly bigger grain of salt than usual, discussed below.

First, using R to make up some data that match what you say you've got:

set.seed(101)
x <- MASS::mvrnorm(n=15, mu = rep(0,2), Sigma = matrix(c(1, 0.83, 0.83, 1), 2),
                   empirical = TRUE)
cor.test(x[,1], x[,2])

Results (lightly edited):

t = 5.3654, df = 13, p-value = 0.0001286
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
  0.5527576 0.9418211
sample estimates:  cor = 0.83

If you really have a correlation of 0.83 with n=15, then your p-value should be two orders of magnitude smaller than 0.01 (in case you care about the precise value of the p-value; maybe you were only saying that your p-value was less than 0.01 [i.e. "significant at alpha = 0.01"], not that it was (approximately) equal to 0.01?)
as @ChristianHennig comments, the confidence interval is wide because the sample size is small: you can confidently reject the null hypothesis, but the 95% CI ranges from 0.55 to 0.94 (you should be able to get a similar answer from SPSS)
the tests here assume that the data follow a Normal distribution. While the kinds of statistical analyses discussed here are reasonably robust to non-Normality, the results do tend to be more sensitive to non-Normality when the sample sizes are small (you could try a non-parametric correlation like Spearman or Kendall if you're worried about this).
the last caveat has to do with statistical significance filtering. That is, if your experimental design has low power (lots of noise/small sample size), and you only pay attention to statistically significant results, then you will generally overestimate the effect size (i.e., your correlation may be an overestimate). Illustrating this takes a little bit of work: I'm going to use the retrodesign package for R, first translating the correlation using the Fisher transform (info from Wikipedia) and the standard error of a correlation on this scale. I also have to make an assumption about what the true effect size is: for example, below I will suppose that the true correlation for this system is 0.3. Then:

fz <- atanh(0.83)       ## Fisher transformation
fe <- 1/sqrt(nrow(15)-3) ## see Wikipedia
retrodesign(atanh(0.3), fe)
## $power
## [1] 0.1885498
## $typeS
## [1] 0.006438651
## $exaggeration
## [1] 2.32502

power=0.188 means I had a fairly low probability of rejecting the null hypothesis;
typeS=0.006 means I also had a low probability of getting the wrong sign (i.e. estimating a negative instead of a positive correlation);
exaggeration=2.32 (also called 'Type M error') means that on average a significant result will inflate the correlation (on the transformed scale) by a factor of 2.32, meaning that if we get a correlation of 0.83 (and we are only paying attention to significant results), the true correlation would be more like tanh(atanh(0.83)/2.32) = 0.47. (Guessing that the true correlation is 0.3 may be too pessimistic — that's something you need to think about — but this kind of effect inflation is a real concern.)

Finally, if sources on the internet are telling you that you need $n \ge 25$ to test a correlation, they are assuming that the true correlation is around 0.5: power calculations are usually done assuming a target power (probability of rejecting the null hypothesis) of 0.8 and a significance cutoff of 0.05:

library(pwr)
pwr.r.test(n=25, sig.level = 0.05, power = 0.8)
 approximate correlation power calculation (arctangh transformation) 

          n = 25
          r = 0.5280313
  sig.level = 0.05
      power = 0.8
alternative = two.sided

The source you cited in comments assumes we are trying to detect a true correlation around 0.3, in which case you would need $n > 85$ [in R you would do pwr.r.test(r = 0.3, sig.level = 0.05, power = 0.8); the author provides a link so you can do your own calculations]

Low-powered experimental designs are bad because (1) they waste time, money, and effort; (2) they provide a strong motivation for questionable research practices (you can look up terms like p-hacking); (3) in conjunction with a statistical significance filter, they lead to inflated estimates of effect size.

With MANY thanks to you and @Dave. This has been rich learning. Thank you. — ND_Coder, Jul 09 '23 at 23:31
"That is, if your experimental design has low power (lots of noise/small sample size), and you only pay attention to statistically significant results, then you will generally overestimate the effect size (i.e., your correlation may be an overestimate)." Answers here get into this in more detail. $//$ $+1$ — Dave, Jul 10 '23 at 11:21

Is it wrong to trust Pearson Correlation output of r=.830 (sig at 0.01 2-tailed) with n=15?

1 Answers1