0

Working in health science research. We have generated correlations between two variables. This is the amount of two proteins in the system. I wanted to make a straight line with slope, but the values of the straight lines are not consistent with the R values of the Spearman correlation test.

Indeed, the R is negative while the slope of the line is positive. Or vice versa. Or two lines are similar, but the R is totally different. It cannot be published in a newspaper. I then wonder how to represent the correlation between the two variables on the graphs and see the evolution of the point cloud?

(1 and 2 are two different populations)

Thanks for your advices !

enter image description here

enter image description here


Whuber: yes sorry, I expressed myself badly. We performed Spearman correlations, and we wanted to visually represent the trend of the data with a straight line. (not necessarily a non-linear line of regression). It does not matter whether the correlation is good, significant or not. Just show at a glance if there is a correlation or not.

I had in mind the correlation could be represented by a straight line. And the values of the correlation test made it possible to show scientifically that this is valid.

But with Alexis, I understand that I can't do that. And these graphs prove it, where I have negative R's with positive slopes.

I will try in log.

Is there a visual method to represent a correlation on this type of graph? Because in life sciences, a representation of the genre is often more convincing for scientific journals.

Afterwards, I am not a mathematician or a statistician, so there are a lot of gaps. And I work on the GraphPad prism software, which may have some limitations.

chrishmorris: Indeed, we have an R and not an R².

Here is the type of data we have with a spearman correlation test in the GraphPad Prism software.

enter image description here

  • 4
    Please explain the sense in which a "straight line" might be considered "nonlinear regression." That reads like a direct contradiction. Why are you mentioning Spearman correlation, which otherwise appears unrelated to the rest of the question? – whuber Jan 13 '23 at 16:36
  • Possibly related: https://stats.stackexchange.com/questions/64938/if-linear-regression-is-related-to-pearsons-correlation-are-there-any-regressi – COOLSerdash Jan 13 '23 at 16:38
  • 5
    Three quite different comments: 0. You seem to have 3 variables here 1. The graph to make sense of Spearman correlations is a plot of the ranks of two variables. 2. Regardless of P-values straight lines aren't very convincing Perhaps it would help a bit to work with log Y, depending partly on what makes sense scientifically. – Nick Cox Jan 13 '23 at 16:44
  • 5
    I want to link whuber and @NickCox's comments: While you can graph straight lines on the ranks of your data, you cannot validly represent monotonic association (i.e. what Spearman's $r_{\text{S}}$ measures) with (only) a straight line on your data (not ranks), since the are many nonlinear curves which are monotonic functions of two variables. – Alexis Jan 13 '23 at 17:17
  • Is the r presented Spearman's rho ? – Sal Mangiafico Jan 13 '23 at 20:38
  • The answers are pretty much in these comments: Spearman correlation and linear regression aren't the same thing, so it's not surprising that the results are different. You could rank transform both variables, plot them that way, and then Pearson correlation, Spearman correlation, and linear regression will all be consistent. ... But given the bivarate distributions of your observations, you might consider a different approach. Perhaps nonparametric regression or transforming the observations. ... Ultimately, it really depends on what you want to know. – Sal Mangiafico Jan 16 '23 at 15:15
  • Hello, Thank you for your answers. Indeed, I had misunderstood and I realize that the two tests are different!

    Nevertheless, I am looking for the best test to apply to my data. I want to know:

    1. If X and Y are related. So show if X increases Y also increases (or vice versa) or show that this is not true.
    2. know if the two groups (red and blue) are different from each other.

    For the 1 I thought to use the correlation test For the 2 I thought to use the simple line of regression and test the difference between two lines. But maybe other tests are more suitable?

    – jujutou Jan 30 '23 at 14:54
  • It might depend on what you mean by "different". – Sal Mangiafico Jan 30 '23 at 15:38
  • Usually when addressing this kind of question, you would use what might be called analysis of covariance, where the data for the two groups are put in a single model, and then you can test for differences for the slopes and for the intercepts. ... There may be a nonparametric model that achieves this. – Sal Mangiafico Jan 30 '23 at 15:38
  • My two groups (red and blue) are healthy or sick patients. So know if there is a significant difference in relation between X and Y. For example the red graph 1 seems to indicate that when X is large, Y is small. Which is not the case with Blue. Is this significantly true? What is the appropriate test? – jujutou Jan 30 '23 at 16:55
  • I use the GraphPad Prism software, I used the "Test whether slopes and intercepts are significantly different" function. And here is the result for the first graph: [Are the slopes equal? F = 3.493. DFn = 1, DFd = 199 P=0.0631 - If the overall slopes were identical, there is a 6.309% chance of randomly choosing data points with slopes this different. You can conclude that the differences between the slopes are not quite significant. Since the slopes are not significantly different, it is possible to calculate one slope for all the data. The pooled slope equals -0.1368.] Is this the right test? – jujutou Jan 30 '23 at 16:57
  • On the internet, they seem to say that this test is equivalent to an analysis of covariance – jujutou Jan 30 '23 at 16:58

1 Answers1

-2

It is only with the blue points on the second graph that you have a model - an R value of 0.49 shows that there is a real correlation here, albeit not a very strong one.

A negative value for R-squared shows no correlation - it means that your model is a worse predictor than always predicting the mean value of Y.

Looking at your first graph, this is not surprising - neither the blue points nor the red points seem to show any correlation between X and Y. However, it is conspicuous that for both variables low values are more common than high values. It is worthwhile to try a data transformation to mitigate this and try again to find a correlation (e.g. square root).

On the second graph, with the red points, X is at best a weak predictor of Y. If there is any causal mechanism at all there would seem to be two subpopulations, one with low Y values positively correlated with X and a smaller one with large Y values negatively correlated with X.

chrishmorris
  • 1,780