4

I have been learning about standard methods in Statistics such as the Pearson's Correlation Coefficient, Spearman's Correlation and Kendall's Tau.

My understanding of this so far is that:

  • Pearson Correlation Coefficient measures the linear correlation between two sets of data

  • Spearman's Correlation measures the "monocity" between two sets of data (e.g. do they both increase and decrease at the same time?)

  • Kendall's Tau measures the ordinal association between two sets of data - supposedly Kendall's Tau is similar to the Spearman Correlation, but Kendall's Tau has a more logical confidence intervals.

I had the following question - can any of these methods be used for measuring a specific form of "Non Linear Correlation" between two sets of data?

For example - suppose I want to see how strongly two sets of data are correlated relative to a "second order curve" :

enter image description here

Is there something that could measure the "curved correlation"?

The two ideas I came up with:

  • Try to use some data transformations (e.g. Log) to transform one of the variables into a more linear pattern that will make it suitable for one of the above measures

  • Fit a polynomial regression model (of order 2) to this data and measure the MSE

But I am not sure if either of these approaches are suitable.

User1865345
  • 8,202
stats_noob
  • 1
  • 3
  • 32
  • 105
  • Interesting question. Some of the trouble of defining a curved correlation will be deciding on what kind of curvature you want to measure. After all, a logarithm-type of graph has different curvature than a quadratic. Further, determining the sign will be challenging, since many curves (such as quadratics) allow for increasing and decreasing sections. I’ve wondered if the concavity of a parabola (up-opening vs down-opening) could be used for this, but parabolas are just one type of curve. (Maybe you can do this if you restrict to convex or concave functions.) – Dave Nov 09 '22 at 06:59
  • 2
    (1) What do you mean by "measuring"? If you want a measure of the "strength" of such a correlation, then you could indeed run a polynomial regression and report the MSE. Possibly cross-validated, otherwise if you re-ran this for higher order polynomials, you would "find" that the "second-order correlation" is smaller than the "third-order correlation" and so on. Conversely, if you want to do statistical inference, the null and alternative hypotheses will need some thinking about - are $x$ and $x^3$ for $-1<x<1$ "significantly second order correlated"? ... – Stephan Kolassa Nov 09 '22 at 07:41
  • 2
    ... (2) Especially for inference, the question comes up whether you want to test a specific polynomial correlation, or a general second-order polynomial, or a general polynomial of up to second order. Perhaps you could explain what you want to do with such a nonlinear correlation? – Stephan Kolassa Nov 09 '22 at 07:42
  • 1
    Another way to consider Stephan's comments is that every regression you could estimate for the two variables in your plot is, in a sense, a correlation measurement. Testing and comparing arbitrarily many regressions has problems with false discovery and statistical validity, so "just try stuff" isn't a great way to go about it: you need to be specific about what questions you want to ask your data & how you want to ask it. The plot you show is roughly monotonic & Spearman's correlation would characterize the extent. Lots of nonlinear functions are monotonic, so Spearman's is an answer. – Sycorax Nov 10 '22 at 03:28

1 Answers1

1

You may be interested in distance correlation.

distance correlation measures both linear and nonlinear association between two random variables or random vectors. This is in contrast to Pearson's correlation, which can only detect a linear association between two random variables.

https://en.wikipedia.org/wiki/Distance_correlation