4

I have used quadratic regression on a dataset to find the graph of best fit, that is, finding the coefficients a, b and c in the general formula of y = ax^2 + bx + c.

Having done that I would now like to find the correlation coefficient of that fit to the data. Can anybody help with either the formula for the correlation coefficient or the coefficient of determination for a quadratic?

whuber
  • 322,774
havelly
  • 41
  • 1
    This is just multiple regression on the variables $x$ and $x^2,$ so go ahead and use the standard formulas. – whuber Mar 26 '19 at 13:27
  • Welcome to CV havelly. It is somewhat unclear what you mean by correlation coefficient for a quadratic. Pearson's correlation coefficient assumes a linear, not quadratic relationship between $x$ and $y$. Spearman's correlation coefficient assumes a monotonic relationship between $x$ and $y$. I assume you are not asking about Pearson's (or Spearman's) correlation coefficient between $x^{2}$ and $y$, since that seems very obvious. – Alexis Mar 26 '19 at 16:14
  • Thank you for the welcome. Perhaps I will re-frame the question as it can be confusing. I have used quadratic regression on a dataset with two variables and from that the a, b and c coefficients have been determined so I have an equation like y = 5x^2 + 2x + 7. Now this is not a perfect match to the data, that is, the graph does not exactly go through all the data points but will be fairly close to them. How can I now calculate the correlation coefficient for this quadratic equation to the dataset? – havelly Mar 27 '19 at 02:01
  • Nope. That does not clarify. See my first comment. – Alexis Mar 27 '19 at 15:16
  • It's plausible you're looking for the correlation coefficient between the fitted values and the responses. This is closely related to $R^2,$ the so-called "coefficient of determination." See, for instance, https://stats.stackexchange.com/questions/36064 for a formula. – whuber Apr 19 '22 at 22:07

2 Answers2

2

I notice that for these sort of questions there is always a lot of pedantry in the community about the use of the term "correlation". Us non-statisticians use the term to generally mean "relationship", but some people might not get that. So like others have told you, you can't compute the correlation coefficient for a non-linear relationship such as a quadratic relationship. However, you can measure the Root Mean Squared Error and Adjusted R-squared, which will tell you about the "goodness of fit" of your model. You can also do an F-test, which will tell you how much better your model is compared to a degenerate model consisting of only a constant term. All of these measures can be computed in Matlab using the function fitnlm. I know it's been a while since this question was posted so you probably figured this out, but this could still help others. Best of luck.

  • 3
    I would like to suggest that you might see "pedantry" where others who see multiple possible meanings seek enough clarity to meet this site's requirements for posing objectively answerable, mutually understood questions. The distinction is perhaps due to relative levels of experience, because with more experience comes the awareness of alternative interpretations: but please do not mistake the comments of experienced statisticians for the nit-picking of pedants unless you have good evidence. (cc @Alexis) – whuber Sep 29 '20 at 21:18
  • 3
    found the statistician :-) – MilesWinter Nov 06 '20 at 12:44
  • Isn't transforming to quadratic function values and then derive a correlation coefficient efficient enough? – thistleknot Jul 31 '21 at 15:41
0

Thank you for the welcome. Perhaps I will re-frame the question as it can be confusing. I have used quadratic regression on a dataset with two variables and from that the a, b and c coefficients have been determined so I have an equation like y = 5x^2 + 2x + 7. Now this is not a perfect match to the data, that is, the graph does not exactly go through all the data points but will be fairly close to them. How can I now calculate the correlation coefficient for this quadratic equation to the dataset?

This is exactly what the $R^2$ will tell you! If you do a simple linear regression, $R^2$ is equal to the squared correlation, so this is highly analogous.

When you do the parabolic fit, particularly if your parabola is symmetric, you lose the usual meaning of the sign. I have not seen this done, but if you want to take the square root of $R^2$ and then give it a sign according to the sign of the parameter on the $x^2$ to indicate if the "quadratic correlation" corresponds to an upward- or downward-opening parabola, that might make sense. Since this is not standard, however, please do define what you're doing.

$R^2$, however, will be what gives you some sense of how tight the fit is to the parabola of best fit, much as $R^2$ in simple linear regression tells you how tight the fit is to the parabola of best fit.

Dave
  • 62,186