sklearn.metrics.r2_score vs sklearn.LinearRegression.score

Question

I'm using sklearn to calculate the coefficient of determination between X (true age) and Y (predicted age). But I'm getting two different values for two different methods, which to the best of my understanding should be identical.

Here is the data with a line fit. (Not a great fit, I know, but I'm working on the pipeline before I work on the model architecture)

Then, trying to calculate the R2 value, checking two different ways. Am I doing something wrong?

>>> Xs
array([[6.80000000e+001],
       [7.50000000e+001],
       [5.90000000e+001],
       ...,
       [1.15882924e-310],
       [1.15882924e-310],
       [1.15882924e-310]])
>>> Ys
array([[58.503006],
       [67.75964 ],
       [63.875973],
       ...,
       [67.37394 ],
       [67.37394 ],
       [67.37394 ]])
>>> regressor = LinearRegression()
>>> regressor.fit(Xs, Ys)
LinearRegression()
>>> regressor.score(Xs, Ys)
0.006946203557267383
>>> r2_score(Xs, Ys)
-0.16379061117029314

Dave · Accepted Answer · 2024-01-15T23:57:13.297

1

score calculates the predictions for features Xs and then compares those predictions to the input Ys.

r2_score assumes that you have calculated the predictions. This, both the Xs and Ys explicitly make it into the calculation.

Once you have the predictions, each function seems to calculate $R^2$ the same way (and I have thoughts on that calculation), but they assume different inputs.

For a really extreme example, try fitting a model to $X=(-1,-2,-3,-4)$ and $Y=(1,2,3,4)$, and check out how score gives a perfect result of $1$ while r2_score is below zero to denote an awful fit.

edited Jan 15 '24 at 23:57

answered Jan 15 '24 at 23:42

Dave

62,186

Thanks... ok, so when you say "score calculates the predictions" ... it's using the fitted line to essentially calculate the Y values of points on that line, then comparing those to the second parameter (ys) by ... what, ANOTHER fitted line? I think I'm getting confused.
But I suspect I need to use r2_score anyway, because the Xs I have are real observations, and the Ys are predicted values from a RNN model
– reas0n Jan 21 '24 at 06:47

sklearn.metrics.r2_score vs sklearn.LinearRegression.score

1 Answers1