I'm using sklearn to calculate the coefficient of determination between X (true age) and Y (predicted age). But I'm getting two different values for two different methods, which to the best of my understanding should be identical.
Here is the data with a line fit. (Not a great fit, I know, but I'm working on the pipeline before I work on the model architecture)
Then, trying to calculate the R2 value, checking two different ways. Am I doing something wrong?
>>> Xs
array([[6.80000000e+001],
[7.50000000e+001],
[5.90000000e+001],
...,
[1.15882924e-310],
[1.15882924e-310],
[1.15882924e-310]])
>>> Ys
array([[58.503006],
[67.75964 ],
[63.875973],
...,
[67.37394 ],
[67.37394 ],
[67.37394 ]])
>>> regressor = LinearRegression()
>>> regressor.fit(Xs, Ys)
LinearRegression()
>>> regressor.score(Xs, Ys)
0.006946203557267383
>>> r2_score(Xs, Ys)
-0.16379061117029314

scorecalculates the predictions" ... it's using the fitted line to essentially calculate the Y values of points on that line, then comparing those to the second parameter (ys) by ... what, ANOTHER fitted line? I think I'm getting confused.But I suspect I need to use r2_score anyway, because the Xs I have are real observations, and the Ys are predicted values from a RNN model
– reas0n Jan 21 '24 at 06:47