Problem with R-Squared value

Question

I have a problem to determine my R-Squared value. I do a polynomial regression:

fit3 <- lm(value ~ date + I(date^2)+ I(date^3),data=training)

I have a R-Squared value (0.9416) when I do

summary(fit3)

But when I try to compute it in the testing dataset my R-Squared value is -84.20259. I don't understand why because when I plot it the results look goods.

I use 2 methods.

First one

pred.lin <- predict(fit3, newdata=testing) actual <- testing$value SS.total <- sum((actual - mean(actual))^2) SS.residual <- sum((actual - pred.lin)^2) SS.regression <- sum((actual - mean(actual))^2)

test.rsq <- 1 - SS.residual/SS.total
test.rsq
And the second method is:

1 - sum((actual-pred.lin)^2)/sum((actual-mean(actual))^2)

Here the graph the points are the real data and the curve the model. The blue points are the testing dataset.

Can you help please? I am new in this area :)

The term "testing $R^2$" is rarely heard of. Why don't use the mean prediction error? — Zhanxiong, Jul 24 '15 at 13:45
The crux of the problem is that polynomial regression is not an appropriate procedure for these data. This could be determined from the training data alone by (say) applying a goodness of fit test. — whuber, Jul 24 '15 at 14:34
Even if we forget that this is probably about times series data, polynomial regression is a very bad choice if the aim is prediction outside the domain of your predictor. — Michael M, Jul 24 '15 at 14:39
@Zhanxiong , how do you compute this R the mean prediction error? — Gigi, Jul 24 '15 at 14:53
@whuber, I look only about goodness of fit test I find Chi-Squared Goodness-of-Fit Test but I don't get how to perform it here... — Gigi, Jul 24 '15 at 14:54
@MichaelM, yes I am currently on my thesis about "A study of data mining approaches to Computational Finance". I use different machine learning techniques to compare the prediction performance accuracy. — Gigi, Jul 24 '15 at 14:56
Issue the command plot(fit3) and study the four graphs it produces. — whuber, Jul 24 '15 at 14:58

score 1 · Answer 1 · answered Mar 25 '23 at 21:10

1

You say the results look good. Why? It looks like every prediction on the red line is higher than the blue point you want to predict. Consequently, I would say that you are doing a rather poor job of predicting those points, which is totally consistent with $R^2<0$.

answered Mar 25 '23 at 21:10

Dave

62,186

Because the out-of-sample data sure do not look to be selected at random, the means of the in-sample and out-of-sample observations are not equal or even approximately equal. Consequently, if you use the out-of-sample $R^2$ that makes sense to me, you may find that your value increases quite dramatically. However, you seem to be comparing to a model that predicts the out-of-sample mean every time, and a horizontal line midway through the blue points sure seems to be a better predictor of the blue points than the red curve is. – Dave Mar 25 '23 at 21:17

Problem with R-Squared value

1 Answers1