Polynomial regression has good Mean Squared Error but poor prediction on unseen data

Question

In a dataset, the unseen target value is 2500000000. In a polynomial regression with degrees from 1 through 7, I have the following results:

Degree: 1, Mean Squared Error: 410983500.2950994, Predicted: -81178512721.65013

Degree: 2, Mean Squared Error: 5829093.43611284, Predicted: -81178512721.65013

Degree: 3, Mean Squared Error: 0.006954321149912846, Predicted: -81178512721.65013

Degree: 4, Mean Squared Error: 0.9152467870231109, Predicted: -81178512721.65013

Degree: 5, Mean Squared Error: 19.06090147224352, Predicted: -81178512721.65013

Degree: 6, Mean Squared Error: 8651.727673046189, Predicted: -81178512721.65013

Degree: 7, Mean Squared Error: 78583221.21056704, Predicted: -81178512721.65013

The polynomial regression has the lowest error when degree = 3. But the predicted value is too erroneous. Moreover, in all the degrees, the predicted values remain same and negative. Can anybody throw light on this?

high degree polynomial regression without regularization is awful :) This has been addressed in some other questions, like this one: https://stats.stackexchange.com/questions/549012/why-is-the-use-of-high-order-polynomials-for-regression-discouraged — John Madden, Oct 31 '22 at 14:38
Can the data be made available? This apparent numeric instability can probably be fixed with appropriate centering and scaling of the predictor variable. — JimB, Oct 31 '22 at 14:49
@JohnMadden I went to the link you mentioned and found Regression Splines as a solution. I went through an article at https://www.analyticsvidhya.com/blog/2018/03/introduction-regression-splines-python-codes/ and found useful. Thank you. — PS Nayak, Oct 31 '22 at 16:49
@JimB Sorry, the data cannot be made public for proprietary rights. I'll try your method. Thank you. — PS Nayak, Oct 31 '22 at 16:51
The real fix is in the link of @JohnMadden 's comment. But if you really need polynomials, then centering and scaling should minimize numerical stability issues. Using orthogonal polynomials will also minimize numerical stability issues. — JimB, Oct 31 '22 at 17:07
@JimB I do not specifically need polynomials, rather best fit. For that, I'll try all the suggestions mentioned. — PS Nayak, Oct 31 '22 at 17:24
"Best fit" means nothing until you describe the space of possible functions you will permit to fit the data. Otherwise there is a bewilderingly large and varied set of optimal fits. — whuber, Nov 01 '22 at 11:33

Polynomial regression has good Mean Squared Error but poor prediction on unseen data

0 Answers0