0

I was trying to show that in polynomial regression, the model overfits the data when the degree of polynomial $k$ increases. To demonstrate this, I had 30 2D datapoints and $k = 1,\dots,18$. I plotted the MSE training error against $k$.

However the plot shows that the training error kept decreasing until $k=10$, after which point the error rate goes up and down randomly.

From this answer I realise the problem might be that my design matrix is ill-conditioned so the solution becomes numerically unstable.

How can I get around this problem?

Thanks.

  • 1
    Have you used orthogonal polynomials? – Dave Nov 06 '19 at 16:43
  • @Dave Hi, I haven't used them. In fact I don't think I learned that in class. – FrankieYin Nov 06 '19 at 16:49
  • 1
    I found two links on CV: https://stats.stackexchange.com/questions/241703/orthogonal-polynomials-for-regression and https://stats.stackexchange.com/questions/258307/raw-or-orthogonal-polynomial-regression. My suspicion is that the issue is numerical instability. – Dave Nov 06 '19 at 17:09
  • That's a truly eluding issue that is not directly obvious unless you delve into the hairy math behind these models. The ill-conditioned matrix is a suspect, since you could end up with more parameters than observations if you are using interaction parameters too (i.e. $x^5y^5, x^4y^6, x^3y^7$ ...and so on). Notice also that more parameters does not necessarily mean higher marginal likelihood, which will result in suddenly underfitting the data after some point. I explained this a bit in the following answer > answer – Ammar Rashed Jun 03 '22 at 00:05

0 Answers0