1

I have fitted a polynomial regression (4 degree model) to describe a non-linear relationship between my two variables. My question is why does this model begin to decrease towards the right hand side of the plot, from x = 39 onwards, despite the only available data point suggesting that the line should continue to increase?

I have read previously, in the context of researching and interpreting GAMs, that some types of smooths/splines are more unreliable towards the upper and lower bounds of the fitted data. Unfortunately, I was unable to find the reference where I previously read this. I am wondering if this is also characteristic of polynomial models - they are known to be unreliable towards the extremes of the fitted data. Based on my data there seems to be no valid reason why the curve should begin to decrease at the upper end of the data distribution, hence I believe this decrease to be an artefact of using the polynomial model and have no real basis. I would appreciate any explanation or confirmation for this observation.

plot

  • 1
    Can you edit your post to include the data you fitted your model to, and ideally also the code you used in fitting? – Stephan Kolassa Jan 29 '23 at 14:23
  • 3
    Does this answer your question? Why is the use of high order polynomials for regression discouraged? In particular, from Frank Harrell's answer on that page: "The shape of the fit in one region of the data is influenced by far away points." – EdM Jan 29 '23 at 16:45
  • 1
    High-degree polynomial models are unreliable everywhere. Degree 4 isn't too bad, but it's already concerning unless you have fairly densely sampled data. I can't see how you can support your claim that "the only available data" points towards a decrease: the second-to-last data point shows a sizable drop from 5 to about 4.3 in height. You have no basis to let the last data point, at a height of 5.2, wholly determine where the curve should go. Together these last two points tell you to be highly uncertain about the polynomial shape here, because polynomials are just too flexible. – whuber Jan 29 '23 at 19:02
  • I mostly agree with @whuber but I would say polynomials are too inflexible. They make a global fit to all the datapoints at once. If the steep downward curve on the far left of your plot is achieved by a large negative $\hat\beta$ for the $x^2$ term, for instance, then that same parabola has to curve downward somewhere on the far right too. If instead you wanted to fit a model that does go up past $x=39$, a piecewise polynomial such as a spline may be able to fit local behavior on the left half and right half more independently than a single polynomial curve can. – civilstat Jan 30 '23 at 13:54

0 Answers0