5

Polynomial regression fits a non-linear model to the data. But as a statistical estimation problem it's still linear in the sense that the regression function $h\left(\Theta, X\right)$ is linear in the unknown parameters $\Theta$.

When we use polynomial regression we actually give our linear model additional features like $X^2$ or $XY$. But with the same success you can give your model features like $\log\left(X\right)$ or $\exp\left(X\right)$, and after that apply least squares. So you can fit any kind of curvature to your data.

My question is: Why does non-linear regression assume a more general hypothesis space of functions - one that encompasses the hypothesis space of functions that you can get with linear regression? I mean why do we think that non-linear regression can fit more types of curvatures to the data than linear regression if linear regression itself (e.g. with polynomial or logarithmic features) can fit any curvature to the data?

mathgeek
  • 541
  • 2
  • 10

1 Answers1

7
  1. Model Parsimony

If you have a sine curve, you can approximate it to arbitrary accuracy with its series expansion.

I’d probably rather estimate the two parameters of $\mathbb E[y]= A\sin(Bx)$ than the many parameters in a long series expansion.

Note that, because the $B$ is inside the nonlinear sine function, you cannot create the estimated-frequency sine curve with a sine basis function; you would have to pick a $B$, rather than estimate it from the data.

  1. Interpretation

Parameters in the nonlinear equation can have interpretations of interest. In the above equation, $A$ is the amplitude and $B$ relates to the frequency. Perhaps you can wrestle with a long polynomial that approximates the sine curve in order to get at frequency and amplitude, but they are immediate from the nonlinear equation.

Dave
  • 62,186
  • @mathgeek: your question can be generalized even further: why use a non-linear function when any function can be approximated by a neural net ( proof is by Hornik et al in early 90's. I forget journal title ). So, why aren't neural nets used instead of non-linear functions ? Dave's answer still applies to this case, particularly the second part about interpretation. – mlofton Oct 02 '21 at 18:58
  • @Dave, Thank you for the answer! You stated a couple of advantages of nonlinear model, but I still can't see where is the main difference between the two with respect to the fitted curvature. How comes that there are more types of curvatures that a non-linear model can fit compared to types of curvatures that a linear model can fit? – mathgeek Oct 02 '21 at 19:02
  • @mlofton 1) If you mean the universal approximation theorem, it does not address all functions. Sycorax and I commented about this yesterday, I thought in response to something you posted. 2) Neural networks are nonlinear regression models. (Okay, yes, one could see a linear regression as a neural network with no hidden layer. Let’s say that we want a hidden layer with a nonlinear activation function.) – Dave Oct 02 '21 at 19:02
  • 3
    @mathgeek If you didn’t know the frequency $(B)$ of the sine curve, how would you fit to that with a linear equation? – Dave Oct 02 '21 at 19:04
  • @Dave, with Taylor series for example. We can fit a polynomial to any function. Just use a linear regression with polynomial features and you don't actually need to know that frequency $B$. – mathgeek Oct 02 '21 at 19:09
  • 2
    @mathgeek How many polynomial terms do you need to get the Taylor series to equal the sine curve? Because of the potential to do series expansions, you are correct that linear regressions with nonlinear basis functions are going to be possible and will be able to approximate an awful lot of curves, but if you know (perhaps from your knowledge of the physics of how springs work) to expect $A\sin(Bx)$, then you might prefer to fit just those two parameters. – Dave Oct 02 '21 at 19:13
  • @Dave, In fact infinitely many. But forget about polynomial. Just give you linear model a single additional feature $\sin\left(X\right)$. – mathgeek Oct 02 '21 at 19:17
  • 2
    I want you to run a simulation in the software of your choosing where you simulate a relationship $y=\sin(2x)$ and then fit a linear regression with that nonlinear basis function $\sin(X)$. Plot the real scatter plot along with the predicted curve. Are you happy with the performance? – Dave Oct 02 '21 at 19:19
  • @Dave, I think I got it. You can use linear regression for data that looks like $\sin(X)$, but you actually can't use linear regression for data that looks like $\sin(BX)$, because in this case your coefficient $B$ is no longer linear, and no linear model is able to approximate it well, right? – mathgeek Oct 02 '21 at 19:20
  • 2
    You might approximate it well if you take five or five-thousand terms of the Taylor expansion, but that puts you at risk of overfitting, far more than the two parameters of $A\sin(Bx)$ (if you know to expect a sine curve). Further, if you don’t restrict the Taylor expansion to be a polynomial regression on the appropriate monomial terms (sine and cosine take alternating terms of the exponential expansion, right?), then you would needlessly fit parameters that should be zero. – Dave Oct 02 '21 at 19:26
  • @Dave, Yes, I got it! Accept of reject this statement for me to check my understanding, please: "If your data looks like $\sin(2X)$, then there is no linear model that's going to generalize well. A linear model is always going to just approximate the given data but not capture the actual trend", right?. – mathgeek Oct 02 '21 at 19:36