If the regressor is not linear, then is OLS not a good idea?

Question

As in the title, I just want to make sure if that's the case.

Welcome to Cross Validated! Could you please give more detail? There are many ways to incorporate nonlinearity into ordinary least squares. — Dave, Dec 08 '22 at 13:05
If "regressor" means something like "explanatory variable", what do you mean by "not linear"? — Henry, Dec 08 '22 at 13:05
$y_i=\beta_0+\beta_1 x_i+\beta_2x_i^2+\epsilon_i$ can be modelled with ordinary least squares as linear regression — Henry, Dec 08 '22 at 13:07
the problem is about finding the optimal regressor where for regression by theorem it should be the expectation of the random variable. However, that expectation is not linear and that's why I claim it's not good to use OLS. sorry, i cannot give more information. — 79999, Dec 08 '22 at 13:19
I suspect the OP is interested in linearity/nonlinearity of a model w.r.t. the regressor. Previous comments neglect an important point that OLS is an estimation technique, not a model. — Richard Hardy, Dec 08 '22 at 13:30
I agree with Richard. And it's not a polynomial model. So it is because of the nonlinearity right? — 79999, Dec 08 '22 at 13:39
“So it is because of the nonlinearity right?” I am having trouble making sense of this. Could you please clarify? In particular, what is “it”? // I’m not actually sold on OLS referring to an estimation technique rather than a model. Yes, we can estimate the coefficients of a linear model many ways and can apply minimization of square loss to estimating the coefficients of a nonlinear regression. However, the “ordinary” in OLS suggests to me an interest in a linear model whose coefficients are estimated by minimizing square loss. — Dave, Dec 08 '22 at 13:43
"where for regression by theorem it should be the expectation of the random variable" this is not clearly formulated . "that expectation is not linear" what does a non-linear expectation mean? — Sextus Empiricus, Dec 08 '22 at 14:23
OLS is an estimation technique, but it uses a model that is linear combination of one or more functions/regressors. — Sextus Empiricus, Dec 08 '22 at 14:26
@Dave, thank you for an interesting perspective. I have now asked a question about it. — Richard Hardy, Dec 08 '22 at 14:41

Shawn Hemelstrand · Answer 1 · 2022-12-08T13:55:24.727

1

The main objective of the linearity constraint is that is allows one to predict accurately. Not having this assumption met, it creates many issues, which include the violation of other important assumptions (such as normal residual variance, etc.). To give a direct practical example, here is a simulated data set with a curvilinear trend created in R. I plot the data below to show what it looks like after simulation, then fit it to an OLS regression later.

#### Simulate Data ####
set.seed(123)
x <- rnorm(n=1000)
y <- sin(x) + runif(n=1000,max=.9)
df <- data.frame(x,y)
plot(df)

This is quite artificial, but it at least gives you an idea of what a curvilinear trend looks like. If we fit an OLS regression to this data, we will see a couple issues already.

#### Fit Data ####
fit <- lm(
  y ~ x,
  data = df
)
Draw Fit Line on Plot
abline(fit,col="red")

You can see that it has fit the middle point of values perfectly, but it has missed the tails by a lot:

This isn't the most problematic fitting. Something like a parabolic curve, for example, can cause much worse problems, but we will stick with this for now. If we plot the diagnostic plots after, you will see plenty of issues with our residuals by using plot(fit) and other diagnostics.

As seen below, there is really extreme residual behavior (which means the model will poorly predict by a lot) and there are some sizeable outliers because the regression is poor at guessing. Basically, the model cannot understand what to do with the heavy tailing in the data, and predictions with this model will ultimately be bad (and this can have some serious outcomes depending on what this regression is used for).

In short, linearity is one of the crucial parts to getting an OLS regression right. There are of course nonlinear options and transformations that get around this issue, but that is another topic.

edited Dec 08 '22 at 13:55

answered Dec 08 '22 at 13:52

Shawn Hemelstrand

13,543

thanks! that helps a lot. – 79999 Dec 08 '22 at 13:55
2

I find it misleading to suggest that OLS cannot model a curving pattern. This is why, @79999, it is so important for you to clarify what you mean, since a linear model of $y=a+b\sin(x)$ should be able to model this pattern just fine. – Dave Dec 08 '22 at 13:56
That is what the last paragraph highlights though. There are of course polynomials, log transformations, etc. that get around this. – Shawn Hemelstrand Dec 08 '22 at 13:58
i thought only polynomial models are considered linear. sin/cos is not linear – 79999 Dec 08 '22 at 14:26
What do you mean with 'the linearity constraint'? Why does it allow to predict accurately? – Sextus Empiricus Dec 08 '22 at 14:28
I think I understand now where Dave doesn't agree with my answer given the title of this thread is "If the regressor is not linear, then is OLS not a good idea?" OLS can certainly be done if the data is nonlinear, but you still have to formulate the model correctly. His comment about introducing $sin$ into the regression formula expresses this explicitly...you have to model the terms in a way where prediction is legitimate. – Shawn Hemelstrand Dec 08 '22 at 14:42
@79999 there is an interesting follow up conversation going on here if you are interested in learning more: https://stats.stackexchange.com/questions/598430/does-the-use-of-ols-imply-the-model-is-linear-in-parameters?noredirect=1#comment1108052_598430 – Shawn Hemelstrand Dec 08 '22 at 15:57

If the regressor is not linear, then is OLS not a good idea?

1 Answers1

Draw Fit Line on Plot

Linked