Why is a linear regression not linear when you plot it?

Question

I can't find a proper explanation for my question on Cross Validated. The closest explanation was this one from Medium, but still, I don't see the difference visually among the four cases in that explanation. So here we are.

I have this df.head() with two plots where:

Y: The column Y
Y_predicted: It's the output of the linear regression (see code below for details)
error: Y-Y_predicted

The code of the linear regression:

model = LinearRegression().fit(X, y)
y_pred = pd.Series(
        model.predict(X),
        index=X.index,
        name='Fitted')
error = (y-y_pred).rename('Error')

I have always been taught that linear regression is linear, and I don't see a linear prediction here. I just can't understand why. Why is it not linear if it's a linear regression?

I have been playing around with this linear regression and as I add more features, the more complex it becomes (in other words, more "curved" is the linear regression), but still it's not linear in the plot. I have been trying also to get the linear equation from this model from sklearn.linear_model.LinearRegression, but it seems that it's only possible to get the intercepts. And with only the intercepts I can't see how the equation changes as I change the features. So I have two questions:

Why is this linear regression not linear?
What is the explanation, at least visually, for the differences among the four equations in the explanation linked? Specifically the differences among the equations below. Are all of them are linear?

(1) Y = a + bx
(2) Y = a+bx+cx^2
(3) Y = a+(b^2)X
(4) Y =a +(b^2)X+cx^2

When I used one feature (const) in X, it was a straight line with slope = 0. When I used const and trend it was a straight line with a slope!=0. Then I added one column of the Fourier series, and the straight line changed to be a curved one. Does this mean that, if I plot it in 3D, it would still be a straight line? (I don't imagine how.)

Linear regression is linear in its parameters. Many non-linear functions can be expressed as linear combinations of non-linear basis functions. — Demetri Pananos, Aug 26 '22 at 20:49
(2) is fully answered at https://stats.stackexchange.com/questions/148638. Once you understand that, the answers to (1) and (3) follow easily. — whuber, Aug 26 '22 at 21:44

Tim · Answer 1 · 2022-08-26T21:57:46.610

Linear regression is "linear" in the sense of modeling the data with a linear function, i.e.

$$ f(x) = a + b x $$

If you put a sinusoid in place of $x$, after multiplying it by a number $b$ and adding a number $a$, it will still be a sinusoid. If you create a new feature $x' = x^2$, $a + bx'$ is still a linear function of the feature $x'$. The same applies to Fourier transformations or any other transformations of the features. Linear regression is not about a straight line, but a linear function.

You are saying that on the plot of $x$ (time) against $y$ you would expect to see the “straight line” , $a + bx$, but in fact you have the

$$ f(x) = a + bx + c \sin x + d \cos x $$

(simplified) that is still linear in parameters (use imagination to replace $x’ = \sin x$, etc and the function is again linear), but has more than two dimensions and clearly is not a “straight line” in the $y$ vs $x$ dimensions.

score 3 · Answer 2 · answered Aug 26 '22 at 21:54

Note that in more than two dimensions your regression should give you a plane or a hyperplane rather than a line. If you have two $x$-variables, indeed you will have all predicted $y$-points on a plane, which is a 2-d hyperplane in 3-d; with 3 x-variables it'll be a 3-d hyperplane in 4-d. If you have more than one x-variable, a 2-d plot in one $x$-variable will not look linear, at least not if the connection of the $x$-variables is nonlinear; deviations from linearity are explained by the impact of the other variable(s).

This should address questions 1 and 3.

Question 2: (1) $y$ is modelled as linear function of $x$. (2) $y$ is modelled as quadratic function in $x$ - note that this means that $y$ is (and will look) nonlinear in $x$, however it is linear in 3-d space with the two variables $x$ and $x^2$. (3) Assuming that you mean the same thing by $X$ and $x$, $y$ is just linear in $x$, however the regression slope is $d=b^2$ rather than $b$. The only thing that this changes compared to (1) is that you have now called $b$ the square root of the slope, rather than the regression slope itself. Same (4) compared to (2).

Why is a linear regression not linear when you plot it?

2 Answers2