Why the blue line of a regression model is not a linear line?

Question

For the following regression model, $$ y=\hat{\beta}_0+\hat{\beta}_1\cdot|m(a)-m(b)|+\sum_{k=0}^{12}\hat{\beta}_{k} 1[|m(a)-m(b)=k|] $$ The fitting plot not a linear.

Why the blue line (that is fitted using a regression model that assumes a linear trend) is not a linear line?

The blue line obviously is not a linear function of the horizontal axis. Please explain to us what your notation means. What are "$m(a)$" and "$m(b)$"?? — whuber, Jul 23 '22 at 02:45
Also, what is $y$ doing in there? Can you edit your post to include the parameter estimates? The part on the right, with the indicator function, seems to fit six-month cycles (since $k$ goes up to $6$), but with the absolute values in there, it may also yield the 12-month periodicity of the blue line. Also, why does your question carry the [tag:logistic] tag? Am I misunderstanding something? — Stephan Kolassa, Jul 23 '22 at 07:12
Too much information is omitted to make sense of this. Please give more context. — Glen_b, Jul 23 '22 at 09:19
It may help you to read my answer to Why is polynomial regression considered a special case of multiple linear regression? — gung - Reinstate Monica, Jul 23 '22 at 16:55

score 1 · Accepted Answer · answered Jul 23 '22 at 11:24

"Linear" regression estimate coefficients in order to explain the response of a variable $Y$ to changes in variables $X_k$ using a linear equation of the form $Y=X\beta$. Ex:

$$y_i = \beta_0 + \beta_1 x_{1, i} + \beta_2 x_{2, i} + \epsilon_i$$

"Linear" refers to the fact that $\mathbb{E}(y_i)$ is defined as a linear combination of the parameters $\beta$, not necessarily $X$, which could be modified depending on your objectives/interpretations of the data. Indeed, if some variables $X_k$ are "transformed" variables derived from your actual variable of interest, then the responses will not be linear, but you are still using a linear regression model. Ex:

\begin{split} y_i &= \beta_0 + \beta_1 x_{1, i} + \beta_2 x_{2, i} + \epsilon_i\\ &= \beta_0 + \beta_1 t_{i}^2 + \beta_2 \sqrt{t_{i}} + \epsilon_i \end{split}

With the equation above, $\mathbb{E}(y_i)$ is defined as a linear combination of parameters $\beta$ and of transformed variables $X_k$. But it is not a linear combination of the actual variable of interest, time $t$, which has a non-linear impact on your dependent variable $Y$.

This is what is happening in your case, where your variables $X_k$ are non-linear functions of time $t$, since you introduced absolute values of time differences and binary variables depending on the time of each observation.

Thanks! In Wiki, is says "Linearity. This means that the mean of the response variable is a linear combination of the parameters (regression coefficients) and the predictor variables." in https://en.wikipedia.org/wiki/Linear_regression. What do you mean $E(y_i)$ is a linear combination of $\beta$ and $X$? — Hermi, Jul 23 '22 at 14:36
You're welcome!
Indeed, but in my second equation, the predictors of the linear regression are the variables $X_k$. So these 2 sentences have the same meaning. — FP0, Jul 23 '22 at 15:31

Why the blue line of a regression model is not a linear line?

1 Answers1