0

This post answers for when you fit your data with a line, but my teacher's notes at university claim that the mean of residuals is zero for all linear (in its parameters) models, such as the parabolic model, while no proof is given for this general case.

Wikipedia seems to agree that linear regression is just lines and does not include parabolas (my teacher mixes up linear and linear in its parameters...)

  • 4
    In this problem, the only nonlinear component is the squared data $x$. Regression estimates the parameters, not $x$. So a parabolic regression estimates parameters $a,b,c$ in $\hat y = ax^2 + bx + c$. This is manifestly linear in the parameters and the proof is complete. – Sycorax Apr 23 '23 at 19:16
  • @Sycorax Why does that prove the statement? Please note I am being first exposed to statistics and the proof I know for the simple linear case is elementary and relies on knowing the coefficients explicitly. – ChristmasTree Apr 23 '23 at 19:40
  • The comment is hard to understand — if you know the coefficients explicitly, what are you estimating? Anyway, linear in the parameters is what a linear model is, so the proof you already have is all the proof you need. – Sycorax Apr 23 '23 at 20:00
  • It's a question of vector algebra. As long as least squares is used, and the intercept term is in the model, then the sum of residuals is zero, mathematically, regardless of what other terms are fit. – BigBendRegion Apr 24 '23 at 11:51

1 Answers1

1

Let $x_1$ denote the raw values of your variable, and let $x_2$ be some other feature. The model below sure looks linear.

$$ y_i =\beta_0 + \beta_1x_{i1}+ \beta_2x_{i2}+\varepsilon_i $$

Indeed, such a model is linear, and the usual theorems about linear regression apply.

Next, define $x_2$ by $x_2=x_1^2$.

You now have a parabolic model written in a form that is known to satisfy the usual properties of a linear model.

Dave
  • 62,186
  • Oh, I understand. The problem is that I only knew about regression with one variable. I will get into linear regression with multiple variables to see how that goes... Thanks – ChristmasTree Apr 23 '23 at 19:46
  • @ChristmasTree glad I could help. It seems that your instructor either assumes students have seen this material before or is alluding to a future lesson on regression with multiple features (which might be part of a different course). – Dave Apr 23 '23 at 19:49