1

The simple linear regression model is given by $y_i = \beta_0 + \beta_1x_1 + e$

It is my understanding that it can be rewritten in matrix vector form as $\vec{y} = X\vec{\beta} + \vec{e}$ where $X$ is the design matrix and $\vec{y}, \vec{\beta}$ and $\vec{e}$ are vectors of the observed $y_i$, true coefficients $\beta_0, \beta_1$, and $\vec{e}$ is the irreducible error $e_i$ for every $y_i$. Thus we have two $n$ x $1$ (for $i = 1, ..., n$ observations) vectors, one $2$ x $1$ vector $\vec{\beta}$, and a $n$ x $2$ matrix $X$.

By definition a matrix is a transformation on a vector. And since the linear model can be written as a matrix vector product, are we transforming the coefficient vector $\vec{\beta}$ to a new vector (which I would think would be the best fit line in the linear model)? I'd like to properly understand what is occurring here.

1 Answers1

1

As mentioned in one of my old posts, in linear regression model, $\mathbf y=\mathbf X\boldsymbol\beta+\boldsymbol\varepsilon,~\mathbf y\ne \mathbf X\boldsymbol\beta^\star$ for some $\boldsymbol\beta^\star\in\mathbb R^p$ that is, $\mathbf y\notin \mathcal C(\mathbf X), $ as $\mathbf y$ is not realized in the column space of $\mathbf X. $ This means the system $\mathbf y= \mathbf X\boldsymbol\beta^\star$ is not solvable.

What we can do is to find $\mathbf X\hat{\boldsymbol\beta}=:\hat{\mathbf y}\in\mathcal C(\mathbf X) $ such that it has the closest square distance to $\mathbf y\notin \mathcal C(\mathbf X).$

As explained in my post and shown in the snapshot:

enter image description here

We would minimize $\rm AB$ when we resort to $\hat{\boldsymbol{\theta}}$ ($\hat{\mathbf y}$ above) and that is precisely when $$\left(\mathbf Y-\hat{\boldsymbol\theta}\right) \perp \Omega:= \mathcal C(\mathbf X).$$

And such $\hat{\boldsymbol{ \theta}}$ does exist, as shown in the former post.

User1865345
  • 8,202