Question: I want to clarify my understanding of OLS regression using Matrix Algebra.
Let's assume we have 2 different independent variables $x_1$ and $x_2$.
Our 'model' will be the plane that lives in $\mathbb{R^3}$ that minimises the sum of squared distances between each point on the plane corresponding to observations of our pair of independent variable points $x_{1i}$ & $x_{2i}$ and the corresponding point $y_i$. These individual distance are our estimated residuals call them $\bar μ_i$
Vector form: In vector form we instead have single vectors in $n$ dimensional space, whereby our number of observations defines $n$. $y$ and each $x_i$ variable form $n$ dimensional vectors in this space.
The span of these $x_i$ vectors will form a hyperplane $X$ in this $n$ dimensional space. Now our vector $\hat y$ which represents our model. This is the orthogonal projection of $y$ onto the plane $X$, and the coefficients that define the linear combination of $x_i$ vectors that give $\hat y$ are the weights of our $β_ι$ parameters in our model.
What's bothering me is that in this lecture series the author states that.
In OLS regression we are trying to get as close to this dependent variable vector $y$ as we can, given that we don't have a vector or a space $X$ which is as highly dimensional [as $y$].
What's bothering me here is that, each of the vectors $x_i$ are as "highly dimensional" as $y$ because they live in the same $n$ dimensional space, but i appreciate that $X$ does not span all of $\mathbb{R^n}$ and so we can not get y with just a linear combination of our different $x_i$. Is this interpretation correct?
Question: If we had the "true model" then this distance between the true plane call it $\bar X$ and $y$ would be our error term vector $u$ and i assume that $u$ would itself be the linear combination or many other unaccounted for independent variables?
Question: When i try and visualise this, naturally all i come up with is a plane in $\mathbb{R^3}$ and then $y$ is outside of this plane. This visual analogy implies that only one $x$ variable is missing i.e. we just need a 3rd linearly independent vector. But surely for any dimension of space we chose. Our $y$ is just one vector away from $X$ as we can always point wise define an extra vector $y - lin(X)$ and indeed if we take $\hat y$ to be our vector $\in X$ then $y - \hat y$ is just $\bar u$ our residual vector.
So i'm unclear how to think about this missing extra dimensionality the author mentions. I suppose in reality this $\bar u$ could be decomposed into many more LI vectors of uncounted for independent variables. But I would appreciate clarification.