"Dimensionality" of y vs the span of our Independent variables In regression

Question

Question: I want to clarify my understanding of OLS regression using Matrix Algebra.

Let's assume we have 2 different independent variables $x_1$ and $x_2$.

Our 'model' will be the plane that lives in $\mathbb{R^3}$ that minimises the sum of squared distances between each point on the plane corresponding to observations of our pair of independent variable points $x_{1i}$ & $x_{2i}$ and the corresponding point $y_i$. These individual distance are our estimated residuals call them $\bar μ_i$

Vector form: In vector form we instead have single vectors in $n$ dimensional space, whereby our number of observations defines $n$. $y$ and each $x_i$ variable form $n$ dimensional vectors in this space.

The span of these $x_i$ vectors will form a hyperplane $X$ in this $n$ dimensional space. Now our vector $\hat y$ which represents our model. This is the orthogonal projection of $y$ onto the plane $X$, and the coefficients that define the linear combination of $x_i$ vectors that give $\hat y$ are the weights of our $β_ι$ parameters in our model.

What's bothering me is that in this lecture series the author states that.

In OLS regression we are trying to get as close to this dependent variable vector $y$ as we can, given that we don't have a vector or a space $X$ which is as highly dimensional [as $y$].

What's bothering me here is that, each of the vectors $x_i$ are as "highly dimensional" as $y$ because they live in the same $n$ dimensional space, but i appreciate that $X$ does not span all of $\mathbb{R^n}$ and so we can not get y with just a linear combination of our different $x_i$. Is this interpretation correct?

Question: If we had the "true model" then this distance between the true plane call it $\bar X$ and $y$ would be our error term vector $u$ and i assume that $u$ would itself be the linear combination or many other unaccounted for independent variables?

Question: When i try and visualise this, naturally all i come up with is a plane in $\mathbb{R^3}$ and then $y$ is outside of this plane. This visual analogy implies that only one $x$ variable is missing i.e. we just need a 3rd linearly independent vector. But surely for any dimension of space we chose. Our $y$ is just one vector away from $X$ as we can always point wise define an extra vector $y - lin(X)$ and indeed if we take $\hat y$ to be our vector $\in X$ then $y - \hat y$ is just $\bar u$ our residual vector.

So i'm unclear how to think about this missing extra dimensionality the author mentions. I suppose in reality this $\bar u$ could be decomposed into many more LI vectors of uncounted for independent variables. But I would appreciate clarification.

I suspect your problem might lie in the distinction between a plane and a hyperplane. With two independent variables, $x_1$ and $x_2$ span a subspace of dimension at most $2$ -- not a hyperplane unless $n=3.$ The dimension of $X$ is 0, 1, or 2 here; generally, it cannot be any larger than the number of IVs, $p.$ The codimension of $X$ is at least $n-2.$ Thus, describing $u$ (or $y$) requires at least $n-p$ more coefficients. There's no "missing dimensionality." Moreover, everything happens in at most $p+1$ (here $3$) dimensions. — whuber, Jun 19 '23 at 22:43
(Continued) In fact, you can reduce least squares multiple regression to a sequence of operations in Euclidean planes: you don't even need as many as 3 dimensions. See https://stats.stackexchange.com/a/113207/919. The key point is that the number of scalars ($n$) needed to write down your vectors is wholly irrelevant (except when it comes to doing the calculations, of course). What matters mathematically and conceptually is the dimension of the vector spaces in which the operations are performed. $x_1,$ $x_2,$ and $y$ determine a space of dimension at most 3. — whuber, Jun 19 '23 at 22:47
Hi @whuber thank you very much for taking the time to comment on my question! I understand most of what you are saying in your first comment. When you say $n - p$ more coefficients. By coefficients do you mean, the $β$ coefficients on our $x_i$ vectors? Why the focus on coefficients and not just saying we need more independent variables? And yes i believe i understand $p+1$ as here we must account for the intercept column of 1's in our matrix yes? the link i have included i have time stamped it to the exact moment the author mentions what was troubling me. Is there any chance you'd be able to — CormJack, Jun 20 '23 at 18:17
(continued) quickly check. The exact moment is 3:44 he says "we don't have a vector or a space that is as highly dimensional as..." — CormJack, Jun 20 '23 at 18:18
As for your second comment thank you very much for the link. I have read through this, and i understand the relationship between orthogonal projections and least squares. however i am having to remind myself of the controlling aspect. I.e. it's obvious that our beta paramaters, give us the linear combination of independent variables that sum to that orthogonal projection. But i do need to remind myself how this relates to the mechanic of controlling for the impact of one variable on the others, as the post mentions. As the post describes it itself is just projections, but i need to think on it — CormJack, Jun 20 '23 at 18:43
Yes, the intercept (when you include it) counts as one of the variables. — whuber, Jun 20 '23 at 19:19

"Dimensionality" of y vs the span of our Independent variables In regression

0 Answers0